Best Undocumented Feature of Backbone.js

It’s fairly well known that you can override Backbone.sync in order to provide your own backing store.*  However, what isn’t documented but is clearly visible in the source, is that this.sync not only exists but takes priority.  This is pretty vital if you ever want a presentation-model style collection, where adding and removing causes changes to other models in your application.  I’ll warn you that you need to step through the framework code fairly carefully to get this to work, but it’s really useful.

*Another useful tip: if you’re using ASP.NET MVC, add a route like this

routes.MapRoute("Delete", "{controller}/{id}", new { action : "DELETE" }, 
new { httpMethod = new HttpMethodConstraint("DELETE"))

Because ASP.NET MVC doesn’t seem to be able to cope with just slapping an [HttpDelete] attribute on an Index method.

Technorati Tags:

Second Thoughts on Jetlang Remoting

So, after my first article I decided to ask Mike Rettig what I’d missed.  I have to admit to still not having had time to play with it myself,* however his reply was sufficiently interesting I asked to be allowed to share it.  He was kind enough to permit this.  There’s no explicit quoting here, but pretty much the credit for the pain points section is owing to Mike.  Any errors are my own.

Publish/Subcribe Pain Points

In the original article I stated that client and server were identical.  This isn’t really true.  In particular, a server is an acceptor and a client is an initiator, in standard TCP style.  Although both can publish and subscribe, the presence of topics in the protocol allows for server side filtering.  To make all of this easier to understand, imagine you’re writing a simple application that prints up the last trade in a particular list of stocks.  In classic pub/sub as ZeroMQ implements, you’d connect up to a stream that published every trade as it went down.  This would mean that you were receiving many updates in which your client had no interest.  You’re flooding the network with stuff that doesn’t matter, and usually your network is your most limited resource (certainly, very few have to worry about processing power and storage anymore…).  Using the topic, the server can be made aware of the necessary filtering that needs to occur.

This brings us to the next strength of Jetlang Remoting that I missed, probably because I only read the protocol and not the code.  When you subscribe, the server receives an event telling you this has occurred.  This is actually really significant.  Let’s return to our example application.  Let’s say we’re interested in the trades of some obscure stock, call it Recondite Inc.  Now, Recondite is an extremely illiquid and only trades once an hour.  In the classic model, you will subscribe and only get told about the stock when the next trade occurs.  Pretty much everyone regards this as part and parcel of the way pub/sub works and the way around it is fairly well understood: you just get the current state via an RPC first.  However, with the Jetlang model, the server is not only aware of what interests you, the server is aware of when you began.  Therefore it can publish the initial state through the topic.

These aren’t obscure points.  Most systems I’ve worked with that involved any form a real-time data distribution encountered both of those pain points.

The last point that Mike made concerns versioning.  The topic nature of subscriptions makes it relatively easy to identify the version of the client.  In turn this enables the server to filter messages that it knows the client can’t handle.  Achieving this without server-side filtering is possible, but only by throwing ports at the problem.

Complex Subscriptions

Lastly, I made a point last time that topics could have been made byte arrays, for complex subscription, but in practice the fact that every reply contains the topic suggests you’ll want to keep the topic small.  An alternative, but more complex arrangement, would be for subscriptions to return a subscription ID first and then send data messages with the subscription ID.  Once you recognize that model, you can see that you could set up such topics using an RPC call first.  Jetlang therefore provides the flexibility to do complex subscriptions without requiring every program to implement code for it.

*I’m spending most of my time with client side UI and backbone.js at the moment, of which more another time.

Technorati Tags: ,

Node.js Thinks Like A Man

Any man with a wife or girlfriend knows a fundamental fact: women can multi-task and men can’t.  V8 in this respect is exactly like a man: it can only do one thing at once.  Node.js uses V8 and makes a virtue of it.  Only doing one thing at once means there’s no race conditions in your memory model.  None.  I’ve lost count of the number of times I’ve been writing node.js code and worrying about “what happens if the array length changes while I’m reading” before realizing it 100% can’t happen.

One thing that men know is that you don’t actually need to do two things at once.  Consider how you make a cup of tea

kettle.boil()
cup.fill()
cup.waitUntilBrewed()
cup.serve()

Well, actually, if you want to do the dishes at the same time, this isn’t how you do it at all.  What you do is switch the kettle on, then you go and start washing.  When the kettle clicks, you then stop what you were doing and fill the cup.

kettle.switchOn()
kettle.on ‘boiled’, –> cup.fill()
cup.on ‘brewed’. –> cup.server()

Notice that we’ve lost the names “boil” and “waitUntilBrewed”.  These are words that denote processes.  Everything in node.js is asynchronous, and process verbs imply a sychronous approach.  The words we are left with indicate actions and events.  It’s also worth noticing that with a multi-threaded environment, you’d be very careful to ensure that event bindings were set up before you kicked off the process that could fire the event.  With node.js, you’re guaranteed that your current code will finish before anything else happens.  While you’re talking, you have node’s undivided attention until you stop talking.

Doing The Dishes

So, how do you do the dishes?  Well, the simplest way to handle this would be

_.forEach dishes, (d) –> d.wash()

If you do this, you’ll never hear the kettle boil.  Remember, a man can only do one thing at once, even pay attention.  So, you need a way to specify that you want to pay attention

interruptibleForEach = (list, f) –>
    index = 0
    action = –>
        return if index >= list.length
        f list[index]
        index++
        process.nextTick action
    action()

You might want to compare this code to the Caliburn Micro Co-Routine Trick.  It’s fundamentally the same code.  There’s a catch, though.  You won’t stop washing dishes for very long.  In fact, you’re guaranteed to wash a dish between filling the cup and it being brewed to your satisfaction.  Now, you could generalize all of this to interruptible processes, but you’d be missing the point: doing a synchronous process like washing dishes is a fundamentally bad idea in node.js.  If you can’t figure out a decent way to express it as a series of events, you need to hive it off to a sub-process.  You then communicate with that through asynchronous RPC or pub/sub.  I’m going to avoid any obvious gender stereotype humour here.  If you want that sort of stuff, you can read the ZeroMQ Guide.

Personal Note: I was hit by Hurricane Penelope six weeks ago.  She is insanely gorgeous and an awful lot of work.  As a consequence, it’s quite hard to get out blog articles, so please forgive me any stupid errors and formatting problems.  Next time, I hope to talk about Jetlang Remoting again, but it might take a while…

Technorati Tags: ,

Debugging Usages of the Thrush Operator

Creating long chains of transformation using the thrush operator looks attractive, but what if you need to see what it’s doing?

Consider the following code:

(-> 1  
  inc
  inc)

Now it’s obvious that this returns 3, and that the intermediate value was 2, but the following code proves it:

(defn thread-println [v prefix]  
  (doto v (->> (str prefix) println))) (-> 1
  inc
  (thread-println "Intermediate value was")
  inc)

You’d need to write a second function for ->>, sadly.

Technorati Tags: ,

Another Go at Explaining The Thrush Operator In Clojure

I was recently at a dojo where a few people asked what on earth those -> and ->> symbols did, so I thought I’d have another go at trying to explain them.  You can find my C# heavy first go here.

To start with, (-> a b c d) in clojure expands to (d (c (b a))).  You could describe this as reversing the arguments but I don’t find that very helpful.  I find it easier to think about it imperatively:

  • start with a
  • do b to it
  • do c to the result
  • do d to the result of that

Or, to express it another way (-> a b c d) is equivalent to:

(let [r0 a
r1 (b r0)
r2 (c r1)
r3 (d r2)]
r3)

However, unlike the two alternative forms above, (-> a b c d) is pretty concise and doesn’t involve a lot of brackets.

Composing Functions

Another way to think of it is like a function composition operator.  Imagine you had code like the following:

(map move predators)

But then you wanted the predators to think before moving.  Naively, you could write

(map move (map think predators))

Now, Clojure does have a perfectly sensible function composition operator, so you could write

(map (comp move think) predators)

but even this disguises the fact that you wanted think to occur before move.

(map #(-> % think move) predators)

Multiple Parameters

So, if you thread through single parameter functions, -> is effectively the same as comp with the arguments reversed.  However, if you had functions with multiple parameters, getting comp to do what you want would involve a lot of lambda function.  However, -> is a macro and has some more tricks up its sleeve.  With multiple parameters, -> puts in the argument as the first parameter of the function.*

So (-> p (a b) (c d)) expands to (c (a p b) d).  This may look weird, but it helps if you put your OO hat back on and remember that a lot of functions take “the object” as the first parameter and return a mutated object.  For instance, the jQuery home page contains the following code:

$("p.neat").addClass("ohmy").show("slow");

This could be written in clojure as

(-> 
($ "p.neat")
(addClass "ohmy")
(show "slow"))

In general terms, if I’m doing anything even slightly complex, I’ll put each form on a separate line for clarity.

*Formally, it inserts it as the second form, a distinction that I wouldn’t worry about at this point.

So what does ->> do, then?

The there’s only one difference between -> and ->>: where it inserts the result into the next function.  -> puts it in as the first parameter, ->> puts it in as the last.  So (-> a (b c)) is equivalent to (b a c) while (->> a (b c)) is equivalent to (b c a)).  This make ->> feel a lot more like you’re using “partial” on every line.  You’re pretty likely to need to use ->> if you’re dealing with sequences, because pretty much all of the clojure sequence functions take the sequence as the last parameter.  Here’s a simplified example from the dojo code.

(->> animals
(filter surviving?)
(map move)) 

So, the code says “take the animals, strip out the ones that didn’t make it, then move the rest”.

If you just thread through single parameter functions (or macros), there’s no difference between -> and ->>.

Using the Thrush Operator

I think the thrush/threading operators make people uncomfortable because it’s the first time they encounter the power of macros in Clojure.  However, it’s a really useful tool that can make your code significant more concise.  As with all concise code, you need to find the correct balance with readability.  The more complex your intermediate expressions are, the more likely you are to need a let block to clearly express what you’re trying to do.

Technorati Tags: ,

Using the min-key function in Clojure

I was discussing today at work the problem of dealing with a variable length list of parameters in a typeless language.  If you explicitly declare the variable part of your parameters to be a list of strings, if someone only passes in one parameter, the compiler can check if it’s a list of string or a single string and compile accordingly.  Without the type information (which ironically, includes C) you’re out of luck.  It’s got to assume you are dealing with a list.

So, if you’ve got a list and you want the smallest item in Clojure

(min [-2 1 3])

will just return [-2 1 3].  Not really a useful behaviour.  However, you can use apply to solve the problem:

(apply min [-2 1 3])

actually returns -2.  It gets trickier, though, when you’re dealing with min-key.  Let’s say you want the number that’s closest to zero.

(min-key #(Mat­h/abs %) -2 1 3)

works, but what if you have a list?  Now you need to get creative:

(apply min-key #(Mat­h/abs %)) [-2 1 3])

This relies on the fact that apply always takes the last parameter and flattens it.  So (apply f a b [c d e]) is the same as (f a b c d e).

CORRECTION: My original solution to this was

((partial apply min-key #(Mat­h/abs %)) [-2 1 3])

which I didn’t really like.  I invited alternatives in the comments, and got a superior one from Huw.  I’ve replaced the main text with his solution.  I’ve also learned a bit more Clojure, which is the point of blogging in the first place…

Technorati Tags:

Thoughts on the Jetlang Remoting Specification

Mike Rettig announced the Jetlang Remoting Specification back in June.  This extends the Jetlang library to IPC and distributed scenarios.  It’s intentionally language-agnostic but to the best of my knowledge there is currently only a Java implementation (Retlang doesn’t have it.)  It’s flavour is of a really stripped down version of ZeroMQ:

  • It implements packet sending across TCP/IP
  • It doesn’t tell you how to serialize your objects; that’s your concern, not Jetlang’s.
  • It handles publish/subscribe and request/reply scenarios.

In other ways, it feels like Retlang 0.2.  In particular, subscriptions are on the basis of topics.  Retlang 0.3 replaced this with the channel model, which only works in memory.

So, given that I’ve got a well-known obsession with Retlang and have recently been working with ZeroMQ, how do they stack up?

Advantages

Well, for one thing it’s a lot, lot easier to use.  The implementation is in easy-to-read Java, not C, then JNI, then Java.  The wire protocol is simple and documented so you can connect things to it without having to port the code.  Jetlang’s documentation has never been the greatest, but ZeroMQ’s is nowhere near as good as it looks.  In short, if I needed to do something in a hurry, I’d favour Jetlang Remoting.

Second, Jetlang Remoting is designed to be asynchronous.  Sending two requests to the same destination and getting back the results when they finish (rather than the order in which they were sent) is considered “advanced usage” by ZeroMQ, and is the default use case for Jetlang Remoting.

The biggest difference has to be in pub/sub scenarios, though.  In ZeroMQ, if you subscribe to a socket, you get the firehose.  There’s no way to say “please only send me orders that this trader can see” without opening up another socket.  Jetlang’s topics provide a mechanism for achieving that.  I would be surprised if a parallel ZeroMQ and Jetlang pub/sub project resulted in far less wire traffic in the Jetlang case.

Finally, Jetlang Remoting doesn’t pretend to be sockets.  I’m sure it’s a lovely idea and there are many, many powerful things you can achieve with it, but if all I needed was asynchronous RPC and pub/sub, ZeroMQ proves bizarrely opinionated about what exactly I’m allowed to put down the wire.  You’ll note, for instance, that the Jetlang spec makes no reference to “client” and “server”.  The concepts aren’t needed.

Disadvantages

Jetlang doesn’t handle anything like store and forward.  One of the cool things with a ZeroMQ solution is that you can power-cycle a server and everyone connecting will continue to work.  This includes those who sent messages while the box was down, providing they didn’t choose to time out.  Jetlang does do automatic reconnection, but that’s about it. 

ZeroMQ also supports more protocols than just TCP.  In particular, it would probably be significantly faster in the case of two processes communicating in the same memory space.  (Mongrel does this.)  It probably wouldn’t be hard to extend Jetlang Remoting to these cases, it probably just isn’t necessary for the project they’re working on.

Differences

ZeroMQ does its damnedest to make intra-process communication look the same as inter-process communication.  Mike Rettig doesn’t believe that’s a good idea,  If you’re looking for a unified API where you can invisibly move components into and out of process, you really should look somewhere else.  YMMV on this one.

Criticisms

Just a couple of annoyances I found in the spec:

  • Topics are ASCII only.  I really don’t think it would have done much harm to make it UTF-8.  Anything that actually is ASCII would have the exact same wire representation.
  • For that matter, why not make topics byte arrays? or integers?
  • Receivers detect heartbeats, but senders configure when they send them, which introduces a co-ordination annoyance.  One extra message would enable the sender to inform the receiver of his heartbeat schedule.  (FIX suffers from the same problem.)

Conclusion

All in all, it looks like Mike Rettig has done it again.  He’s taken something that’s widely regarded as difficult and made it look easy.  If you need all the bells and whistles of ZeroMQ, obviously you’re going to go down that route.  But if you just need asynchronous rpc and pub/sub, Jetlang Remoting looks like the best option.

Thoughts on ZeroMQ

I’ve been doing a bit of ZeroMQ work in Clojure at work, and I thought I’d share my thoughts on it.  It’s worth understanding that what I was building was a globally distributed system, written for the JVM, involving long-running queries and peer-to-peer communication.  If your use case is different from this, you might have a radically different experience.

The Good

ZeroMQ basically gives you an interface at the C API level that looks very like a POSIX socket.  It then implements a packet protocol on top, so you never need to worry about receiving partial messages.  It’s also got some nice magic in there to allow you to connect to a server that is currently down and have the messages queued.  This is pretty nice in terms of reducing the fragility of the system.  Also, as a developer, it means that you can stop worrying about whether you started the client or the server first when checking something out.  (Also, doing the demo where you bring up a missing server and communication resumes is great for boss wow points.)

What it doesn’t do is tell you how to serialize your objects.  This only sounds like a good idea if you think that local proxy objects are a good idea.  The rest of us would like control over our wire object format, thank you very much.  So, for instance, for small objects you can use Protocol Buffers or Thrift.  If you’re sending a lot of data you can knock yourself out and gzip the entire thing.  All of this without having to figure out where the hooks in the serialization API are.

So, ZeroMQ basically takes your socket programming and simplifies it.  Well, it would except for a couple of things. 

The Bad

The documentation looks much better on a skim through than when you dive into it

  • There’s an awful lot to learn for even simple cases.
  • Most of the examples aren’t available in Java.  (Or most of the many other languages for which ZeroMQ has wrappers.)
  • The guide keeps repeating itself.
  • The Javadocs for JZMQ aren’t available anywhere online.
  • The wire protocol is completely undocumented.
  • The guide keeps repeating itself.
  • JZMQ has a wilfully different API from the C API.*

One or two of these would be fine.  All together, it’s a bit of a perfect storm for a newbie.

Next, for all that they talk about supporting multi-threading, asynchronous programming is very much regarded as the advanced option.  Most of the simple use cases involve blocking calls and enforced send/receive ordered pairs.  There’s some very very clever code going on in ZeroMQ to make it do what it does, but all of the REQ/REP stuff does is make it easy to write blocking IO calls in an asynchronous environment.  It’s kind of like node.js in reverse.  Frankly, I’ve spent years getting up to speed with thinking asynchronously.  Sticking a synchronous layer in the way is just going to complicate matters for me.

The pub/sub stuff is only appropriate if you want every client to receive every message.  There’s no real subscription API.  This would require a whole extra layer on top of the socket layer, so it’s understandable why they haven’t done it.

*And sometimes the implementation is a bit… odd.  For instance, there’s two different ways to set a poll timeout.  Only one works.

The Ugly

This bit may make no sense unless you’ve spent a couple of days with ZeroMQ.  I should say right now that I never did succeed in getting router to router communication working.  No idea why, my code looked pretty similar to the C code and the req to router stuff worked fine.  I did, finally, figure out that I could solve my problem using dealer to router. This also improved the design a bit.

With that said, seriously, have you looked at the examples of router to router communicationThere’s a sleep command in there.  That’s not just lazy coding, that’s the only way to avoid dropped messages.  The only general way to handle this is to manually send heartbeats to your peers.  Pretty much most of the benefits of ZeroMQ just start to disappear with router to router communication.  It comes with a health warning in the guide but frankly, the typeface isn’t nearly large enough.

A Few Thoughts More

It seems like I’ve ragged on ZeroMQ quite extensively.  It’s been an intensely frustrating experience getting it to work, but it does achieve a fair bit of dull and tricky stuff in the background that I no longer need to worry about.  I think it’s got a slightly odd attitude to asynchronous processing. (To be fair, it was pretty much the state of the art when I read Tenenbaum.)  However, it’s fast and it solves problems.  What more did you want?

I’m not sure I’d use it for intra-process communication the way that Mongrel does, Retlang/Jetlang/Rx all seem to solve the problem with significantly less fuss.  No, none of these are available in C, but it wouldn’t be that hard to port Retlang to anything with function pointers, mutexes and finally clauses.

Finally, I’ve only been playing with it for a couple of weeks.  A ZeroMQ expert could undoubtedly introduce you to many more advantages of it than I could.

A Little More Clojure Enlightenment

If there’s one thing I’ve been thinking about all year, it’s tool-based thinking.  I’ve spent my time getting out of Visual Studio, getting out of .NET and smelling the flowers.  Turns out Vim is a great editor, Coffeescript is a great language and Clojure… still has a lot to teach me.  One of the odd things with Clojure is that there’s not many people writing about it.  I have a theory that this is basically because there’s not much to say.  It’s elegant and it works.

It is however, insanely expressive, which can lead to some rather entertaining debate as to the most elegant way to implement something.  This is one I asked on the Clojure IRC room:

(fn [f coll] (reduce #(assoc %1 %2 (f %2)) {} coll))

For those of you not familiar with Clojure, this takes a list and creates a map where the keys are the original list and the values are what you would get by applying f to the key.  This is pretty much the inverse of what “group-by” does.  If you read the source of group-by, you’ll notice that it uses a transient hashmap internally, which is faster than the naive implementation above.  Here’s two that were proposed:

(fn [f coll] (apply hash-map (mapcat (juxt identity f) coll)))

(fn [f coll] (zipmap coll (map f coll)))

Of the two, I prefer the second, because it involves constructing fewer intermediate objects.  For those of you interested in performance, you’ll be disappointed to learn that zipmap doesn’t use transients (yet).

It’s worth revisiting why I’d want such a function.  In C#, I’ll commonly pre-compute results of functions and throw them around in Dictionaries, and when translating to a functional language I just translated the behaviour.  (Yadic commits the same sin.)  So why does Clojure not have a built-in function for generating maps like this?  Basically, because memoize is more powerful, more flexible and doesn’t require you to know your keys beforehand.  In particular, it would be hard to distinguish between the first code example and

(fn [f coll] (memoize f))

For those of you not familiar with Clojure, this does rely on one of the neater syntax shortenings in Clojure: any map is a function that takes a key and returns the corresponding value or nil.

NPOI: Generating Excel Files from C#

I’m constantly amazed at the time that Microsoft wastes on OLE automation, when it typically isn’t even the best way of integrating with Excel.  Easily the best way to generate (or read, for that matter) an Excel document is to use NPOI.  (There are competitors, such as ExcelLibrary, but they lack the depth of features of NPOI.)  The version on NuGet is pretty up to date.  The one catch is that the API is hermetic.  Here’s some tips:

  • You need to create a HSSFWorkbook.
  • You should create any styles and fonts before you start to process the cells.  (You don’t have to, but it’ll be easier to understand the code.)
  • The correct way to create a custom data format is workbook.CreateDataFormat().GetFormat(“0.00”), which will return a short.  You then assign the number to the dataformat of the cell style.

It’s also blindingly fast compared to most other techniques.  Finally, there’s quite extensive questions on stackoverflow about it.

Technorati Tags: ,