July 2011 – Colour Coding

Debugging Usages of the Thrush Operator

Creating long chains of transformation using the thrush operator looks attractive, but what if you need to see what it’s doing?

Consider the following code:

(-> 1  
  inc  
  inc)

Now it’s obvious that this returns 3, and that the intermediate value was 2, but the following code proves it:

(defn thread-println [v prefix]  
  (doto v (->> (str prefix) println)))

(-> 1  
  inc  
  (thread-println "Intermediate value was")  
  inc)

You’d need to write a second function for ->>, sadly.

Technorati Tags: Clojure,Thrush Operator

Another Go at Explaining The Thrush Operator In Clojure

I was recently at a dojo where a few people asked what on earth those -> and ->> symbols did, so I thought I’d have another go at trying to explain them. You can find my C# heavy first go here.

To start with, (-> a b c d) in clojure expands to (d (c (b a))). You could describe this as reversing the arguments but I don’t find that very helpful. I find it easier to think about it imperatively:

start with a
do b to it
do c to the result
do d to the result of that

Or, to express it another way (-> a b c d) is equivalent to:

(let [r0 a
      r1 (b r0)
      r2 (c r1)
      r3 (d r2)]
    r3)

However, unlike the two alternative forms above, (-> a b c d) is pretty concise and doesn’t involve a lot of brackets.

Composing Functions

Another way to think of it is like a function composition operator. Imagine you had code like the following:

(map move predators)

But then you wanted the predators to think before moving. Naively, you could write

(map move (map think predators))

Now, Clojure does have a perfectly sensible function composition operator, so you could write

(map (comp move think) predators)

but even this disguises the fact that you wanted think to occur before move.

(map #(-> % think move) predators)

Multiple Parameters

So, if you thread through single parameter functions, -> is effectively the same as comp with the arguments reversed. However, if you had functions with multiple parameters, getting comp to do what you want would involve a lot of lambda function. However, -> is a macro and has some more tricks up its sleeve. With multiple parameters, -> puts in the argument as the first parameter of the function.*

So (-> p (a b) (c d)) expands to (c (a p b) d). This may look weird, but it helps if you put your OO hat back on and remember that a lot of functions take “the object” as the first parameter and return a mutated object. For instance, the jQuery home page contains the following code:

$("p.neat").addClass("ohmy").show("slow");

This could be written in clojure as

(-> 
  ($ "p.neat") 
  (addClass "ohmy") 
  (show "slow"))

In general terms, if I’m doing anything even slightly complex, I’ll put each form on a separate line for clarity.

*Formally, it inserts it as the second form, a distinction that I wouldn’t worry about at this point.

So what does ->> do, then?

The there’s only one difference between -> and ->>: where it inserts the result into the next function. -> puts it in as the first parameter, ->> puts it in as the last. So (-> a (b c)) is equivalent to (b a c) while (->> a (b c)) is equivalent to (b c a)). This make ->> feel a lot more like you’re using “partial” on every line. You’re pretty likely to need to use ->> if you’re dealing with sequences, because pretty much all of the clojure sequence functions take the sequence as the last parameter. Here’s a simplified example from the dojo code.

(->> animals
  (filter surviving?)
  (map move))

So, the code says “take the animals, strip out the ones that didn’t make it, then move the rest”.

If you just thread through single parameter functions (or macros), there’s no difference between -> and ->>.

Using the Thrush Operator

I think the thrush/threading operators make people uncomfortable because it’s the first time they encounter the power of macros in Clojure. However, it’s a really useful tool that can make your code significant more concise. As with all concise code, you need to find the correct balance with readability. The more complex your intermediate expressions are, the more likely you are to need a let block to clearly express what you’re trying to do.

Technorati Tags: Clojure,Thrush Operator

Using the min-key function in Clojure

I was discussing today at work the problem of dealing with a variable length list of parameters in a typeless language. If you explicitly declare the variable part of your parameters to be a list of strings, if someone only passes in one parameter, the compiler can check if it’s a list of string or a single string and compile accordingly. Without the type information (which ironically, includes C) you’re out of luck. It’s got to assume you are dealing with a list.

So, if you’ve got a list and you want the smallest item in Clojure

(min [-2 1 3])

will just return [-2 1 3]. Not really a useful behaviour. However, you can use apply to solve the problem:

(apply min [-2 1 3])

actually returns -2. It gets trickier, though, when you’re dealing with min-key. Let’s say you want the number that’s closest to zero.

(min-key #(Math/abs %) -2 1 3)

works, but what if you have a list? Now you need to get creative:

(apply min-key #(Math/abs %)) [-2 1 3])

This relies on the fact that apply always takes the last parameter and flattens it. So (apply f a b [c d e]) is the same as (f a b c d e).

CORRECTION: My original solution to this was

((partial apply min-key #(Math/abs %)) [-2 1 3])

which I didn’t really like. I invited alternatives in the comments, and got a superior one from Huw. I’ve replaced the main text with his solution. I’ve also learned a bit more Clojure, which is the point of blogging in the first place…

Technorati Tags: Clojure

Thoughts on the Jetlang Remoting Specification

Mike Rettig announced the Jetlang Remoting Specification back in June. This extends the Jetlang library to IPC and distributed scenarios. It’s intentionally language-agnostic but to the best of my knowledge there is currently only a Java implementation (Retlang doesn’t have it.) It’s flavour is of a really stripped down version of ZeroMQ:

It implements packet sending across TCP/IP
It doesn’t tell you how to serialize your objects; that’s your concern, not Jetlang’s.
It handles publish/subscribe and request/reply scenarios.

In other ways, it feels like Retlang 0.2. In particular, subscriptions are on the basis of topics. Retlang 0.3 replaced this with the channel model, which only works in memory.

So, given that I’ve got a well-known obsession with Retlang and have recently been working with ZeroMQ, how do they stack up?

Advantages

Well, for one thing it’s a lot, lot easier to use. The implementation is in easy-to-read Java, not C, then JNI, then Java. The wire protocol is simple and documented so you can connect things to it without having to port the code. Jetlang’s documentation has never been the greatest, but ZeroMQ’s is nowhere near as good as it looks. In short, if I needed to do something in a hurry, I’d favour Jetlang Remoting.

Second, Jetlang Remoting is designed to be asynchronous. Sending two requests to the same destination and getting back the results when they finish (rather than the order in which they were sent) is considered “advanced usage” by ZeroMQ, and is the default use case for Jetlang Remoting.

The biggest difference has to be in pub/sub scenarios, though. In ZeroMQ, if you subscribe to a socket, you get the firehose. There’s no way to say “please only send me orders that this trader can see” without opening up another socket. Jetlang’s topics provide a mechanism for achieving that. I would be surprised if a parallel ZeroMQ and Jetlang pub/sub project resulted in far less wire traffic in the Jetlang case.

Finally, Jetlang Remoting doesn’t pretend to be sockets. I’m sure it’s a lovely idea and there are many, many powerful things you can achieve with it, but if all I needed was asynchronous RPC and pub/sub, ZeroMQ proves bizarrely opinionated about what exactly I’m allowed to put down the wire. You’ll note, for instance, that the Jetlang spec makes no reference to “client” and “server”. The concepts aren’t needed.

Disadvantages

Jetlang doesn’t handle anything like store and forward. One of the cool things with a ZeroMQ solution is that you can power-cycle a server and everyone connecting will continue to work. This includes those who sent messages while the box was down, providing they didn’t choose to time out. Jetlang does do automatic reconnection, but that’s about it.

ZeroMQ also supports more protocols than just TCP. In particular, it would probably be significantly faster in the case of two processes communicating in the same memory space. (Mongrel does this.) It probably wouldn’t be hard to extend Jetlang Remoting to these cases, it probably just isn’t necessary for the project they’re working on.

Differences

ZeroMQ does its damnedest to make intra-process communication look the same as inter-process communication. Mike Rettig doesn’t believe that’s a good idea, If you’re looking for a unified API where you can invisibly move components into and out of process, you really should look somewhere else. YMMV on this one.

Criticisms

Just a couple of annoyances I found in the spec:

Topics are ASCII only. I really don’t think it would have done much harm to make it UTF-8. Anything that actually is ASCII would have the exact same wire representation.
For that matter, why not make topics byte arrays? or integers?
Receivers detect heartbeats, but senders configure when they send them, which introduces a co-ordination annoyance. One extra message would enable the sender to inform the receiver of his heartbeat schedule. (FIX suffers from the same problem.)

Conclusion

All in all, it looks like Mike Rettig has done it again. He’s taken something that’s widely regarded as difficult and made it look easy. If you need all the bells and whistles of ZeroMQ, obviously you’re going to go down that route. But if you just need asynchronous rpc and pub/sub, Jetlang Remoting looks like the best option.

Technorati Tags: Jetlang,Jetlang Remoting,ZeroMQ,node.js,Mongrel,FIX

Thoughts on ZeroMQ

I’ve been doing a bit of ZeroMQ work in Clojure at work, and I thought I’d share my thoughts on it. It’s worth understanding that what I was building was a globally distributed system, written for the JVM, involving long-running queries and peer-to-peer communication. If your use case is different from this, you might have a radically different experience.

The Good

ZeroMQ basically gives you an interface at the C API level that looks very like a POSIX socket. It then implements a packet protocol on top, so you never need to worry about receiving partial messages. It’s also got some nice magic in there to allow you to connect to a server that is currently down and have the messages queued. This is pretty nice in terms of reducing the fragility of the system. Also, as a developer, it means that you can stop worrying about whether you started the client or the server first when checking something out. (Also, doing the demo where you bring up a missing server and communication resumes is great for boss wow points.)

What it doesn’t do is tell you how to serialize your objects. This only sounds like a good idea if you think that local proxy objects are a good idea. The rest of us would like control over our wire object format, thank you very much. So, for instance, for small objects you can use Protocol Buffers or Thrift. If you’re sending a lot of data you can knock yourself out and gzip the entire thing. All of this without having to figure out where the hooks in the serialization API are.

So, ZeroMQ basically takes your socket programming and simplifies it. Well, it would except for a couple of things.

The Bad

The documentation looks much better on a skim through than when you dive into it

There’s an awful lot to learn for even simple cases.
Most of the examples aren’t available in Java. (Or most of the many other languages for which ZeroMQ has wrappers.)
The guide keeps repeating itself.
The Javadocs for JZMQ aren’t available anywhere online.
The wire protocol is completely undocumented.
The guide keeps repeating itself.
JZMQ has a wilfully different API from the C API.*

One or two of these would be fine. All together, it’s a bit of a perfect storm for a newbie.

Next, for all that they talk about supporting multi-threading, asynchronous programming is very much regarded as the advanced option. Most of the simple use cases involve blocking calls and enforced send/receive ordered pairs. There’s some very very clever code going on in ZeroMQ to make it do what it does, but all of the REQ/REP stuff does is make it easy to write blocking IO calls in an asynchronous environment. It’s kind of like node.js in reverse. Frankly, I’ve spent years getting up to speed with thinking asynchronously. Sticking a synchronous layer in the way is just going to complicate matters for me.

The pub/sub stuff is only appropriate if you want every client to receive every message. There’s no real subscription API. This would require a whole extra layer on top of the socket layer, so it’s understandable why they haven’t done it.

*And sometimes the implementation is a bit… odd. For instance, there’s two different ways to set a poll timeout. Only one works.

The Ugly

This bit may make no sense unless you’ve spent a couple of days with ZeroMQ. I should say right now that I never did succeed in getting router to router communication working. No idea why, my code looked pretty similar to the C code and the req to router stuff worked fine. I did, finally, figure out that I could solve my problem using dealer to router. This also improved the design a bit.

With that said, seriously, have you looked at the examples of router to router communication? There’s a sleep command in there. That’s not just lazy coding, that’s the only way to avoid dropped messages. The only general way to handle this is to manually send heartbeats to your peers. Pretty much most of the benefits of ZeroMQ just start to disappear with router to router communication. It comes with a health warning in the guide but frankly, the typeface isn’t nearly large enough.

A Few Thoughts More

It seems like I’ve ragged on ZeroMQ quite extensively. It’s been an intensely frustrating experience getting it to work, but it does achieve a fair bit of dull and tricky stuff in the background that I no longer need to worry about. I think it’s got a slightly odd attitude to asynchronous processing. (To be fair, it was pretty much the state of the art when I read Tenenbaum.) However, it’s fast and it solves problems. What more did you want?

I’m not sure I’d use it for intra-process communication the way that Mongrel does, Retlang/Jetlang/Rx all seem to solve the problem with significantly less fuss. No, none of these are available in C, but it wouldn’t be that hard to port Retlang to anything with function pointers, mutexes and finally clauses.

Finally, I’ve only been playing with it for a couple of weeks. A ZeroMQ expert could undoubtedly introduce you to many more advantages of it than I could.

Technorati Tags: Clojure,ZeroMQ,jzmq,Protocol Buffers,Thrift,node.js

A Little More Clojure Enlightenment

If there’s one thing I’ve been thinking about all year, it’s tool-based thinking. I’ve spent my time getting out of Visual Studio, getting out of .NET and smelling the flowers. Turns out Vim is a great editor, Coffeescript is a great language and Clojure… still has a lot to teach me. One of the odd things with Clojure is that there’s not many people writing about it. I have a theory that this is basically because there’s not much to say. It’s elegant and it works.

It is however, insanely expressive, which can lead to some rather entertaining debate as to the most elegant way to implement something. This is one I asked on the Clojure IRC room:

(fn [f coll] (reduce #(assoc %1 %2 (f %2)) {} coll))

For those of you not familiar with Clojure, this takes a list and creates a map where the keys are the original list and the values are what you would get by applying f to the key. This is pretty much the inverse of what “group-by” does. If you read the source of group-by, you’ll notice that it uses a transient hashmap internally, which is faster than the naive implementation above. Here’s two that were proposed:

(fn [f coll] (apply hash-map (mapcat (juxt identity f) coll)))

(fn [f coll] (zipmap coll (map f coll)))

Of the two, I prefer the second, because it involves constructing fewer intermediate objects. For those of you interested in performance, you’ll be disappointed to learn that zipmap doesn’t use transients (yet).

It’s worth revisiting why I’d want such a function. In C#, I’ll commonly pre-compute results of functions and throw them around in Dictionaries, and when translating to a functional language I just translated the behaviour. (Yadic commits the same sin.) So why does Clojure not have a built-in function for generating maps like this? Basically, because memoize is more powerful, more flexible and doesn’t require you to know your keys beforehand. In particular, it would be hard to distinguish between the first code example and

(fn [f coll] (memoize f))

For those of you not familiar with Clojure, this does rely on one of the neater syntax shortenings in Clojure: any map is a function that takes a key and returns the corresponding value or nil.

Technorati Tags: Clojure,Yadic,Tool Based Thinking