For me, Rich Hickey’s original post on transducers raised more questions than it answered. Stian Eikeland wrote a good guide on how to use them, but it didn’t really answer the questions I had. However, there’s an early release of Clojure 1.7, so I thought I’d take a look.
Let’s start with a simple example using an existing transducer:
(def z [1 2 3 4 5 6])
(sequence (filter odd?) z)
;;; (1 3 5)
Okay, so far so good, we understand how to use an existing transducer to create a sequence.
Now, is identity a transducer?
(sequence identity z)
;;; (1 2 3 4 5 6)
Perfect. Now let’s try doing it ourselves. We’ll write a transducer that preserves all its input.
Arity Island
Rich says the type of a transducer is (x->b->x)->(x->a->x). In practice, arity matters in Clojure, so it’s really (x->b-x)->(x,a)->x. So let’s write my-identity
(defn my-identity [yield] (fn [x b] (yield x b)))
(sequence my-identity z)
;;; ArityException Wrong number of args (1) passed to:
;;; user/my-identity/fn--1347 clojure.lang.AFn.throwArity (AFn.java:429)
Wait, it’s only expecting one argument? Let’s try one
(defn my-identity [yield] (fn [x] (yield x)))
(sequence my-identity z)
;;; ArityException Wrong number of args (2) passed to:
;;; user/my-identity/fn--1342 clojure.lang.AFn.throwArity (AFn.java:429)
Unsurprising. Let’s combine the two.
(defn my-identity [yield] (fn ([x b] (yield x b)) ([x] (yield x))))
(sequence my-identity z)
;;; (1 2 3 4 5 6)
OK. So, a transducer is actually two functions. What the heck are these functions being passed?
(defn my-identity [yield]
(fn ([x b] (println "Arity2: " x) (yield x b))
([x] (println "Arity1: " x) (yield x))))
(sequence my-identity z)
;;; StackOverflowError clojure.lang.RT.boundedLength (RT.java:1697)
Oh dear. Maybe we can see the class instead:
(defn my-identity [yield]
(fn ([x b] (println "2A " (class x)) (yield x b))
([x] (println "1A " (class x)) (yield x))))
(sequence my-identity [5 7 9])
(2A clojure.lang.LazyTransformer
2A clojure.lang.LazyTransformer
5 2A clojure.lang.LazyTransformer
7 1A clojure.lang.LazyTransformer
9)
Well, that’s a bit of a mess, but we can see the 5, 7 and 9 streaming out. Weirdly, they seem to be coming out slightly too late. And the arity-1 function is called at the end. It’s not clear what you can usefully do with it’s parameter other than pass it through since it’s not fixed, has no guaranteed protocols and in the case of LazyTransformer, blows up if you try to evaluate it.
If you take a look at actual transducers, you’ll see there’s a third, zero-arity function declared as well. I haven’t discovered what that’s for yet.
State of Play
So what’s that arity-1 function for, then? Well, the doc string for drop gives us a clanger of a clue:
Returns a stateful transducer when no collection is provided.
Transducers can have state. They start when the yield function is passed them, and finish when the arity-1 function is called, and you can clean up resources when it ends. This start/reduce/finish lifecycle is actually vital to making drop
and other reducers work.
OK, this is starting to look an awful lot like the IObserver interface in C#. (The Subcribe method corresponds to the initial start step.) That suggests the arity zero function is for some form of error handling, but I haven’t managed to trigger it.
Bad Reputation
Okay, now let’s try something a bit harder. Let’s repeat our input.
(defn duplicate [yield]
(fn ([x b] (yield x b) (yield x b)) ([x] (yield x))))
(sequence duplicate [1 2 3 4 5 6])
;;; (1)
What the heck happened there? We ignored the result of the first call to yield
. Let’s fix that.
(defn duplicate [yield]
(fn ([x b] (yield (yield x b) b)) ([x] (yield x))))
(sequence duplicate [1 2 3 4 5 6])
(1 1 2 2 3 3 4 4 5 5 6 6)
Perfect! It’s a mystery to me how exactly it failed, but we’ve gained a bit more insight: you can only do calls to yield
by passing the result of one into the first parameter of the next.
So, here’s what we’ve learned:
- A transducer is a function that takes one parameter and returns a “function”
- Said function is actually 2/3 other functions, using arity overloading
- It has a start/reduce/finish lifecycle. The finish step can’t transform the result further.
- It can have state.
- Calls to
yield
in the reduce step have to be well-behaved.
I’d like to write some more, but this is easily enough for this post.