Fox Goose Corn in Haskell for Clojure Programmers

span.kw { color: #007020; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #40a070; }
code > span.fl { color: #40a070; }
code > span.ch { color: #4070a0; }
code > span.st { color: #4070a0; }
code > span.co { color: #60a0b0; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #06287e; }
code > span.er { color: #ff0000; font-weight: bold; }
]]>

This is my attempt at a solution to the fox/goose/corn problem in Haskell. It was inspired by Carin Meier’s Clojure Kata for the same problem, although it deviates from the approach. A better Haskell developer might significantly improve on my version. I didn’t find much use for the standard typeclasses in this, sadly. As a consequence, however, the code is relatively understandable from the perspective of a Clojure programmer with no Haskell experience.

I’ll explain each construct as we encounter it.

Preliminaries

First, we have the namespace declaration. Unlike Clojure, we need to declare any identifiers we export. Since we’re writing an executable, we export main just as we would in C.

module Main (main) where

Data.Set exports a lot of things with the same names that Data.List exports, so it’s pretty common to import it qualified. It’s not strictly necessary for the code that follows, though.

import qualified Data.Set as S
import Data.List(unfoldr)
import Data.Foldable(Foldable,foldr,find)

The equivalent of clojure.core is the Prelude. We hide Left and Right because we’ll be using our own concept using those identifiers. We hide foldr because the version in Data.Foldable is more general.

import Prelude hiding (Left, Right, foldr)

The Haskell Prelude is actually kind of frustrating, in that it doesn’t show off the language in its full power. It’s heading in that direction, though. In particular, this particular problem is getting addressed soon. Some people opt out of the Prelude altogether and use an alternative.

Basic Data Types

We’re writing Haskell, so we should write some types down.

You’ll recognize the following declarations as being identical to Java enums. Ord means it’s orderable, which in turn means you can put it in a set (hash sets aren’t the default in Haskell), Eq means you can test for equality using it, which comes along for the ride with Ord, Show means you can print it. Haskell magically writes the code in deriving.

data Item = Fox | Goose | Corn | Me deriving (Ord, Eq, Show)
data Side = Left | Right

We’ll represent everything using only the representation of the right hand side. This has the nice property that the initial state is the empty set. So we’re travelling from the Left to the Right. If we’d used a list, some of the code below would be prettier than using a set, but I believe set is the correct representation since it’s fundamentally unordered. It’s worth considering how it would look in Clojure with Set.

This is a newtype. type would indicate a type alias (so State was exactly the same thing as S.Set Item.) A newtype can’t be mixed with a raw set (which is what a Clojure programmer would naturally do) and requires you to explicitly construct and deconstruct it from the set as necessary. This obviously has a cost in verbosity, but has no runtime overhead because it’s all optimised out. It’s especially useful if you’re dealing with two concepts wih the same type representation. In our case, State and History (defined later) could be very easily confused in Clojure.

newtype State = State (S.Set Item) deriving (Ord, Eq, Show)

State of Play

We’ll need some way of mapping booleans to Left/Right. We’re adopting a convention that Left = True here, and we’ve named the function to help keep this straight. Note that we have two definitions. Each definition is a pattern match on the right hand side. Basically, you need this for two things: identifying the side you’re on, and the side you’re not on, so the Bool -> Side mapping makes sense.

toRight :: Bool -> Side
toRight True = Right
toRight False = Left

Now let’s figure out which side we’re on. Here we destructure State for the first time.

onRight :: State -> Bool
onRight (State s) = S.member Me s

We also need a function that tells you what is on which side.

  • \ means “difference”. Since Data.Set is namespace qualified, so is the operator.
  • Sadly there’s no general type that subsumes sets and lists so there’s a List.\ and a Set.\ and they don’t interoperate well

Coming up with a good type system for lists and list like things is regarded as an open problem in the Haskell world and they’re not prepared to take the kinds of compromises Clojure and Scala have made. (Consider, for instance, that mapping a set returns a list.) However, in practice that means that using different types of lists together or writing general list-like code is a pain I could have introduced my own abstraction, but seriously, what’s the point?

Again, we have two definitions. This is the first time we use a where clause. A where clause is similar to a postfix let clause. Note that we don’t need type declarations for non-top-level declarations.

Also, this is an arity-2 function. Only there’s no such thing in Haskell. Haskell, like most FP languages (and unlike Clojure) only ever has functions that take one parameter and return one. So what you’re really looking at here is a function that takes a Side and returns another function which takes a State that then returns a set of items. If you just don’t apply enough parameters, you get the partial application of the function. I’ve long since been an advocate of programming Clojure like this ever since I spent a couple of hours in F#’s company.

side :: Side -> State -> S.Set Item
side Left (State s) = everyone S.\ s
  where everyone = S.fromList [Fox,Goose,Corn,Me]
side Right (State s) = s

The whole reason we’ve defined the operations above is this: after this point we’ll never destructure State again, just interact with the State functions we’ve already defined. The hope is that this enables us to think at a higher level about what we’re doing. (I’m not going to argue this point, but there’s plenty of people on the internet prepared to do so.)

Haskell! So We Can Be Safe!

Let’s figure out if a State is safe. Turns out the rules for whether or not you’re safe are pretty easy

  • The attended side is always safe
  • The unattended side is safe if only the Goose is there
  • No other unattended Goose is safe
  • Every other unattended side is safe

We’re using more Haskell features here.

  • . performs functional composition, so (toRight . not . onRight) is equivalent to (comp toRight not onRight).
  • We can have multiple definitions in a where clause.
  • We can call a variable _ if we don’t care about its value.
  • You can put a “guard” on a pattern match. I prefer to use guards and pattern matching over explicit branching primitives.
  • a $ b c d means the same as a (b c d). This prevents ridiculous paren buildup. Clojure has different ways of avoiding this, most obviously ->.
safe :: State -> Bool
safe s = safeSide $ side unattendedSide s
  where unattendedSide = (toRight . not . onRight) s
        safeSide l | S.member Goose l = S.size l == 1
        safeSide _ = True

In practice, the side function is only used within safe so we could have just stuck it into the where clause and saved some newtype book-keeping.

Moving the Boat

I’m not 100% happy with the readability of this next function, mostly because it’s quite long. Suggestions are welcome.

We need to find the next possible states. We’re mapping to set, because there’s no inherent ordering of the future states. You can do the same in Clojure. Unlike clojure, we need a separate map function, S.map rather than map. The good news is that it returns a set rather than a lazy list.

There is a general map function, fmap that will map anything to its correct container type (and more!) but we can’t use fmap here for technical reasons (for the curious, lookup: “Set is not a Functor in Haskell”).

Also, note that this is where we finally actually create a new State, and that we can just use State, the constructor, as a straight function that we can map over.

transitions :: State -> S.Set State
transitions state = S.map State $ S.insert moveBoat carry
  where onRight = S.member Me $ side Right state
        mySide = side (toRight onRight) state

The move command is either a delete or an insert, depending on the direction of travel. In Clojure this would be (if onRight dissoc assoc)

       move = if onRight
         then S.delete
         else S.insert

The list of items is the things that are on your side that aren’t you.

       items = S.delete Me mySide

Effectively, this next line just destructures State.

       right = side Right state

Whatever else happens, you’re definitely moving youself Note that moveBoat is the State represented by just moving yourself.

       moveBoat = move Me right

If you choose to move an item, it’s a motion on top of moveBoat, not on top of s, since you’re also moving.

We’re using flip, which swaps the parameters of move. We could also have said moveItem x = move x moveBoat or something with lambdas (IMO, lambdas are rarely the most clear option, and in this code they’re never used.) Although you could write flip in Clojure, it really isn’t Clojure “style”, but definitely is Haskell style

       moveItem = flip move moveBoat

carry is the set of states if you carry an item with you

       carry = S.map moveItem items

There’s a huge number of different types in the preceding function, and no type declarations other than the top level. You can put more type declarations in that I have, but you can’t put in fewer and compile with -Wall (If you’re OK with warnings, you can throw away the top level type declarations some of the time, but there’s a lot of reasons that’s a bad idea.)

Desperately Seeking Solution

We’ll ignore the State type completely for a while and just talk in general about how you solve this kind of problem.

We need to think about how to represent a sequence of moves. Here we newtype List (List here is a good choice of type, since history is fundamentally ordered). History is stored backwards for convenience.

newtype History a = History [a] deriving (Eq, Ord)

Let’s make history print the right way around though. To do this, we need to implement the Show typeclass by hand. (Typeclasses are a bit like interfaces, but behave very differently.)

=> is the first example in the code of a type restriction. Here we’re saying “If a is showable, then History of a is showable.” Then the implementation says “The way you show it is by taking the list, reversing it and then showing that.”

instance (Show a) => Show (History a) where
  show (History l) = show $ reverse l

How are we’re going to find the solution? You want to use your transition function to construct a set of possible histories and then search that list for a solution. You could basically do this as a breadth-first or a depth-first search. A breadth-first search will have the advantage of finding the minimal solution. To avoid wasting time on cycles such as the boat just going backwards and forwards, we’ll keep track of what positions we’ve already generated.

So, how to we go from all combinations of 2 moves to all combinations of 3 moves? We define a data structure, Generation.

data Generation a = Generation {
  previous :: S.Set a,
  states :: S.Set (History a)
}

In practice, we know that a will be State, but it’s generally good Haskell style to use the most general type possible. When you get the hang of it, this aids clarity, rather than impeding it. (See also: parametricity and theorems for free).

Generation is a record data type. Like Clojure, you can use previous and states as accessor functions. Unlike Clojure, these functions are strongly typed. That means you can’t have fields with the same name in different records (within the same file/namespace).

Working with Generations would be better if we used lenses, but lets stick to things in the base libraries.

We need to map the function that generates new states to a function that creates new Generations. In Clojure, we’d probably use reduce. In Haskell, we use foldr, which is pretty similar, modulo some laziness and argument order differences.

  • (a -> S.Set a) is a parameter that is a function.
  • We’re specifying that a implements Ord, which we need to be able to put it into a Set.
  • Due to the wonders of partial application, (a -> S.Set a) -> Generation a -> Generation a is exactly the same as (a -> S.Set a) -> (Generation a -> Generation a)
liftG :: (Ord a) => (a -> S.Set a) -> Generation a -> Generation a
liftG f t = foldr (stepG f) initial (states t)
  where initial = Generation {
          previous = (previous t),
          states = S.empty
        }

Actually, I’ve skipped the most important bit of this: the step function. I could have inlined it, but it’s pretty complex I prefer to give it its own top level declaration, along with a semi-scary type signature.

stepG :: (Ord a) => (a -> S.Set a) -> History a -> Generation a -> Generation a
stepG f (History h@(s : _)) t = result

The destructuring of History is a bit more complicated. Here we’re assigning h to the whole history, and s to the latest state in the history. Note that if History is empty, the pattern match won’t work. Clojure would just match it and put nil in s. Type safety is pretty cool here but it means we need a new pattern match for empty histories. Strictly speaking, they aren’t valid, but the way we defined the type they can happen. (If you’re seriously thinking you want a type system that can express “non-empty list” I have two answers for you: core.typed and Idris.) This is the point at which Haskell goes “Well, I’m trying to be a practical FP language, you know.”

  where result = Generation {

Add the new states into the list of known states.

          previous = S.union (previous t) nextStates,

Add the new histories into the current generation.

          states = S.union (states t) (S.map newHistory nextStates)
        }

The next states are the states of the transition function minus the known states.

        nextStates = f s S.\ (previous t)

The newHistory function is interesting. Observe (: h). Now (x : xs) is the same as (cons x xs) in Clojure. (x :) would be (partial cons x) and (: xs) would be #(cons % xs). So (: h) is a function that takes a t and puts it in front of the existing list. This is operator section and works for all operators (you can define your own) except (- x) (which is special cased to unary minus).

Again, History is just an ordinary function, that wouldn’t have been needed if we’d done types instead of newtypes.

        newHistory = History . (: h)

Finally, to avoid compiler warnings, tell it what happens when History is empty. This case should never happen.

stepG _ _ t = Generation { previous = previous t, states = S.empty }

The Under-Appreciated Unfold

So, now we’ve got a Generation to Generation function, how do we get the list of all possible histories? Well, we could always just write some recursive code, but like in Clojure, there’s functions that exemplify common recursion structures. In Clojure, iterate might be good choice here. In Haskell, there’s unfoldr.

The type declaration of iterate in Clojure would be iterate :: (a -> a) -> a -> [a].

In comparison, the type declaration of unfoldr is quite complex: unfoldr :: (b -> Maybe (a, b)) -> b -> [a].

You might be wondering why they’re so different. The short answer is that unfoldr is awesome. The key is the step function itself b -> Maybe (a,b). This says that it takes a b and returns either Nothing (nil) or Just a pair of a and b. (Did I mention one of the coolest things about Haskell? null/nil doesn’t exist.) The b gets passed to the next step, the a gets output. So unfoldr supports having an internal state and an external state. What happens if Nothing is returned? The list stops generating. Clojure expects you to then terminate the list in a separate step, an approach that seems simpler but falls down when you start to use things like the state monad.

So, our output a is going to be the set of states of the generation, while b is going to be the Generations themselves. We’ll return Nothing when there’s no states in the Generation.

iterations :: (Ord a) => a -> (a -> S.Set a) -> [S.Set (History a)]
iterations start f = unfoldr (forUnfoldr . (liftG f)) initial
  where forUnfoldr t | S.null (states t) = Nothing
        forUnfoldr t = Just ((states t),t)
        initial = Generation {
          previous = S.empty,
          states = S.singleton $ History [start]
        }

So we just call unfoldr with a generation producing function using forUnfoldr to adapt it to fit.

We’ve done this using unfoldr, which has explicit state. Control.Monad.Loops exposes unfoldM which could be used with a state monad to achieve a similar effect.

Fun with Types

Let’s have some fun. We’ve got a list of sets that contains the solution. There’s a perfectly good function for finding an element in a a list called find (as an aside: there’s no such perfectly reasonable function in Clojure). Small catch: it takes a Foldable (in Clojure, a reducable). List is Foldable, Set is Foldable, but a list of sets of states iterates through the sets, not the states.

We’ll do some type magic and make it iterate through the states. (Thanks to Tony Morris for pointing me to a way to achieve this. Much more brain-bending stuff is available in Control.Compose)

newtype Compose g f a = O (g (f a))
instance (Foldable f1, Foldable f2) => Foldable (Compose f1 f2) where
  foldr f start (O list) = foldr g start list
    where g = flip $ foldr f

So, here we’ve said that a foldable of a foldable of a can be used as a single foldable by using flip $ foldr f as the step function. We could have just written this function out, but hey, why not live a litte.

The Finish Line

Finally, we get to main. Often this is expressed in do notation, but I don’t feel the need here, since it’s literally one line: print solution.

main :: IO ()
main = print solution
  where solution = find success (O allHistories)
        success (History (s : _)) = side Left s == S.empty
        success _ = False
        allHistories = iterations allOnLeft next
        allOnLeft = State S.empty
        next = S.filter safe . transitions

So, you can build it, and run it. time reports that it takes 2ms on my machine. How on earth did it run so fast? Aren’t fully lazy functional languages meant to be slow? Well, there are advantages to running an optimizing compiler, but they’re helped by understanding a bit of what is going on under the hood. An unfold followed by a fold is called a hylomorphism. The thing is, you never need to build the whole structure, you could just run each iteration through the fold as it comes. The Haskell compiler is smart enough that it actually rewrites the code. So a large chunk of our code is actually running imperatively.

How much have types helped me write this code? Well, the early functions, especially safe, I needed to nail in GHCi, the Haskell REPL. On the other hand, the later parts of the code actually worked first time (after I’d managed to fix all of the type errors.). Make of that what you will.

I hope you’ve found this interesting. I’m still very much a beginner Haskell programmer, but I hope the presentation enables you to see how you can express ideas in Haskell. If you’d like to learn more, I can highly recommend starting with Brent Yorgey’s course.

Design Patterns: Happy Birthday and Goodbye

One of the biggest lies we tell starting developers is that design patterns are language independent. Whilst true at a high level, the truth is that a programmer in a modern programming language can junk most of the Gang of Four book. A couple of days ago, it was twenty years old. It’s time to celebrate its lasting positive influences, and then bury it.

Some things are potentially useful as terminology for discussing with people, but others aren’t even useful as that. The really obvious example is the template pattern: if you’re programming in a language that can use functions as values it’s utterly meaningless. Another is iterator: most programming languages have a list/sequence implementation and you just use that.

Prototype, equally is meaningless for two, entirely opposite, reasons: first, the whole concept originates in C++ where you can perform a raw memory copy. In a language such as Java that doesn’t have one it’s so cumbersome you’ll prefer a factory method. In a language such as F# or Clojure, ubiquitous persistence data structures mean that everything’s a prototype.

Command is basically a pattern that replaces functions with objects. In a functional programming language, this is just the normal way you do things. In languages such as Python and Clojure where objects can act as functions the line is further blurred. But that’s nothing compared to what you can do with Clojure’s multimethods.

Multimethods and Protocols

Quite a few patterns are just workarounds for the painfully restricted dispatch patterns in old OO languages. The visitor and adapter patterns are both ways of circumventing the closed nature of classes in C++/Java. When you can just associate new methods with existing data structures, even third party code, you just don’t need them.

Also, if you understand multimethods for more than just class based dispatch, you see that it subsumes the state pattern.

(defmulti state-pattern (fn [tool data] tool))
(defmethod state-pattern pen-tool
  [tool data]
  nil)

How about a strategy pattern?

(defmulti strategy-pattern (fn determine-strategy [tool data] ...))
(defmethod state-pattern :strategy1
  [tool data]
  nil)

In practice, you can use multimethods to mix and match dispatch on raw parameter value (state), dispatch on computed value (strategy) and dispatch on class (visitor). Similar effects can be achieved using Haskell’s type features.

Trivial

Then there’s stuff that’s just a special case of something more general. Chain of responsibilty in Clojure is easily implemented using the some function:

(defn chain-of-responsibility
  ([elements] (partial chain-of-responsibility elements))
  ([elements data] (some #(% data) elements)))

Is chain of responsibility really useful terminology here, or is it just “using the some function”?

Then there’s ones that are just plain outdated: observer and mediator are rarely a better choice than a decent pub/sub mechanism. Heck, even your language’s event system is often a better choice. And I think everyone’s got the message about singleton by now.

Outdated

I’m concerned this will be seen as down on the whole concept of patterns. Actually, high level patterns, the kind that Martin Fowler talks about are fine and last a long time. But our understanding of patterns constantly evolves (see pub/sub) and the ergonomics of specific patterns varies wildly between languages. GoF was a great book, and made a huge positive impact, but it’s time to take it off our shelves.

Let’s Write a Transducer!

For me, Rich Hickey’s original post on transducers raised more questions than it answered. Stian Eikeland wrote a good guide on how to use them, but it didn’t really answer the questions I had. However, there’s an early release of Clojure 1.7, so I thought I’d take a look.

Let’s start with a simple example using an existing transducer:

(def z [1 2 3 4 5 6])
(sequence (filter odd?) z)
;;; (1 3 5)

Okay, so far so good, we understand how to use an existing transducer to create a sequence.

Now, is identity a transducer?

(sequence identity z)
;;; (1 2 3 4 5 6) 

Perfect. Now let’s try doing it ourselves. We’ll write a transducer that preserves all its input.

Arity Island

Rich says the type of a transducer is (x->b->x)->(x->a->x). In practice, arity matters in Clojure, so it’s really (x->b-x)->(x,a)->x. So let’s write my-identity

(defn my-identity [yield] (fn [x b] (yield x b)))
(sequence my-identity z)
;;; ArityException Wrong number of args (1) passed to: 
;;; user/my-identity/fn--1347  clojure.lang.AFn.throwArity (AFn.java:429)

Wait, it’s only expecting one argument? Let’s try one

(defn my-identity [yield] (fn [x] (yield x)))
(sequence my-identity z)
;;; ArityException Wrong number of args (2) passed to: 
;;; user/my-identity/fn--1342  clojure.lang.AFn.throwArity (AFn.java:429)

Unsurprising. Let’s combine the two.

(defn my-identity [yield] (fn ([x b] (yield x b)) ([x] (yield x))))
(sequence my-identity z)
;;; (1 2 3 4 5 6) 

OK. So, a transducer is actually two functions. What the heck are these functions being passed?

(defn my-identity [yield] 
  (fn ([x b] (println "Arity2:  " x) (yield x b)) 
      ([x] (println "Arity1:  " x) (yield x))))
(sequence my-identity z)
;;; StackOverflowError   clojure.lang.RT.boundedLength (RT.java:1697)

Oh dear. Maybe we can see the class instead:

(defn my-identity [yield] 
  (fn ([x b] (println "2A " (class x)) (yield x b)) 
      ([x] (println "1A " (class x)) (yield x))))
(sequence my-identity [5 7 9])
(2A  clojure.lang.LazyTransformer
2A  clojure.lang.LazyTransformer
5 2A  clojure.lang.LazyTransformer
7 1A  clojure.lang.LazyTransformer
9)

Well, that’s a bit of a mess, but we can see the 5, 7 and 9 streaming out. Weirdly, they seem to be coming out slightly too late. And the arity-1 function is called at the end. It’s not clear what you can usefully do with it’s parameter other than pass it through since it’s not fixed, has no guaranteed protocols and in the case of LazyTransformer, blows up if you try to evaluate it.

If you take a look at actual transducers, you’ll see there’s a third, zero-arity function declared as well. I haven’t discovered what that’s for yet.

State of Play

So what’s that arity-1 function for, then? Well, the doc string for drop gives us a clanger of a clue:

Returns a stateful transducer when no collection is provided.

Transducers can have state. They start when the yield function is passed them, and finish when the arity-1 function is called, and you can clean up resources when it ends. This start/reduce/finish lifecycle is actually vital to making drop and other reducers work.

OK, this is starting to look an awful lot like the IObserver interface in C#. (The Subcribe method corresponds to the initial start step.) That suggests the arity zero function is for some form of error handling, but I haven’t managed to trigger it.

Bad Reputation

Okay, now let’s try something a bit harder. Let’s repeat our input.

(defn duplicate [yield] 
  (fn ([x b] (yield x b) (yield x b)) ([x] (yield x))))
(sequence duplicate [1 2 3 4 5 6])
;;; (1)

What the heck happened there? We ignored the result of the first call to yield. Let’s fix that.

(defn duplicate [yield] 
  (fn ([x b] (yield (yield x b) b)) ([x] (yield x))))
(sequence duplicate [1 2 3 4 5 6])
(1 1 2 2 3 3 4 4 5 5 6 6)

Perfect! It’s a mystery to me how exactly it failed, but we’ve gained a bit more insight: you can only do calls to yield by passing the result of one into the first parameter of the next.

So, here’s what we’ve learned:

  • A transducer is a function that takes one parameter and returns a “function”
  • Said function is actually 2/3 other functions, using arity overloading
  • It has a start/reduce/finish lifecycle. The finish step can’t transform the result further.
  • It can have state.
  • Calls to yield in the reduce step have to be well-behaved.

I’d like to write some more, but this is easily enough for this post.

Not a Haskell Monad Tutorial: Applicatives

Functors are all very well, but they only allow you to map with a function that takes only one parameter. But there’s plenty of functions that take more than one parameter, including useful ones like add and multiply. So how do we want to multiply to work on nullable integers?

  • 2 times 3 should be 6
  • 2 times null should be null
  • null times 3 should be null
  • null times null should be null

There’s something else we need to do. What if 2 is just an integer, not a nullable integer? Really, we need to be able to promote an integer to a nullable integer. The more parameters a function has, the more likely one of them isn’t in exactly the right format. Haskell calls this function pure. (+)

Now let’s get a bit more complicated. What about multiplying two lists together? Multiplying [2] and [3] should obviously give [6]. But what happens if you’re multiplying [2,3] and [5,7]? Turns out there’s at least three sensible answers:

  • Multiply the pairs in sequence: [10,21]
  • Multiply the pairs like a cross join: [10,14,15,21]
  • Actually, you could also iterate the first sequence first [10,15,14,21

More than one way to skin a list

Let’s just concentrate on the first two. How are they going to deal with lists of different length?

  • [2] * [1,3] should be [2] OR
  • [2] * [1,3] should be [2,6]

But what if the first parameter isn’t a list. What should that look like? Well, 2 * [1,3] should definitely be [2,6]. But that means that, depending on how we generalise multiplication, we also need to generalise turning a number into a list.

  • To multiply like a cross join, 2 can just become [2]
  • To multiply the pairs in sequence 2 needs to be [2,2,2,2,2,...], an infinite sequence of 2s.

So, generalizing multiple-arity functions to functor contexts isn’t as obvious as it is for single-arity functions. What on earth do we do about this? Well, the approach Haskell goes with is “pick an answer and stick with it”. In particular, for most purposes, it picks the cross join. But if you want the other behaviour, you just wrap the list in a type called ZipList and then ZipLists do the pairwise behaviour.

Back to the Functor

So, how should we handle the various examples of functors that we covered in the first part? We’ve already dealt with nullables and lists and sets are a dead loss because of language limitations.

Multiplying two 1d6 distributions just gives you the distribution given by rolling two dice and multiplying the result. Promoting a value e.g. 3 to a random number is just a distribution that has a 100% chance of being 3.

You can multiply two functions returning integer values by creating a function that plugs its input into both functions and then returns the product of the results. You can promote the value 3 to a function that ignores its input and returns 3.

How about records in general? Well, here’s the thing: you can’t promote a record without having a default value for every field. And that isn’t possible in general. So, while you can undoubtedly make some specific datastructures into applicatives, you can’t even turn the abstract pair (a,b) (where you’re mapping over a) into an applicative without knowing something about b.

We could make the mapping work for pair if we were actually supplied with a value. But that doesn’t make sense, does it? How about, instead of (a,b) we work on functions b -> (a,b). Now we can map a, on single and multiple-arity functions, and just leave the b input and output values well alone. It turns out this concept is rather useful: it’s usually called the State Monad.

Would you like Curry with your Applicative?

Up until now, I’ve mostly talked about pairwise functions on integers. It’s pretty obvious how you’d generalize the argument to arbitrary tuples of arbitrary input times. However, it turns out that the formulation I’ve used isn’t really that useful for actual coding, partly because constructing the tuples is a real mess. So let’s look at it a different way.

Let’s go back to multiplying integers. You can use the normal fmap mapping on the first parameter to get partially applied functions. So our [2,3] * [5,7] example gives up [2*,3*] and [5,7]. Now we just need a way of “applying” the functions in the list. We’ll call that <*>. It needs to do the same thing as before and the promotion function, pure is unchanged.

It turns out that once you’ve got that, further applications just need you to do <*> again, so if you’ve got a function f and you’d normally write f a b c to call it, you can instead write

f <$> a <*> pure b <*> c

Assuming a and c are already in the correct type and b isn’t. This is equivalent to

pure f <*> a <*> pure b <*> c

but in practice people tend to write the dollar-star-star form. Finally, you can also write

(liftA3 f) a (pure b) c

which is much more useful when you’re going pointfree.

And finally…

So, here’s the quick version:

  • a functor that can “lift” functions with multiple parameters is termed an “applicative functor”, “idiom” or just “applicative”
  • a functor is uniquely defined by the data type you’re mapping to(*)
  • some data structures like list, however, give rise to multiple possible implementations of Applicative

Functors have been well understood for a long time, and monads provided the big conceptual breakthrough that made Haskell a “useful” language. The appreciative of applicative functors as an abstraction that occupies a power level between the two is a more recent development. When going around the Haskell libraries you’ll often discover two versions of a function, one of which is designed for applicatives and one for monads but they’re the same function. It’s just that the monad version was implemented first. With time, the monad versions will be phased out, but it’s going to take a long tuime. You can read more about the progress of this on the Haskell wiki.

If you want a much more rigorous approach to what I’ve been talking about here, read Brent Yorgey’s excellent lecture notes.

(+) and return, for historical reasons.

(*) Indeed, and this is awesome, Haskell will just automatically generate
the Functor fmap function for you.

Functors: Programming Language Limitations

What does a functor actually look like in various programming languages? We already said it’s something you can use to map, so let’s take a look at some language’s mapping functions:

  • Clojure has map, mapv and fnil
  • Haskell has map and fmap. (It also has <$>, which is the same as fmap)
  • C# has Select
  • Java’s streams library has map and mapToInt

So, are any of these functor mapping operations? Well, it won’t take a genius to guess that fmap does the right thing. map in Haskell does the same thing, but only works for Lists. The definition of fmap for list is map, so it’s pretty much a wash. (*)

Land of Compromises

The others? Well, kind of, but you tend to need to squint a bit. The problem is that if you map over an identity function, (e.g. x => x in C#) you should get the same type back as you put in. And actually, that’s very rarely true. map in Clojure can be called on a vector and will return a lazy list. mapv can be called on a lazy list and get back a vector. map in Java and Select in C# send an interface type to the same interface type, but rarely return the exact same type as you were expecting.

Moreover, there isn’t a general mapping interface that lots of functor-like things implement and there isn’t any way to make one. This isn’t a problem for list comprehension, but it horribly breaks functors as a general model of computation. (#) You can still use the concepts, but you’ll end up with a lot of code duplication. Indeed, you’ll probably already have this code duplication and be thinking of it as a pattern. As is all to often the case, low level programming patterns reveal deficiencies in your programming language.

There’s good reasons for these mapping functions not behaving exactly like a functor, though: performance. The haskell compiler has everything, not just lists, as lazy and can optimize types away. Clojure, C# and Java can’t and treat them as hard optimization boundaries.

Haskell ain’t perfect either

We already established in the previous article that there are plenty of functors that have nothing to do with types. Haskell’s Functor type class is therefore only a functor on the category of Haskell types (usually referred to as Hask). This seems good enough, but actually it isn’t.

Consider a set of values. You can easily define a mapping function that satisfies the functor rules. Sadly, Set in Haskell isn’t a Haskell Functor. This is because Set imposes a condition on its values that they be sortable. Whilst this isn’t a problem for real Functor’s, it’s a problem for Haskell functors because type classes don’t admit restrictions on their type parameters. To put it another way, Functor in Haskell is a functor over the whole of Hask, never a subcategory. For that matter, you can’t do the (*2) functor that I described last time in any sensible way because you can’t restrict its action to integers.

It turns out this problem is fixable, with Rank-2 typeclasses, but don’t hold your breath for that to land in Prelude any time soon. In the meantime, you can’t use Functor to represent functors with domain type restrictions.

(*) Many smart Haskellers believe (and I agree with them(+)) map should work on all functors and fmap should be retired. There’s a general theme here: the standard libraries are showing their age and need work to simplify them.

(+) If you’ve never seen Alan Rickman play Obadiah Slope, you’re missing out.

(#) If you’re prepared to lose information, all you really need is reduce/Foldable anyway.

Functors: Category Theory Stuff

Functors : Category Theory Stuff

If you ever want to talk the same language as smart Haskellers, you need to know a bit of category theory. Here’s some notes on how I understand category theory right now.

The first thing to to appreciate is a list isn’t a functor, “list” is a functor. In particular, it’s a mapping from one type to another e.g. int to list of int. Furthermore, it’s a mapping that preserves the structure of int, in that performing “map” works.

Considered this way, there’s no such thing as a “higher order type”, there’s just functions from one type to another. Types with more than one type parameter in Java/C# are just multiple arity functions on types.

Some other things that are worth considering: you can make a list of any type, even a list. Not only that, but if a and b are different types, list of a and list of b are different as well. So, in maths terms, it’s an injection of the type space onto a subsection of the same type space.

What the heck is a category?

Now, let’s go back to the start and talk terminology. A category is a bunch of “objects” and “arrows” between them. They behave basically like values and functions. Indeed, values and functions form a category. The only real requirement is that arrows compose like functions and that there’s an identity map that does nothing.

In the context of type theory, the objects are the types themselves. The arrows are higher order type constructors. Just like normal functions, they’re not reversible. Now let’s make a bit weirder. Just the lists and the functions between lists and other lists form a category too.

The next bit may or may not make sense if you don’t have a maths background. Mathematically, a functor isn’t anything to do with types at all, it’s just a mapping between one category and another that preserves some structure.

Wait what?

Let’s think of a really simple category. Let’s have the objects be integers and the arrows be rotations of integers e.g. add three, subtract two. And “add zero” is an identity map.

Now let’s have another one which is the same, only all of the numbers and rotations are even. Then “times two” maps objects and functions between the two categories. So 3 becomes 6 and “add 3” maps to “add 6”. And finally, “add zero” becomes… “add zero”. So “times two” is a perfectly valid functor that is absolutely nothing to do type theory at all.

Finally, a small note, if you’re just looking at category theory for the purposes of understanding Haskell you’ll come across the phrase “locally small” a lot. Every last category you are ever going to worry about is locally small, so don’t sweat it.

Not a Haskell Monad Tutorial: Functors

One of the things that people new to Haskell may not appreciate is that academia’s love affair with monads has been waning for some time. In its place is a more nuanced hierarchy of Functor, Applicative, Monad. (*)

So what the heck is a functor? Well, really it’s just something you can map over and it makes sense. “Makes sense” has a specific mathematical meaning, but I’m going to gloss over it and keep going.

Let’s talk about some things you can map over:

  • A list, [a]
  • A set, Set a
  • A nullable value, Maybe a
  • A random value, Rand StdGen a
  • A function returning a value, (->) a
  • A program returning a value, IO a (We’ll ignore this one from now on, I’m just mentioning it because IO is kind of important.)
  • A record, where we only map over one of the fields. e.g. imagine a pair (x,y) where x and y are different types

So, if you were mapping “add one” you’d get

  • A list where all of the values were one larger e.g. [2,4] becomes [3,5]
  • A set where all of the values were one larger e.g. #(2,4) becomes #(3,5).
  • A value one larger, or null. So null becomes null, and 3 becomes 4.
  • A random integer value between 1 and 6 becomes a random integer value between 2 and 7.
  • A function f(x) becomes a function g(x) where g(x) = f(x) + 1
  • The pair (x,y) becomes (f(x),y)

I’ve found that while getting your head around the concept, it’s best to just concentrate on nullable values and lists. In particular, if you’re familiar with Clojure or LINQ, it’s about understanding that things like nil-punning and fnil are exactly the same concept as map and behave the same way.

Just to complicate matters, the set example doesn’t actually work in Haskell, but I’ll get to that in a later post.

My Head is Hurting

If you want to learn the stuff I’m saying properly, go do Brent Yorgey’s Introductory Haskell course and do the exercises. It’s a significant time investment, but well worth it.

(*) This is a gross over-simplification, so sue me.

Evaluating Clojure Libraries

My last post on Clojure template libraries needs updating already, but before I do that, I’d like to jot down some notes on how I try to evaluate libraries. Ultimately, it’s actually hard to accurately evaluate a library without using it in anger. Sadly, by that point you tend to be committed. Try to find someone who uses Rails heavily to tell you that it’s not fit for purpose.

There’s already ten or so HTML template libraries. You’re not going to use them all before committing, so you need a way to choose which you’re going to try first.

  1. The first question, of course, is: does it do what you need? It’s usually worth browsing the documentation beyond the first couple of paragraphs of the readme. What you mean by “rest middleware” could be very different from the author’s understanding of the term.
  2. Is it correct? Clojure is still in its amateur phase. When I looked at node postgres libraries, only one was actually capable of querying the first system table I tried. Unit tests are a good signal. (This is a problem with REPL driven development: there’s no evidence after the fact you did it.)
  3. Is it simple/composable? Can you vary what it does? Macros, sadly, are a bad sign. If no part of the API takes a function as a parameter, it’s probably not that flexible. Calling yourself a framework is a red flag.
  4. Is it fast? Given the choice between two otherwise identical libraries, you’ll always prefer the faster. Usually, all you can really tell from the documentation is whether or not the author cares about speed.
  5. Does it have a community? Large numbers of committers, even large numbers of issues on github are usually a good sign. These make it more likely that the library will continue to evolve as your requirements change.
  6. Can I change it? This one is horribly underrated. At this point, there’s still a good chance that you’ll find something you want to submit as a pull request. Take a browse of the code and see if it’s a code base you can work with. Watch out for libraries that are really Java libraries, like mustache.clj.

Clojure Web Stack: Server Side HTML Generation

I’m going to try to outline your current choices when generating HTML in Clojure.  To enable you to skip the entire article: there are no bad libraries, but they’re very different and suitable for different things.  If you need to, there’s nothing stopping you using multiple technologies in different parts of the program.

I’d be lying if I said I was an expert on any of these technologies, so please feel free to correct me. 

Hiccup

If there’s a default stack for Clojure, it’s the work done by James Reeves.  Hiccup is an HTML DSL for Clojure, like HAML but everything is valid Clojure.  The principal advantage of doing thing is this way is the ability to build things on top of it: if you want to create an API for standardized HTML components e.g. a form library, Hiccup’s your friend.

With Hiccup, you’re going to be ready to roll in about ten seconds after you’ve read the documentation, and it’s extremely fast.  (I’ll let someone else run the micro-benchmarks.)  However, it’s a bit fiddly, precisely because it’s very low level and macro-friendly.

Enlive

If James Reeves is the Beatles, Christophe Grande is the Rolling Stones of Clojure Web Development.*  Enlive is the most insanely fully featured project here.  For instance, the latest version contains a helper with the entire functionality of Hiccup.

Enlive is properly an HTML parsing and transformation engine based on JSoup, but it contains a capable templating solution.  Here you provide external files that are valid HTML with no additional markup It then performs transformations on nodes you identify using a CSS-like syntax.  The transformation you’re going to be using most often, of course, is to insert some text or HTML, but you can do arbitrarily smart things here.

Enlive does have an acknowledged weakness: the documentation.  There’s some good introductions, but Enlive is ridiculously deep.  However, if you the time in, you’ll find it’s incredibly powerful and elegant.

Laser

Laser is a new project by the talented Anthony Grimes. It aims to take the best parts of enlive and put them into a simpler package.  The syntax is more verbose, but fully composable (everything’s a function).

It also has the ability to use unparsed HTML in its output, which Enlive discourages.

Fleet

Neither Hiccup nor Enlive are very close to a traditional templating engine as you’d expect to find in other languages.  It seems like node has as many templating solutions as it has developers.  Clojure, at the time of writing, basically has two.

Fleet is a classic “mix code with your markup” templating language.  You can insert arbitrary functions into your HTML, like class ASP and ERB.  It also has a host of support functions, including the ability to create namespaces on the basis of a directory of templates.

Clostache

Clostache, on the other hand, is a mustache implementation.  It is programmable, but only in limited and well-defined manners.  Whilst Fleet has an extensive API, Clostache declares only two methods: render and render-resource.

Choosing an Engine

When choosing between node libraries, the principal question was “does it work?”.  And the answer was, typically, “not after you’ve been using it for half an hour”.  Clojure libraries just aren’t like that.  All of them are good at what they do.

The biggest question here is: what’s your approach to templating?  If you believe in pure HTML templates, Enlive or Laser is for you.  If you want lots of code in your HTML, you’re going to want Hiccup or Fleet.  If you’re looking for somewhere in between, Clostache is worth a look.  And, as I said before, you can always use one solution for one part of your system and another elsewhere.  That said, the different systems aren’t composable with one another, so be very clear as to what you’re using each for.  (I’m pretty sure this is why Enlive has added hiccup-style generation.)

For what it’s worth, my current project uses Clostache for straight forward page serving and Enlive for HTML reprocessing, although I’m thinking about Laser.  To date, I’ve found it pretty easy to change my mind about which libraries to use.  I’ve never found that with any other platform.  I’m ascribing that to the design of Clojure and the aesthetic of the community, and it’s a huge win.

(NB For some reason my blog won’t take comments from Chrome. I promise I’m working on it, but it’s taking a while.  A long while.)

Clojure has a Problem with Async

Clojure, like node.js, is a very opinionated platform.  The funny thing is that almost every opinion is different. 

Clojure embraces Java as a platform. 

  • Originally, every declared identifier was overrideable on a per-thread basis. 
  • There’s many features (e.g. Futures and Reducers) that allow you embrace multi-threading at a high level.
  • Data is immutable.
  • Data is globally shared between threads.
  • It adds STM to Java’s already extensive thread-synchronization primitives.
  • Everything’s a function. 

Node, conversely embraces Javascript

  • It’s aggressively single thread and asynchronous.
  • If you want another thread, you’ll have to start another process.
  • Everything’s mutable, even class definitions.
  • Share data between processes?  I hope you like memory mapping.
  • Synchronization barriers?  You don’t need them. 
  • Everything’s an event with a callback.

Clojure and Node have completely different sweet spots: clojure is truly excellent at computation, node at IO.  Like it or not, multiple threads aren’t really a good solution to blocking IO solutions.  Which is a pity, because all the main Clojure libraries feature blocking IO (e.g. clojure.java.jdbc, ring).  That’s not to say there isn’t some amazing stuff being done in Clojure, just that it could be even better.

JDBC is an interesting case because it’s a Java problem that works its way through to Clojure.  Node.js made a virtue of being the only API on a new platform.  However, it introduces a couple of oddities of its own.  For instance, the jdbc library can only have one open database connection at once.  Usually the case, but sometimes undesirable (try performing a reconciliation of a million records between two databases).  To some extent, this is a hangover of Clojure being envisaged as an application language that used libraries written in Java. 

There’s nothing stopping you from writing Clojure code in a node-like style, as long as you’re prepared to write your own web-server (Webbit, Aleph) and DB libraries (er… no-one).  Equally, implementing a feature like co-routines wouldn’t actually be that hard, but you’d lose bindings, which is a problem for any library that assumes that they work.  And you’d still need all of your libraries to be async.

For all these reasons, I don’t think we’re going to be seeing a proper Clojure async solution any time soon.  Ironically, I think it’s the complete absence of async DB libraries that is really holding it back.  Without that, solving most of the other things isn’t really that useful.