Principled Casts in Haskell

TL;DR I continue trying to implement a routing library, but instead end up learning about Typeable, writing about orphan instances, reading and (so far) failing to understand type-magic and sending my first Haskell PR.

I remember when I was starting Clojure, one of the big catchphrases was that everything was opt in. A type system, inheritance, multiple dispatch, &c. On the other hand, there were actually plenty of things that weren’t opt-in: Java itself, polymorphism, reflection and so on.

Haskell is another opt-in language. The basic type system and language is a requisite, but there’s still a phenomenal number of things to opt into. Equality is opt in, Hashable is opt in, as we saw in the previous article, polymorphism through existential types is opt-in. Next, we’re going to see “opt-in” type casts, and hopefully you’ll see how they’re better than what you can achieve in Java or C#.

{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
import qualified Network.Wai as Wai
import qualified Network.HTTP.Types as H

So, the question I asked last time was, how can I tell if two RequestConditionBoxs are equal? To do that, we’re going to want to make RequestConditions themselves implement equality.

(As an aside: the whole of the functionality of the last post might have been better implemented using good old functions or possibly the reader monad. However, I always wanted the conditions to be instances of Eq and Show. That’s not going to be possible with that approach.)

class (Show rc, Eq rc) => RequestCondition rc where
  isMatch :: Wai.Request -> rc -> Bool
data RequestConditionBox where
  RC :: (RequestCondition rc) => rc -> RequestConditionBox
  deriving (Show)

Oops, that isn’t going to work: you can’t derive Show on a GADT. So delete it. We’ll need to implement Eq and Show for RequestConditionBox. (I’m going to skip Show.)

instance Eq RequestConditionBox where
  (==) (RC a) (RC b) = a == b

Small problem: a and b are different types. And Eq only allows you to test that two members of the same type are equal. We need some way of checking that the two types are equal. Now, you can test for type equality in a type precondition but I can’t see how I could make that work. We need something more like

testEqual :: (Eq a, Eq b) => a -> b -> Bool

Only right now we have no idea how to implement it.

He’s My Typeable

George Pollard pointed me to an experimental class called Typeable. As I alluded to earlier, it’s opt-in, although I think the opt-in nature is more to do with the fact that it’s not standardized yet than that there are types that can’t logically have a typeclass instance.

Typeable looks like a pretty unpromising typeclass:

typeRep# :: Proxy# a -> TypeRep

Actually, it’s more than just unpromising, it looks positively hostile. What are those hashes? Well, it turns out that hash is a valid character in an identifier if you enable the MagicHash extension. As a convention, GHC uses it to represent unboxed types. Unboxed means exactly the same thing as it does in C# and Java: something that doesn’t have a garbage collected pointer around it. This is a very deep rabbit-hole that I’m just going to carefully step around right now.

Actually, I’m going to skip pretty much everything except to notice that Data.Typeable exports a rather useful function called cast.

cast :: (Typeable a, Typeable b) => a -> Maybe b

Yep, that’s exactly what as does in C#. I’ll skip over the implementation, because it’s slightly scary and I’d need to get into unsafeCoerce. One thing I can’t tell is if this code is actually run at runtime or whether it’s possible for the compiler to optimize it out. After all, the types of a and b are known at compile time.

With that, we can actually test if two values of different types are equal:

testEqual :: (Typeable a, Eq a, Typeable b, Eq b) => a -> b -> Bool
testEqual x y = fromMaybe False $ (== x) <$> cast y

Reading from right to left:

  • cast y
  • map (<$>) the maybe with (== x)
  • this gives us Nothing if x and y are different types, and Just (x==y) if they’re the same.
  • finally, we use fromMaybe to strip off the Just and replace Nothing with False

Orphan Black

To use testEqual, we need to make our RequestConditions typeable

class (Typeable rc, Show rc, Eq rc) => RequestCondition rc where
  isMatch :: W.Request -> rc -> Bool

How do we implement it? Well, we don’t. Typeable is special. Not only is it derivable, the compiler requires you use the deriving version. And that needs an extension:

-- Put this up at the top
{-# LANGUAGE DeriveDataTypeable #-}
newtype And rc = And [rc] deriving Typeable

Unfortunately, H.HttpVersion doesn’t implement Typeable. Luckily we can implement it ourselves. But, you guessed it, we need another extension:

-- Put this up at the top
{-# LANGUAGE StandaloneDeriving #-}
deriving instance Typeable H.HttpVersion

We’re probably alright here, but what we’ve done is, in general, ridiculously dangerous. We’ve implemented an instance in a library that is neither the library that declares the typeclass, nor the library that declares the type. This is known as an orphan instance and will have seasoned Haskellers gathering with torches and pitchforks around your codebase. The reason for this is that, while typeclasses provide the power of ruby’s mixins, orphan instances provide the problems. (They call it “incoherence”, and they mean it.)

While we’re on the subject, you’ll probably have already noticed that when you add projects into your cabal file, you pull in the world, Maven style. This is pretty horrific, but the reason for this is orphan instances. For instance, the functionality of the semigroups package looks pretty small: it just exposes a couple of typeclasses. But when you take a look at what is an instance just of Semigroup you’ll see a whole list of types that the semigroups package needs just to compile. Semigroups itself has defines to try to ameliorate this situation but the truth is that it’s just too much work (at least given cabal in its current design) to enforce small dependency lists and coherence.

Long story short, it’d probably be best to just expose Typeable from the library, so I’ve sent a pull request. (As everyone knows, open source software collaboration is a variable experience. But even at my beginner level, it is possible to make small contributions.)

The Equalizer

Remember last time I mentioned that we could destructure existential types? Now we can actually use this.

equalRC1 :: RequestConditionBox -> RequestConditionBox -> Bool
equalRC1 (RC a) (RC b) = testEqual a b

That looks pretty promising. But we haven’t handled the case where a or b are themselves a RequestConditionBox

equalRC2 :: RequestConditionBox -> RequestConditionBox -> Bool
equalRC2 a1@(RC a) b1@(RC b) = eq3 (cast a) (cast b)
  where eq3 (Just x) _ = equalRC2 x b1
        eq3 _ (Just y) = equalRC2 a1 y
        eq3 _ _ = testEqual a b

Well, that’s kind of fun, but an alternative formulation is arguably better:

infixl 3 <|!> -- Left associative, use same precedence as <|>
(<|!>) :: Maybe a -> a -> a
(<|!>) = flip fromMaybe
equalRC4 :: (Typeable a, Eq a) => a -> RequestConditionBox -> Bool
equalRC4 x (RC y) = a <|> b <|!> c
  where a = equalRC3 x <$> (cast y)
        b = equalRC3 y <$> (cast x)
        c = testEqual x y

<|> is a fairly general function, but in general it means “take the first valid parameter”. Its type is

(<|>) :: Alternative f => f a -> f a -> f a

Here, just remember that Maybe is an Alternative. I’ve also introduced my own infix operator <|!> to get me out of Maybe land. (Hey, I don’t even need an extension for this!)

We now have a vastly better implementation of Eq:

instance Eq RequestConditionBox where
  (==) == equalRC4

(Aside: there’s a very interesting looking function in Data.Typeable called gcast that I thought could be useful here, but I couldn’t figure it out, so everything here stays at the cast level.)

Designed By An Idiot In London

Let’s load what we’re got into a REPL.

> :m + Main
> :m + Data.List
> :m + Network.HTTP.Types
> let td = [RC methodGet, RC methodGet, RC (RC methodGet), RC http10, RC http11]
> nub td

gives us

[RC "GET",RC HTTP/1.0,RC HTTP/1.1]

Well, that’s demonstrated that Eq works. But it also demonstrates something else: Eq isn’t actually what we wanted in the first place. Really we want to be unifying to [RC "GET",RC HTTP/1.1]. To do that, we’re going to have to rip up everything we’ve done so far and start again.

FOOTNOTE: Elise Huard pointed me to the AdvancedOverlap page on the wiki, which details techniques for branching your code by typeclass rather than type. In practice, I decided to just make everything an instance of Eq, which isn’t so much of a problem given the problem domain I’m working within.

A Route To Learning The Haskell Type System

TL;DR I start trying to write a library and get sidetracked into learning about Haskell’s type system.

So last time, I talked about Wai and how you could use it directly. However, if you’re going to do that, you’ll need a routing library. So, let’s talk about how we could build one up. One of the first things you’d need to do is to provide simple boolean conditions on the request object.

It turns out that this raises enough questions for someone at my level to fill more than one blog post.

{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
import qualified Network.Wai as Wai
import qualified Network.HTTP.Types as H

So, how should we define conditions? Well, the Clojure model of keyword and string isn’t going to work here, because the Wai.Request object is heavily strongly typed. So how about instead we just use the expected values and deduce the key from the type?

So, we’re going to want to implement the same method for several different types. There’s several different ways of doing that: * Create a union/enum class. This is a good approach, but not extensible. * Create a typeclass, which is extensible. * Create a type family, which is also extensible, but I don’t really understand.

You Can’t Buy Class

With that in mind, let’s create our first typeclass!

class RequestCondition rc where
  isMatch :: Wai.Request -> rc -> Bool

So, in English this says “If the type rc is a RequestCondition then there is a method isMatch which takes a Wai.Request and an rc and returns a Bool.” This is pretty interesting from an OO standpoint. The OO representation would look like rc.isMatch(request). A Clojure protocol would change this to (isMatch rc request). In practice, it doesn’t matter: what’s happening is that there’s dynamic dispatch going on on the first parameter.

In the Haskell case, there’s no dynamic dispatch in sight and the first parameter isn’t special. isMatch on HTTPVersion and isMatch on Method are different functions.

We can now implement the RequestCondition for some obvious data types.

instance RequestCondition H.HttpVersion where
  isMatch = (>=) . W.httpVersion

So, here we’ve said “calling isMatch with a HttpVersion as a parameter calls (>=) . W.httpVersion i.e. checks the request is using the version specified. We’d probably need a more sophisticated way of dealing with this if we were writing a real system.

instance RequestCondition H.Method where
  isMatch = (==) . W.requestMethod

This is much the same, with one wrinkle: H.Method isn’t actually a type. It’s a type synonym. In C++ you’d introduce one with typedef, in C# with using. Haskell, because it likes to confuse you, introduces something that is not a type with the keyword type. If you look up method on Hackage you see:

type Method = ByteString

You might wonder why this matters. The answer is that the Haskell standard doesn’t allow you to declare instances of synonyms. You can understand why when you realize that you might have multiple synonyms for ByteString and shoot yourself in the foot. However, for now I’m going to assume we know what we’re doing and just switch on TypeSynonyms in the header.

Let’s do one more, because three’s a charm.

instance RequestCondition H.Header where
  isMatch = flip elem . W.requestHeaders

We’d need (a lot) more functionality regarding headers, but let’s not worry about that now. However, again this will fail to compile. This time H.Header is a type synonym, but a type synonym for a very specific tuple.

type Header = (CIByteString, ByteString)

Problem is, Haskell doesn’t like you declaring instances of specific tuples either. This time, you need FlexibleInstances to make the compiler error go away. To the best of my knowledge, FlexibleInstances is much less of a hefalump trap than TypeSynonyms could be.

For fun, let’s throw in a newtype

newtype IsSecure = IsSecure Bool
isSecure :: IsSecure
isSecure = IsSecure True
instance RequestCondition IsSecure where
  isMatch r (IsSecure isSecure) = W.isSecure r == isSecure

Under Construction

How about when we’ve got multiple conditions to apply? Well, if we were writing Java, we’d be calling for a composite pattern right now. Let’s declare some types for these.

newtype And rc = MkAnd [rc]
newtype Or rc = MkOr [rc]

I described newtypes back in Fox Goose Corn Haskell. Note that there’s no reference to RequestCondition in the declaration. By default, type variables in declarations are completely unbound.

Before we go any futher, let’s fire up a REPL (if you’re in a Haskell project right now you can type cabal repl) and take a look at what that does:

data And rc = MkAnd [rc]
:t MkAnd
MkAnd :: [rc] -> And rc

Yes, MkAnd is just a function. (Not exactly, it can also be used in destructuring, but there isn’t a type for that.) Let’s try expressing it a different way while we’re here:

:set -XGADTs
data And2 rc where MkAnd2 :: [rc] -> And2 rc

(You’ll need to hit return twice) Now we’re saying “And2 has one constructor, MkAnd2, which takes a list of m. The GADTs extension does way more than this, some of which I’ll cover later on, but even then I’m only really scratching the surface of what this does. For now I’ll just observe how the GADTs extension provides a syntax that is actually more regular than the standard syntax.

Incidentally, I could have called MkAnd just And, but I’ve avoided doing so for clarity.

Composing Ourselves

With the data types, we can easily write quick functions that implement the RequestCondition typeclass.

instance (RequestCondition rc) => RequestCondition (And rc) where
  isMatch r (MkAnd l) = all (isMatch r) l
instance (RequestCondition rc) => RequestCondition (Or rc) where
  isMatch r (MkOr l) = any (isMatch r) l

The most interesting thing here is that we haven’t said that And is an instance of RequestCondition, we’re say that it is if its type parameter is an instance of RequestCondition. Since data types normally don’t have type restrictions themselves, this is the standard mode of operation in Haskell.

So, now I can write

Or [H.methodGet, H.methodPost]

and it’ll behave. So we’re finished. Right? Not even close.

What if we wanted to write

And [H.methodGet, H.http10]

It’s going to throw a type error at you because HTTP methods aren’t HTTP versions. If you take a look at the declaration, it says “list of rcs that are instances of RequestCondition” not “list of arbitrary types that are instances of RequestCondition“. If you’re used to OO, (and I have some bad news for you if you’re a Clojure programmer, that means you) this makes no sense at all. If you’re a C++ programmer, this is going to make a lot more sense. You see, when you do that in Java you’re telling Java to call through a vtable to the correct method. Haskell doesn’t have pervasive vtables in the same way. If you want one, you’re going to have to ask nicely.

Pretty Please and Other Existential Questions

What we want, then, is a function that boxes up a RequestCondition and returns a type that isn’t parameterized by the original type of the RequestCondition. What would that function look like?

boxItUp :: (RequestCondition rc) => rc -> RequestConditionBox

Hang on, that looks like the type of a constructor! Except for one really annoying little detail: as I said before, you can’t put type restrictions in data declarations.

Except you can, if you enable GADTs.

data RequestConditionBox where
  RC :: (RequestCondition rc) => rc -> RequestConditionBox

RequestConditionBox is what’s known as an “existential type”. As I understand it that should be interpreted as “RequestConditionBox declares that it boxes a RequestCondition, but declares nothing else”. So its quite like declaring a variable to be an interface.

Since I wrote this, I’ve learned that existential types are indeed very like interfaces in C#/Java: they are bags of vtables for the relevant type classes. They don’t expose their parameterization externally, but destructuring them still gets the original type out. This is bonkers.

It just remains to actually implement the typeclass:

instance RequestCondition RequestConditionBox where
  isMatch r (RC m) = isMatch r m

And now we can finally write

And [RC H.methodPost, RC isSecure]

And the compiler will finally accept it. Not quite as pretty as in an OO language where polymorphism is baked into everything, but keeping the character count low isn’t everything. We’ve traded implicit polymorphism for explicit polymorphism.

So we’re done, right? Well, we could be, but I want to go further.

The Power of Equality

If you take a look, what we’ve built looks very much like a toy interpreter (because it is one). What if we wanted a toy compiler instead? In particular, imagine that we really were building a routing library and we had thousands of routes. We might want to only check any given condition once by grouping, for example, all of the GET routes togther.

Now, you could leave that to the user of the library, but let’s pose the question: given two RequestConditions, both of which may be composite, how do you determine what conditions are common between the two?

One route is to backtrack, and look at HLists. I think that’s probably an extremely strong approach, but I really haven’t got my head around the type equality proofs-as-types stuff. Another approach is add some stuff to RequestCondition to track the types in some way. It turns out there’s a way to get the compiler to do most of the work here, so I’ll talk about that next time.

FOOTNOTE: On the Reddit discussion it was pointed out that RequestConditionBox is an example of the existential type anti-pattern. To summarize: if all you’ve got is a bunch of methods, why not just have a record with those methods as properties? If all you’ve got is one method, why not just use a function.

This is a completely valid criticism of the code in this post as a practical approach. However, we wouldn’t have learned about existential types in the first place, and we couldn’t make functions implement Eq and Show. Implementing Eq is the subject of the next post.

The commenter also added an elegant implementation of the functionality given above in terms of pure functions.

EDIT: Lennart Augustsson clarified that existential types do indeed construct vtables. So “boxing” something in an existential type is very like casting a struct to an interface it implements in C#. I should also clarify that the word bonkers used in the above text was meant as a good thing. 🙂