Code Fatigue: Performance Edition

It’s a truth universally acknowledged that sometimes you have to compromise your coding practices in order to eke out more performance.  However, I think it’s not appreciated how often just writing code well actually results in better performance.  For instance, I’ve seen more systems that would be faster with a more normalized schema in my career than systems that were crying out for denormalization*,

Let’s see how the two implementations of triangular number calculation from the last post compare in terms of performance.  In my unscientific testings:

  • The naive recursive algorithm calculates (t1 25000) in about 3 milliseconds.
  • The threaded sequence processing algorithm calculates (t2 25000) in about 5 milliseconds.

Well, it’s not looking good for readable code right now.  Although the fact that t1 can’t calculate (t1 26000) without blowing the stack indicates that it has bigger problems. 

Jean-Louis suggests an alternative: the tail-recursive implementation**

(defn t3 [n]
(loop [n n result 0] (if (zero? n) result (recur (dec n) (+ result n)))))

Let’s compare this to the original r1 in terms of readabilty:

  • We’re now using clojure functions zero? and dec, which requires a bit more clojure knowledge but reduces noise (and hence code fatigue)
  • We’re using a well-known accumulator pattern.  This doesn’t impose a cost on anyone familiar with how recur is used.
  • We’re still dealing with a branch and a recursive statement that interact.  On the other hand, the recur expression now only goes two deep.
  • We’ve got to keep track of positional parameters.  Not really 

So, in terms of readability, it’s better than t1 but we’d still prefer t2.  However, the figures speak for themselves: t2 can calculate (t3 25000) typically in about 1.5 milliseconds.  As I said at the start, sometimes the most elegant solution isn’t the fastest.  I doubt I could squeeze much more out of the sequence processing approach, although in the real world the application of fork-join reducers may outperform any explicit TCO solution.  However, is it possible to create a much faster implementation if we simply allow ourselves to assume more knowledge?  This time, though, we need more knowledge of our business domain (mathematics) than of our programming environment.

(defn t4 [n] (/ (* n (inc n)) 2))
(defn t4threaded [n] (-> (inc n) (* n) (/ 2)))

That calculates (t4 25000) in 0.1 milliseconds.  Needless to say, it can handle numbers much larger than any of the other implementations without breaking a sweat.  From a readability perspective, again, it’s excellent providing you have the required background knowledge.  The last example may feel like cheating, but the truth is that solutions like that are pretty common in the real world: insight into your problem area can radically simplify your code and improve your performance.

SIDEBAR:  I’m afraid I’m having trouble with the comment system on this site.  My apologies if this is affecting you (it isn’t affecting everyone).  It’s certainly affecting my ability to respond in anything other than a full post.

*although there is a special circle of Hell reserved for key-value pair implementations on SQL

**I should point out this differs stylistically from Jean-Louis’ original implementation, but the performance characteristics are the same.

Published by

Julian Birch

Full time dad, does a bit of coding on the side.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s