Chuck Norris Understands Concurrency

…but I sure don’t.  After promising that I had posted my very last article on the subject of load balancing, I spent another three months tweaking the code I’d put up in production.  The gist is now pretty close to the production code (there’s a couple more Console.Writes so you can see what’s going on).  So, here’s what I learned:

The reason QueueChannel and the nServiceBus distributor are dumb is for a very good reason: even small, rare failures in the distribution code can be horribly fatal.  This I knew intellectually, but not in my gut.  The code now has some amazing defensive code around AssignShardsToPendingQueueItems.  Furthermore, it’s got a channel for reporting errors in the sharding.  Obviously, you have two choices when such a piece of code fails: you can kill processing or attempt to keep going.  ShardingChannel attempts to keep going, but it’s a decision everyone has to make for themselves.

I had to simplify the locking.  Taking out multiple finer grained locks turned out to slow down the system.  Now there’s one lock on pendingQueue, the list of items that have yet to be assigned.

Receiving whilst processing can present challenges of its own.  The wakeup code is appreciably different.  Publishing a message now only wakes at most one thread.  Since the consumers now tracking whether they are sleeping, this can be done without locks.  This prevented a situation in which a large number of empty consumers receiving messages that processed quickly could actually choke the channel so that it couldn’t even receive messages.

Sometimes you should use degenerate data representations.  This violates DRY and makes your code hard to keep correct but sometimes, that doesn’t matter.   The mapping of items to queues is now significantly more complex:

  • A dictionary that maps all shards to the list of items for the shard.
  • A queue of shard, list pairs, in the order that the shards were created.  Only inactive shards appear in this list.
  • A hashset of active shards.

All three of these were needed to keep the code running fast.

Finally, and most surprisingly, if a consumer finishes with a shard, it now asks the channel if there are any more messages for that shard.  This change has produced a phenomenal performance improvement in the target system.  In particular, on the test system it went from being IO-bound to CPU-bound.  If there’s a lesson to be learned here, it’s that disk caches really do matter.

Sometimes, Complexity Wins

There was one other change: the wait until empty method is now an empty event. The interesting thing about this was that it was the only one that relates to the way we usually discuss code quality: I slightly reduced the responsibilities of the class.  Pretty much all of the other changes made the code harder to understand and pretty much none of the performance improvements were susceptible to naive complexity analysis.

Tuning code for performance is fascinating.  It’s very rarely the same problem twice, and it can be a constant challenge to unlearn your standard “best practice” responses. 

  • Complexity analysis is fine, but the exact implementation of the algorithm matters.
  • SOLID principles are fine, but they’re just part of a larger trade-off.
  • The more complex version of the code is better than the simple one.
  • You’ve got to actually understand the machine upon which the code executes.

Vim Windows Paths Can be Case Sensitive

I just spent 15 minutes working this one out. I’ve got a machine called “Julian-Lucid”. I’ll let you guess what operating system it’s running. However, my Windows copy of Vim wouldn’t allow me to cd to the directory. Typing in “:cd \julian-lucid” gave me this error:

E344: Can't find directory "\julian-lucid" in cdpath
E472: Command failed

Long story short, Vim on Windows can be case-sensitive even when the path isn’t. “:cd \JULIAN-LUCID” actually works.

Technorati Tags:

Design Patterns: Factories are a Function of the Consumer

Sometimes i think I spend my time on this blog just trying to explain the thinking of people much smarter than me.  This is definitely one of those cases.  I’ve probably learned more about programming from reading the Retlang source code than any other.  It’s both remarkably free of compromises and easy to read.  Sometimes you learn not from any individual snapshot, but from the evolution of the code base.  For instance, here was the process context factory in Retlang 0.3:

    public interface IProcessContextFactory : IThreadController, IObjectPublisher
    {
        IMessageBus MessageBus { get; }
        IProcessContext CreateAndStart();
        IProcessContext Create();
        IProcessContext CreateAndStart(ICommandExecutor executor);
        IProcessContext Create(ICommandExecutor executor);
        IProcessContext CreateAndStart(string threadName);
        IProcessContext Create(string threadName);
        IProcessContext CreateAndStart(ICommandExecutor executor, string threadName);
        IProcessContext Create(ICommandExecutor executor, string threadName);
        IProcessBus CreatePooledAndStart(ICommandExecutor executor);
        IProcessBus CreatePooledAndStart();
        IProcessBus CreatePooled(ICommandExecutor executor);
        IProcessBus CreatePooled();
    }

Even ignoring the extra inherited interfaces, this is a pretty big interface.  There’s 11 different ways to create 2 different objects.  Now, when I decided to use this, I created my own interface:

    public interface IRetlangFactory : IDisposable, IHaltable, IChannelCreator
    {
        ICommandTimer CreateCommandTimer(bool isHaltable);

        ICommandQueue CreateContext();

        ICommandQueue CreateContext(ICommandExecutor executor);

        void Stop(ICommandQueue queue);

        void Stop(ICommandTimer timer);

        void WaitForQueues(IEnumerable<ICommandQueue> queues);
        void WaitForTimers(IEnumerable<ICommandTimer> queues);
    }

So, it’s a smaller interface that does more.  I’m not holding this up as an example of great design: it understands ISP about as well as I understand Xhosa.  But from 11 methods to create objects, I’ve got only three.  What isn’t obvious is things like CreateContext always starts a queue at the same time.

So why is this interface so much smaller than the Retlang one?  It’s pretty simple:  I knew which methods I needed.

Still Getting Schooled

This might seem like an insurmountable problem for a library writer, but it isn’t.  By Retlang 0.4, the interface just simply wasn’t there.  Graham Nash had resolved the problem with a couple of hints from Alexander the Great.  So, in the old days my factory called through to Retlang’s factory.  Now, my standard “make a thread” method now reads like this:

    var fiber = new ThreadFiber(new DefaultQueue(_haltableExecutor));
    fiber.Start();
    return fiber;

The haltable executor is actually from my very first post from two and a half years ago.  Of course, if I hadn’t wanted to change the default behaviour, I could have just used the empty constructor, because that implements some sensible defaults.  This technique is decried in some quarters as being “poor man’s IoC”, but I fail to see the problem.  You can write your own factory, or configure a DI container to do the construction for you.  Either way, you’ve lost no flexibility.

So, I think all in all, there’s no point in defining a factory interface within your library unless you’re defining an injection point for your own code.  Controller Factories in ASP.NET MVC are an example of this.  On the other hand, it’s hard to see why IWindsorContainer exists. 

FOOTNOTE:  For design patternistas, you’ll observe that I’m making no distinction between factories, abstract factories or builders.  Truth be told, I’m not seeing many codebases that bother either.

What’s an architect?

I was recently watching Dan North’s latest presentation on InfoQ.  Right near the start, he pops up a Dreyfus ladder, leading to expert programmer and then… novice architect.  Now, I’m a fully paid up member of the school of “once you’ve mastered something, it’s time to go find something you suck at”.  However, I was thinking about the various things we mean by architect.  Looking back, I’ve been described as architect a few times and met many others described as such.  But the roles didn’t have very much in common.  Here’s a quick taxonomy of software architects

The Policeman

For some people, the architect is the guy who insists that everything is done on Oracle, regardless of context.  Or writes documents that say things like “All IPC must go through SOAP”.  This guy is the policeman of the company’s software infrastructure.  Typically, you’ll come up with a design and he’ll have to “validate” it before you build it.  Usually, you’ll find that this process and this role treats innovation as a defect.  In general terms, the very existence of this job is a sign your employer doesn’t trust her developers.

I’ve been this guy.  Luckily I had the opportunity to get this out of my system when I was 24.  Apart from anything else, getting talented developers to toe the line is a value-destroying exercise.

The Numbers Guy

Then there’s the numbers guy.  He’s actually contributing.  What he’s doing is trying to figure out when things are going to break.  Can you handle twice as much business as you currently do?  Could you do what you’re currently doing faster?  Is our pipe wide enough?  And most importantly, what’s going to break first?  How do we avoid this? 

Sometime this guy’s actually a DBA.  I’ve met a number of DBAs with deep and intimate knowledge of routers, disk access times and an uncanny ability to pull information (rather than data) out of PerfMon.  Sometimes he’s a junior developer with a Rainman-like grasp of the latency figures coming out of Firebug.

If there’s a problem with this role it’s that shouldn’t really be a full time job.  Everyone should care about this.  A lot of the time you’ll see him gathering data.  Data that should be there in plain sight, in the logs.  In fact, some of his responsibility actually lies within the operations team.  All in all, this is a good guy, and he’s doing valuable work.  But when you decide to hire someone to own this, it’s a sign you have a problem elsewhere.

The System Expert

These days you’ve got a big important system that does a number of critical things.  But five years ago, it was stuck together with gaffer tape and string by someone in a month.  You’re lucky, and he’s still with the firm.  Sometimes the expert didn’t build it, but he has supported it for three years, and knows its quirks pretty well.

At best, this guy’s your shaman: the guy who understands the history of the project and your organization.  The guy who remembers “why we didn’t do it that way”.  The guy who’s got a good idea of the quickest way to implement the feature your manager needs yesterday.  He’s incredibly valuable. 

The danger here is emotional attachment to the system in its current form.  When this happens, you retard the progress of the project by failing to recognize that circumstances have changed.  You’re offended when fundamental structures in the system are replaced.  If you’re recognizing yourself here, it’s time to take a back seat, listen to some Sting and chill out.

The Great Developer

You’re a cracking developer, so they give you a fancy title.  Make sure it comes with a raise.  Recognition is nice, but you’re doing the same job as before.

The Go-to Guy

I like being this guy.  A lot of the time, he doesn’t actually have architect in his title.  He’s just the guy you go to when you can’t figure a bug out, or you need to talk something over, or you suspect there’s a better way of doing something than the one you’ve come up with.

You get to be this guy through three qualities: technical ability, accessibility and patience.  The first is a given: if you’re just starting out, no-one wants your opinion (which is their loss, frankly.).  Accessibility is necessary as well: you’ve got to have time for people.  You’ve got to be genuinely glad to see them and make them welcome.  Talk to them, not just fob them off with a link (URLs are for the end of the conversation, not the beginning).  Finally, patience is key.  Sometimes you’ve got to go back to the start of the problem and painstakingly work through every step to find the solution.  Sometimes you’ve got to answer the same question five times in the same day.

The best bit about go-to guys are: you can have more than one.  Even if someone else occupies this role, there’s nothing stopping you trying it too.  You don’t need to know everything, you just need to be useful.

Apart from anything else, even the go-to guy needs someone to go to.

The Literal Architect

Picture of All Souls Church It’s worth remembering that we used to believe that building construction was a viable metaphor for software development.  This guy still thinks it is.  His job is to design the software down to the last detail, yours is to do the rather uninteresting task of actually building it.  Usually in a development environment the architect doesn’t understand.

If you’re working in this organization, I’m sorry.  I feel for you.  You’ve probably also got a modified waterfall SDLC.  You probably even know what SDLC stands for.  If you’re really unlucky, you know what PRINCE2 means as well.

Ironically, this guy is the only one who really warrants the title architect.  He does the thinking, you just lug bricks around.  However, beautiful as the ivory tower shown is, it hasn’t changed in the last two hundred years.  The requirements for your project have changed two hundred times in the last month.  Architect is a nice title, but it’s the systems experts, the numbers guys, the great developers and the go-to guys who will make a system sing.  And design?  Well, everybody should be doing that.

Technorati Tags:

Notes: The “numbers guy” is, of course, Al Capone.  The Sting video is from the Nelson Mandela concert.  The “ivory tower” is All Souls Church, a phenomenally beautiful building by John Nash.  If you’re ever near Oxford Circus in London you should visit it.  Nash also designed Buckingham Palace.  an artist’s best known work isn’t necessarily his best.

What’s the True Price?

I’ve just been in New York, and they’ve instituted what I believe to be an absolutely brilliant law: all restaurants have to show the calorific content of the items on the menus.  It’s seriously hard to see how you could be against it: it simply provides more information to the consumer.

The interesting thing about this is how I spotted myself responding to it: I saw the calorific content as the price.  I paid almost no attention to how much the order would cost (unless you’re on the knife edge, you don’t) but calories: one way or another I’m going to have to pay those off the hard way.

When optimizing code, everything comes with a price.  Sure, there are some quick wins where you stare at the code and think “what bonehead wrote that?”.  (In my experience, the answer is nearly always “Oh, me”.)  After that point, you can trade memory usage for CPU cycles, CPU cycles for IO and all three for time and money.  Then there’s the nasty one: speed for flexibility.  Some examples of this are parallelism, denormalisation, responsibility combining and ultimately, kernel hacking.  In pretty much all of these cases, you’re giving up the ability to change the code fast for the ability to make the code run fast.  And that cost isn’t printed on the menu.

What’s worse, 9 times out of 10 your ability to pivot the code is actually its single most important quality.  You’ve got to be doing something pretty special and pretty specific to give that up. 

Technorati Tags: