Dan North once described an advanced beginner as someone who makes all the mistakes an advanced beginner makes. It doesn’t matter if you warn them about the mistake or not, they’ll still make it. This post is about one of those problems. Hopefully someone will find it useful, but what I know of the Dreyfus model suggests that it’ll only be useful after something fails in production.
Here’s the problem: let’s say you have a process. It’s fairly easy to divide into two parts. The first, A, is fast and feeds the second, B, which is slow. So, you make it concurrent following some kind of pipes and filters arrangement. Problem is, you get exactly the same effect when you push a motorway’s traffic onto a B-road: traffic jams for miles.
A traffic jam may not sound like a bad thing, but you need to monitor them. Any long system queue can be a sign that your architecture is failing. The problem is even worse if you’re using in-process concurrency like Retlang or the TPL. All of your queues are in memory. Too long a queue and you’ll crash. But even if you’re using MSMQ, you’re still subject to the Micawber Principle.
Dealing With It
Here’s some things you can do that don’t fix the problem, but help you scale further:
- If the queue is going to be huge, consider reducing the memory footprint of the message passed between A and B.
- Make your application 64-bit.
- Make the queue persist to a database. This doesn’t reduce the jam, but it does reduce the memory consumption. (Obviously, if you’re using MSMQ or similar, this is done for you.)
Although they don’t address the root cause, these can be useful nonetheless. They could allow you to scale as much as you need, but they’re not an unlimited solution.
An approach that sometimes fixes the problem is to parallelize B’s processing. This is only appropriate in certain circumstances, and there’s a law of diminishing returns as you add more threads. For instance, if B is 4 times slower than A, and perfectly parallelizable, then four worker threads will solve the problem. However, if it’s 25 times slower, and parallelism maxes out at a factor of 16, it’s going to improve matters, but not fix it.
Finally, there’s addressing the root cause:
- Optimize B so that it’s fast enough. Sometime possible and worth considering.
- After you’ve tried that, redesign B into multiple parts, each of which are, on average, at least as fast as A. If you’ve got a factor of 100 difference, that’s pretty much the only thing that’ll solve the problem.
Basically, balancing performance isn’t a nice to have, it’s essential if you want your system to work at all.