August 2009 – Colour Coding

Thinking Again About the Decorator Pattern

Speaking to a colleague about the Decorator pattern, I think I was unduly harsh on decorators the last time I wrote on the subject. I’ll start off by re-iterating my opinion that writing pass-through methods is for the most part a horrible waste of time and a code smell to boot. However, there are some cases where proxy/decorator functionality is useful where there are relatively few methods you need to proxy. Ironically, one of these is my very first post. It employs the decorator pattern to give you a class that executes Retlang Commands, but allows you to halt processing at a time of your choosing. This kind of buffering is generally useful, you can do things like process a message based on the following message this way (and if you know enough about Bloomberg feeds, you’ll know why this isn’t an academic example).

Some other examples of where the decorator pattern can come in handy:

Currency Conversion: here the converted amount can be a decorator on the underlying value.
A read-through cache: here the read-through functionality is added as a decorator on the underlying value.
Parameter interception: mostly useful for testing, you log the parameters and then call through to an underlying implementation. (I actually used to have a Rhino Mocks capture constraint that was a decorator around a proper constraint.)

A good rule of thumb for using the decorator pattern:

You have an underlying behaviour that you wish to leverage. In the case of the haltable executor, I wanted to allow any underlying executor.
You want to change one particular aspect of the underlying behaviour. Again, in the case of the haltable executor, I wanted to allow for halting.
There are very few methods that need delegating.

Decorator vs Composite

The problem with decorator is that you’re constantly in danger of making a Law of Demeter violation. You’ve got to ask yourself “do I really need the underlying object, or is it just convenient?” Let’s take the example of the Rhino Mocks capture constraint. In practice, I didn’t really have any underlying behaviour I wished to leverage. I just wanted to deal with the fact that you can only have one constraint per parameter. Problem is, we’ve already got a pattern for dealing with that situation: composite. A proxy or decorator needs exactly one target.

Now imagine you can a constraint that evaluated both constraints (it’s not hard, it actually exists: Rhino.Mocks.Constraints.And). Now you can have the capture constraint just always return true. Your constraint now becomes “the underlying constraint” and “the capture constraint”.

Decorator vs Chain of Responsibility

Some have argued that Chain of Responsibility, one of my personal favourite patterns, should be replaced wholesale with Decorator. I seriously disagree with this. First, let me point out that the version I used in the previous post on the subject isn’t actually the classic gang of four pattern. The original is a linked list, hence the “Chain” terminology. The interface I gave has rules that don’t need to know about other rules. This gives you a bit more flexibility in constructing different rule chains. Providing the rules don’t need to share information (which will usually cause problems of their own) linked-list behaviour or using a decorator is just a Law of Demeter violation.

Technorati Tags: Design Patterns,Decorator,Proxy,Chain of Responsibility,Law of Demeter,Composite,Rhino Mocks

Does Anyone Out There Understand The Microsoft Public License?

Seriously, I’ve raised the subject of this before, but it’s getting to be a bit of a joke. Take a look at this recent discussion on the castle list. Whereas the Free Software Foundation will tell anyone who cares to listen (and frankly, many who’d rather not) the minutiae of compliance with the various versions of the GPL, Microsoft can’t or won’t answer a straight question about what they regard as being compliant with a license they themselves drafted. Rather laughably, they then point amateur open source projects at lawyers. Lawyers who are going to tell them “well, it depends, there’s no case law on the subject”.

Believe me, I would love to see a good answer to this question, but as it is it’s hard to recommend the use of this licence to anyone. I think people who do choose to use it are going to be receiving a lot of requests for dual licensing.

Technorati Tags: Open Source Licenses

SOLID Principles: O is for Open / Closed

Any tour of SOLID should start with the Open/Closed principle. The Open/Closed principle is actually different from the others. All of the others are development practices. OCP is a philosophical goal. Here’s the classic statement:

Objects should be open for extension but closed for modification.

A great phrase, but meaningless unless you’ve had it explained to you. Now, the SOLID principles are all about improving project velocity. What the Open/Closed Principle actually says is: you don’t want to be editing code to change its behaviour. You should be changing a class to fix a bug, not to make it do something else. Now, when this was originally formulated, TDD was in its infancy and changing code was regarded as being inherently fragile. We’ve mitigated that with TDD, but actually it turns out that the same principles enable testing: you shouldn’t have a separate code path for testing.

Let’s take a look at some examples of code that fail the Open / Closed test.

public void PrintLines(string[] lines) {
    foreach (string line in lines) {
        Console.WriteLine(line);
    }
}

Okay, let’s think about how we’ve violated the open closed principle. First off, we’ve got a great big ugly static method. I’ve talked a fair bit about these already. Let’s talk about possible scenarios that could come up:

What happened if you wanted to write to a file? You’d have to change the code.
What happened if you wanted to disable writing to anywhere? Because the function isn’t virtual, you’d have to change the code.
What happened if the lines were streaming from a database? Passing them in as an array isn’t ideal so you’d have to change the code.

Let’s look at another example:

public void PrintLines(string[] lines) {
    using (var writer = new StreamWriter(@"c:x.txt")) {
        foreach (string line in lines) {
            writer.WriteLine(line);
        }
    }
}

Now, obviously many of the objections to the last code are valid again, but this one’s got some more to worry about:

You can’t change the filename.
Even assuming you only ever wanted to write to a file, you can’t choose to buffer the file.

Finally, consider this code:

public void PrintLines(ILineProvider lineProvider) {
    using (var writer = new StreamWriter(@"c:x.txt")) {
        foreach (string line in lineProvider.Lines) {
            writer.WriteLine(line);
        }
    }
}

This has an interface in it, so it must be better, right? Sadly, it isn’t. This code is actually less flexible than the previous example. Now you’ve got to implement ILineProvider just to use it, not just any old array of strings. This is what is known as the Law of Demeter. The Law of Demeter isn’t explicitly mentioned in the SOLID principles, but it should be. Maybe it could be SOLIDD…

Danger Points

Just these two examples have given us some talking points that highlight points at which you’re likely to violate the Open/Closed principle:

Static Methods
Non-virtual methods
Creating objects
Too-specific variable declarations
Hard-coded values
Using the wrong object (the Law of Demeter)

If there’s a summary to all of this, it is this: be careful what you depend upon. Next, I’ll talk about how we actually go about achieving this.

Technorati Tags: SOLID,Open/Closed Principle

Everything I Learned About Object Oriented Design Was Wrong

For all the kerfuffle about maintainability recently, it’s worth noting that actually the principal participants agree much more about best practices than they disagree. The SOLID principles underpin pretty much every major open source .NET project I’ve examined. (There’s a selector bias here, but I’m not going to worry about it.) They are, in my opinion, much more important than the design patterns I’ve been talking about. Sadly, they’re not half as well known as the Gang of Four book. They should be, because they can be summarized as follows: you need to change the way you develop.

Now, people are always saying this, and they’re usually wrong. AOP, Workflow, there’s any number of new development ideas that have fallen by the wayside. In practice, this one won’t, mostly because actually the ideas are quite old. They’re just not widely understood.

When I was at University, I had to deliver a project in Turbo Pascal. I think the Maths Department must have been allergic to curly braces. Like a good little geek, I opened the user guide and started reading. Now, most technical documentation is completely unreadable, but this was different. It was, in fact, one of the single best bits of technical documentation I’ve ever read. It introduced a whole new concept to Turbo Pascal: object-oriented development. C++ had been around for a while, but the concepts were entirely new to me; programming for me was pretty much defined by Kernighan and Ritchie, as it was to most of the people I knew.

This new OOP concept was a revelation: it was, finally, a logical way of organizing your code built right into the language. All you had to do was identify your objects, which behaved like actors in a play. The actors would play their parts, and the program would come about as a combination of the actors.

Unfortunately, this was dead wrong.

Objects aren’t actors, a single business concept doesn’t necessarily correspond to one object. This will lead you directly to the creation of god objects. It turns out that the development of code isn’t principally about modelling. It’s about flexibility, dependencies and coupling, and only then about modelling. Ironically, this approach leads to better models in the long term, simply since the model will track changes in requirements faster.

What is “good” code?

I think I’ve probably seen more fruitless arguments about this than any other subject. Everyone thinks their code is good, and sometimes they’re right. Good code definitely isn’t the same thing as useful code, which is an error a lot of developers make. Good code isn’t about aesthetics or huge architectures either. Here’s the only definition of good code worth having: good code is code you can easily change. There are many formerly popular technologies (CORBA particularly springs to mind) that failed that one test. The SOLID principles are guidelines designed to make your code easy to change.

So, I’ve decided to write a York Notes guide to SOLID. My principal sources are Uncle Bob’s own website, and the excellent Hanselminutes podcasts on the subject. One of the things you’ll probably notice as you go through is that your code tends to get longer:

There’s more interfaces
There’s more small classes
There’s more constructor parameters
There’s more instance variables

I really, really advise getting ReSharper, it turns doing this from what feels like an extremely bureaucratic endeavour to a case of pressing “Alt-Enter, Enter” a lot. One day I hope I’ll be able to see this stuff baked into the language.

Technorati Tags: SOLID

Why I Don’t Regard Algorithm Proof as Useful

Okay, now that I’ve swapped insults with Frans (in fairness, I started it) I think I should explain my position on provability more carefully. Some context: I’m a mathematician by background, I came to computing relatively late, and am extremely aware of the gap between the two disciplines.

Let’s be clear, I actually agree with a fair amount of what Frans said in the original article. I’m certainly not advocating a plough-in “RAD” approach to development, I just think that the concept of proof in computing isn’t particularly useful.

The Importance of Algorithms

Let me start with sorting. There are a number of obvious known algorithms for sorting on one thread:

Bubble sort
Quick sort
Merge sort

Of these, one is stupid, one has the wrong complexity but low memory usage and one is fast but uses more memory. Which one should I implement?

Well, actually, usually the answer is: I shouldn’t bother. I’ll just type “orderby person.Name” and get on with my life. I’ve got a bunch of algorithmic knowledge in my head and it’s about as useful as my encyclopedic knowledge of Babylon 5. Frameworks like .NET and, frankly, LLBLGen have been implementing high quality algorithms for most use cases, which means that most of the time, I don’t need to bother. Sure, there’s very few general frameworks for graph traversal, but that’s mostly because it’s of specialized utility.

I used to work at a dot-com with over 50 developers. Every so often, something somewhat mathematical came across our desks. When this happened, it got passed to me. I estimate that I did something like this for about two days in every three months. The truth is, there are many environments in which this stuff doesn’t come up very often.

The Problem of Proving Anything

Let’s first deal with a philosophical point: what do you mean by proof? As a former mathmo, I have a pretty strong idea of what that means. It means 100%. Sadly, C# isn’t a proven language, and Windows isn’t a proven operating system. In terms of formal specification and verification, hardware is ahead of the game, but your average processor is chock-full of errata. In general terms, we’ve no good model for proving imperative style code, even if there have been recent breakthroughs.

This sounds all very academic, but it has some real consequences. One of the principal ones is that, even if you prove something, proving that your implementation matches your conceptual model isn’t really possible. There’s a huge gap between your pen and paper and the keyboard.

Now, previously I talked about how algorithms were a relatively small part of the average development workload. Problem is, algorithms are actually the things that are easiest to deal with using formal proof techniques, because they’re small and they’re abstract. Your average workflow is harder.

This is all assuming you have formal training in the discipline that this involves. Most people don’t. Your proof is only as good as your confidence that you haven’t made a mistake.

What Are You Proving?

I referred to proof before as specification and verification. So far I’ve talked about verification, the part that most people regard as proof, but specification is just as important. Let me tell another abstract story: Roger Needham was pretty much the world authority on authentication. He pioneered third-party authentication protocols and helped develop the world’s first authentication logic, a system for verifying authentication protocols. You might think that a paper such as that would be the last word on the problem. The problem is in the specification. All the logic deals with is verifying that specific types of attack are not possible. It didn’t necessarily protect against guessing attacks on poorly chosen secrets, which is unfortunate since pretty much all passwords are poorly chosen.

The problem here is specification. Your employer or your client is only interested in “Is it secure?” or even more nebulous concepts. The only way to formally specify “secure” is to restrict your attention. Sometimes things slip through the gaps. Famously, the initial authentication logic failed to spot a password sent in plain text. Needham always argued that this wasn’t a problem with the logic because it was a violation of the principles, but even that argument should illustrate the practical limitations of the techniques of formal proof.

Again, this isn’t a an academic objection. About six months ago, I put live an extremely sensitive piece of code. It dealt with something that was actually appropriate to this kind of analysis. I and several others spent a lot of time testing it, trying to poke holes in it. There’s a raft of automated tests for it. We even ran it in parallel to the live systems for months to check that it behaved.

It failed a month after it went live. Why? Because we missed something. The system worked perfectly, but it was only part of a larger system. That larger system had complexities we hadn’t captured that caused it to fail. In short, we didn’t ask the right questions.

Unattainable Goals

So, what’s the alternative? Well, I’d still favour TDD. I know Frans wasn’t arguing against this, but I think that talking about provable algorithms is a distraction. Your average developer, when being told that he needs to prove things, won’t outline the arguments that I’ve just given, but he’ll know in his gut:

It doesn’t seem very relevant to what he’s doing.
It sounds impossible
It doesn’t sound like what he does for a living

On the other hand, approaching things in a story-based fashion, where cussed exceptional cases are built into the test framework, that’s something he can see is working.

Your average developer wouldn’t be able to read the phrase transitive closure without reaching for a dictionary (or wikipedia). What he found there would be unlikely to enlighten him. In fact, most developers don’t really ever analyze complexity. I used to regard that as shocking. These days, I’ve come to the conclusion that it rarely makes them less productive. Complexity does occasionally come up and bite them, but it’s rare.

I’m not arguing against learning, or against continuous improvement. But you’ve got to have priorities and there are a phenomenal number of things to learn. For instance, if you don’t know anything about graph algorithms or the SOLID principles, I’d say you were better off studying and practicing the latter.

Summary

Frameworks are different, they’re useful precisely because they do things that the average developer would find at best time-consuming and at worst error-prone or impossible. They’re the exception to the build vs buy rule: you’re building so that someone else can buy. For these reasons, I don’t necessarily think the best practices for an ORM developer are necessarily the same as those for a LoB developer. In short:

I don’t think algorithms are that important in LoB development. Those that are common are already written.
Formal verification isn’t a error-free as is first appears.
You can only verify what you specify. Errors in specification can and do happen.
Finally, for the reasons above, I think that the concept of proving your algorithms is of niche utility. TDD is the tool I’d reach for first, not formal proof.

Technorati Tags: Design,Authentication,TDD

Microsoft’s Game Plan for C# 4.0

There’s a reason certain companies are on top. People always talk about Google’s long term strategy and how collectively bright an organization it is. Less is said about Microsoft, partly because it occasionally missteps quite badly. A good example of this is Windows Phone. Microsoft have had a strategy for a phone platform for going on ten years. Google, seeing a long-term threat from the iPhone, knock together something better reactively. However, for every Windows Phone, there’s an X-box Live.

But .NET’s what I’m interested in, and C# 4.0 is focusing on, of all things, compatibility with Microsoft Office. Now, if you seriously think there are any competitors to Excel, you really don’t understand the product well enough. You can already write Office automation apps in .NET, it’s not brilliant, but it’s doable. I’m really not looking forward to the next round of marketing that tells me that I should really be using Office as “A massive DLL” for my app. Microsoft do this kind of marketing because it’s part of their long term strategy: keep selling Windows and Office. But, to be honest, I don’t think even they believe that they’re going to get that many people to start writing office workflow apps, especially after SharePoint.

So, if it isn’t really about integration apps, what’s it about? My apologies if this was obvious, but the light only just went off in my head: it’s about macros. I’d be watching what Microsoft’s doing to the CodeDom very carefully. I’ll bet you’ll start to see the sort of enhancements that would make macro recording in C# possible. Making it easier to link to Word isn’t really that fascinating. On the other hand, VBA is 15 years old and hasn’t really evolved in all of that time. But there’s a phenomenal amount of code out there in VBA, most of which is written by non-programmers. Allowing people to write their sensitivity analyses in a proper programming language, that’s huge. Stopping having to support VBA, that’s huge.

I seriously doubt they’re going to turn the average end user into a C# programmer. Python though, sounds just about possible. I’d keep an eye out for new work on language design coming out of the labs, too. I’m not saying it’s dead just yet, but Anders is measuring VBA’s coffin as we speak.

Technorati Tags: C# 4.0,Com Interop,Excel

Metrics: Signal to Noise

Okay, I promised I’d stop talking about metrics, but it occurred to me that there’s a serious point I hadn’t covered: sometimes your measures are broken, and you need to fix the measure before it’s going to tell you anything useful. The classic example of this is error reporting: if you’re reporting too many errors, you won’t notice when something serious is going on. I once saw a system that generated 2400 errors a week. That’s under unexceptional circumstances. Assuming you spent 30 seconds on each error, that would take over half your week.

It doesn’t take a genius to figure out that no-one was looking at those errors. So, if there were a couple of really serious problems occurring, no-one would ever know. Well, not until they started to get the phone calls.

Thing is, the code entanglement statistics from NDepend look like that: I can pretty much guarantee that 95% of them aren’t a problem. Thing is, you’re never going to know unless you go through them all. Oren’s random sampling doesn’t really prove anything other than there’s a lot of noise. It doesn’t mean there isn’t some signal there.

Personally, I find the idea of code metrics fascinating, but I can’t help feeling that we’re still a long way from having usable ones. An NDepend-driven refactoring might yield genuine results, or it might be a complete waste of time. This doesn’t mean that Patrick shouldn’t be trying to improve the state of the art.

Technorati Tags: Metrics,NDepend

Maintainability: I think there’s something in the water

Did someone declare it “talented developers talk rubbish” week when I wasn’t paying attention? Maybe I wasn’t on the mailing list… *sigh*

First we had the extremely smart Frans Bouma talking about the importance of proving your algorithms as a development methodology, which at least had the virtue of being funny. We then had one of the single most productive developers in Alt.Net talking absolute garbage about maintainability. At least Patrick Smacchia is still talking sense. Let’s go back to what Ayende’s saying.

Maintainable is a value that can only be applied by someone who is familiar with the codebase.

This is the exact defence I have seen of most of the worst systems I have ever encountered, from dodgy spreadsheets to distributed Access macros to over-engineered C++. I have personally, to my shame, delivered systems that had exactly these problems: they were well architected (to my definition of a good architecture), they were easy to modify (by me), followed consistent conventions (known to me). That’s not a good system. Even if you replace “me” with “us” you haven’t got a good system. This is what I would term the “I’m incredibly smart” anti-pattern.

The problem with the “I’m incredibly smart” anti-pattern is: often the projects using it have incredibly high velocities. Until someone leaves. Companies often don’t have too much of a problem with this, especially during a recession. Also, they know that the next bunch of developers who work on it will probably want to either a) rewrite everything or b) run the project in maintenance mode until it’s as bad as any legacy system. The good news is that the latter scenario definitely isn’t happening to NHibernate.

Part of the problem is that maintainability should be a concept that is separate from the IQ of the developers: NHibernate is developed by extremely smart people, that shouldn’t have any impact on its maintainability as a code base. However, this doesn’t mean that Joe Graduate on his first coding job should be able to pick up and modify the code. Maintainability isn’t a case of the lowest common denominator. If so, we’d have to ensure we never used the following:

yield return
lambda expressions
LINQ
and frankly, NHibernate

I’ll talk a bit more about the middle ground in a later post.

Technorati Tags: Maintainability,NHibernate

Don’t register controllers as singletons in your container

This is just me documenting a particularly weird behaviour of ASP.NET MVC. Usually, it doesn’t matter, but if you used your own controller factory, like with MvcContrib, it’s a pain to diagnose. Basically, if you re-use a controller, the ModelBindingContext’s values are cached. This is, to say the least, freaky, especially since you can see that the url has changed. So, you fire up one page, move to another id and get back the first page. In short, register controllers as transient explicitly. If you’re wondering how I found this out, consider that AllTypes in Castle Windsor registers services as singletons by default.

Technorati Tags: ASP.NET MVC,Castle Windsor,MvcContrib

NDepend Maintainability Metrics

Now, I have some contempt for code metrics, ever since I discovered that TFS thought that my worst code was a constructor that checked its parameters for null. Metrics are useful indicators. Profit is a metric used to measure the health of a company. It’s not always useful, but that’s the nature of metrics, they’re indicators, not hard and fast rules. I thought it might be interesting to examine Patrick’s original post about NHibnerate 2.1 which started the whole recent argument about maintainability.

First, he took a look at the number of changes to the code base. There’s been a phenomenal amount of work done here, and I don’t regard that as a problem. It still passes pretty much the same unit tests. Replacing a dreadful piece of code with a good piece of code that satisfies the same conditions doesn’t make it less maintainable, it just means Steve Strong is smoking. However, metrics are pointers, and the sheer number of places changed by the HQL changes suggests the possibility that the old HQL code was a mess. This isn’t necessarily the case, and I’m not qualified to judge, but that is what the metrics are suggesting.

Next, he took a look at the assembly dependencies. Here the story is that the code base is improving: the dependencies on Castle have been removed, part of the project to allow pluggable interceptors, and a dependency on ANTLR3 has been taken. This isn’t a problem: it doesn’t introduce an externality and it doesn’t affect user preferences: no-one’s about to demand a different HQL lexer, but plenty of people wanted to plug in their own proxy framework.

Entangled Code

Okay, now we get to something interesting: Entangled Code. Patrick’s been pushing the idea that you shouldn’t have namespace circularities for quite some time. It’s not standard practice (Udi Dahan does it in nServiceBus, albeit in a bizzare and inconvenient way), but actually it’s probably a good idea. I think it’s an idea that would gain more traction if an open source tool integrated the concept directly into Visual Studio.

But entanglement at the class level? As Ayende points out, it’s often the only way you can get things done. No-one worries about circular dependencies with inner classes, and they shouldn’t. Since the difference between an inner class and an external class is often file size, I think a two element cycle is completely ignorable. On the other hand, some of the dependencies are solvable: the dependency of SessionImpl on SessionFactoryImpl, for instance, is only really required by serialization. This is solvable. Whether you regard it as a problem is quite another matter. With my purist hat on, I’d certainly prefer it was separated out, but I’m sure many would argue it doesn’t much matter.

The truth is: no-one’s going to address this in a hurry. Whatever the costs of having your code all over your namespaces, it’s probably smaller than the cost of diverging too far from the Hibernate source base.

Breaking Changes

Well, first let me say a breaking change is a breaking change. On the other hand, I’m not sure I’d be too bothered about

Removed public types that had no public constructors and were never returned with their concrete type
Changing interfaces: interfaces are brittle, abstract classes aren’t. We know that. It’d be nice if there was a public document that explained NHibnerate’s upgrade policy, but I’m not going to lose sleep over this.

There’s a question of ALT.NET philosophy here. If you really wanted to avoid breaking changes, you’d use abstract classes for everything just like Microsoft. The cost of the occasional breaking change isn’t really that high.

Truth is, if you drill in, Oren’s right not to worry. Most people are going to drop in the new assembly and not notice the difference. The one thing they’re really going to notice, the requirement to specify a ByteCodeProvider, is completely intentional and isn’t picked up by any of these metrics. Most of the other “breaking changes” aren’t going to break real code. NHibernate actually has a very strong story here: the release notes highlight all of the likely breaking changes.

Code Metrics

If you actually take a look at Patrick’s report on “methods that could do with refactoring”, you’ll see that a particularly large offender is one of SessionFactoryImpl’s constructors. Personally, I think that many code metrics don’t take constructor injection into account: there’s certain patterns of behaviour you expect to see in a constructor that send cyclomatic complexity through the roof. It doesn’t do the code any harm, though.

Measures such as cyclomatic complexity sometimes punish desirable behaviours. It’s time we got better metrics. For example, I don’t believe checking a parameter for null and then throwing an ArgumentNullException should make a code quality metric look worse.

A Bad Example

NHibernate was picked by Patrick for this analysis because it’s a large, well-known open source code base. However, unfortunately it’s a singularly bad code base to pick for this kind of analysis. Like it or not, NHibernate’s maintainability is directly affected by its similarity to the Hibernate code base. A metrics-driven refactor would actually decrease project velocity, which is ultimately all we care about.

Technorati Tags: NHIbernate,Maintainability,Code Metrics,NDepend