The Law of Demeter: Context Matters

It’s funny, everyone knows the Law of Demeter, but everyone still seems to think it’s about dots.  There seem to be a million definitions of it, but here’s mine:

Don’t pass in the wrong object.

Laying down the law

Now, Derick’s saying extension methods don’t count.  I’d go further: they don’t matter at all.  The Law of Demeter has nothing to do with how you get an object, just whether or not you started with the right object.  Derick shows some code that is described as making Demeter scream.  (It’s a projection from a variable “assets” to “asset”.)  He shows a version of the code that looks completely different and argues that if that code is okay, so is the original code.  That’s completely correct, but neither has told us anything about whether or not “assets” was the right parameter to the function in the first place.  Here’s the mental rules of thumb:

  • Is “assets” used anywhere in the function other than to generate “asset”?
  • If not, is the principal purpose of the function to return “asset”?

If the answer to the first question is yes, then the function properly depends upon assets and Demeter is happy.  Equally, if the function is a principally a projection or an overload, there’s not a problem: assets is still the correct subject and asset is the result.  It’s when you answer “no” to both questions that you’ve got a problem.  In other words, you’ve violated SRP: the function now projects “assets” to “asset” and then does something else with asset.

If you think of it from a testing perspective, you can apply a different measure.  If you were writing tests for this code, would you start to get annoyed at having to construct the list of assets each time?  If most of your tests had only one item in the list, that’s a bad sign.  So, in answer to Derick’s basic question (does the code violate LoD?) I’d have to answer: I don’t know, I need to see the rest of the code.

More About Python and the Interface Segregation Principle

One of the joys of blogging is that you occasionally discover people thoughtfully and politely reducing your arguments to shreds.  I recently came across an article by William Caputo on the subject of my discussion with Ryan back in November.*  I’ll try to summarize the original discussion:

  • Ryan contended that using Python fundamentally changed the principles of OOP.
  • I argued that the SOLID principles still held.

Now, in my original article, I accepted that dynamic languages helped ameliorate the sharp edges of statically typed languages.  Importantly,

  • Python’s constructor syntax means that any constructor is effectively an implicit abstract factory.  (This advantage is unique to Python, Ruby is nowhere near as slick in this respect.)
  • The dynamic nature of Python means that your interaction surface with another class is exactly those methods you call, no more no less.

Now, in certain aspects, I assumed that certain principles became less important simply because the language took some of the burden.  William, however, has pointed out that I was wrong.*

The thing is, we were both concentrating on one aspect of SOLID here: statically typed languages have fairly high friction related to their type system that can render code brittle.  We therefore have practices closely associated with SOLID principles that are pretty much the only way to keep code flexible in languages in C#.  These practices, such as always creating an interface to go with an implementation, are themselves a form of friction which Ryan was arguing was unnecessary in Python.

As William points out, that’s a good benefit of SOLID; it’s not the whole story.

The Interface Segregation Principle: It's not rocket science.

ISP Isn’t About Code

Imagine you’ve got a space station.  This station gets visited by two kinds of ships: shuttles, which carry people, and refuelling tankers.  Now, the requirements for the shuttle’s docking interface are quite large: you’ve got to be able to comfortably get a stable human shaped hole between the two for an extended period of time.  Refuelling, on the other hand, is carried out by attaching a pipe to the tanker.

Now imagine that you were told that both ships needed to use the same connector.  You’d end up with a massively overcomplex connector.  Now, this metaphor works perfectly well if you consider the space station to be exposing a single IConnector interface and the ships to be consuming classes.  However, William’s first point is that actually, it still holds for data feeds, web services, any interaction between two systems.  Indeed, the ISP does in fact, apply to space stations.  In many ways, interfaces are cheap in code.  But in third party integration, it’s expensive and so the ISP is more important.  Something to bear in mind the next time you try to reuse the webservice you built for the last client.

Just Because You Can, Doesn’t Mean You Should

Since I’m interviewing at the moment, I’m getting heartily sick of hearing the phrase “an interface is a contract”, but it’s relevant in this context.  In a statically typed language the contract is fixed and enforced by the consumed class.  Because of this friction, often you get an interface that is larger than it should be because it’s trying to be forgiving enough to handle multiple clients.  ISP says you should be doing the opposite: having interfaces for each consumer.  In a dynamic language, the consumed class can’t enforce the contract.  However, that doesn’t remove the concern, it just rebalances the responsibilities. 

Returning to the space station, imagine if you allowed a ship to attach itself to any part of the hull.  That would certainly help with adding in new types of vessel to the mix.  The problem would come when you wanted to change the space station itself.  Maybe those solar panels aren’t very useful anymore and you’d like to get rid of them.  Unfortunately, it turns out that there’s a visiting space monster that wraps its tentacles around the panels.  You don’t want to upset the monster, so you end up leaving the useless panels on the station.

Reducing Entanglement

This is the danger in dynamic languages.  In a statically typed language, the space monster wouldn’t have been able to visit at all without work on part of the station.  However, if we observe the ISP, we still have to do the work.  Equally, the space monster needs to be responsible and not just attach itself to anything that provides purchase.  To put it more formally, the consumed class still needs to export an interface the consuming class is going to find useful, and the consuming class has avoid taking unnecessary dependencies.  The expression of the problem may be different, but the concerns and the principle remains. 

I originally said that because Python automatically keeps interface surfaces as small as the developer is actually using there wasn’t much you could do about ISP in Python, but in fact that’s not the case.  Interaction interfaces between classes can still be made smaller, they can still be made more role specific.  You can still attempt to create Unified Modelzilla in Python, and it will be as bad an idea as it was when you tried it in J2EE.   In many ways, paying attention to ISP is more important in Python than it is with a statically typed language.

*If you want to read it, William’s article is on his home page dated 21 November. I’m afraid I don’t have a permalink.

Decorator Pattern: The Leaking This Problem

With all of the substitution patterns, the principle is that the proxied target doesn’t need to be aware of the proxying object.  That’s pretty achievable if what you’re trying to do is provide a local proxy to a remote object.  However, when you’re using a decorator, things get a bit trickier.  Welcome to the “leaking this” problem.

This is leaking

To start with an easy example, what if the target does this:

return this;

What do you do?  Well, it’s not obvious, but typically you’ll get the decorator to return itself.  This case is relatively easy to spot.  But how about if it does this

return this.AnotherMethodOnInterface();

Here you can’t intercept the call at all.  Maybe you didn’t want to, but this is the case in which inheritance can actually be more useful than composition.  But there’s an even worse case:

return new SomeOtherObject(this);

Okay, well your decorator can give SomeOtherObject a decorator as well, and often that’s what you wanted to do.  But sometimes you actually wanted SomeOtherObject to take a dependency on the decorator, and that can’t be achieved.  Using a factory doesn’t help, since it’s typically a constructor dependency and as such unaware of the decorator.*

It just gets worse.  What if your target raises an event?  You’re going to have to make sure the sender points to the decorator.  The target could stick itself into a global variable (ugly, but possible).  So what’s the solution?  Here’s the thing: there isn’t one.  There are solutions for specific cases, but there’s no general way of replacing an object with a decorated version.  Sometimes, you’ve just got to redesign your target object to make sure you get the behaviour you want.

Why Isn’t This a Problem For Remote Proxies?

You might be thinking “but I’ve been using Remoting/WCF/RMI for years and I’ve never had a problem”.  And you’d be right.  The thing is: proxies don’t change behaviour, so you never encounter a position in which using the unproxied version would cause an issue.  The original object stays on the server, the proxy stays on the client.  If you take a look at the examples above, it’s really easy to answer the question “What should the proxy do?”

If you think that it’s painful to deal with hand-written decorators, wait until you try to build a framework for decorators.  Castle’s InterfaceProxyWithTarget and InterfaceProxyWithTargetInterface** methods are exactly that: general ways of writing decorators.  Anyone who uses DynamicProxy runs into the problem sooner or later.  The knee jerk reaction is that there’s something wrong with DynamicProxy.  Later on you realize it’s a limitation of the programming language: there’s simply no expressing what you actually wanted to achieve.

*You could pass the factory in as a function parameter, but you’d typically have to redesign your target to achieve this.

**Read about the difference on Krzysztof’s blog.  Read about how leaking this pertains to Dynamic Proxy specifically here.

Open Source Should Be Read/Write

Linus Torvalds is a busy guy.  Not content with pushing badge-name Unix vendors into the land of the legacy system, and creating an operating system that’s nigh on won the server market (It’s not looking too shabby on mobile and embedded systems, either), he somehow managed to find the time to revolutionise source code management as well.  Anyone who’s jumped straight between Subversion* and git will know quite how radical a change it is.  But the reason I love it is that it’s fundamentally changed the mindset of open source software.

You see, although open source projects were meant to be collaborative, the repositories are necessarily authoritative.  Who has access to it was necessarily limited by the governing body (or person).  If you aren’t on the approved list, you had to prove yourself before you got access.  So, imagine you’re using Fluent NHibernate and discover that you can’t make a composite primary key that references another composite key (true at the time of writing).  You download the code and make the change.  But now what are you going to do?

  • Stick together a patch (you need to change all of five lines, IIRC) and hope that the overworked maintainers get around to incorporating it before you need to download the code again.
  • Change the code, and keep your own copy of the code.  Stop updating your version, losing all subsequent improvements to the main branch.

If Fluent NHibernate was on Subversion, those would be your only options.  However, Fluent NHibernate is on GitHub, so you can create a quick fork, make the change and send a pull to the maintainers.  They can choose to incorporate the change or ignore it, but you can keep updating your version in line with their changes.  Git and GitHub are part of a revolutionary democratisation of the open source software development process.  The whole mindset is different.  Before, a fork was something you had to justify carefully and set clear blue water between you and the maintainers.  You set yourself up as an authority in competition with them.  On GitHub, you fork because you’re interested, it’s a compliment to the maintainers not a challenge. 

Power To The People

The great thing with Git and GitHub is the implicit change in approach, but there’s still much more to do.  One of my personal bugbears is the completely impenetrable nature of build systems on .NET.  There’s a reason I keep documenting build instructions, and there’s a reason that this is one of the most popular articles on my site, entirely from natural search.  Those articles typically were the result of well over a day’s work.  If you’re thinking that everyone should spend a week of pain before they can build your project, you’re an elitist and I suggest you consider whether you’re on the right side of the argument. 

For that matter, exactly where do I download the binaries?  Every project has a different answer.  We’ve just got friction on top of friction here.  We’ve got nothing even remotely resembling Gems in ALT.Net (why exactly that is would probably take another 900 word post…).

My First OSS Project

My first try at an OSS project was an unmitigated disaster.  The idea was really simple: provide a tool that would download and build common .NET open source packages.  Sounds easy enough unless you’ve ever tried it.  Just figuring out a reliable command line way of installing ruby was a challenge (not sure I ever exactly nailed it).  The Horn guys gave up as well and started to redirect the project towards something they could actually build.  The SymbolSource guys tried to address another problem with not being able to build the code: the fact that you can’t sensibly debug it.

But, with all due respect to the Horn and SymbolSource projects, they’re just introducing some extra authorities to help me with some of the problems of my existing authorities.  What I, and most developer, want and need, is to be able to easily build their own versions of open source projects, whatever version they like (including trunk).  You want to work with some code, discover a problem, step through it, figure out what’s going on and fix it,

Here’s an idea: how about open source projects have solution files which you can get straight out of source control that actually build?**  Radical, I know.  The Castle project can do that, and goes to a fair amount of effort to achieve that.  nServiceBus is way off.  If you had that, combining versions would be a cinch: just merge the solutions together and away you go.

GitHub’s great, but it’s only part of the story.  There’s still a lot more to do before Open Source is truly open to all.  It’s the right thing to do, and it sure beats complaining about how the average developer doesn’t care about ALT.NET.

*or worse, I’m using TFS…

**There’s a couple of technical issues like what you do with parallel Silverlight and .NET versions, but at worst that’s solvable with a quick batch file entitled “RunMeAfterPerformingGetLatest.cmd”.

Update: I’d originally asserted that Castle Core didn’t 100% build from the solution.  Roelof Blom pointed out on Buzz that this was incorrect.  I’m happy to correct the article.

Postscript: If you want to know more about the poster, read the full article at the International Museum of Women.

 

Substitution Patterns in Pictures

I’ve said it before and I’ll say it again.  I hate UML.  That doesn’t mean there aren’t pictures that can say 1000 words.  Here’s approximately 6000 words on the subject of substitution patterns.  Substitution patterns are patterns where you want functionality to expose a certain interface, but need some sort of bridging logic to achieve that.

One nice thing about all of these patterns is that they describe an object.  Patterns that involve several actors are harder to name and explain.  Here, you’ve got the object that implements the pattern, and the target.  The target never knows anything about the pattern (which is why I haven’t labelled them in the diagrams).

Proxy

A proxy is an object that behaves exactly like its target.  Usually it only exists to cross a machine or process boundary.  WCF and RMI create proxies all of the time.  I’ve written more about proxies before, but I ran with the more colloquial use of the term, where proxy and decorator are basically the same thing.  I’m using the formal terminology here, but it’s still probably time we just accepted that common practice treats them as the same.

Proxy Pattern 

Adapter

A square peg in a round hole.  Used typically to deal with interfacing issues.

Adapter Pattern

Decorator

A round peg in a round hole.  Decorator differs from proxy in that it changes behaviour.

Decorator Pattern

Circuit Breaker

Not a classic Gang of Four pattern, although it’s technically a special case of decorator.  Here you have two implementations: a primary implementation and a fallback implementation.  Typically, the primary implementation is a remote proxy and the fallback is a dummy class that just throws exceptions, but obviously other arrangements are possible. The circuit breaker flips between the two depending on the behaviour of the primary implementation.  If connectivity goes down or performance degrades unacceptably, subsequent calls are routed to the fallback until the circuit breaker decides to try the primary again.  This prevents the calling system from contributing to the load on the remote system.  There’s a good implementation using Castle DynamicProxy over on Davy Brion’s blog (it conflates the circuit breaker with the fallback).

Circuit Breaker Pattern I’ve marked the Fallback target in this case because it is an actor in the pattern.  The remote target is still unaware of the circuit breaker.

Composite

Multiple round pegs, one round hole.  Not really anything to do with trees.

Composite Pattern

Façade

Loads of pegs, lots of shapes, one round hole.  Used to simplify a subsystem.

Facade Pattern

Solution Transform Now Supports Round Tripping to VS2010 and .NET 4.0

Headline says it all.  I’ve spent a fair bit of time recently knocking the code base into shape and adding minor details which sound dull but reduce friction when you’re using it, such as the ability to explicitly remove an assembly reference.  But more exciting is the ability to switch between VS2010 and VS2008.  This is a “dumb conversion”, it doesn’t modify your CS files (personally I think that’s what you want).  It does, however, tweak assemblies.

Some example command lines:

Convert to VS2010:  SolutionTransform Retarget –solution %cd%InversionOfControl-vs2008.sln –ide vs2010 –target dotnet40 –rename –VS2010

Convert back:  SolutionTransform Retarget –solution %cd%InversionOfControl-vs2010.sln –ide vs2008 –target dotnet35 –rename –VS2008

Convert to Silverlight 3.0:  SolutionTransform Retarget –solution %CD%InversionOfControl-vs2008.sln –target silverlight30 –rename -Silverlight –assemblyPaths ..libsilverlight-3.0

Now, obviously, if you’re migrating your project to VS2010, you’ll be using the Visual Studio upgrade wizard.  However, this isn’t for that: it’s for including in build scripts so that you can keep parallel versions of your code running without pain.  Code is found on GitHub, and the documentation on the Castle Wiki.

Technorati Tags:

Writing a more C#-ish Intersection Function

Okay, first off, this isn’t a Ruby versus C# post.  I actually find Ruby quite interesting and want to learn more: that’s why I’m reading posts about it.  It’s a post about how the language idioms lead you to different code designs.  This is following on from Alan Skorkin’s post about writing an intersection function in a Ruby idiom.  So I thought I’d do the equivalent for Alt.Net programmers.  Neither is massively useful code: both languages already have perfectly good intersection functions in them.  This is just a bit of fun (as was Alan’s post).

First let’s talk about the problem space.  The basic problem is to get the intersection of two pre-sorted lists with distinct elements.  The first interesting difference is the language: when asked for a list, the natural primitive in Ruby is an array.  In C#, the natural primitive is IEnumerable<T>.  Then there’s the question of sorting.  For Ruby, the natural answer: whatever “<” says the sorting is.  For C#, the natural thing is to introduce a Comparison<T> function that defines the ordering.

Implementing Each Continued

C# is, of course, statically typed, so monkey patching tricks aren’t going to work.  However, we have static extension methods which can be used for similar effects.  So, here’s my implementation of “Each_Continued”

public static IEnumerable<TValue> AsContinuable<TValue>(this IEnumerable<TValue> list)
{
    return new ContinuableEnumerable<TValue>(list);
}

It’s at times like this that I miss the ability to declare anonymous types.  However, one of the big changes is obvious here.  Rather than have a dedicated function that’s added into the class I’m trying to use, I’m just slapping a decorator on top of it.  This prevents any problems with two users of the each_continued function since the scopes will be different.

Now, IEnumerable<T> doesn’t have a concept of indexing (IList<T> does, but requiring that wouldn’t be very idiomatic).  As a consequence, Alan’s trick of feeding back the next index isn’t going to work for me.  Instead, I’ll use a simpler approach: whenever you restart the inner loop, you’ll start on the same element as last time.  I could fix this and make it more efficient, but I’m going for idiomatic here.

public class ContinuableEnumerable<TValue> : IEnumerable<TValue> {
    private IEnumerable<TValue> underlying;
    private IEnumerator<TValue> enumerator;
    TValue current;

    public ContinuableEnumerable(IEnumerable<TValue> underlying)
    {
        this.underlying = underlying;
        enumerator = null;
    }

    public IEnumerator<TValue> GetEnumerator()
    {
        if (enumerator == null)
        {
            // First time, start everything off
            enumerator = underlying.GetEnumerator();
            
        } else
        {
            // Later time, restart from the current position
            yield return current;
        }
        while (enumerator.MoveNext())
        {
            current = enumerator.Current;
            yield return current;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Okay, the code is inarguably longer, even if you ignore the curly-brace-induced vspace.  The support for non-generic enumerations is enforced by the type system, despite the fact we never intend to use it.  The necessity to declare, then initialise variables makes it longer than the Ruby equivalent.  However, I’d argue that the main implementation is concise and clear.

Implementing The Intersection Function

Ruby and C# both have generators, a for/foreach loop and function for iterating (Select in C#, each in Ruby).  However, since I’ve decided to go for a pure IEnumerable<T> approach, I really want to use foreach for the main loop.  The implementation of ContinuableEnumerable has been drive by that.

public static IEnumerable<TValue> PreSortedIntersection<TValue>(this IEnumerable<TValue> list1, IEnumerable<TValue> list2,
    Comparison<TValue> comparison
    )
{
    var continuable = list2.AsContinuable();
    foreach (var v1 in list1)
    {
        foreach (var v2 in continuable)
        {
            var comparisonResult = comparison(v1, v2);
            if (comparisonResult == 0)
            {
                yield return v1;
            }
            if (comparisonResult <= 0)
            {
                break;
            }
        }
    }
}

Personally, I find this function much easier to understand than the Ruby version, which operated through an interaction between the calling function and each_continue.  (You could, however, object that if you made it do the same thing as the Ruby version, it would be more complicated.)

Obviously, that’s fine in the general case, but what if you wanted to perform an intersection on numbers or dates?  You shouldn’t really have to define a comparison function every time.  So we add

public static IEnumerable<TValue> PreSortedIntersection<TValue>(this IEnumerable<TValue> list1, IEnumerable<TValue> list2)
    where TValue : IComparable<TValue>
{
    return PreSortedIntersection(list1, list2, (x, y) => x.CompareTo(y));
}

Which will, of course, work on any class that defines a natural order, including integers and dates.  Since you can’t polymorphically overload operators, IComparable<TValue> is the nearest C# has to Ruby’s “<“.  We can then write code like this:

var list1 = new[] { 1, 2, 3, 6, 7, 8 };
var list2 = new[] { 3, 4, 5, 6, 8, 9, 10 };
Write(list1.PreSortedIntersection(list2));

Finally, there’s an extremely large but invisible elephant in the room.  The ruby function evaluated eagerly, the C# version is lazy by default.  Again, this is idiomatic: we nearly always return lazy lists and call the extension method “ToList()” if we want them evaluated eagerly.  You could, of course, implement something much closer to the Ruby implementation if you used lists and anonymous delegates, but it wouldn’t be as idiomatic.  What you couldn’t do is to start throwing LINQ at the problem: I don’t think there’s a succinct way of expressing it using LINQ that has the right complexity.

Way Too Complicated

The only problem with all of this: Alan’s original implementation was, in many ways, clearer.  It was undoubtedly more efficient.  Ironically, removing the index variables and going straight to the C# enumeration interface gives you a very elegant implementation:

public static IEnumerable<TValue> PreSortedIntersection2<TValue>(this IEnumerable<TValue> list1, 
    IEnumerable<TValue> list2, Comparison<TValue> comparison
) {
    var enum1 = list1.GetEnumerator();
    var enum2 = list2.GetEnumerator();
    while (enum1.MoveNext() && enum2.MoveNext())
    {
        int comparisonResult;
        while (0 != (comparisonResult = comparison(enum1.Current, enum2.Current)))
        {
            if (!(comparisonResult < 0 ? enum1 : enum2).MoveNext())
            {
                yield break;
            }
        }
        yield return enum1.Current;
    }
}

C# and Ruby are both great languages, and closer together than many think, but it’s interesting to see how the differences send you down very different coding paths. 

P.S. I’ll explain the MapReduce tag in a later post…

Technorati Tags: ,,