Running MTA code in an STA thread

I don’t claim to understand the windows threading model that well.  For that matter, I don’t want to learn either.  But every so often you hit an error like this:  “WaitAll for multiple handles on a STA thread is not supported.”.  Now, we’ll calmly step away from the car crash that is understanding the apartment threading model and skip straight to making the problem go away.

private void RunInMtaThread(ThreadStart action)
{
    var thread = new Thread(action);
    thread.SetApartmentState(ApartmentState.MTA);
    thread.Start();
    thread.Join();
}

Incidentally, the SetApartmentState call isn’t actually necessary, since it’s the default.  I’m including it so that it’s obvious how to achieve the reverse.  As an aside, I’ve been spending a lot of time thinking about API design (Ayende’s opinionated API article started me on this road), and I can think of no good reason that the apartment state should not be an optional parameter of thread start, rather than a settable property.  It’s not like you can change the apartment state once it’s running.

AutoGen for Castle Windsor

I recently published a tool called AutoGen for Castle.  You can check it out at Google Code.  In essence, it auto-generates factory objects for you and adds them to your container.  This is an extremely useful thing to be able to do.  I do, however, find it a bit hard to explain, so bear with me.

A relatively good rule of thumb these days is that a class should instantiate objects or do something with objects but never bothMiško Hevery has written a lot of good stuff on this subject, and I don’t propose to mangle his thinking here.  Now, if we go back to our old Gang of Four maze construction example, the abstract factory in that case produced objects that were closely related and independent.  However, that’s rarely the case these days if only because we design systems to have very few interdependent components.  It’s actually much more likely that the objects could have been produced using a DI framework such as StructureMap or Castle.

Now, when you start using dependency injection containers, you seem to end up putting “Resolve” or “GetInstance” all over your code.  This is an extremely bad idea, for two principal reasons:

  • Calling Resolve is conceptually as bad as calling new
  • Your code should not be taking a dependency upon its DI framework.

Now Jeremy Miller wrote an excellent article on the question of libraries taking a dependency upon IoC containers.  It’s a known problem and hard to deal with without Microsoft stepping up to the plate.  However, typically a your code probably doesn’t need an interface as general as the one Jeremy proposed.  That’s only going to be useful for people building frameworks.  It’d be better if you could specify your own.

That’s what AutoGen does, it lets you specify an interface (or multiple interfaces) for how you interact with Castle.  Anything you like, really.  By default a parameter called “key” is the key, and anything else gets passed to the constructor.  (Obviously, it uses Castle’s semantics for doing this, there’s not a lot of control there.)  It even, if you so wished, allows you implement Jeremy’s interface.  That won’t help you with standardization, however.

Ideally, this means that you can actually restrict your interaction with your container to your main method.  You only need one call to Resolve/GetInstance: the call that resolves your service class.  The rest of your code can now be container-agnostic.

Anyway, if you’re interested, you can take a look here:

http://code.google.com/p/castleautogen/

It depends upon Castle Core, Dynamic Proxy and (obviously) Castle Windsor.  The tests are written in NUnit 2.4.

Understanding Inversion of Control Containers

I thought I’d write a bit about how I understand the philosophy of IoC containers.  I realize I’m actually building up to something that I started talking about a while ago.  I’m probably not saying anything that Martin Fowler didn’t say before, but I have my own slant on it.  To start off with, I’d like to just review what we actually mean by various terms:

  • Inversion of Control (IoC) is a general name for the pattern where an object isn’t responsible for managing the lifecycle of the services it uses. 
  • The simplest way to implement this (in .NET) is passing services in through the constructor.  This is termed constructor injection.
  • Typically, services are passed in using interfaces, which eases testability.  However, Inversion of Control is not about testability.

So what is an IoC container?  It’s a configuration tool.  That’s it.  Typically, it implements the constructor injection pattern like so:

  • For each object registered, you usually specify:
    • A name for the component
    • The interface it implements
    • The class that implements it.
  • For primitive values, you just say what the constructor parameter is and what the value should be.
  • For interfaces, you either not specify the implementation, in which case you get the default, or specify a particular component reference.

Actually, there is one other thing the container does: it handles lifecycles.  This is a major feature that people often take for granted.  The clue is in the name, really.  Containers are things that hold objects, not produce them.  Containers typically allow you to specify the lifecycle of the object e.g.

  • one implementation in the process (Singleton)
  • one implementation in the thread
  • one implementation in an HttpContext

This lifecycle management is crucial to the use of IoC containers in most environments.  The catch is that it can have side effects you do not expect.  For instance, if you call a parameterized resolve on an object with a singleton lifecycle, the object will only ever have the first set of parameters passed in.  Any others will be ignored (the moral of this story is to always use transient lifecycles when dealing with run-time parameters).

A fundamental part of the philosophy of IoC containers is that they should be extremely low footprint and non-invasive.  The code should not need to know it is running in a container.  Nor should the interfaces.  There are, however, a number of times that you do need to know about the container.  The obvious one is when reasoning about lifecycle management, however there are a number of times the abstraction gets broken.  Having the abstraction broken is not as painful as having no abstraction at all, but it can be a distraction.

Evaluation of Containers

There are, of course, a lot of subtleties about containers.  Quite a lot of people come to the conclusion that the libraries out there are too “heavy-weight” and that they would be better off rolling their own.  If you’re one of those people. hopefully after reading this list you will either decide to refocus your efforts on improving the existing libraries, or you will have a USP that merits the duplication of effort.  (Or you just want to have fun, which is always a valid reason for writing code.)  I’ve listed some of them out here:

Most of this is specific to Castle Windsor, since its the one I’ve worked with most, but many of these questions are common across implementations and are things you should watch out for when evaluating.  I will re-iterate that whilst it is easy to write a simple IoC container, writing a very good one such as Castle is a challenge.

Are Primitives Strings?

My personal bugbear is that IoC containers started out when XML was fashionable.  As a consequence, there’s a tendency in most of them to treat everything as a string.  Since these days there’s a move towards DSLs such as Binsor or fluent configuration, the requirement that parameters be strings is out of date.  There are a number of side effects of this.  Castle Windsor RC3, for instance, fails one of its unit tests in a UK environment due to different date formats.  Equally, adding a primitive type that isn’t easily expressed as a string is painful.  Custom Type Converters are a useful concept for dealing with text-based file formats, but seriously, why can’t you say

Component.For<IDownloader>()
    .ImplementedBy<Downloader>()
    .Parameters(
Parameter.ForKey("target").Eq(new Uri("http://www.google.com")) ) );

The current way of achieving this is unnecessarily verbose.

How are Lists Handled?

If there is one thing I heartily dislike about Castle, it’s the list handling.  Ironically, in many ways, the list handling is strong: it’s relatively easy to register an array of array of strings, for instance.  However, once you leave primitives, it gets more ambitious.  If you create a constructor parameter of IEnumerable<IService>, it will by default pass in a list of all components that are registered with the IService interface.  There are a number of problems with this

  • The worst is that it gets in the way of the second simplest use case of a list: one where you specified a list of component references yourself.  If you try this, you end up with a type conversion error.
  • It can’t handle super-interfaces, it will only ever do exact matches.
  • You can’t specify that you care about interfaces on the registered implementations.  Thus, requesting IEnumerable<IDisposable> wouldn’t return the “Right Thing” (all registered disposable objects) even if you could specify that you wanted super-interfaces.

I would advise anyone evaluating a container to pay particular attention to how you can specify lists of components, because it come up a lot in real use cases.

What Issues are there with Open/Closed Generics?

There’s always a couple of bugs to do with open and closed generics.  Castle recently fixed another one.  In March of this year, it wasn’t possible to express this concept in StructureMap:

Component.For<IMessageHandler<string>>()
   
.ImplementedBy<MessageHandler>()
);

Indeed, this issue was pretty much why I moved to Castle in the first place.  These days you’ve got to come up with something fairly involved to run into a problem (e.g. an open generic type relying on a closed one).  However, if you’re using one of the many less-popular frameworks, or rolling your own, you need to watch out for this.

How does the Container Deal with Multiple Interfaces?

If you register the same class as the implementation of multiple interfaces, typically you will end up with multiple instances.  It’s possible to mitigate this by using explicit component references, but that’s not a perfect solution.  Sometimes you want a service that exposes different interfaces to different consumers.  Castle Windsor calls this feature “forwarding”.

How can you Inject your own code?

How good is the container at handling the case where it doesn’t create the object itself?  Can you say something like this?

Component.For<IConnection>()
   
.CreatedBy(() => ConnectionFactory.NewConnection() )
);

Windsor’s story here is rather painful, with two facilities defined which use reflection to run.  On the other hand, they support Dynamic Proxy out of the box, so intercepting method calls to the interfaces is pretty simple and powerful.

Can you Create a Derived Container?

I am, frankly, amazed this does not come up more often.  It should be relatively easy to create a container based upon another container, overriding and extending parts of the configuration.  This is actually extremely useful.  Binsor has the Extend keyword (you’ll need to check the unit tests for documentation) which achieves this, but frankly this is too important a feature to be left to the DSL, this should be a fundamental part of the container.  Certainly there’s no easy way to achieve this without using Binsor in Windsor.  I think there will probably be a whole separate post about this.

Automated deployments #1: What’s on your server?

No really, what’s actually on your server?  If you’re first answer isn’t “erm” you’re either very good or don’t understand the question.  Servers are huge, they can store War and Peace on their hard drives without you even noticing.  For that matter, they can store any number of PDFs of books on patterns and practices without any appreciable benefit to civilization, but I don’t think that’s really the fault of the servers.  It’s practically impossible to really know what’s on there.  What’s worse, the way most people do development, they make the job harder for themselves.

I had a meeting with our auditor today and thanked my lucky stars that we had automated deployments.  Automated deployments save an awful lot of effort if they’re done right and really save your hide when people start poking around your process.  Let’s talk about a really simple question, what’s an your server?

If you tell me it’s version 1.2.6, I’m going to have a few questions.

  • What was in version 1.2.6?  Is there a label in source control?
  • Was every file checked in?
  • What build does that correspond to?
  • How can you check that the build is what got deployed?
  • How about the config, is that in source control?  The actual config that’s on the server right now.
  • How do you know nothing’s changed on the server since then?

Look at Microsoft, or any large company, and they’ve got this sorted out.  It’s internal development teams that tend to have a problem.  When people ask these questions:

  • What’s changed since we last deployed?
  • What could this affect?
  • Can we rollback?

You want to have good answers.  And absolutely fundamental to this is: know what’s on your server.  Exactly.

First, you need to have a build server.  Download and love CruiseControl.NET.  Builds on local machines always turn out to have the wrong version, a reference to something that isn’t in source control, a dependency that isn’t well worked out.  A real pain for anyone starting with this is that it turns out your solution files aren’t really as flexible as you’d like.  You can get going with MSBuild, but there’s a reason every open source project uses nant.  (NAnt is far from perfect, but it’s a heck of a lot easier than MSBuild for anything slightly complex.)

Anyway, here are my answers:

  • Version numbers are build numbers.  “1.2” is just for PR, it’s got nothing to do with versioning.  Call it what you like (you can call it Vista if you must) but the real version number is the build number from on the build server.
  • Build Servers will only build files that are checked in.
  • I said that version numbers are build numbers, right?
  • We label every assembly that gets built with the build number (I stick it in the summary in the DLL).  This makes it really easy to just check the version on the server.  Also, we stick the information in an About Box, or a web service call.
  • The actual config on the server isn’t in source control, but the environmental delta is.  The deployment process stamps the config with the information as well.
  • Making sure that nothing’s been changed is harder, because no-one’s written a general tool for doing so, but taking a hash of the directory straight after deployment and checking it each day will catch most of that.  (You can hash files individually for better diagnostics.)  Tracking every setting on the server is probably more trouble than it’s worth, but I do have a tool for downloading IIS Virtual Directory settings to an XML file, because that turned out to be on the right side of the cost/benefit calculation.

Your answers don’t need to be the same, but I guarantee you your life will be easier when you have answers to these questions.  Importantly, the work scales.  The more people join your team, the more this stuff is important.  Incidentally, you can do all of this in TFS.  I know, I’ve done it.  And I’ve regretted not using CruiseControl.NET, NUNIT, SVN and NANT every time.  Open source, depressingly, appears to be better documented than the stuff I paid for.

Mono’s still got a long way to go

Reading this didn’t impress me massively, and not only because it’s a reheated blog post.  Don’t get me wrong, the Mono team has done some superb work, but it’s really not ready for primetime.  Miguel does a phenomenal job of cheerleading, but let’s take a look at this particular example.  That’s not an industry standard benchmark they’re running there, it’s some code on some guy’s blog.  It’s quite a nice and interesting blog, but it’s nowhere close to the mainstream.

Sadly, where Mono still falls down is meat and potatoes issues.  Look at the number of patches the Ubuntu team need to make to each release before they’re happy packaging it.  Look at the weird behaviour problems that the Castle team discuss on their mailing lists (e.g. why on earth does Mono 1.9 believe it can convert from a string to an interface?  Don’t they have a regression test for that?).  Worst of the lot, however, has to be the garbage collector.

Getting the garbage collector wrong is second only to getting the compiler wrong.  People won’t understand what the problem is, but they’ll suffer when it doesn’t work right.  Mono currently uses the Boehm garbage collector, which is a non-compacting, conservative C++ garbage collector.  If you use vanilla .NET, you don’t need to know about the garbage collector for the most part (unless you’re doing stupid things with finalizers), however, if you’re running on mono, the same program that run fine on .NET can give you nonsense like this:  http://www.mail-archive.com/mono-list@lists.ximian.com/msg22436.html.  (Incidentally, the suggested remedy is a sticking-plaster over a sword wound.)

At the moment, the only real solution to this problem is to use allocation patterns that the Boehm GC likes, which is ridiculous to anyone who has stopped worrying about memory fragmentation for the last five years.  In fairness, the Mono Project is planning to address this at some point towards the end of the year.  Then all I’ll be worried about is their quality problems.

Unforeseen Consequences: My lavatory made a funny noise

Now, every in my office has already heard this story, but it deserves a wider audience.

Six months ago, I bought a house.  One of the lavatories made a funny noise when it flushed.  Actually, not so much funny as extremely loud.  This noise would go on for about a minute.  It actually sounded like the house was shaking apart.  I ignored this for months and worked around the problem.  Being a geek, I figured out that running the bath at the same time stopped the noise, so I knew I was dealing with a resonance problem.  I’m not, however, a plumber, and had no idea what was starting the whole thing off.

There are a couple of weird things about the house.  One is that the lower bathroom had a shower head, but no shower rail.  Not particularly wanting to soak my new house every time I used the shower, I used the shower upstairs.  Finally, I (or should I say, my significantly more organised wife) got the plumber ’round.  He installed a shower rail, took one look at the loo and determined it had the wrong washer on it.  Replacing the washer for a fiver, he managed to fix a problem that I’d assumed was going to cost me thousands.

I then went away for a couple of days, and came back to a leak in my kitchen.  Water was seeping through from the ceiling.  I went nuts, thinking the house was about to fall down.  I phoned up the plumber and he agreed to come back on the Sunday morning.  (Our plumber, you will appreciate, is an absolute brick.  Couldn’t praise him more highly.)  In the morning, we started discussing the problem.  Maurice (really) first wanted to check that he hadn’t drilled through a pipe.  He was quite happy to admit that he had done so before, but he doubted this was the problem since we’d have a lot more water leaking.  We then started on a relatively serious discussion on whether it was better to rip up the floorboards or break through the plaster.  Another difficulty was working out from where exactly it was leaking.  Finally, I asked him if it was possible that fixing the washer had affected something else.  Maurice said “No, that can’t happen.  Let me explain why.”.  He lifted off the cover of the lavatory tank, stared at it and said “There’s no overflow”.

For those of you who don’t know, the overflow is a pipe out of the back of your loo that goes outside.  In the event of a minor problem, you end up with water being dumped outside your property.  Since the property can handle rain, it’s not an urgent problem and is easily fixed.  What my loo did, was drop the water from the overflow onto the floor, and eventually through the kitchen ceiling.  Basically, the guy who’d installed it in the first place had done a dreadful, incompetent job.  So now I have a bucket where the overflow should be and another date with Maurice where he’s going to install some overflows.

The reason I mention this is, the experience was nigh on identical to conversations I have with my manager about some legacy systems I deal with:

  • Problems sometimes aren’t as serious as their symptoms suggest.
  • Fixing one thing may highlight a problem somewhere else.
  • Always explain to someone else why something can’t possibly happen.
  • An audit won’t find every problem.
  • You’re always going to get these problems when you’re taking over a badly done job.

And sadly, sometimes you won’t get lucky and will have to rip up the floorboards to figure out what’s going wrong.

How to support Default.aspx in ASP.NET MVC

If you’re trying to use MVC incrementally on an existing project, this can be a bit of pain.  There must be some way of way of getting the routing logic to redirect itself, but in practice the following does the trick (assuming you’re using the standard routing).

    public class HomeController : Controller {
        public ActionResult Index()
        {
            return new RedirectResult("~/Default.aspx");
        }

    }
Technorati Tags:

Understanding MapReduce #1: The Assumptions

I finally had a light-bulb go off in my head about MapReduce.  To be honest, part of the problem with understanding it is that the implementation most of us look at (Hadoop) has a considerable amount of implementation detail visible at all times.  I’m going to try to explain some of the fundamentals behind it in terms of C#, which has two great advantages:

  • I know it
  • LINQ gives us a fairly decent declarative syntax for expressing algorithms declaratively.

Now, as everyone knows, Map and Reduce are lisp terms for project and aggregation respectively.  In LINQ, these are called Select (or SelectMany) and Aggregate (or just “apply function”).  MapReduce simply applies one and then the other.

        public static TResult MapReduce1
            <TResult, TMapped, TInput>
            (
            Func<TInput, TMapped> map,
            Func<IEnumerable<TMapped>, TResult> reduce,
            IEnumerable<TInput> inputs) {

            return reduce(
                from input in inputs
                select map(input)
                );
        }

That’s it!  So why is it so clever?  Well, what google did was to change the assumptions a bit.  The irony is that by adding in more conditions, they actually came up with something more general, not less.  So, let’s take a look at some of those assumptions:

  • The map always returns a list. 
  • The reduce function operates on the same input type as output type.
  • The reduce function is idempotent.  In plain english, if you reduce the output of a reduce, your output will be equal to your input.

The first one’s a gimme.  Returning a list doesn’t make a blind bit of difference.  You could just return one item for every input and you’d be back to the original function.  However, the restriction on the reduce is hugely powerful.  In particular, it allows for the distribution of partial reduces.  I’m not going to show that in code today.

Version 2 of the code looks pretty similar:

        public static IEnumerable<TResult> MapReduce2
            <TResult, TInput>
            (
            Func<TInput, IEnumerable<TResult>> map,
            Func<IEnumerable<TResult>, IEnumerable<TResult>> reduce,
            IEnumerable<TInput> inputs) {

            return reduce(
                from input in inputs
                from mapped in map(input)
                select mapped
                );
        }

We’ve got an extra from to deal with, but otherwise this is pretty tame.  Note that we’ve made the reduce return a list as well.  Again, it doesn’t make much of a difference.  We’ll abstract away the concept of applying a map.

        public static IEnumerable<TResult> MapReduce2b
            <TResult, TInput>
            (
            Func<TInput, IEnumerable<TResult>> map,
            Func<IEnumerable<TResult>, IEnumerable<TResult>> reduce,
            IEnumerable<TInput> inputs) {

            Func<IEnumerable<TInput>, IEnumerable<TResult>> applyMap =
                mapInputs => mapInputs.SelectMany(map);
            return reduce(applyMap(inputs));
        }

Now things get interesting.  MapReduce assumes that you’re using Tuples everywhere.  This is the most important step.  The point is, it groups on the basis of the keys.  We can also use different keys for mapped data and the results of reduces, although the type system restricts how useful that could be.  Now version 3 does look somewhat more complex.

        public class Tuple<TKey, TValue> 
        {
            public TKey Key;
            public TValue Value;
        }

        public static IEnumerable<Tuple<TKey, TValue>> MapReduce3
            <TKey, TValue, TInput>
            (
            Func<TInput, IEnumerable<Tuple<TKey, TValue>>> map,
            Func<TKey, IEnumerable<TValue>, IEnumerable<Tuple<TKey, TValue>>> reduce,
            IEnumerable<TInput> inputs) {
            Func<IEnumerable<Tuple<TKey, TValue>>, IEnumerable<Tuple<TKey, TValue>>> applyReduce =
                results => from result in results
                           group result.Value by result.Key into grouped
                           from reduced in reduce(grouped.Key, grouped)
                           select reduced;
            Func<IEnumerable<TInput>, IEnumerable<Tuple<TKey, TValue>>> applyMap =
                mapInputs => mapInputs.SelectMany(map);
            return applyReduce(applyMap(inputs));
        }

The important bit is the way we’ve redefined the reduce operation.  Now the reduce operation operates on a list of values for a particular key (it can still return whatever it likes).  The applyReduce function demonstrates how this concept of reduce maps onto the old concept of reduce.

The LINQ syntax obscures one thing we’ve overlooked so far: how the grouping actually works.  The Hadoop implementation makes this far from explicit as well.  Hadoop does it by requiring all keys to implement “WriteableComparable”.  The direct translation would be to require TKey to implement IComparable.  However, we’ll go with a more .NET like way of doing things using IEqualityComparer<TKey>.  Here’s version 3 with an IEqualityComparer.

        public static IEnumerable<Tuple<TKey, TValue>> MapReduce4
            <TKey, TValue, TInput>
            (
            Func<TInput, IEnumerable<Tuple<TKey, TValue>>> map,
            Func<TKey, IEnumerable<TValue>, IEnumerable<Tuple<TKey, TValue>>> reduce,
            IEqualityComparer<TKey> groupRule,
            IEnumerable<TInput> inputs) {
            Func<IEnumerable<Tuple<TKey, TValue>>, IEnumerable<Tuple<TKey, TValue>>> applyReduce =
                results => results
                            .GroupBy(result => result.Key, result => result.Value, groupRule)
                            .SelectMany(grouped => reduce(grouped.Key, grouped));
            Func<IEnumerable<TInput>, IEnumerable<Tuple<TKey, TValue>>> applyMap =
                mapInputs => mapInputs.SelectMany(map);
            return applyReduce(applyMap(inputs));
        }

Now, I’ve tried to avoid talking about distribution concerns in this post, but here we’re forced into it.  The results of maps will potentially be transmitted across the network.  Therefore, it makes sense for the grouping to actually occur during the map.  Again, you might not see this in the Hadoop examples as the grouping is actually performed by the OutputCollector.  While we’re here, we’ll observe that the Hadoop standard of taking two inputs to the reduce function doesn’t make much sense in an environment in which IGrouping is a standard concept.  Thus, we can move the grouping call to the map as follows:

        public static IEnumerable<Tuple<TKey, TValue>> MapReduce5
            <TKey, TValue, TInput>
            (
            Func<TInput, IEnumerable<Tuple<TKey, TValue>>> map,
            Func<IGrouping<TKey, TValue>, IEnumerable<Tuple<TKey, TValue>>> reduce,
            IEqualityComparer<TKey> groupRule,
            IEnumerable<TInput> inputs) {
            Func<IEnumerable<IGrouping<TKey, TValue>>, IEnumerable<Tuple<TKey, TValue>>> applyReduce =
                results => results.SelectMany(reduce);
            Func<IEnumerable<TInput>, IEnumerable<IGrouping<TKey, TValue>>> applyMap =
                mapInputs => mapInputs
                    .SelectMany(map)
                    .GroupBy(result => result.Key, result => result.Value, groupRule);
            return applyReduce(applyMap(inputs));
        }

The problem with writing it out like this is that the Func definitions get to be most of the code.  Let’s see it again, simplified:

        public static IEnumerable<Tuple<TKey, TValue>> MapReduce6
            <TKey, TValue, TInput>
            (
                Func<TInput, IEnumerable<Tuple<TKey, TValue>>> map,
                Func<IGrouping<TKey, TValue>, IEnumerable<Tuple<TKey, TValue>>> reduce,
                IEqualityComparer<TKey> groupRule,
                IEnumerable<TInput> inputs) {
            Func<IEnumerable<Tuple<TKey, TValue>>, IEnumerable<IGrouping<TKey, TValue>>> collectOutput =
                mapped => mapped.GroupBy(result => result.Key, result => result.Value, groupRule);
            return collectOutput(inputs.SelectMany(map)).SelectMany(reduce);
        }

Now, Hadoop goes one stage further by insisting that the inputs also be tuples.  It then has a file handling system for generating those tuples from files.  Let us just, for the moment, observe that actually generating the list of inputs may be an expensive operation in itself.  So, we need to be able to deal with batches of inputs.  We’ll leave that problem until next time.

When will I ever learn?

So, I just noticed before deploying the latest version of a system that one of the drop downs wasn’t populating on the front end.  Now, the front end is far from perfect, but this part of the code actually has pretty good test coverage.  So it was a bit puzzling as to why this hadn’t been flagged by the build.  Diving into some controller tests I knocked together about nine months ago, I find the following line.

            IgnoreStaticData(view);

Need I say more?

It seems like I need to keep re-learning the lesson: anything that you’re not testing is wrong.

The LINQ vs Generators Shootout, Round #1

I have to admit, Python is growing on me.  I’m still not entirely convinced of the utility of IronPython, especially given that Boo exists (why don’t more scripting languages allow you to meddle with compilation?).  However, Python as CPython and Jython are actually rather interesting beasts, with some very cool stuff being done on them (I really like the look of Pylons, for instance, I’ll probably write something up on that in the future.)

I thought I should probably expand on my remark that LINQ was more powerful than list comprehension.  It was pointed out to me that Python supports both lazy and greedy evaluation (it calls one list comprehensions and one generators).  LINQ is purely lazy, although adding “ToList” onto the end will typically deal with that if it is a problem (and it would be if you used it naively).

So, how is LINQ a better form of list comprehension?  Four reasons:

  • It’s implemented as a data model, allowing stuff such as LINQ to NHibernate to exist.
  • It supports group by
  • It supports order by
  • It supports intermediate assignments through let and into

The first is probably the most technically impressive, but it’s also the most controversial.  It means that LINQ is much more than just a list comprehension system, but no-one’s got enough experience of it yet to know exactly how these features are best used.

The grouping is cool, although I have to admit I’ve rarely needed it.  The ordering, on the other hand, is huge.  Python’s model for sorting is the IComparable model in C#.  If you’ve ever tried to sort by three keys, you’ll know the problems with it.  In contrast, you can just specify the keys and let LINQ sort it out for you. 

The final one is probably the most useful of the lot, even if it seems minor.  Take a look at the following code:

public static string Deduce7ZipDirectory(IEnumerable<string> output)
{
    var regex = new Regex(@"s(?<Folder>[^s\]+[^:])[\]");
    var result = (from value in output
                  let match = regex.Match(value)
                  where match != Match.Empty
                  select match.Groups["Folder"].Value)
        .FirstOrDefault();
    return result;
}

I actually wrote this code to parse the output of 7zip’s command line list function and I think it’s pretty elegantly declarative.  I’m not entirely happy with the debugging story, however.  You can put breakpoints within the LINQ statement, but seeing the local variables doesn’t seem to work for me.  Ironically, this is a bigger problem for C# than it is for JavaScript or Python, simply because it’s possible to write rather complex things in these statements.

Personal Note

I made the mistake of posting shortly before I went on holiday for several weeks.  I’d like to thank everyone who commented, I learnt a lot,  Amongst the things I learned was that I really need to get around to writing an “About Me” page, mostly because of my aversion to posting noise rather than signal.  For the record, my name is Julian Birch and I live in London.