Writing a more C#-ish Intersection Function

Okay, first off, this isn’t a Ruby versus C# post.  I actually find Ruby quite interesting and want to learn more: that’s why I’m reading posts about it.  It’s a post about how the language idioms lead you to different code designs.  This is following on from Alan Skorkin’s post about writing an intersection function in a Ruby idiom.  So I thought I’d do the equivalent for Alt.Net programmers.  Neither is massively useful code: both languages already have perfectly good intersection functions in them.  This is just a bit of fun (as was Alan’s post).

First let’s talk about the problem space.  The basic problem is to get the intersection of two pre-sorted lists with distinct elements.  The first interesting difference is the language: when asked for a list, the natural primitive in Ruby is an array.  In C#, the natural primitive is IEnumerable<T>.  Then there’s the question of sorting.  For Ruby, the natural answer: whatever “<” says the sorting is.  For C#, the natural thing is to introduce a Comparison<T> function that defines the ordering.

Implementing Each Continued

C# is, of course, statically typed, so monkey patching tricks aren’t going to work.  However, we have static extension methods which can be used for similar effects.  So, here’s my implementation of “Each_Continued”

public static IEnumerable<TValue> AsContinuable<TValue>(this IEnumerable<TValue> list)
{
    return new ContinuableEnumerable<TValue>(list);
}

It’s at times like this that I miss the ability to declare anonymous types.  However, one of the big changes is obvious here.  Rather than have a dedicated function that’s added into the class I’m trying to use, I’m just slapping a decorator on top of it.  This prevents any problems with two users of the each_continued function since the scopes will be different.

Now, IEnumerable<T> doesn’t have a concept of indexing (IList<T> does, but requiring that wouldn’t be very idiomatic).  As a consequence, Alan’s trick of feeding back the next index isn’t going to work for me.  Instead, I’ll use a simpler approach: whenever you restart the inner loop, you’ll start on the same element as last time.  I could fix this and make it more efficient, but I’m going for idiomatic here.

public class ContinuableEnumerable<TValue> : IEnumerable<TValue> {
    private IEnumerable<TValue> underlying;
    private IEnumerator<TValue> enumerator;
    TValue current;

    public ContinuableEnumerable(IEnumerable<TValue> underlying)
    {
        this.underlying = underlying;
        enumerator = null;
    }

    public IEnumerator<TValue> GetEnumerator()
    {
        if (enumerator == null)
        {
            // First time, start everything off
            enumerator = underlying.GetEnumerator();
            
        } else
        {
            // Later time, restart from the current position
            yield return current;
        }
        while (enumerator.MoveNext())
        {
            current = enumerator.Current;
            yield return current;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Okay, the code is inarguably longer, even if you ignore the curly-brace-induced vspace.  The support for non-generic enumerations is enforced by the type system, despite the fact we never intend to use it.  The necessity to declare, then initialise variables makes it longer than the Ruby equivalent.  However, I’d argue that the main implementation is concise and clear.

Implementing The Intersection Function

Ruby and C# both have generators, a for/foreach loop and function for iterating (Select in C#, each in Ruby).  However, since I’ve decided to go for a pure IEnumerable<T> approach, I really want to use foreach for the main loop.  The implementation of ContinuableEnumerable has been drive by that.

public static IEnumerable<TValue> PreSortedIntersection<TValue>(this IEnumerable<TValue> list1, IEnumerable<TValue> list2,
    Comparison<TValue> comparison
    )
{
    var continuable = list2.AsContinuable();
    foreach (var v1 in list1)
    {
        foreach (var v2 in continuable)
        {
            var comparisonResult = comparison(v1, v2);
            if (comparisonResult == 0)
            {
                yield return v1;
            }
            if (comparisonResult <= 0)
            {
                break;
            }
        }
    }
}

Personally, I find this function much easier to understand than the Ruby version, which operated through an interaction between the calling function and each_continue.  (You could, however, object that if you made it do the same thing as the Ruby version, it would be more complicated.)

Obviously, that’s fine in the general case, but what if you wanted to perform an intersection on numbers or dates?  You shouldn’t really have to define a comparison function every time.  So we add

public static IEnumerable<TValue> PreSortedIntersection<TValue>(this IEnumerable<TValue> list1, IEnumerable<TValue> list2)
    where TValue : IComparable<TValue>
{
    return PreSortedIntersection(list1, list2, (x, y) => x.CompareTo(y));
}

Which will, of course, work on any class that defines a natural order, including integers and dates.  Since you can’t polymorphically overload operators, IComparable<TValue> is the nearest C# has to Ruby’s “<“.  We can then write code like this:

var list1 = new[] { 1, 2, 3, 6, 7, 8 };
var list2 = new[] { 3, 4, 5, 6, 8, 9, 10 };
Write(list1.PreSortedIntersection(list2));

Finally, there’s an extremely large but invisible elephant in the room.  The ruby function evaluated eagerly, the C# version is lazy by default.  Again, this is idiomatic: we nearly always return lazy lists and call the extension method “ToList()” if we want them evaluated eagerly.  You could, of course, implement something much closer to the Ruby implementation if you used lists and anonymous delegates, but it wouldn’t be as idiomatic.  What you couldn’t do is to start throwing LINQ at the problem: I don’t think there’s a succinct way of expressing it using LINQ that has the right complexity.

Way Too Complicated

The only problem with all of this: Alan’s original implementation was, in many ways, clearer.  It was undoubtedly more efficient.  Ironically, removing the index variables and going straight to the C# enumeration interface gives you a very elegant implementation:

public static IEnumerable<TValue> PreSortedIntersection2<TValue>(this IEnumerable<TValue> list1, 
    IEnumerable<TValue> list2, Comparison<TValue> comparison
) {
    var enum1 = list1.GetEnumerator();
    var enum2 = list2.GetEnumerator();
    while (enum1.MoveNext() && enum2.MoveNext())
    {
        int comparisonResult;
        while (0 != (comparisonResult = comparison(enum1.Current, enum2.Current)))
        {
            if (!(comparisonResult < 0 ? enum1 : enum2).MoveNext())
            {
                yield break;
            }
        }
        yield return enum1.Current;
    }
}

C# and Ruby are both great languages, and closer together than many think, but it’s interesting to see how the differences send you down very different coding paths. 

P.S. I’ll explain the MapReduce tag in a later post…

Technorati Tags: ,,

Technical Debt isn’t Like Borrowing From The Bank

I had an issue with a legacy component today.  It’s pretty much a poster child for technical debt, ill-understood, no documented procedures, precious little logging and everyone avoids touching it in case it breaks.  It’s also using an outdated architecture and the code is frankly bad.  In fact, the debt’s so bad… I’ve been putting off paying it off.  For the most part it doesn’t bother me and the inconveniences are mild.

raining_stones

The inconvenience today wasn’t.  Actually, I got really lucky.  Some messing around with IP addresses fixed it.  But I don’t know why it broke, and that bothers me, because next time I won’t be as lucky.

Borrowing from the bank is a relatively safe occupation.  You agree a sum, they check you can repay, you agree an interest rate and you agree a schedule of repayments.  You keep up the payments and eventually, the debt is paid off.

Technical debt isn’t like borrowing from the bank.  It’s like going to a loan shark.  You can borrow any amount, irrespective of your ability to repay.  The interest rate is high, and subject to change.  You don’t even need to pay it off all of the time.  In fact, it’s all smiles until the he comes round to break your legs.

If you’re lucky, you’ll get some warning.  If you’re not, the first thing you know is when someone’s banging on your door telling you it’s time to pay up.  And you’d better pay up there and then because the consequences of not doing so are pretty severe.  And that’s the really pernicious thing.  Technical debt often feels like a small problem, but it mounts up all of the time.  You can put off repaying it most of the time, so it doesn’t feel that big.  Until one day a something breaks and there’s no sensible way of fixing it.  So pay it off before the payment becomes due.

Technorati Tags:

The Composite Pattern Revisited

The composite pattern has been dismissed as being tricky and slightly special case.  Like the visitor pattern, we tend to associate it with dealing with trees* and such relatively obscure topics as semantic evaluation. I’m going to argue that in fact the opposite is the case: it’s a really easy to use thing that you probably see all the time and don’t think about too much.

Actually, it’s closest to the adapter pattern.  But while the adapter pattern is about jamming a square peg into a round hole, the composite pattern is about slipping two round pegs into the same round hole.  Of course, the physical realization of this would be impossible, but code’s a lot more flexible than IKEA furniture.

So, how do we achieve this?  Well, imagine a simple interface like this

public interface ILogger {
    void WriteLine(string line);
}

and two implementations

public class ConsoleLogger : ILogger {
    public void WriteLine(string line) {
        Console.WriteLine(line);
    }
}

// Yeah, the implementation of FileLogger is awful
// you don't need to tell me
public class FileLogger : ILogger {
    string fileName;

    public FileLogger(string fileName) {
        this.fileName = fileName;
    }

    public void WriteLine(string line) {
        using (var writer = new StreamWriter(fileName, true)) {
            writer.WriteLine(line);
            writer.Flush();
        }
    }
}

So, imagine we now want to write to the console and to the file.  Or to two different files.  Do we need to write a new logger each of these cases?  No, we just need the one, the Composite logger.

public class CompositeLogger : ILogger {
    IEnumerable<ILogger> underlying;

    public CompositeLogger(IEnumerable<ILogger> underlying) {
        this.underlying = underlying;
    }

    public void WriteLine(string line) {
        foreach (var logger in underlying) {
            logger.WriteLine(line);
        }
    }
}

This class gives us all the aggregation we need and allows us to easily implement the single responsibility principle.  A class taking one logger can now write to two loggers without changing the code.  Incidentally, if you look at the Solution Transform source code, you’ll see that I have two composite classes that work exactly as defined above.

Aggregation of Results

That’s the basic idea.  However, my example rather deliberately skirted an important detail: what happens if your interface returns values?**  The answer is, it depends.  We’re starting to move away from the classical definition Let’s take an example from the dotcom world.  Imagine that you were running a shopfront application.  You’ve got several suppliers with price providers.  But you only want the one price.  Well, the sensible thing to do would be to quote the lowest price.  So, on top of IPriceProvider you could have MinimumPriceProvider as a composite implementation. In fact, most aggregate functions could be used in various contexts:

  • Minimum
  • Maximum
  • Sum
  • Product
  • First (or First not null)
  • And (True if all of the underlying implementations return true)***
  • Or (True if one of the underlying implementations return true)

The price example is oversimplified, however.  Usually, you don’t only want to know the lowest price, you want to know who is quoting the lowest price.  The composite pattern doesn’t really apply here.  Actually, we’re probably talking about chain of responsibility again.  Composite objects are dumb.  If you’re talking logic and decision making, composite isn’t the right design.

Composites in Use

There’s a couple of composite implementations that you never even think about.  Any logging system is an obvious example, but MulticastDelegates are composites as well: they expose the method signature and call multiple implementations when invoked.  It’s also a fundamental building block of any publish and subscribe system.  Jeremy Miller’s got a list of common composites but, inevitably, most of them are tree structures.  Obviously, you start putting composites in other composites and you get a tree structure, but there are plenty of flatter problems that find composites useful.

So, when should you use it?  Basically. anytime you process a list of objects, revisit whether your code would be simpler if you just dealt with one object, and used a composite to for aggregation/distribution duties.

*Please bear in mind that I’m referring to that JavaWorld article as an example of someone making a mistake.  I don’t really want to dissect exactly what’s wrong with the rest of the article, but I’m not recommending it.

**In fact, I don’t think you can return values according to the classical definition.  It’s hard to find an example on the internet that does.  I’m sure someone with a copy of Gamma et al handy will correct me.

***Take a look at the Rhino Mocks constraints code for a good example of this.

Technorati Tags: ,

CSS Keeps Getting Better

I’ve knocked together a quick demonstration of what CSS can do these days.  Take a look (I’ll wait).

Ultimately, this is how this sort of styling should be done on the web. Importantly, I’m not a web designer and I haven’t used any “tricks” to deliver this. This is all hand-coded from easy to read articles on the web.  You can do this yourself without very much work; cutting and pasting the code on this page would be a good start.

The weakness is the anti-aliasing (Firefox does this slightly better than Chrome). If you had a single colour background and no transparency, you could fix that with the excellent cornerz plug-in. I’m not sure what other approaches you could do here. There’s a Javascript PNG generator that could be used to create corner images, but it would be very hard to be compatible with both the gradient and the background image. There’s also some nasty issues with the compatibility of background images and background effects: sometimes the wrong one shows on top. Finally, the opacity isn’t all it could be: there’s no easy way of getting the text fully opaque and getting the gradient bleed through at the same time.

As with most things CSS, it’s all a bit more complex than you think it should be. But it does work. Here’s hoping browsers improve their anti-aliasing and render the rest of this discussion moot.

Technorati Tags: ,,,,,

Notes on dotLess

I’ve been working on a restructuring of this blog and decided it was about time I entered the modern age.  I’ve always pretty much developed CSS by hand in Notepad.  This is, to say the least, a frustrating and time-consuming experience.  Now I’d heard of dotLess from Daniel Hölbling’s blog and thought it was time to give it a go.  (As an aside, I know that dotLess is a ruby port.  Many of these issues may well also apply to the original codebase.)

In summary, dotLess is good and I’m enjoying using it, but it could be great.  Let’s start with usability.  If you fail to terminate a variable, it doesn’t tell you.  It simply keeps going until it finds an appropriate termination.  No warning that you’re generating a variable name with >8000 characters and multiple newlines in its name or value.  It has issues with unterminated CSS directives as well.

Then there’s what happens if you refer to a mixin or variable with the wrong name.  It crashes, which isn’t too bad.  However, it doesn’t tell you what name it couldn’t find or where in the file it was processing when it failed.  Again, this just makes more work for the user, and wouldn’t be very hard to fix.  It’s also a pity the –watch command line parameter wasn’t working in the version I downloaded.

Current Selectors Nested Rules

I’ll admit, I wish I’d thought of the whole nested rules concept.  It’s simple, it’s powerful and it encourages good CSS design.  However, I can’t help feeling the whole thing could be even more powerful.

In particular, nested rules currently only handle elements within the declaring element.  Wouldn’t it be great if we could say the following easily?

  • The text for this element is red, including any links in the text.
  • The left margin on a UL is 0px, except in IE7 where it needs to be 8px.
  • The post’s heading should be red, unless it’s within a digest where it should be blue.

Basically, if we could have a syntax for “current selector”, we’d be able to do really cool things and make the dotLess files even smaller and more readable.  Here’s a proposal: have ampersand (“&”) represent the current selector in a nested rule.  You can keep the existing syntax as well: there’s no conflict between the two.  But now you can say things like this:

.redText { 
    &, & a, & a:hover { color : red; }
}

UL {
    margin-left : 0px;
    *:first-child+html & { margin-left : 8px }
}

.post h1 {
    color : red;
    .digest & { color : blue; }
}

This isn’t my idea, it’s already fully implemented in sass and less.js.  It’s an awful lot simpler than turning dotLess into progamming language, and solves most of the problems that people trying to solve by introducing programming language features.  This is part of a much longer rant: JavaScript already exists, Modernizr already exists.  dotLess is a useful tool, but duplicating functionality is going to be a waste of time.

CSS3 Problems

Using modern features and, in particular, unadopted features causes a few problems.  There’s two main issues here: if you define multiple values for a property, it just uses the last one.*  This is fine, except that web designers sometimes do this in the knowledge that different browsers will read different lines.

For instance,

h1 {
    background: -moz-linear-gradient(top, @primary, @primaryDark);
    background: -webkit-gradient(linear, left top, left bottom, from(@primary), to(@primaryDark));
}

Will generate a file containing only the webkit specific instruction.  Historically, that hasn’t been such an issue, but it’s becoming one at the moment.  Also, there’s another reason this code doesn’t work: dotLess variables don’t like being in a larger CSS expression, so in practice this code doesn’t even work for webkit browsers.

Use it

All in all, I’m going to continue to use it.  The operations stuff is incredible: I can actually do things like say “It needs to be the colour which at 80% opacity will merge with the background to make navy blue.”  Expressing stuff like the holy grail layout is relatively easy as well (never thought I’d say that, it’s normally a complete pain). 

*All of this was from the 1.0 beta download in January.  I’m not the first to spot this issue and for all I know it’s already fixed in source.

Technorati Tags: ,

Fail Early, Fail Partially

I seem to be starting a series of posts on good advice gone bad.  KISS is a great piece of advice, Fail Early is another.  Like KISS, it’s advice that scales: it’s as appropriate to function design (don’t return spurious default values on failure) as it is to project management (find out that something’s gone wrong now, not in three months time).  The basic idea is the same: provide instant feedback that something is wrong.  Don’t let the problem grow.

On the other hand, there’s nothing in “Fail Early” that says that you need to completely fail.  I once was advising on a store-front application.  It took the prices and products from various third-party suppliers.  However, if there was anything wrong with the data, the whole feed would fail.

Now, it certainly “Failed Early”.  But it didn’t observe the spirit of the phrase.  For one thing, it didn’t actually tell anyone.  It waited for someone to notice that the prices were wrong on the live site.  It’s definitely feedback, but really the sort you want.  The worst of it, though, was the “all or nothing” approach to failure.  Now, sometimes you’ll make a decision to do it this way, but typically this will be because data is inconsistent if you don’t.  A price list is not one of those cases.  99% of prices could typically be processed completely successfully.

So, Fail Early.  But don’t Fail Catastrophically.

Technorati Tags:

Top 10 Reasons You Should Write Documentation

10. Because C# isn’t your first language.

9. Because your mate on the desk next to you isn’t going to be there for ever.

8. Because activity is obvious, purpose is not.

7. Because you need to take a break from talking jargon all the time.

6. Because it may not reduce the number of stupid phone calls you get, but at least it’ll shorten them when you tell them the URL where you’ve already answered the question.

5. Because it keeps you honest: if it’s hard to explain, it’s probably wrong.

4. Because you’re not going to remember how it works in five years time.

3. Because it makes it easier to spot holes in your use cases.

2. Because the internet isn’t full of source code reading savants.

1. Because one day, someone might thank you for it.

And yes, for the record, I have been writing documentation recently.

Technorati Tags:

More Solution Transforms

I’ve finally managed to get enough time to knock out a release of Solution Transform.  The original Silverlight transform code is unchanged, but there’s a lot of new features.  It’s now usable for quite a number of build tasks.  I’ve even actually written a documentation page on how to use it.  if feedback is positive, I’ll actually make a binary distribution of the tool.

Basically, it now allows you to script conversion to Silverlight, adding and removing of projects, rebasing of assemblies, renaming of projects, and merging and demerging of solutions.  When I next get to it, I’m planning on enabling parallel VS2010 and VS2008 work. 

A note about the source code repository, the Castle Project as a whole is currently moving to github, and Contrib will no longer be hosted directly by Castle.  I’ve moved Solution Transform already because it was actually pretty easy for me to do, having no history that was worth keeping.

Technorati Tags:

Syntactic Macros in Boo

I recently had occasion to use Boo’s macro feature in anger.  In particular, I wanted to build something like “with” in JavaScript.  Those who aren’t familiar with Boo’s macros might think this couldn’t be that hard.  I mean, there’s an example that does this right on the page, isn’t there?

Ironically, the example actually demonstrates the limitations of boo’s macros, rather than their power.  Let’s start with the requirement to prepend method names with an underscore.  It shouldn’t be too hard to make it use a dot just like VB, should it?  Well, no, an expression that began with a dot wouldn’t be syntactically valid.  So here we have the first major restriction: you can’t create new syntactic structures, you can only abuse the ones that are already there.

That isn’t what I wanted to do, however, I wanted JavaScript’s with statement.  I didn’t want to prepend anything to the method name, I just wanted to use the method name and have it resolve to the instance’s method.  Shouldn’t be too hard, just look up the instance’s type’s method names and rewrite the appropriate expressions.  Only you can’t do that: they’re called “syntactic” macros for a reason: there’s no semantic information, including type, available at this point.  You’ll note that this has a knock-on effect on the using macro.  The check that the subject of the macro implements IDisposable is at runtime.  In C#’, it’s at compile time.  Probably not the end of the world, but I think people would be surprised if “using (3) {}” compiled in C#.*

There are other ways to achieve the effect, you could push delegates into the execution scope, but supporting overloading would be hairy in the least.  You could do an import, but that will only work with static methods.  In general terms, the syntactic macros don’t give you enough unless you’re prepared to be really ugly and the static type system takes away flexibility you’d have had in a proper dynamic language like JavaScript, Python or Ruby.

Now, I’ve had some reservations about the conceptual value of macros anyway, and this has firmed up my opinions.  Macros are only useful if they’re relatively easy for the user to understand intuitively what they do.  It’s interesting to look at Binsor here.  For the most part it presents you with a view of the world that suggests the mapping from the DSL to Castle Windsor is simple.  However, once you start digging under the hood, it gets unpredictable.  To this day, I have no idea how to configure a facility; I’m using code ripped off the internet for each one.  I remember it took me a day to figure out how to set a config value on a component.  Ultimately, it’s a pretty big code base for something that can be 90% replaced with 5 lines in a blog post.  (Not 100% yet, though.  I’ll let you know when I’ve done that.  🙂 )

Obviously, all tools have advantages and disadvantages.  But with Boo, I’m just not sure it’s got an advantage over IronPython I care about.  I eventually ended up adopting pretty much the “with” code in the sample, because I don’t believe it can be usefully improved. 

Technorati Tags: Boo,Binsor,DSLs