Writing a more C#-ish Intersection Function

Okay, first off, this isn’t a Ruby versus C# post.  I actually find Ruby quite interesting and want to learn more: that’s why I’m reading posts about it.  It’s a post about how the language idioms lead you to different code designs.  This is following on from Alan Skorkin’s post about writing an intersection function in a Ruby idiom.  So I thought I’d do the equivalent for Alt.Net programmers.  Neither is massively useful code: both languages already have perfectly good intersection functions in them.  This is just a bit of fun (as was Alan’s post).

First let’s talk about the problem space.  The basic problem is to get the intersection of two pre-sorted lists with distinct elements.  The first interesting difference is the language: when asked for a list, the natural primitive in Ruby is an array.  In C#, the natural primitive is IEnumerable<T>.  Then there’s the question of sorting.  For Ruby, the natural answer: whatever “<” says the sorting is.  For C#, the natural thing is to introduce a Comparison<T> function that defines the ordering.

Implementing Each Continued

C# is, of course, statically typed, so monkey patching tricks aren’t going to work.  However, we have static extension methods which can be used for similar effects.  So, here’s my implementation of “Each_Continued”

public static IEnumerable<TValue> AsContinuable<TValue>(this IEnumerable<TValue> list)
{
    return new ContinuableEnumerable<TValue>(list);
}

It’s at times like this that I miss the ability to declare anonymous types.  However, one of the big changes is obvious here.  Rather than have a dedicated function that’s added into the class I’m trying to use, I’m just slapping a decorator on top of it.  This prevents any problems with two users of the each_continued function since the scopes will be different.

Now, IEnumerable<T> doesn’t have a concept of indexing (IList<T> does, but requiring that wouldn’t be very idiomatic).  As a consequence, Alan’s trick of feeding back the next index isn’t going to work for me.  Instead, I’ll use a simpler approach: whenever you restart the inner loop, you’ll start on the same element as last time.  I could fix this and make it more efficient, but I’m going for idiomatic here.

public class ContinuableEnumerable<TValue> : IEnumerable<TValue> {
    private IEnumerable<TValue> underlying;
    private IEnumerator<TValue> enumerator;
    TValue current;

    public ContinuableEnumerable(IEnumerable<TValue> underlying)
    {
        this.underlying = underlying;
        enumerator = null;
    }

    public IEnumerator<TValue> GetEnumerator()
    {
        if (enumerator == null)
        {
            // First time, start everything off
            enumerator = underlying.GetEnumerator();
            
        } else
        {
            // Later time, restart from the current position
            yield return current;
        }
        while (enumerator.MoveNext())
        {
            current = enumerator.Current;
            yield return current;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Okay, the code is inarguably longer, even if you ignore the curly-brace-induced vspace.  The support for non-generic enumerations is enforced by the type system, despite the fact we never intend to use it.  The necessity to declare, then initialise variables makes it longer than the Ruby equivalent.  However, I’d argue that the main implementation is concise and clear.

Implementing The Intersection Function

Ruby and C# both have generators, a for/foreach loop and function for iterating (Select in C#, each in Ruby).  However, since I’ve decided to go for a pure IEnumerable<T> approach, I really want to use foreach for the main loop.  The implementation of ContinuableEnumerable has been drive by that.

public static IEnumerable<TValue> PreSortedIntersection<TValue>(this IEnumerable<TValue> list1, IEnumerable<TValue> list2,
    Comparison<TValue> comparison
    )
{
    var continuable = list2.AsContinuable();
    foreach (var v1 in list1)
    {
        foreach (var v2 in continuable)
        {
            var comparisonResult = comparison(v1, v2);
            if (comparisonResult == 0)
            {
                yield return v1;
            }
            if (comparisonResult <= 0)
            {
                break;
            }
        }
    }
}

Personally, I find this function much easier to understand than the Ruby version, which operated through an interaction between the calling function and each_continue.  (You could, however, object that if you made it do the same thing as the Ruby version, it would be more complicated.)

Obviously, that’s fine in the general case, but what if you wanted to perform an intersection on numbers or dates?  You shouldn’t really have to define a comparison function every time.  So we add

public static IEnumerable<TValue> PreSortedIntersection<TValue>(this IEnumerable<TValue> list1, IEnumerable<TValue> list2)
    where TValue : IComparable<TValue>
{
    return PreSortedIntersection(list1, list2, (x, y) => x.CompareTo(y));
}

Which will, of course, work on any class that defines a natural order, including integers and dates.  Since you can’t polymorphically overload operators, IComparable<TValue> is the nearest C# has to Ruby’s “<“.  We can then write code like this:

var list1 = new[] { 1, 2, 3, 6, 7, 8 };
var list2 = new[] { 3, 4, 5, 6, 8, 9, 10 };
Write(list1.PreSortedIntersection(list2));

Finally, there’s an extremely large but invisible elephant in the room.  The ruby function evaluated eagerly, the C# version is lazy by default.  Again, this is idiomatic: we nearly always return lazy lists and call the extension method “ToList()” if we want them evaluated eagerly.  You could, of course, implement something much closer to the Ruby implementation if you used lists and anonymous delegates, but it wouldn’t be as idiomatic.  What you couldn’t do is to start throwing LINQ at the problem: I don’t think there’s a succinct way of expressing it using LINQ that has the right complexity.

Way Too Complicated

The only problem with all of this: Alan’s original implementation was, in many ways, clearer.  It was undoubtedly more efficient.  Ironically, removing the index variables and going straight to the C# enumeration interface gives you a very elegant implementation:

public static IEnumerable<TValue> PreSortedIntersection2<TValue>(this IEnumerable<TValue> list1, 
    IEnumerable<TValue> list2, Comparison<TValue> comparison
) {
    var enum1 = list1.GetEnumerator();
    var enum2 = list2.GetEnumerator();
    while (enum1.MoveNext() && enum2.MoveNext())
    {
        int comparisonResult;
        while (0 != (comparisonResult = comparison(enum1.Current, enum2.Current)))
        {
            if (!(comparisonResult < 0 ? enum1 : enum2).MoveNext())
            {
                yield break;
            }
        }
        yield return enum1.Current;
    }
}

C# and Ruby are both great languages, and closer together than many think, but it’s interesting to see how the differences send you down very different coding paths. 

P.S. I’ll explain the MapReduce tag in a later post…

Technorati Tags: ,,

Published by

Julian Birch

Full time dad, does a bit of coding on the side.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s