LazyWeak: A Low Memory Implementation of Lazy

Lazy<T> is a new class in .NET 4, but I imagine most developers have been running their own versions for years.  It’s used to avoid the fetching of expensive resources.  However, the fundamental assumption is that it’s the fetching of T that is expensive.  Assume, instead, that the cost of fetching/calculating T isn’t your main concern, but the memory storage cost.  For this, you want something with slightly different characteristics:

  • T is computed when needed.
  • T is cached
  • T is thrown away if the memory is at a premium.
  • T can be recreated if necessary
  • The external API should be the same as Lazy<T>

Sadly, the last isn’t quite possible, since Microsoft didn’t create an ILazy<T> while they were there.  (One day someone at Microsoft will read this and figure out where they’ve been going wrong.)  So, this is as close as I can manage:

using System;

/// <summary>
/// Like Lazy, only can recreate the object on demand
/// </summary>
/// <typeparam name="T"></typeparam>
public class LazyWeak<T> 
{
    private static readonly object __noObject = 3;

    private readonly Func<T> _factory;
    private readonly WeakReference _reference;


    public LazyWeak(Func<T> factory, T initial) {
        _factory = factory;
        _reference = new WeakReference(initial);
    }

    public LazyWeak(Func<T> factory) 
    {
        _factory = factory;
        _reference = new WeakReference(__noObject);
    }

    public bool IsValueCreated
    {
        get
        {
            return _reference.IsAlive && !ReferenceEquals(_reference.Target, __noObject);
        }
    }

    public T Value
    {
        get
        {
            var result = _reference.Target;
            if (ReferenceEquals(result, __noObject) || !_reference.IsAlive)
            {
                _reference.Target = result = _factory();
                
            }
            return (T)result;
        }
    }
}

You can create the LazyWeak<T> with an initial value.  This makes no sense in the case of Lazy<T>, but can be pretty useful here.  I’ve also aimed to ensure that the code is thread-safe.  If you spot a bug, let me know.

Finally, a quick test:

public class LazyTest
{
    static void Main()
    {
        int calls = 0;
        Func<int> factory = () => { calls++; return 7; };
        var lazy = new LazyWeak<int>(factory, 7);
        Console.WriteLine(string.Format("{0}:{1}", lazy.Value, calls));
        Console.WriteLine(string.Format("{0}:{1}", lazy.Value, calls));
        Console.WriteLine("GC");
        System.GC.Collect();
        Console.WriteLine(string.Format("{0}:{1}", lazy.Value, calls));
    }
}

There are lots of advantages of relying on the garbage collector, but you lose control over what’s going on. 

So, what’s it good for?  Well, it uses less memory than just storing T or Lazy<T> and it’s faster than repeatedly calling Func<T>.  So, it falls between those two, which may be the performance sweet spot for some problem in your application.  On the other hand, it’s fairly important that T is immutable.  There’s nothing stopping you changing T, but it would revert back to its original state each time the garbage collector ran.

It’s worth mentioning that this is one of many partial solutions to the long queue problem: stick LazyWeak<T> instead of T on the queue.  This can allow you to just keep keys in memory and leave values persisted elsewhere.  If you want to do both, you want a proper external queue.

Tweaking the Retlang Queue Channel

Okay, it’s time for one of my Friday code dumps.  I’ll warn you now: this post is ten pages long and I’m not really happy with it.  It’s probably going to continue to develop over time.  However, one of the things I like about Clojure is something I like about Retlang: the code is short.  As developers, we know you can’t measure productivity, but still, we tend to associate large amounts of code with large amount of functionality, which in turn we associate with “better”.

Retlang, on the other hand, is full of short pieces of code that do powerful and useful things.  The QueueChannel is an example of this.  Conceptually similar to nServiceBus’s distributor, it’s a pure load balancing solution: messages get processed on the first available queue.  However, a hardware load balancer typically does a bit more than this.  In particular, users tend to get associated with particular servers, which allows local caching to be effective.  Good servers take advantage of this for performance, but don’t rely on it.

Partition and Sum

One of the standard tutorials for Retlang is a lock free way of summing a large number of asynchronously produced numbers.  This basically works by single-threading the addition in the SummationSink.  Let’s complicate the example a bit:

  • You are receiving a large number of executions.
  • Executions have an order reference and a quantity
  • You need a sum of the quantities per order.
  • For the sake of argument, imagine that adding these numbers took significant time.

Now, ignoring the obvious plinq implementation of this, how would a reactive implementation of this problem look?

using System.Collections.Generic;
using System.Linq;
using System.Threading;

namespace LoadBalancer {
    class Execution {
        public int Quantity { get; set; }
        public int OrderId { get; set; }
    }

    public class SumExecutionsService {
        Dictionary<int, int> sumByOrderId = new Dictionary<int, int>();

        internal void Add(Execution value) {
            if (sumByOrderId.ContainsKey(value.OrderId)) {
                sumByOrderId[value.OrderId] += value.Quantity;
            } else {
                sumByOrderId[value.OrderId] = value.Quantity;
            }
            Thread.Sleep(value.Quantity);
        }

        public IEnumerable<int> OrderIds() {
            return sumByOrderId.Keys;
        }

        public long Sum() {
            return sumByOrderId.Values.Sum();
        }
    }
}

 

Ripping off the code for the summation tutorial would give us a working example.  Except that you’ve missed an important non-functional requirement: the actual summation is taking too long.  Using QueueChannel would make it faster, but you’d end up with orders being processed on multiple threads by multiple services.  (In this case you could patch up afterwards, but let’s assume you’re doing something where the operations don’t commute.)

So, what you want is something where you can tell which service/fiber is handling which request and keep it there.

using System.Collections.Generic;
using Retlang.Core;

namespace LoadBalancer {
    public interface IBalancer<T> {
        IDisposingExecutor Appropriate(T value);
        void Assigned(T value, IDisposingExecutor fiber);
    }

    internal class NullBalancer<T> : IBalancer<T>
    {
        public IDisposingExecutor Appropriate(T value) {
            return null;
        }

        public void Assigned(T value, IDisposingExecutor fiber) {
            return;
        }
    }


    class ExecutionBalancer : IBalancer<Execution> {
        Dictionary<int, IDisposingExecutor> fibersByOrderId = new Dictionary<int, IDisposingExecutor>();

        public IDisposingExecutor Appropriate(Execution value) {
            IDisposingExecutor result;
            return fibersByOrderId.TryGetValue(value.OrderId, out result) ? result : null;
        }

        public void Assigned(Execution value, IDisposingExecutor fiber) {
            fibersByOrderId[value.OrderId] = fiber;
        }
    }
}

So, what we want now, is an implementation of QueueChannel that takes a balancer as a constructor.  The behaviour when it’s given a null balancer should be unchanged from the original implementation.

A Sticky Queue Channel

So, here’s the modified version of QueueChannel.  It’d be nice to be able to unpick it from the consumer, but at the moment they’re pretty tightly coupled.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using Retlang.Channels;
using Retlang.Core;

namespace LoadBalancer
{
    public class BalancingQueueChannel<T> : IQueueChannel<T> {
        private readonly IBalancer<T> _balancer;
        private readonly Queue<T> _globalQueue = new Queue<T>();
        private readonly Dictionary<IDisposingExecutor, Queue<T>> _localQueues = new Dictionary<IDisposingExecutor, Queue<T>>();
        AutoResetEvent waitHandle = null;
        internal event Action SignalEvent;

        public BalancingQueueChannel(IBalancer<T> balancer) {
            _balancer = balancer;
        }

        public IUnsubscriber Subscribe(IDisposingExecutor executor, Action<T> onMessage) {
            lock (_localQueues) {
                if (!_localQueues.ContainsKey(executor)) {
                    _localQueues[executor] = new Queue<T>();
                }
            }
            var consumer = new BalancingQueueConsumer<T>(executor, onMessage, this);
            consumer.Subscribe();
            return consumer;
        }

        internal bool Pop(IDisposingExecutor destination, out T msg) {
            bool hasTransferredToLocalQueue = false;
            lock (_localQueues) {
                try {
                    var localQueue = _localQueues[destination];
                    if (localQueue.Count > 0) {
                        msg = localQueue.Dequeue();
                        return true;
                    }
                    // Not found in local queue, now it's time to process the global queue
                    lock (_globalQueue) {
                        while (_globalQueue.Count > 0) {
                            if (_globalQueue.Count == 1) {
                                int x = 0;
                            }
                            T candidateMessage = _globalQueue.Dequeue();
                            var fiber = _balancer.Appropriate(candidateMessage);
                            if (fiber == null || fiber == destination) {
                                _balancer.Assigned(candidateMessage, destination);
                                msg = candidateMessage;
                                return true;
                            }
                            hasTransferredToLocalQueue = true;
                            _localQueues[fiber].Enqueue(candidateMessage);
                        }
                        msg = default(T);
                        return false;
                    }
                } finally {
                    if (!hasTransferredToLocalQueue) {
                        CheckQueueEmpty();
                    }
                }
            }
        }

        private void CheckQueueEmpty() {
            if (waitHandle != null && _localQueues.All(l => l.Value.Count == 0)) {
                lock (_globalQueue) {
                    if (_globalQueue.Count == 0) {
                        waitHandle.Set();
                    }
                }
            }
        }

        internal int Count() {
            lock (_localQueues) {
                lock (_globalQueue) {
                    return _globalQueue.Count + _localQueues.Sum(x => x.Value.Count);
                }
            }
        }

        internal int Count(IDisposingExecutor executor) {
            lock (_localQueues) {
                lock (_globalQueue) {
                    return _globalQueue.Count + _localQueues[executor].Count;
                }
            }
        }

        public void FlushAndWait() {
            waitHandle = new AutoResetEvent(false);
            lock (_localQueues) {
                CheckQueueEmpty();
            }
            waitHandle.WaitOne();
            var queues = _localQueues.Select(x => {
                var handle = new AutoResetEvent(false);
                x.Key.Enqueue(() => handle.Set());
                return handle;
            }).ToArray();
            WaitHandle.WaitAll(queues);

        }
        /// <summary>
        public void Publish(T message) {
            lock (_globalQueue) {
                _globalQueue.Enqueue(message);
            }
            var onSignal = SignalEvent;
            if (onSignal != null) {
                onSignal();
            }
        }
    }

    internal class BalancingQueueConsumer<T> : IUnsubscriber {
        private bool _flushPending;
        private readonly IDisposingExecutor _target;
        private readonly Action<T> _callback;
        private readonly BalancingQueueChannel<T> _channel;

        public BalancingQueueConsumer(IDisposingExecutor target, Action<T> callback, BalancingQueueChannel<T> channel) {
            _target = target;
            _callback = callback;
            _channel = channel;
        }

        public void Signal() {
            lock (this) {
                if (_flushPending) {
                    return;
                }
                _target.Enqueue(ConsumeNext);
                _flushPending = true;
            }
        }

        private void ConsumeNext() {
            try {
                T msg;
                if (_channel.Pop(_target, out msg)) {
                    _callback(msg);
                }
            } finally {
                lock (this) {
                    if (_channel.Count(_target) == 0) {
                        _flushPending = false;
                    } else {
                        _target.Enqueue(ConsumeNext);
                    }
                }
            }
        }

        public void Dispose() {
            _channel.SignalEvent -= Signal;
        }

        internal void Subscribe() {
            _channel.SignalEvent += Signal;
        }
    }
}

 

You’ll notice the addition of FlushAndWait.  Retlang typically doesn’t have operations for checking that something is finished.  This reflects a real-time bias in the code: I imagine the original use cases simply shut down without clearing their queues.  However, it’s pretty useful functionality for testing purposes.  It’s also useful if you ever wanted to rebalance the queue: you’d have to stop processing first, then reset your balancer and continue.

Let’s Test It

using System;
using System.Linq;
using Retlang.Fibers;

namespace LoadBalancer {
    class Program {
        static void Main()
        {
            // Set up services
            var serviceFibers = Enumerable.Range(0, 16).Select(
                x => new { Service = new SumExecutionsService(), Fiber = new ThreadFiber() }
            ).ToList();
            var balancingQueue = new BalancingQueueChannel<Execution>(new ExecutionBalancer());
            foreach (var serviceFiber in serviceFibers)
            {
                serviceFiber.Fiber.Start();
                balancingQueue.Subscribe(serviceFiber.Fiber, serviceFiber.Service.Add);
            }

            // Create the orders
            var random = new Random();
            for (int i = 0; i < 25000; i++)
            {
                balancingQueue.Publish(new Execution {
                    OrderId = random.Next(1, 1000),
                    Quantity = random.Next(1, 10)
                });
            }

            // Wait for the end of the calculation
            balancingQueue.FlushAndWait();

            // Check the results
            var orderIds = serviceFibers.SelectMany(x => x.Service.OrderIds());
            Console.WriteLine("Order IDs were {0}handled by individual services.",
                orderIds.Count() == orderIds.Distinct().Count() ? "" : "NOT ");
            foreach (var service in serviceFibers)
            {
                Console.WriteLine(service.Service.Sum());
            }
        }
    }
}

So why doesn’t nServiceBus do this?

It’s fairly obvious that you could implement something like this in nServiceBus as well.  However, the nature of the application causes additional complications:

  • The balancer would need to be implemented on the distributor.  This isn’t really a problem for Retlang, where everything’s in memory anyway.
  • The distributor can handle multiple message types.  The code above simply can’t do that.
  • Currently the distributor has no state.  State on the QueueChannel isn’t as big a problem, simply because if the process fails both the QueueChannel and it’s consumers lose their state simultaneously.
  • Addressing these would result in significantly more infrastructure around code loading and rebalancing.  Retlang has no code loading other than the CLR.

So, the solution above is good for situations in which your services have important in-memory state that is closely associated with non-commutative messages.  It suffers, on the other hand, from the same issues as sharding: the need to rebalance every so often. 

Problems

As I said before, I’m not 100% happy with this post, not only because it’s 10 pages long.  The stickiness solves some problems and creates others.  In the worst case, you end up with no load balancing at all.  If you’ve got persistent state, you could move orders between fibers.  However, this would require you to check if an order was currently assigned to a particular thread, which in turn would involve tracking a count of currently queued messages per order.  This, in turn, would make IBalancer significantly more complex.  Then there’s the question of dynamic rebalancing, which I can assure you is a real, not a theoretical problem.  Basically, there’s no perfect solution: you’re always going to be balancing scaling, raw performance and potential race conditions.  I suspect this is basically just the CAP theorem in disguise.  Next, I’m going to try an approach based around sharding.  This will trade in memory state for better balancing.

Got Religion?

One of the accusations that gets levelled at agile/lean enthusiasts is that they seem to have “got religion”.  It’s an interesting metaphor, in that it focuses on one aspect of religious behaviour: the evangelistic, animated, passionate guy who wants everyone to know what he has learned and feel what he feels.  Sure, that guy can be irritating, especially if he’s got a megaphone on a major crossing when you’re trying to get to work* but let’s be honest, this behaviour isn’t dangerous and it’s often actually beneficial.

There are other forms of behaviour we associate with religion, however, and some are dangerous and positively insidious.  I’m thinking here of the guys who went after Galileo.  This behaviour starts from a set of assumptions and seeks to conform the world to those assumptions.  Received wisdom is treated as authoritative and inconvenient facts are either ignored, suppressed or co-opted into the world view.  Dissenting views are regarded as things to be eliminated.  Thankfully, these days we prefer insults thinly disguised as humour and negative performance reviews to thumbscrews, but these behaviours are nonetheless corrosive.

Personally, I think we’ve all got our little religions, whether they’re lean processes, time to market, PRINCE2 or constructor injection.  Belief itself is pretty neutral.  The question is whether our beliefs, and our behaviours, make us better, or make us worse. 

*If you’re wearing a cowboy hat right now, yes, I am talking about you.

Technorati Tags:

Building Leiningen and Circumspec from Trunk

It’s always the way that when you start learning a new technology you spend your entire time running into problems out of your comfort zone.  Getting Circumspec up and running was a particular challenge.  There’s two big problems:

  • Circumspec uses Clojure 1.2.0, which isn’t finished.
  • Worse, to lein repl, you need to compile Leiningen from trunk as well.
  • It doesn’t actually work on Windows.

These are solvable problems.  But it’s a bit of a nightmare.  First step, you’re going to need my modified lein.ps1 script.  If you’ve been following this thread, you’ll already have it, but I’ve fixed yet more problems with it (I can’t quite believe how much time I’ve spent on this thing.)

Compiling Leiningen from Trunk

So, from your root OSS folder (mine is d:juliandocumentsprojectsoss) type the following

git clone http://github.com/technomancy/leiningen.git
cd leiningen
lein deps
lein compile
lein uberjar

You should now see a file called leiningen-standalone.jar.  We need this for circumspec.

Compiling Circumspec from Trunk

Circumspec trunk doesn’t actually work under Windows.  So you’ll need to use my fix.

cd ..
git clone git://github.com/JulianBirch/circumspec.git
cd circumspec
lein deps
lein compile
lein jar
lein repl -clojureJar ..leiningenlibclojure-1.2.0-master-20100528.120302-79.jar -verbose -leinJar ..leiningenleiningen-standalone.jar

That last line is why you need the modified ps1 script.  You’ll now be in a Clojure trunk repl.

Now type (use ‘circumspec.watch)(watch).  You should see circumspec run its own tests.  You’ll also see that the colourizing code isn’t working, but that’s a problem for another time.

Technorati Tags: ,,

Thrush Operator: Extension Methods in Clojure

I recently had a go at Uncle Bob’s latest challenge.  Part of the problem was, given s and n, find the smallest factor that divided n that was greater than or equal to s.  In C#, you could express this as follows:

Enumerable.Range(s, n)
    .TakeWhile(x => x <= Math.Sqrt(n))
    .Where(x => n % x == 0)
    .FirstOrDefault();

I think of this as results from the Range method being “piped” through the other methods.  I use similar programming techniques in Powershell, where I actually use a pipe symbol.  Now, it’s worth bearing in mind that this syntax only works because of extension methods.   Actually, what we’ve just written compiles to

Enumerable.FirstOrDefault(Enumerable.Where(Enumerable.TakeWhile(Enumerable.Range(s, n), x => x <= Math.Sqrt(n)), x => n % x == 0));

If you’re thinking that’s not very readable, you’d be right.  Now let’s write the same thing in Clojure.

                                                                    (range s n) 
                                  (take-while #(<= % (Math/sqrt n))            )
       (filter #(zero? (rem n %))                                               )
(first                                                                           ) 

I’ve split it up onto four lines to make it anything like readable, but determining scope would be hard on a single line.  Wouldn’t it be nice if there was an equivalent of extension methods in Clojure?  (You’ve guessed it, there is…)

(->> 
    (range s n)
    (take-while #(<= % (Math/sqrt n)))
    (filter #(zero? (rem n %)))
    first)

Although we haven’t reduced the number of brackets, this feels a lot more readable, mostly because the scope of the brackets is much easier to see visually.  For me, this feels more elegant.

So, how do extension methods and thrush operators differ?  Clojure has two “thrush” operators.  ->> puts the “piped” object at the end, -> at the start of the parameter list.*  Extension methods in C# perform a restricted version of ->.*  In particular, you’ve got to explicitly declare that the function supports the syntax.  In Clojure, the caller declares that she wants the code interpreted that way.

Finally, it’s worth bearing in mind that extension methods are a compiler level technology.  You can’t write your own constructs like them.  In Clojure, ->> is just a macro.  You can read the source on github, and it’s not even long.  (It is, however, incomprehensible if you don’t grok macros.)

*Since this is Lisp we’re using here, the start of the parameter list is usually referred to as the second element of the list.

SIDEBAR:  F#, on the other hand, has |>, which works the same way as ->.  ->> just doesn’t make a lot of sense in context.  There’s also >>, which does nearly the same thing but doesn’t require an input value.  (>> & forms) would be equivalent to #(-> % forms).  (It’s also pretty close to comp) Clojure may not have >>, but it would be trivially easy to add.  On the other hand, it’s worth noting that the F#/ML syntax for expressing this concept is arguably more elegant.  LISP isn’t always the last answer in any discussion of elegance.

The Long Queue Problem

Dan North once described an advanced beginner as someone who makes all the mistakes an advanced beginner makes.  It doesn’t matter if you warn them about the mistake or not, they’ll still make it.  This post is about one of those problems.  Hopefully someone will find it useful, but what I know of the Dreyfus model suggests that it’ll only be useful after something fails in production.

Here’s the problem: let’s say you have a process.  It’s fairly easy to divide into two parts.  The first, A, is fast and feeds the second, B, which is slow.  So, you make it concurrent following some kind of pipes and filters arrangement.  Problem is, you get exactly the same effect when you push a motorway’s traffic onto a B-road: traffic jams for miles.

A traffic jam may not sound like a bad thing, but you need to monitor them.  Any long system queue can be a sign that your architecture is failing.  The problem is even worse if you’re using in-process concurrency like Retlang or the TPL.  All of your queues are in memory.  Too long a queue and you’ll crash.  But even if you’re using MSMQ, you’re still subject to the Micawber Principle.

Dealing With It

Here’s some things you can do that don’t fix the problem, but help you scale further:

  • If the queue is going to be huge, consider reducing the memory footprint of the message passed between A and B. 
  • Make your application 64-bit.
  • Make the queue persist to a database.  This doesn’t reduce the jam, but it does reduce the memory consumption.  (Obviously, if you’re using MSMQ or similar, this is done for you.)

Although they don’t address the root cause, these can be useful nonetheless.  They could allow you to scale as much as you need, but they’re not an unlimited solution.

An approach that sometimes fixes the problem is to parallelize B’s processing.  This is only appropriate in certain circumstances, and there’s a law of diminishing returns as you add more threads.  For instance, if B is 4 times slower than A, and perfectly parallelizable, then four worker threads will solve the problem.  However, if it’s 25 times slower, and parallelism maxes out at a factor of 16, it’s going to improve matters, but not fix it.

Finally, there’s addressing the root cause: 

  • Optimize B so that it’s fast enough.  Sometime possible and worth considering.
  • After you’ve tried that, redesign B into multiple parts, each of which are, on average, at least as fast as A.  If you’ve got a factor of 100 difference, that’s pretty much the only thing that’ll solve the problem.

Basically, balancing performance isn’t a nice to have, it’s essential if you want your system to work at all.

Leiningen and .NET Dependency Management

Over past few years, I’ve spent a lot of time thinking about dependency management, usually in the context of NHibernate and Castle Windsor.  The truth is that the problems these projects have are the problems any sufficiently complex open source project has.  So, having been playing with Clojure for several days I’ve got some thoughts about how the other half lives:

For the most part, the JVM has been phenomenally stable for the last few years, so there’s no analogue of the multiplicity of versions of .NET and Silverlight just isn’t a problem for them.  Another way of saying this, of course, is that the JVM is atrophying.  Clojure doesn’t support conditional compilation and I don’t think anyone’s come up with a way of compiling the same code base against Clojure and ClojureCLR (yet).  It’ll be interesting to see how that’s addressed.  Equally, there’s no particularly sensible way of compiling the same code base against different versions of another library.

Versioning

JARs and Assemblies are roughly analogous, but JARs don’t contain JVM-level versioning information.  This means you don’t get any of those “version doesn’t match” compilation errors like you get with .NET.  However, that doesn’t mean the problem isn’t there, just that you can’t see it.  If the versions really don’t match, your code will break (hopefully visibly, but that’s not guaranteed) and you’ll have no idea what you need to do to fix it.

So, how do they handle compatibility between versions?  Well, to be frank, they don’t.  Let’s give you the worst case scenario I recently encountered:  If you want to use Circumspec from trunk, you’ll need to be using its version of Clojure.  That’s not too bad, because of Leiningen’s dependency resolution.  However, in order to be able to run “lein repl” you’ll need Leiningen to be running (approximately) the same version of Clojure as Circumspec.

Now for the good news: Leiningen’s version handling (which is Maven and Ivy’s) beats anything in the .NET space.  You just specify something like [jline “0.9.94”] to declare a dependency on jline version 0.9.94.  If you want to use it, that’s all you do.  Leiningen will download it from the internet for you, probably from clojars.org.

So, why don’t we have something similar in .NET?  Well, basically, because our thinking is constrained by our tools.  MSBuild, and by extension Visual Studio, says that a reference is a file on your local hard drive.  Adding assemblies is a pain in the neck

  • Find the assembly on the internet
  • Download whatever format they’ve chosen to use
  • Incorporate it into your project using whatever convention you use
  • Now go into Visual Studio and add the reference.

Publishing

The story is as bad for publishing as assembly.  Here’s the process if you’re using Leiningen:

  • lein jar
  • scp $PROJECT.jar clojars@clojars.org:
  • (Update the version in project.clj)

Anyone who works on a .NET OSS project will recognize that the story is significantly more complex on their side of the fence.  This has a huge side effect: we make our assemblies large.  Take a look at this project.  The source is 17 lines long.  I post significantly longer scripts on my blog for people to use via cut and paste.  It easy to publish, and it was easy to consume.  Equally, you don’t tend to see solutions of large numbers of related projects.  Everyone just directly references each others assemblies.  (It’s worth noting that typically in Clojure the jar files actually contain the source code as well (possibly) compiled class files.)

In short, there’s a lot we can learn from the Clojure space.  The problem is that we’d need to break things in order to make them work again.  My ambition of having projects just load from solution files is impossible if you want this kind of flexibility.  However, the goal is the same: make code read/write.  It’s just that there’s multiple paths to get there.

Getting Compojure Working On Windows 7

It’s a lot easier to do decent installation documentation on a clean computer.  Today I tried building Clojure and Compojure from scratch.  First you need to be able to run powershell scripts.  If you can’t, do the following

  • Close all powershell windows
  • Start a powershell window as administrator
  • set-executionpolicy Unrestricted

(I’m assuming that if you have more sophisticated security requirements, you’ll know enough to implement them yourself.)

Building Clojure

I’m going to assume that everything gets built into a directory I’m calling D:OSS.

First, download the following simultaneously:

  • MSysGit.  I used the net install.  Try something like D:OSSMSysGit as the installation directory
  • At the same time, download and install the latest JDK.  Windows 7 64-bit uses the x64 version.
  • Download Ant and unpack it into d:ossant

Once git is installed, create a file git-powershell.bat in the installed directory with the following content:  (Note the explicit setting of JAVA_HOME: if you don’t do this, ant will try using the JRE, which won’t work.)

    @set PLINK_PROTOCOL=ssh
    @setlocal
    @for /F “delims=” %%I in (“%~dp0”) do @set git_install_root=%%~fI
    @set path=%git_install_root%bin;%git_install_root%mingwbin;%git_install_root%cmd;%git_install_root..antbin;%%PATH%
    @if “%HOME%”==”” @set HOME=%USERPROFILE%
    @cd %HOME%
    @set JAVA_HOME=C:Program FilesJavajdk1.6.0_20
    @powershell

This file is going to be your standard way of running your dev command line.  Click on it to get running and switch to the OSS directory.  Then type

git clone git://github.com/richhickey/clojure.git
cd clojure
ant

Building Compojure

Now get the lein ps1 script from GitHub and put it on your path.  Call it lein.ps1.  Then type*

cd ..
git clone git://github.com/weavejester/compojure.git
cd compojure
lein self-install
lein deps
lein jar

Running Hello World

We’re still not finished.  We’re going to need a script to run compojure:

$compojureDirectory = (split-path $MyInvocation.MyCommand.Path)
$jars = (ls (join-path $compojureDirectory *.jar),(join-path $compojureDirectory lib*.jar))
$classPath = [String]::Join(“;”, $jars)  # Create a class path with compojure and every jar in libs on it
$script = (join-path $pwd.Path $args[0])
java -cp $classPath clojure.main -i $script -r $args

Put that in compojure.ps1.  Now we finally try getting started with Compojure.

Create a file hello.clj

(ns hello-world
  (:use compojure.core ring.adapter.jetty))

(defroutes routes
  (GET “/” []
    “<h1>Hello World</h1>”)
  (ANY “*” []
    {:status 404, :body “<h1>Page not found</h1>”}))

(run-jetty routes
    {:port 8080})

You can theoretically put this anywhere you like, but put it in the Compojure directory for now.  Now type “.compojure.ps1 hello.clj” and your web server should start.  Navigate to http://localhost:8080/ and you should see:

Hello World

If you’re thinking apt-get was easier, you’d be right.

*The instructions get Compojure working, but not against the trunk Clojure.  For that, you’ll need to copy the jar file around after lein deps.

UPDATE: The original lein.ps1 script worked as for as these instructions were concerned, but failed in a number of hard-to-debug ways.  As a consequence, I’ve updated it and merged it with lein.bat, which works but involves source hacking the whole time to get it to work.  Hopefully the new version will be the best of both worlds.

Technorati Tags: ,,

Proxy Server: Getting IronRuby Working

Quick installation instructions:

Getting the Proxy Server Running

Annoyingly, you need to change the code from the previous post, because I used a 1.9.1 idiom: the new lambda syntax and WEBrick doesn’t seem to work under IronRuby’s 1.9 mode.

require ‘webrick’
require ‘webrick/httpproxy’

server = WEBrick::HTTPProxyServer.new(
    :Port => 8080,
    :BindAddress => ‘0.0.0.0’,
    :ServerType => Thread,
    :RequestCallback => Proc.new {|request,response| puts “#{request.unparsed_uri}” }
)
server.start

puts “Hit return to quit”
STDIN.gets

Sure enough, the performance is perfectly fine, meaning it’s likely that the whole virtual machine installation is the source of the problems.  This is unfortunate, since that was the purpose of the exercise.  It’s starting to look like I may have to do a full installation onto my computer, which in turn means stopping using Notepad++.

There’s a bit more about WEBrick and proxy servers here, if you can cope with the automatic smiley generator damaging the code.  There’s an MSDN example of a web server as well.

Technorati Tags: ,

My First Ruby App

Okay, if you’ve got a working Ubuntu installation from the previous steps, this gets ruby working:

  • sudo apt-get -y install ruby1.9.1
  • sudo apt-get install rubygems1.9.1

Note that the commands installed are “ruby1.9.1” and “gem1.9.1” instead of “ruby” and “gem”.  You can now follow the Sinatra home page instructions and get Hello World running on your machine.

At this point, I’m starting to get annoyed at how long everything takes to install on windows.  But the next bit was a bit more of a shock to me.  Ruby fans will know what’s coming.

You see, my plan was to sit down, learn some Ruby and try to stick together a simple, well-specified program.  So, I thought, let’s try an HTTP Proxy Server.  It’s not hard, it’s easy to verify that it works, and it involves threads and sockets, so there’s plenty of stuff to play with.  Ruby fans will no what’s coming next: it’s also an API in the standard distribution.

So, if you want to write a proxy server in Ruby, consider the following code:

require ‘webrick’
require ‘webrick/httpproxy’

server = WEBrick::HTTPProxyServer.new(
    :Port => 8080,
    :RequestCallback => ->(request, response) { puts “Intercepted:  ” + request.request_line }
)
trap(“INT”) { server.shutdown }  # This catches Ctrl-C
server.start # start the server

It is, however, pretty slow, but I’m not sure if that’s just a reflection of the way I’ve got everything set up with VMware instances.  So next I’m going to see if I can get this working on IronRuby.

Technorati Tags: ,,