Abstract Factory 61 Revisited

I can’t be the only one glad that the Google Testing Blog has become active again (and I don’t mean “we’re having a conference on another continent” style posts, much as I eagerly await those missives.  But this post has actually crystallized my understanding of one aspect of designing for testability.  There are probably some who will regard this one as old news.  It is, however, pretty fundamental.  To summarize, it says that any given class should either create objects or use them. 

So, let’s assume most of your objects are constructed using auto-wiring of dependencies.  We can ignore cases like List<T>, since they’ve got no externalities and have well-known behaviour.  That still leaves us with some cases where we need to use an object that’s only purpose is to create other, specific, objects.  Which is, of course, the abstract factory pattern.  Now, anyone who’s skimmed the Gang of Four book (and let’s face it, most of us skimmed it) will know the pattern, although they’ll probably have trouble remembering which way around Abstract Factory and Factory go.  (Factory creates one object, and hence is probably best represented as a delegate in C#.)  However, what the “keep instantiation quarantined” approach means is that the section on Abstract Factory use cases can be rewritten as:

Use:  Any time you want to create an object that wasn’t instantiated by your auto-wiring of dependencies.

You’ll note that I didn’t say “that wasn’t instantiated by your IoC container”, that because calls to the container should be treated exactly the same way as calls to new, which is why Windsor doesn’t give you a static method to access the container. Jeremy Miller emphasizes this too, even if StructureMap does have a static accessor.  I’ve written some pretty bad code before today using that accessor…

Quick Tip: Don’t use Atom.aspx with Feedburner

If, like me, you’ve set up Feedburner with a Subtext 1.9.6 blog, point FeedBurner at RSS.aspx, not at Atom.aspx.  Most things work whichever way you do it, but when you view an article in Google Reader, it won’t include a link back to the original article.  As far as I can tell, this is because the Atom feed uses rel=self rather than rel=alternate.  This appears to be fixed in the latest code base, but that means it’s part of the huge 2.0 release.

It will take a while for FeedBurner and Google Reader to catch up.  I mean hours, not minutes, but it will fix the problem.

Technorati Tags: ,

Reporting considered harmful

Oren has been saying that he completely disagrees with Stephen Forte’s assertion that database models support more than one application.

Well, he’s right and he’s wrong.  He’s right in that data models (and especially database models) can and should be private to the application.  In practice, they never are.  There’s always someone adding a “quick fix” piece of functionality on the side.  Let’s look at the main categories:

  • Importing data into the system.  (Including data entry systems.)
  • Adding a separate program e.g. Invoicing added to a tracking system
  • Exporting data to other systems.
  • Reporting

Now, of these, the first is the only one with a right to exist.  The only real problem is that people don’t treat it as a proper project.  The second is a hugely bad idea, because the database is not a good API integration point.  The data model you’ve used for the tracking system isn’t the model needed for invoicing, you’ve basically started down the route of trying to develop the canonical One True Data Model for your business.  (Udi Dahan makes a remark in his latest post about this being a red herring.)  I’m going to slightly labour this point, but it’s important and counter-intuitive to most developers, including the younger me.   Beyond a certain point, trying to standardize the data that your system uses is an exercise that possibly results in intellectual satisfaction and may look, on paper, very impressive, but in fact makes communication extremely difficult and tedious.   If you try to build the Tower of Babel, Babel is exactly what you’ll get. 

The third is functionality that needs to be there, but it shouldn’t be hitting the database directly.  No, not even a “quick export”.

Reporting, on the other hand, is the devil.  Anyone who, like me, worked on VB3 back in the 90s will know that Crystal Reports is the devil.  Unfortunately, some people remember technical limitations as being the problem, not that reporting packages are inherently diabolical.  Business Objects, MS-SQL Reporting Services, the lot.  That’s not to say they’re not powerful and occasionally useful, but I highly recommend using a long spoon.

So, why are they so dangerous.  Well, we need to think about what they’re actually used for.  In practice, they’re usually used as a short-cut to add an extra feature (Management Reporting, Invoicing) or as a data export feature.  (MS-SQL Reporting Services is actually really good at this.)  And how do they achieve it?  By direct data binding.  Yup, the very technique that agilists, alt.netters and anyone with an ounce of self-respect has been trying to eliminate from our arsenal of techniques for years.

Direct Data Access

Now, lets go back to Oren and Stephen’s disagreement.  Let’s look at why people want to access the database directly.

  • It’s easy to develop.
  • It’s ready right now.
  • It’s stable (if the database wasn’t there, you wouldn’t have an application anyway)

The problem is, you’re running up a huge balance on the credit card of technical debt.  Here’s some problems:

  • You’ve just made whatever database structure you have right now a de-facto specification.
  • It’s unlikely you have any real idea of who is accessing your database or why.
  • The external systems don’t synchronize through your business logic (this is not so bad for read-only scenarios, but can be lethal in writeable scenarios.)
  • You’ve no control over exactly what locks are getting put on your DB.

Now, I have to admit, when I need to get data from an external system, I often request direct database access.  Why?  Because I’m not the one who’s going to suffer when these problems come up.  On the other hand, I fight tooth and nail when the shoe is on the other foot.

Getting that spoon out

One of the smartest things my last company did was to develop a database specifically for management reporting.  It was being constantly changed, but the truth is that those costs would have been there anyway, just not as visible.  It also enabled us to highlight that concepts in one part of the business were subtly different from what appeared to be identical concepts in another part.  Of course, it can be quite hard to get people to buy into this for “just a quick report”, which means you’re usually better arguing that it would be cheaper to add the functionality directly to the application.  After all, the implementation of a proper reporting package can be quite time-consuming.

Arguing for a separate database to contain data “you’ve already got” can be a hard sell, especially to non-technical types.  As with all sales, however, you’re probably best off differentiating the systems.  Selling an OLAP cube is, ironically, easier, even if you don’t think the requirements really justify it.  However, reporting requirements only ever grow.  Seven year old reports are still being used by someone in the organization, even if you don’t know their name.

And if you’ve already got a system that reporting has got its tentacles into?  Then I’m sorry, but it’s going to take a lot of spade work to dig out of that hole.

Developers aren’t designers

I know a fair bit of CSS.  I know about the three pixel bug, I’ve even contributed IE5 fixes to three column layout solutions.  However, I’ve just been reminded extremely forcefully of my limitations.

As is pretty obvious, I’m using SubText as my blogging engine.  It’s a good, fully featured system that supports more use cases than you might imagine if you’re still thinking “How hard can it be to write a blog engine?”.  It comes with a number of default skins and a “Naked” skin for developing your own skins.

The skinning system is really powerful and very easy to understand.  Still, I highly recommend avoiding it.  Six hours later and I’ve re-learned what I already knew: developers aren’t designers.  Don’t get me wrong, it was alright, in a Web 1.0 kind of a way.  I could have spent a couple of days learning Photoshop and getting rounded corners and gradients working.  I could have banged my head against a wall for several days trying to figure out why IE rendered a block one way on my local machine and another once I’d uploaded the skin.  The fact remains, at the end of the day, in that time I’d still have something that looked like a MySpace page, because visual design isn’t my strength.  In that time, I could have done something far more rewarding, like writing another article or watching Wall-E.

You want some evidence?  I’m afraid I’m too embarrassed by my own effort to publish it.  Take a look at Ayende’s site.  Cracking content, amateur design.  Now take a look at this one.  This is the weblog of the guy who wrote SubText.  Looks a lot better, doesn’t it?  That’s because the developer of SubText didn’t write his own skin.  This is Adam Smith’s theory of comparative advantage writ large.  Or, to put it another way, yet another example of why you should buy rather than build.  (Writing your own blog engine is another, although that hasn’t stopped Martin Fowler, another case of amateur design and fabulous content.)

So, if you’re blogging, what are your choices?

  • Be a crack visual designer
  • Buy a skin
  • Use one of the defaults

In practice, only really the third option was open to me.  So, having spent a week publishing using the Origami skin, I’m now publishing using a slightly modified Origami skin slightly more to my taste.

It seems like sooner or later, every blog becomes about blogging.  Let’s hope it’s just a phase.  Scott Hanselman recently said exactly the same thing, although he wasn’t talking about blogging at the time.  However, wheras I’ve just wasted six hours of my own time, it’s amazing how many companies delegate design duties to developers or other unskilled workers.  The results are predictable, and much more costly than one night’s sleep.

Technorati Tags: ,,

P.S. After reading all of this and you still want to give it a go, Simon Phils has written an excellent guide to the architecture.

Denormalize data using NHibernate’s Property Accessors

One of the most useful things you can do when setting up an NHibernate environment is to sort out of the XSDs so that Intellisense works in Visual Studio.  Note that the documentation is rather out of date.  Most people are now using the 2.2 schema, not the 2.0 schema, and the Visual Studio directory in 2008 is C:Program FilesMicrosoft Visual Studio 9.0XmlSchemas.  Especially notice the “.0”, which is new.

Sorting this out not only allows you to more quickly see errors in your HBM files (a major benefit in itself) but also enables one of my favourite learning approaches: Intellisense exploration.  This, in combination with reading the documentation, can give you a fairly broad picture of the functionality.  In particular, it provided me with an answer to a question that had been bugging me for a while: how to denormalize data with NHibernate. NHibernate’s support for dealing with database structures that do not map directly to your domain is one of its strongest features, but nearly all examples on the Internet deal with the simple case of mapping directly to a closely related data structure.  This is hardly NHibernate’s fault: Microsoft examples suffer from exactly the same problem. 

So, here’s the scenario: I’m writing a lot of code that deals with FIX messages.  FIX messages do not easily map to database structures, however certain parts of the average message are really useful for searching (generally, the order and execution references that allow you to reconcile the messages against other systems).  Now, this produces a use case which favours denormalization:

  • You can’t throw information away, so you have to log the full message in its native format.
  • Since this isn’t searchable, you need relational fields corresponding to some of the content of the message.
  • Furthermore, none of this is in any way related to the domain requirements of the original FIX message class.

So, really what you want to go is to create virtual properties that can be associated with your class, that NHibernate can use and nothing else.  This feature is called “Property Accessors”.  So, in my HBM file, you write:

<property column="ClOrdId" name="Tag11" type="String" access="ColourCoding.FixPropertyAccessor, ColourCoding" />

Which maps tag 11 of the FIX message to the “ClOrdId” column in the database.  FixPropertyAccessor is a class that implements NHibernate.Properties.IPropertyAccessor.  Now, when using custom accessors, NHibernate can’t use reflection to determine types.  This means that you will need to remember to use the type attribute to specify the target type. 

I’ve included an example at the bottom.  This one won’t compile until you delete the contents of the “Get” method, since the logic is specific to the application from which this is excerpted. 

Aside:  There is, in fact, a feature to allow the property accessor itself to give you this information, but it’s pretty much designed only for internal use.  If it just obtained the information from the IGetter’s target type, there wouldn’t be a problem.  Ho hum.

using System;
using System.Text.RegularExpressions;

using NHibernate.Properties;

namespace ColourCoding
{
    public class FixPropertyAccessor : IPropertyAccessor
    {
        public IGetter GetGetter(Type theClass, string propertyName)
        {
            return new FixProperty(propertyName);
        }

        public ISetter GetSetter(Type theClass, string propertyName)
        {
            return new FixProperty(propertyName);
        }

        class FixProperty : IGetter, ISetter
        {
            int _tagNumber;
            string _propertyName;
            TypeCode _typeCode;

            public FixProperty(string name)
            {
                _propertyName = name;
                _tagNumber = GetTagNumber(name);
                _typeCode = GetTypeCode(_tagNumber);
            }

            static TypeCode GetTypeCode(int tagNumber)
            {
                // Stripped down for clarity.
                switch (tagNumber)
                {
                    case TagNumbers.SendingTime:
                        return TypeCode.DateTime;
                    case TagNumbers.MsgSeqNum:
                        return TypeCode.Int32;
                    default:
                        return TypeCode.String;
                }
            }

            static Regex __tagRegex = new Regex(@"Tag(?<Id>d+)");

            static int GetTagNumber(string propertyName)
            {
                Match match = __tagRegex.Match(propertyName);
                if (match == Match.Empty)
                {
                    throw new ArgumentException(
                        string.Format("Property name '{0}' should be in the format 'Tag' followed by a number.", 
propertyName), "propertyName"); } int result; if (int.TryParse(match.Groups["Id"].Value, out result)) { return result; } throw new ArgumentException( string.Format(
"Property name '{0}' was in the correct format, but the numeric part did not parse as a number.", propertyName), "propertyName"); } public object Get(object target) { // Obviously, this part of the code will not compile on your machine. // I'm including it to give the general flavour of what you'll need to implement. FixMessageTracker tracker = (FixMessageTracker)target; FixMessage message = tracker.OriginalMessage; string value = _tagNumber == TagNumbers.MsgSeqNum
? message.Header[_tagNumber]
:
message[_tagNumber]; if (string.IsNullOrEmpty(value)) { return null; } switch (_typeCode) { case TypeCode.Int32: int result; return int.TryParse(value, out result) ? result : 0; case TypeCode.DateTime: return FixMessage.ParseDate(value); default: return value; } } public object GetForInsert(object owner, System.Collections.IDictionary mergeMap,
NHibernate.Engine.ISessionImplementor session) { return Get(owner); } public System.Reflection.MethodInfo Method { // We don't support this feature, so we return null get { return null; } } public string PropertyName { get { return _propertyName; } } public Type ReturnType { get { switch (_typeCode) { case TypeCode.Int32: return typeof(int); case TypeCode.DateTime: return typeof(DateTime); default: return typeof(string); } } } public void Set(object target, object value) { // You can't write to a denormalized FIX property. return; } } } }
Technorati Tags:

Gotcha: Don’t use anonymous delegates within loops

Okay, what do you think the following code prints?

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
static void Main(string[] args)
{
var strings = new string[] { "Hello ", "World" };
var actions = new List<Action>();
foreach (var s in strings)
{
actions.Add(() => Console.Write(s));
}
actions.ForEach(action => action());
}
}

If you think it says “WorldWorld”, you don’t really need to read the rest of this article.  You can skip to the section entitled “So you think you’re so smart”.

If, on the other hand, you think it prints “Hello World”, it’s worth explaining why the code doesn’t behave as expected.  The issue is the semantics of anonymous delegates.  Variables referenced within anonymous delegates are “live”.  If they change within the function, the latest value is used.  If they go out of scope, the last value is used.

The moral of this story is: don’t create a delegate in a loop, because it probably won’t do what you’re expecting.*  This bit me recently whilst trying to write code to stop a set of threads simultaneously.  The best option is to refactor the code and extract the line that adds in the action.  Of course, that gives you an ugly dialog box to deal with.

*Actually, any time the variable is changes before you use it, but loops are the most common case.

 

semantic-change

 

Actually, what the dialog is telling you is that extracting the method will result in the behaviour changing.  Luckily that’s what we actually want.  So, here you go, an even less elegant way to print “Hello World” on the console:

    static void Main(string[] args)
{
var strings = new string[] { "Hello ", "World" };
var actions = new List<Action>();
foreach (var s in strings)
{
AddAction(actions, s);
}
actions.ForEach(action => action());
}

private static void AddAction(List<Action> actions, string s)
{
actions.Add(() => Console.Write(s));
}

So you think you’re so smart

Okay, how about this code: 

    static void Main(string[] args)
{
var actions = from s in new string[] { "Hello ", "World" }
select new Action(() => Console.Write(s));
actions.ToList().ForEach(action => action());
}

If anyone can explain why this prints “Hello World” succinctly, there’s a beer in it.  The interesting things is that the “ToList” ensures that all of the actions are produced prior to execution, so it’s not to do with the co-routing aspect of Linq.  I believe it’s an example of why expressions of delegates and delegates are slightly different.  A more prosaic explanation would be simply to point out that Linq would be horribly broken if this didn’t behave.

Getting Started: How to Pause a Retlang Process Context

Well, in the words of Scott Hanselman “We’ll see how long this lasts.”.  Anyway, I should start as I mean to go on by providing some code that’s actually useful.

I’ve been using Mike Rettig‘s Retlang library a fair bit recently, and have nothing but praise for it.  I’ll go into more detail about why it’s great at a later date, but here I’ll just detail a problem that I encountered and how to solve it.

Let’s say that you’ve got a system that writes to the database and you need to bring the database down.  You need to stop your Retlang-using service from processing (or at least certain process contexts). 

Luckily, Retlang is a very well designed system with lots of injection points.  Here, we use the fact that each process context has an executor associated with it that actually processes the commands and we pause that.  There are the following design decisions worth understanding:

  • Retlang is heading increasingly towards a thread pool model, and that this code only works with the original “one thread per context” model.  If you used this in a thread pool model, it would not only pause the contexts you wanted to pause, but any contexts sharing the thread.
  • Batched commands are paused halfway through.  This behaviour is what I required, but it’s easy to change.
  • Existing running commands will complete.  This is desirable, since any process that could be paused mid-command could also be split into separate commands.
  • There’s no way to tell when the currently running commands complete.  This is unlikely to be a problem for you.

To use it, just create the context with this executor.  As you can see, the HaltingExecutor passes its commands onto an underlying executor.  This allows you to add logging or your own custom logic that you’ve already implemented.

using Retlang;
using System.Threading;

namespace ColourCoding.Parallel
{
public class HaltableExecutor : ICommandExecutor
{
ManualResetEvent _block = new ManualResetEvent(true);
ICommandExecutor _underlying;
bool _isHalted = false;

public HaltableExecutor()
{
_underlying = new Retlang.CommandExecutor();
}

public HaltableExecutor(ICommandExecutor underlying)
{
_underlying = underlying;
}

public void ExecuteAll(Command[] toExecute)
{
foreach (Command command in toExecute)
{
if (_block.WaitOne())
{
_underlying.ExecuteAll(new Command[] { command });
}
}
}

public void Halt()
{
lock (_block)
{
_isHalted = true;
_block.Reset();
}
}

public void Resume()
{
lock (_block)
{
_isHalted = false;
_block.Set();
}
}

public bool IsHalted
{
get
{
lock (_block)
{
return _isHalted;
}
}
}
}
}
Technorati Tags: ,