Sequence number breaks in FIX sessions seem to terrify most people. This is understandable: all of a sudden, a perfectly decent connection refuses to come up. What’s worse, if you get it wrong whilst fixing it, you run the risk of trashing data. However, not understanding them is dangerous. For one thing, without understanding them, it’s unlikely you’ll be able to prevent them. Worse, even if you never take a system down during trading hours, it could still crash.
Anyone who deals with a fix system should know how to resolve a sequence number mismatch. I recommend practicing on a test system, though.
A Refresher on How Fix Works
Let’s say that Alice is sending Bob messages. Since they don’t want to miss a message, they label every message with a sequence number. If Bob gets a sequence number that’s too high, he just sends Alice a request to resend the messages.
Now, breaks occur when systems recover from an outage. When they start off again, Alice and Bob send each other where they think they’re up to. If Bob gets a sequence number that’s too low, he gets worried. Alice seems to have forgotten she’s sent some messages. Bob now doesn’t trust Alice and refuses to talk to her until she gets her head together. That’s a sequence break.
So, here’s what can happen
- Counterpart sequence number too high: Bob hasn’t received enough, so the connection comes up and he asks for a resend.
- Counterpart sequence number correct: Bob has received exactly the right number, so the connection comes up and there’s no problems.
- Counterpart sequence number too low: Alice has forgotten something. Connection doesn’t come up.
When a connection doesn’t come up, it’s important to figure out exactly why the sender has lost information. In practice, however, the most common cause is that someone’s forgotten a heartbeat message. QuickFix sometimes does this on shutdown.
QuickFix Specific Stuff
If you open up the QuickFix sequence numbers file (your service has to be off to open it) you’ll see two numbers. On the left is the current sequence numbers for messages you’ve sent. On the right, messages you’ve received. Bear in mind that the numbers are the other way around for your counterpart. If you forget this, your connection may come up, but you’ll wish it hadn’t. All you need to do is to mutually agree a pair of numbers.
Bear in mind that Alice and Bob are in fact indulging in two-way communication. The QuickFix logs for your side may show a resend request followed by a logout. In those circumstances, your counterpart is probably seeing a bad sequence number on his side, which means you’re the guy who’s forgotten how much he’s sent.
Whatever number you decide upon, check the main message log for what you’re about to replay. This will usually just be heartbeats. If it isn’t, you’ve got a reliability problem.
So, here’s the quick guide:
- On the left (sent), at least as high as the other guy thinks he’s received.
- On the right (received), lower or equal to what the other guy thinks he’s sent.
If you’re running QuickFix on both sides, you can always just email the file across and swap the numbers round.
Deep Hacking
You can actually say you’ve sent more messages than you think you have. This can get you out of trouble, but it will completely snarf you if someone then asks for a replay. Thankfully, most don’t ask for a replay unless they lose messages.