Why Your Code Has Lots of Spelling Mistakes

Spelling mistakes are cognitive noise. Everyone makes them, but you’d be hard pressed to find any obvious ones in the average word document for a simple reason: Word’s got a spell checker. So, for that matter, has Firefox. In general terms, if you’re writing English text, a spell checker’s got you covered.

So, what’s the state of play for spell checkers on programming language text? Parlous. Forget contextual errors and weird homophones, just basic “Is this word in the dictionary?” stuff doesn’t work in most editors.

I probably wouldn’t have thought that deeply about the problem if it wasn’t for a rather excellent plug-in for Visual Studio with a profoundly boring but search friendly name. This does everything I need and improves my day job immensely.

So, I thought, how about I write something to do the same thing for Visual Studio Code? Well, I’ll save you reading the rest of the article and say that you are basically doomed. And if you use Vim or Emacs and are feeling superior right now, let me tell you right now that you’re basically doomed for the same reason.

Check Yourself

Let’s start with what a spell checker for a programming language should do:

  • It should be able to spell check comments.
  • It should be able to spell check identifiers.
  • It should be able to distinguish between identifiers defined in the file from identifiers that are not, or you’re going to spend your life with errors you can’t correct.

This is pretty minimal. I’d also like to be able to include directives that specify language etcetera.

The comment requirement’s not too bad. There’s only a couple of comment formats out there and you could easily write some code to support them all. Identifiers are a whole different ball game, though. To recognize when something’s an identifier, you need at least a lexer for the language. Even that’s surmountable: there’s CSON files for most languages out there. The final requirement is the one that’s going to break you. Recognizing when and where an identifier is defined requires a parser.

Just as well we’ve got language servers to help us out isn’t it? No, they’re not going to help a bit. Language servers don’t really have any high level protocols. They just say “offer the following actions here”. What our putative spell checker plug-in would need is a protocol that said “this is the definition of a PascalCase identifier”. This is kind of doubly moot since Visual Studio Code only allows one language server at once, so any hope for modularity is gone before we started.

So, what can be done? Well, you could write a spell check library (most of the hard work’s already done for you) and then try to submit PRs to every last language server there is. This sounds like a prohibitively large amount of unpaid work.

Check Please

Unless your editor has spell checking built in and a way for your language plug-ins to interact with it and explain their identifier conventions et al, the chances are you’ll never have a decent spellchecker in your code editor. Meanwhile your code will continue to have spelling errors and your colleagues will curse you for being an illiterate. Of course, you’ll be cursing them at the same time whilst staring at the screen displaying the tool whose fault it really is.

 

Published by

Julian Birch

Full time dad, does a bit of coding on the side.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s