Okay, first off, there hasn’t been a post in, conservatively forever. There have been a number of reasons for this: holidays, people visiting during the weekend (when I write this blog) and swine flu are probably the top three. That and I managed to lose my post on Liskov, so I’m going to have to re-write. But anyway, I thought that more than one person I know could do with reading a quick guide to SQL Server indexes.
Basics of SQL Server Performance
The first thing you need to know about is pages. SQL Server puts rows onto pages. If you access a row, the DB will load the entire page. Whenever you’re talking about performance, 99% of the time you’re talking about how many pages get accessed. Pretty much everything else, including processor time, is irrelevant. Well, unless you decide to do something stupid like a user-defined function or a trigger. But for straight DB access, pages are what it’s all about.
Now, an index is basically a tree. It’s a tree of pages. The top level of the index is a page with rows. Each row points to the page representing the next level of the index. This works all the way down. Finally, you get a link to the actual row itself. This is your basic non-clustered index.
There’s a few things to note:
- If you’ve got 5 columns in your table, but only two in your index, the index rows will only contain two index columns.
- The rows in the index will be ordered, typically in ascending order. (You can do it descending, but it’s rarely useful.)
This can speed up the following operations:
- Filtering on the basis of columns in the index
- Ordering using the exact ordering in the index
- Selecting just the columns of the index.
The second one is highly unlikely to be used, the first is the typical usage, but the last is really rather interesting. Let’s say that you’ve got two columns in your index, and your query only uses those two columns. Then, since the index structure is pretty much identical to the structure of the table itself (rows on pages), it can actually query the last level of the non-clustered index and never actually hit the table at all. If you’ve got some “lite” versions of tables in your database structure, now would be a good time to throw them away and just replace them with indexes. This is called “index covering” and is a vastly under-used technique.
Every index you add has to be maintained on every insert and update. This, obviously, adds to the cost of entering data into the system. In practice, most systems I encounter could easily handle more non-clustered indexes, since the alternative is dreadful select performance. However, keep an eye on it and measure it.