Indexes are an incredibly important tool that, if used correctly, will help you see huge performance improvements in your database. This article takes you through the basics.
As a database grows, queries that were once fast slow down quite considerably. If you have any queries of even slight complexity (for example using joins and group by) the performance problems start to appear all the sooner.
The reason is simple: the database needs to find the data that is part of a query each time the query runs. If there is lots of data, and if it is in no particular order, this can be a slow and intensive process.
To speed things up, we use indexes.
Two types of Index
A clustered index is an index that affects the order in which data is stored on the physical disk. If I have a phone book, and I create a clustered index on the LastName field, the record for, e.g., Mr Cumberdale will be stored before, e.g., Mrs Stewart-Baxter.
A non-clustered index is more like an index in a book. It points to where a record can be found.
Both indexes are optional for a table in a database but most tables contain a clustered index by default. Only one clustered index is possible for a table (since the data can only be stored once and in one physical order).
When you create a table that contains a Primary Key, that field will typically become the clustered index automatically. This is by far the best typical usage for clustered indexes on most tables, and you should leave it this way until you more fully understand indexes. Even then, you will almost certainly leave it this way more often than not.
How to Index
In Enterprise Manager (SQL Server 2000), right click the table to be indexed. In the pop-up menu, choose Manage and then Indexes. You can click ‘Add’ to create a new index.
In SQL Server Management Studio (SQL Server 2005 and later) ‘open’ the table’s icon in the navigation pane, and then open the sub-folder ‘Indexes’. Right click the Indexes folder and choose ‘New Index’.
I usually name my indexes “Idx_TableName”. If I need more than one index (mostly you won’t, and you should certainly try and avoid it) I use another _Suffix to describe the reason for the index.
You click the tick-box next to each column name to add it to the index. In SQL Server 2005 you need to click the ‘Add’ button to see these check-boxes.
What to Index
You must not index all of the columns in a table. Instead, you should index the columns that are used for joining, searching and ordering.
For example, consider this query:
SELECT PersonId, FirstName, LastName, PhoneNumber, CompanyName
FROM People p
INNER JOIN Company c
ON c.CompanyId = p.CompanyId
WHERE p.LastName LIKE 'a%'
ORDER BY p.FirstName, p.LastName
We’re going to index the People table.
First of all, I always index the primary key column, even if it already has an index of its own. I don’t believe this is necessary, but I used to think that it helped the query ‘link’ the data in one index to the real data (via the clustered index) and it certainly doesn’t hurt.
Next, we index columns that we JOIN on. In this case, CompanyId. This is quite crucial, especially where either table is large.
Additionally (and equally importantly), we index columns that we search on in our WHERE clauses. LastName is the only column in this particular query, but FirstName is another likely candidate.
Finally, we index columns that are part of the ORDER BY: FirstName and LastName again.
So, our index will contain the following columns:
- PersonId
- CompanyId
- LastName
- FirstName
Notice that we didn’t index PhoneNumber – that’s because we don’t search on it. We can imagine other non-searched columns – Notes, FavouriteColour, etc etc. They would not be indexed either.
Column Order in Indexes
The order of the columns in your indexes is important. If a column is used for joining to other tables much, it is a candidate for being very near the top of the list, after the primary key.
Out of the non-joined columns (e.g., those used purely to search) that remain in the index, you should try and consider the type of data in those columns. Columns that are ‘more distinguishing’ should be placed nearer the top. E.g., City should be placed before Country, since for most records Country will be the same (and will not distinguish one record from another).
However, you might search far more often on one column than another, even though it is less distinguishing. In this case, it makes sense to place the more-often searched column nearer the top. E.g., Town vs Street.
Indexes slow things down!
Please understand, indexes will make your database significantly faster. However, indexes only improve the performance of SELECTs. They actually slow down UPDATEs, INSERTs and DELETEs. This is because the index needs to be updated and sometimes rebuilt when data changes. Thus, the smaller your index, the less impact it will have when data changes.
If you have a clustered index on anything other than an auto-incrementing IDENTITY field, you must be aware that updating the data in that field will potentially require lots of data to be physically moved between files. This can have a significant impact on performance!
Practice makes perfect
There is no substitute for experience. The more you work with databases the more you will naturally understand what to index and when. It is a vital skill that makes the difference between a sluggish system and a lightening fast one.
I hope this advice is useful to you. Please feedback if you notice anything that I have missed or could improve.
About
We are a small British company that produces business-oriented software and solutions. These articles are a product of our daily work - information that we think might be useful to share. We hope you find them useful.
Our Software
These are some of our products. Several are open source, some are web-based and others are proprietary:
Categories
- .NET (10)
- Apple (2)
- Business (5)
- CSS (1)
- HTML (2)
- Innovation (4)
- Java (4)
- Javascript (1)
- Leadership (1)
- MySQL (2)
- Oracle (6)
- Postgres (1)
- Programming (5)
- Rails (4)
- Ruby (10)
- SQL Server (9)
- Subversion (1)
- Web (5)
- Windows Server (2)
Archives
- July 2010 (2)
- September 2009 (5)
- August 2009 (1)
- July 2009 (12)
- June 2009 (16)
- May 2009 (3)