Monday, February 13, 2012

Benefits of Database Indexing

All About SQL!
In my previous Post I talked about various types of indexes you can create in SQL Server. Today we will discuss the benefits of Indexing and why you should have at least a clustered index in every table.
Every database system offers some kind of indexing that allows for sublinear time lookup for increased performance as opposed to linear search which is highly inefficient. For example, if a table is un-indexed, database engine will examine each record to find the searched record, resulting in approximately half the data rows being scanned on an average. Further, if the searched record doesn't exist, all the data rows will be scanned before a match not found result is returned. Indexing allows for faster searches resulting in improved performance.
Indexing however is a double edged sword. While more indexes generally will result in faster data retrieval from a table, but the more indexes you have, the longer it will take to insert new data in the table.
Depending on your table definition, some indexes are automatically created by the SQL Server. For example, to enforce a UNIQUE constraint. You can create other indexes either via CREATE INDEX statement or by using SQL Management Studio.
Below are some of the guidelines you can use when creating indexes. This is not an exhaustive list and may not be suitable in all situations, so use this as a general guideline.
1. When creating an index, choose the right data type. Certain data types are more suitable for indexing than others. Integer data types (INT, BIGINT, SMALLINT, TINYINT) are good candidates for indexing because of their specific size and are easy to compare and perform other mathematical operation. CHAR, VARCHAR etc. on the other hand are much less inefficient.
2. Make sure that indexes are actually used. I have often seen database schema where indexes were created in advance but any of the data retrieval operations don't use the keys that are indexed.
3. Also keep in mind when a retrieval query applies a SQL function to the keys that are part of the index, the indexed value is not used in the query. For example, you convert a date column which is part of the cluster to a string while retrieving the data. In this case, the index won't be used in the retrieval process.
4. If you create an index with multiple columns, pay attention to the order of the columns. Data is stored in order the columns are used in the index. The column with less unique data should be used first and so on.
5. Remember each non-clustered index column is a pointer to the clustered index columns (assuming clustered index exists in the table). Hence, care should be taken in selecting the number of columns used in the clustered index. For the same reason, try to avoid or limit using the columns that are frequently updated in a clustered index.
6. Be extra careful when rebuilding a clustered index. I have seen it many times where someone has created a SQL job to drop a clustered index and then used the "CREATE INDEX" command to recreate the index. When you drop a clustered index and then recreate it, you end up rebuilding non-clustered indexes multiple times. To avoid this, use DROP_EXISTING clause along with CREATE INDEX which will rebuild non-clustered indexes only once.
As explained in previous Post, use Fill Factor wisely.
I realize this is a long post but I wanted to keep all the crucial points in one blog to make it easier to read. As always your comments are appreciated and please don't hesitate to add other points I may have missed or let me know if I missed anything.

No comments:

Post a Comment

As always, your comments are welcome and appreciated!