Monday, February 13, 2012

Clustered vs. Non-Clustered Index

All About SQL!
The other day someone asked me a question about the difference between clustered and non-clustered indexes in SQL Server and why can you only have one clustered index in a table? It is one of those things that you know but often find it difficult to explain in a non-technical manner, at least that's how I felt. So, what is the difference and why can you only have one clustered index?
Clustered Index: Clustered index sort the data physically in the data table according to the keys in the clustered index. Basically all the data rows are physically sorted according to the clustered index. Since you can only sort data rows one way, there can only be one clustered index. The lowest level of clustered index are the data rows themselves. If you have a primary key defined in the table, then SQL Server automatically creates a clustered index on the primary key.
You don't have to have clustered index and the table without a clustered index stores the data on a heap. Data inserted in this table will return in the same order it was inserted in, unless you apply ORDER BY clause to your query which is SLOW!!!
Non-Clustered Index: A Non-Clustered index on the other hand is stored independently of the data rows' physical order. Think of this as a logical order of the data rows and hence there can be more than one non-clustered index. Non-clustered index's lowest level contains the keys used in the index and are a pointer to the actual data rows containing that key. This pointer is called a row location and its structure depend on whether the table has a clustered index or not. As mentioned earlier, if a table does not have a clustered index, the data rows are stored in a heap and in this case the row locator is simply a pointer to the row. For tables that do have a clustered index, the row locator is the key of the clustered index.
Both clustered and non-clustered indexes can be unique. Primary key by default is a unique index whether it is defined as a clustered or non-clustered index. Unique indexes ensure that no two rows can contain the same keys used in the unique index.
IGNORE_DUP_KEY: When creating a unique index, you can also use "IGNORE_DUP_KEY" hint. When a duplicate key is inserted, if IGNORE_DUP_KEY was specified, SQL will issue a warning and ignore the duplicate row, but if it was not specified, SQL server will issue an error message and roll back the entire insert. IGNORE_DUP_KEY is only allowed when you create a UNIQUE clustered or a non-clustered index.
FILL FACTOR: A Fill Factor defines how densely the index will be packed when it is created. If you are going to have many inserts and updates, you should use a lower fill factor to leave room for new data, which reduces the number of page splits that may occur when data is inserted. The space doesn't change when data is inserted or updated, only when index is created, so it is recommended to rebuild indexes periodically. If the table is rather static, create an index with a high fill-factor.
Indexes generally speed up the retrieval, update and deletion but often slow down inserts because every new record will have to be added to the index. Care should be taken in creating the number of indexes on a table, especially when data is frequently inserted.
While you can create some indexes during database design, often time it is not possible to know the table usage until it is out in the field. Once the database is out in the field and you want to optimize certain heavily used tables, you can do the following
  • In SQL Management Studio (SQL 2005 or SQL 2008) go to Tools > SQL Profiler and run the trace for the tables that are heavily used.
  • Collect some metrics and save the file.
  • Go to Tools > Database Engine Tuning Advisor and load the workload file, select the database and start analysis.
  • Review the recommendations and apply them to the appropriate tables.
I have generally found the tuning advisor to be a good tool, but be careful in applying all its recommendations. As always, do this in staging environment first, before applying to your production database.
As always, your comments are welcome and feel free to correct me if I made any mistake!

1 comment:

  1. Thanks - this is really good for my understanding. Another good post as well:

    Clustered vs non clustered index

    ReplyDelete

As always, your comments are welcome and appreciated!