Friday, March 2, 2012

SQL Server Statistics

When you run a SQL Query, SQL Engine has to choose the best path to execute that query. It can opt for a table scan, i.e. scanning entire table to determine the result set. It can look at the indexes to determine the best way to retrieve the data or it can use statistics information it collects to execute the query. The goal is to minimize the query execution time for faster data retrieval and minimal data locking.

Statistics allow SQL Server to keep information about the number of records in a table, page density, histogram and any available indexes to determine the best path to execute a requested query.

Starting from SQL Server 2000, all versions have ways to collect necessary information and create / update statistics, provided this feature is on (it is on by default) and for the most part you don't have to do anything.

You can also manually create / update statistics.

Implicit Statistics Creation and Update
When automatic statistics creation and update is enabled (default is on), anytime you execute a query with a WHERE or JOIN clause with a condition column, the statistics is automatically updated or created if necessary.

Manually Create and Update Statistics
You can also manually create / drop / update statistics with either the default sampling rate or your own desired sampling rate.

Let's use an example to check when the statistics is automatically created and then manually create statistics.

--Create a new Table in Temp database

USE TempDB
GO
CREATE TABLE Customers
(
   CustomerID INT IDENTITY,
   FirstName varchar(50),
   LastName varchar(50),
   EmailAddress varchar(255),
   PhoneNumber varchar(15)
)
GO

--Let's insert some records
INSERT INTO Customers(FirstName,LastName,EmailAddress,PhoneNumber)
VALUES('Jane','Doe','janed@email.com','404-111-1111')

INSERT INTO Customers(FirstName,LastName,EmailAddress,PhoneNumber)
VALUES('John','Doe','johnd@email.com','404-111-1111')

INSERT INTO Customers(FirstName,LastName,EmailAddress,PhoneNumber)
VALUES('Sara','Lee','saral@email.com','404-222-2222')

INSERT INTO Customers(FirstName,LastName,EmailAddress,PhoneNumber)
VALUES('Chris','Smith','csmith@email.com','404-333-3333')

INSERT INTO Customers(FirstName,LastName,EmailAddress,PhoneNumber)
VALUES('Shania','Rogers','sr@email.com','404-444-4444')

GO

--now lets check if there is any statistics on this table
sp_helpstats N'Customers', 'ALL'
GO

--Following message is displayed
--This object does not have any statistics or indexes.

--Now run a query
SELECT * FROM Customers WHERE LastName='Lee'
GO
--check statistics again
sp_helpstats N'Customers', 'ALL'
GO

--following message is displayed
statistics_namestatistics_keys
_WA_Sys_00000003_0CBAE877LastName
--If you create an index on this table, it will also automatically create statistics. CREATE NONCLUSTERED INDEX ix_Email ON Customers(EmailAddress) GO --check statistics again sp_helpstats N'Customers', 'ALL' GO --Query Analyzer will now display following message
statistics_namestatistics_keys
_WA_Sys_00000003_0CBAE877LastName
ix_EmailEmailAddress
--Let's create an statistic manually CREATE STATISTICS stat_Name ON Customers(FirstName,LastName) -- multicolumn GO --check statistics again sp_helpstats N'Customers', 'ALL' GO --Query Analyzer will now display 3 statistics
statistics_namestatistics_keys
_WA_Sys_00000003_0CBAE877LastName
ix_EmailEmailAddress
stat_NameFirstName, LastName
--If you want to know what columns are part of statistics and also the range as well density, --you can run the following DBCC command. DBCC SHOW_STATISTICS (N'Customers', ix_Email) GO --You can also also define a sample size when creating a script. --For example, in the above create statistics you can --also pass the same size. CREATE STATISTICS stat_Name ON Contact(FirstName,LastName) WITH SAMPLE 75 PERCENT GO --While a larger sample size is better, because it will result --in faster query execution, but large sample also means statistics --creation will take longer because engine has to scan the table more. --To drop a statistic, you have to provide object name.statistics name DROP STATISTICS Customers.stat_Name GO --As I mentioned previously, by default SQL Server --automatically creates and updates statistics. --You can turn auto scanning off or on by running the following query ALTER DATABSE <yourDBName> SET AUTO_CREATE_STATISTICS OFF GO ALTER DATABSE <yourDBName> SET AUTO_CREATE_STATISTICS ON GO

There have been several improvements in this area in SQL Server 2008 and you rarely have to do anything with them, but choices are available should you need them.

Thank you.



No comments:

Post a Comment

As always, your comments are welcome and appreciated!