Saturday, March 24, 2012

SQL Server Collation

Collation determines the rules for sorting / comparing the string characters based on a specific language / locale. For example, depending on your collation, the "ORDER BY" clause may return different results. Default collation in SQL Server is Latin1_General which would sort "Children" before "College" when you run an ORDER BY ASC clause. If your database collation however was "Traditional_Spanish", the sort order will be College before Children because in Spanish "Ch" will be treated as one word and will come after all words beginning with "C".
A specific code page is associated with non-Unicode characters such as char, varchar, text. For example, Latin1_General collation Char is interpreted by SQL Server using 1252 code page. Multiple collations may use the same code page. Unicode data such nchar, nvarchar, ntext doesn't use code pages to handle the data interpretation.

In addition to using SQL Server Collations, you can also use Windows Collations. When using Windows Collation, SQL Server will use collation of the windows OS it is running on to determine the sorting of the characters. To know more about Windows and SQL Server Collations, checkout the following MSDN article - http://msdn.microsoft.com/en-us/library/ms175194.aspx

Determining which Collation to Use
If your application is going to be used where all users speak the same language, you should use the collation that supports the language of your user base. If however your users may speak different languages, you should pick the best available collation that would support most of the languages. For example, Latin1_General collation will support western European languages. Alternatively,you can use Unicode data-types such as nchar, nvarchar, ntext (we will discuss implications of using Unicode data-types in future post). Even though Unicode data-types don't use code pages, it is good practice to pick the collation for the language(s) used by majority of your users in case a column or a variable is declared as a non-Unicode data-type.

Collation can be specified at the Server level, Database level, Column, Parameters or variable level. When you install SQL Server, you can specify a collation, which will be the default collation for all the lower level objects i.e. Database, Column etc. You can also change the collation at each level.

Database level Collation
You can specify database collation when creating a database either via management studio or via T-SQL. In Management Studio, when you are creating a database, go to options and pick the collation from the drop-down list. To specify collation using T-SQL you can use a script like this one...
USE master;
GO
CREATE DATABASE MyCollationTest
COLLATE French_CI_AI;
GO
Verify collation
SELECT name, collation_name
FROM sys.databases
WHERE name = N'MyCollationTest';
GO


You can also change the Collation of a database after it has been created. Before you change the collation of an existing database, make sure you are connected to the database in a Single User mode. Also, if any Schema-bound objects such as UDFs, Computed Columns etc. depend on current collation, SQL Server will generate an error. You can change use the following script...
USE master;
GO

ALTER DATABASE MyCollationTest
COLLATE SQL_Latin1_General_CP1_CI_AS ;
GO



Column level Collation
You can specify a different collation for char, nchar, varchar, nvarchar, text or ntext columns. When creating a new table or modifying an existing table via Management Studio, you can specify the collation in column properties section. Alternatively you can use T-SQL like this...
CREATE TABLE MyTable
  (ID   int PRIMARY KEY,
   Name      varchar(50) COLLATE French_CI_AS NOT NULL
  )
GO
OR

ALTER TABLE MyTable ALTER COLUMN Name
            varchar(50)COLLATE Latin1_General_CI_AS NOT NULL
GO


You cannot alter the collation for a computed column, an indexed column, or a column is used as a foreign key, has a check constraint or is part of the statistics statement.

You can also specify which collation to use in ORDER BY clause of your query. For example...
USE AdventureWorks2008R2;
GO
SELECT LastName FROM Person.Person
ORDER BY LastName
COLLATE Traditional_Spanish_ci_ai ASC;
GO

For the most part default collation works and if you always use Unicode data types, then you don't need to specify collation, but different collation types exist, if you need them.

Thank you.

No comments:

Post a Comment

As always, your comments are welcome and appreciated!