Monday, June 11, 2012

Compressing your data


If deleting data is not an option, SQL Server 2008 and 2012 provide another option to compress individual tables by compressing rows or pages.

Before you compress a particular table, it may be prudent to know which table will provide you the biggest gain. In my previous post I talked about running a script to determine the tables with largest amount of data which can be used to determine which tables to tackle first.

Let's assume you have determined the table(s) to compress, fire up your management studio and then right click on the table > Storage > Manage Compression


Click on Compression and select Compression Type - Row Level Data Compression or Page Level Data Compression.

After you select compression type, you can click on Calculate which will show you how much you will gain.
  • Row Level Data Compression: This compression turns the fixed length data types into variable length data types and frees up empty space. As you can image, a table with more columns of char/nchar data type will yield more space. Row level compression can also ignore zero and null values resulting in additional space savings. 
  • Page Level Data Compression: When you choose this compression type, it starts with row level data compression and then adds additional compression by performing two extra layer of compression...
    • Prefix Compression
      • SQL Server analyzes the repeated pattern at the beginning of the values in each column and substitutes them with an abbreviated reference which is stored in the compression information structure. This type of compression only works on one column.
    • Dictionary Compression
      • It works similarly to prefix compression except that it searches for repeated patterns anywhere in a page and replaces them with an abbreviated reference and stores the reference information in the compression information structure.

How much savings you will gain by the page level compression depends on the patterns of the data. Repeatable patterns will result in more savings, whereas random patterns will result in less savings.

As you can imagine, this will result in performance degradation, but how much and whether it is acceptable will depend on your environment.

Thank you.




No comments:

Post a Comment

As always, your comments are welcome and appreciated!