All About SQL!

Wednesday, September 9, 2015

Pivot Function

Normalization principle dictate that we keep our tables as narrow as possible, i.e. only few columns wide to keep the table schema simple and performant.

It does create a problem when you need to display data in a columnar format or pivot format. For example take a look at the following query...

SELECT C.Name, O.Total As Total,O.Year
FROM Customers C INNER JOIN Orders O ON C.CustomerID = O.CustomerID
The result will look something like this...

but what if you want to display the information in a columnar format, like an excel spreadsheet with Names going vertically but years going horizontally.

Using Pivot function you can achieve just that. Take a look at the query below...

SELECT * FROM
(SELECT C.Name, O.Total As Total,O.Year
FROM Customers C INNER JOIN Orders O ON C.CustomerID = O.CustomerID
) A

PIVOT
( SUM(Total) FOR Year IN ([2006],[2007])

) AS TotalAmt

The result will be as follows...

Data is displayed in much more readable and organized manner.

PIVOT function allows you to pivot a column for another column, as we did in the above example, i.e. pivot Total for Year.

You may be wondering about using nested query. If you don't use nested query you will get a duplicate field error. This is because as you can see from the previous result set, there are multiple customers with same name and we are joining two tables (Orders and Customers) on Customer ID field which results in duplicate field error when trying to pivot.

Thank you.

Monday, September 7, 2015

Transparent Data Encryption (TDE)

Yesterday, we reviewed multiple encryption strategies available to secure data. Starting with SQL Server 2008, Microsoft introduced Transparent Data Encryption (TDE). Today, we will review TDE in detail.

TDE is also knows as data at rest encryption because the data is encrypted while saving to data store using keys. Anyone without keys will be unable to use data, even if they were able to get hold of it. TDE is a real-time I/O encryption / decryption technique. When data is being saved to the data store, both data and log files are encrypted. Conversely, when being read, the data is decrypted in real time.

The encryption / decryption process uses a database encryption key, also called as DEK. This key is stored in the database boot record to ensure it is available for recovery.

Database Encryption Key is a symmetric key which is secured by a certificate stored in the master database. It could also be an asymmetric key which is protected by an EKM module. Both AES and 3DES algorithms can be used to encrypt/decrypt data at this level. TDE encryption is performed at the page level. Page is encrypted before saving to data store and decrypted when read from the data store.

If you move the database from one server to another, you must decrypt the database first, move the database, restore and then re-enable TDE on the destination server.

When you enable TDE, make sure to backup the certificate and the associated private key. If the certificate ever becomes unavailable, you will need the backups of both the certificate and the private key to read the data.

Even if you remove TDE, make sure to keep the certificate handy because some transaction logs may still be encrypted.

Below is the architectural illustration of TDE

Following commands can be used to enable TDE

USE master;
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<strongpassword>';
go
CREATE CERTIFICATE MyServerCert WITH SUBJECT = 'My DEK Certificate';
go
USE <Your_Database>;
GO
CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128
ENCRYPTION BY SERVER CERTIFICATE MyServerCert;
GO
ALTER DATABASE <Your_Database>
SET ENCRYPTION ON;
GO

Thank you.

Sunday, September 6, 2015

Data Encryption

SQL Data Encryption Techniques

With several recent high profile hacks many companies, especially the ones that operate in healthcare space or retail space are worried about whether they are HIPPA and PCI compliant.

Most organizations have taken steps to ensure physical security and have deployed robust tools such as expensive firewalls, SSL and digital code signing techniques.

But, often the data at the database layer is left un-encrypted and even though there are various options to encrypt the data, there is often confusion about the optimal strategy.

Today, we will briefly discuss various options available to encrypt the data at the database level.

Transparent Data Encryption (TDE) - first introduced in SQL 2008.
Custom Encryption - Programmatically encrypting sensitive data before saving to database.
Using SQL Views / Triggers without having to modify your code.
Using third party tools that encrypt and decrypt the data seamlessly using encryption keys.
Drive Level encryption - Using third party tools or built in OS level encryption you can encrypt the drive where the data is stored.

Options 2 thru 5 are self-explanatory, in the next post we will discuss TDE in more detail.

Thank you.

Saturday, August 4, 2012

Duplicate a table with or without the data

I have often been asked to purge data from a table, but if you work with databases, you know never to purge anything from the production database without having a restore plan, in case you need it back. I have used several tricks to ensure I don't lose data.

I've seen some folks export the data out to a spreadsheet, make a backup of the database etc. While it works, it is time consuming specially if you need to just preserve one table. A quick solution is to create another copy of the original table and then delete data from the main table. In order to quickly create an identical copy of the table along with data, you can run SELECT * INTO query.

For example, if I want to duplicate Orders table in Northwind database, I can run the following command...

SELECT * INTO OrdersCopy FROM Orders

What if I only want to create schema of a table and not copy the content. To achieve this, you can pass a false condition to WHERE clause in above script. For example...

SELECT * INTO OrdersCopy FROM Orders WHERE 1 = 0;

Any False condition will do and not just WHERE 1 = 0.

Basically SELECT * INTO statement first creates the schema and then evaluates the WHERE condition to copy data from the source to the destination table. You can also copy records that satisfy a certain criteria.

Thank you.

Monday, July 16, 2012

Difference between COALESCE and ISNULL

I have often been asked about the difference between ISNULL and COALESCE. In most cases they work the same, but there are some fundamental differences. Let's review some of them...

ISNULL is a T-SQL function while Coalesce is an ANSI Standard.

ISNULL only takes two arguments and basically evaluates the first argument for null value, if true, it returns the second value back. For example, if you execute the following command

SELECT ISNULL(NULL,NULL) you will get NULL as your result set. Now try executing the same statement with COALESCE -- SELECT COALESCE(NULL,NULL), you will receive the following error

At least one of the arguments to COALESCE must be a typed NULL.

What does this error mean? Does it mean you can't pass two null arguments to Coalesce? What it really means is that at least one of the argument must be of a particular data type even if it is NULL. For Example, I can do the following..

DECLARE @First Varchar(10)
DECLARE @Second varchar(10) and execute the following statements, they will all work fine.
SELECT COALESCE(@First,@Second)
SELECT COALESCE(NULL, @Second)
SELECT COALESCE(@First,NULL)

Let's run another test...

DECLARE @First char(2) and run the following statements
SELECT ISNULL(@First,'This is a Test') -- returns Th
SELECT COALESCE(@First,'This is a Test') -- returns This is a Test

The data type in ISNULL is determined by the first argument. In this case, the data type of the first argument is Char(2) so it only returns 2 characters of the non-null argument which is the second argument. Coalesce on the other hand determines the data type based on the precedence. Check out the following article about data type precedence ( http://msdn.microsoft.com/en-us/library/ms190309.aspx).

Depending on how you use one or the other, you could actually get different results or an error. For example, if I run the following code...

DECLARE @First varchar(4)='Test'
DECLARE @Second INT = 1
SELECT ISNULL(@First,@Second)
SELECT COALESCE(@First,@Second)

ISNULL will work just fine but COALESCE will generate an error. This is because an INT has higher precedence than varchar and COALESCE will try to convert @First to INT, generating an error. If @First is NULL, then it will work fine and return you the value of @Second.

COALESCE can take more than 2 arguments and will work like Switch/Case (Select/Case) working down the cases until it finds the non-null value or until it has evaluated all the arguments.

Performance wise they both are same in most situations, except there are cases where SQL Server generates different execution plan based on whether you use ISNULL or COALESCE. This generally happens when you are using scalar sub-queries as part of your arguments to ISNULL or COALESCE.

Thank you.

Monday, July 9, 2012

T-SQL Best Practices

When writing a T-SQL either as an ad-hoc query or a stored procedure, besides ensuring the queries are fully optimized, you should keep few things in mind. Although none of these are required, but trust me someone reading your queries in the future will thank you for it, and it may even save you a headache in the future. I don't have many, but there are few guidelines / best practices that I try to adhere to.

Indentation
Properly Indenting your query makes it so much easier to read and / or troubleshoot.
For example, consider the following query..

SELECT FirstName,LastName,EmailAddress,Address1,Address2,Phone,Fax FROM Customers

Can be rewritten as...

SELECT
    FirstName,
    LastName,
    EmailAddress,
    Address1,
    Address2,
    Phone,
    Fax
FROM Customers

It is certainly easier to read and you can quickly figure out what columns are being selected. You can make it even more useful by making the following change...

SELECT
    FirstName
    ,LastName
    ,EmailAddress
    ,Address1
    ,Address2
    ,Phone
    ,Fax
FROM Customers

By using "," as a prefix in subsequent columns, you can easily comment out a line containing the column that you don't want your query to return instead of having to rearrange it.

Using Square Brackets
Although square brackets are not necessary unless you have a space in column name, using it ensures that if your column name ever becomes a reserved word in SQL, you query won't break. For example, you can write the above query as follows...

SELECT
    [FirstName]
    ,[LastName]
    ,[EmailAddress]
    ,[Address1]
    ,[Address2]
    ,[Phone]
    ,[Fax]
FROM [Customers]

Using Statement Terminators
Using semicolon (;) at the end of your SQL Statement. If you use C# or C++ or Java for application development, this may be second nature, but not so much for VB Developers. Although SQL Server currently doesn't require semicolon in most cases, it may change in the future as it is part of the ANSI SQL-92 standard. So to avoid future headaches, always use semicolon. Thus the proper way to write the above query would be...

SELECT
    [FirstName]
    ,[LastName]
    ,[EmailAddress]
    ,[Address1]
    ,[Address2]
    ,[Phone]
    ,[Fax]
FROM [Customers];

If you have multiple statements in your query, use semicolon after each one. For example, I can add SET NOCOUNT ON in my query above.

SET NOCOUNT ON;
SELECT
    [FirstName]
    ,[LastName]
    ,[EmailAddress]
    ,[Address1]
    ,[Address2]
    ,[Phone]
    ,[Fax]
FROM [Customers];

Capitalize all T-SQL Keywords
Capitalizing T-SQL Keywords such as SELECT, FROM, WHERE, JOIN etc. makes it so much easier to read your query and to differentiate between the T-SQL keywords and your own columns etc.

Happy Programming!