All About SQL!: 2012

Saturday, August 4, 2012

Duplicate a table with or without the data

I have often been asked to purge data from a table, but if you work with databases, you know never to purge anything from the production database without having a restore plan, in case you need it back. I have used several tricks to ensure I don't lose data.

I've seen some folks export the data out to a spreadsheet, make a backup of the database etc. While it works, it is time consuming specially if you need to just preserve one table. A quick solution is to create another copy of the original table and then delete data from the main table. In order to quickly create an identical copy of the table along with data, you can run SELECT * INTO query.

For example, if I want to duplicate Orders table in Northwind database, I can run the following command...

SELECT * INTO OrdersCopy FROM Orders

What if I only want to create schema of a table and not copy the content. To achieve this, you can pass a false condition to WHERE clause in above script. For example...

SELECT * INTO OrdersCopy FROM Orders WHERE 1 = 0;

Any False condition will do and not just WHERE 1 = 0.

Basically SELECT * INTO statement first creates the schema and then evaluates the WHERE condition to copy data from the source to the destination table. You can also copy records that satisfy a certain criteria.

Thank you.

Monday, July 16, 2012

Difference between COALESCE and ISNULL

I have often been asked about the difference between ISNULL and COALESCE. In most cases they work the same, but there are some fundamental differences. Let's review some of them...

ISNULL is a T-SQL function while Coalesce is an ANSI Standard.

ISNULL only takes two arguments and basically evaluates the first argument for null value, if true, it returns the second value back. For example, if you execute the following command

SELECT ISNULL(NULL,NULL) you will get NULL as your result set. Now try executing the same statement with COALESCE -- SELECT COALESCE(NULL,NULL), you will receive the following error

At least one of the arguments to COALESCE must be a typed NULL.

What does this error mean? Does it mean you can't pass two null arguments to Coalesce? What it really means is that at least one of the argument must be of a particular data type even if it is NULL. For Example, I can do the following..

DECLARE @First Varchar(10)
DECLARE @Second varchar(10) and execute the following statements, they will all work fine.
SELECT COALESCE(@First,@Second)
SELECT COALESCE(NULL, @Second)
SELECT COALESCE(@First,NULL)

Let's run another test...

DECLARE @First char(2) and run the following statements
SELECT ISNULL(@First,'This is a Test') -- returns Th
SELECT COALESCE(@First,'This is a Test') -- returns This is a Test

The data type in ISNULL is determined by the first argument. In this case, the data type of the first argument is Char(2) so it only returns 2 characters of the non-null argument which is the second argument. Coalesce on the other hand determines the data type based on the precedence. Check out the following article about data type precedence ( http://msdn.microsoft.com/en-us/library/ms190309.aspx).

Depending on how you use one or the other, you could actually get different results or an error. For example, if I run the following code...

DECLARE @First varchar(4)='Test'
DECLARE @Second INT = 1
SELECT ISNULL(@First,@Second)
SELECT COALESCE(@First,@Second)

ISNULL will work just fine but COALESCE will generate an error. This is because an INT has higher precedence than varchar and COALESCE will try to convert @First to INT, generating an error. If @First is NULL, then it will work fine and return you the value of @Second.

COALESCE can take more than 2 arguments and will work like Switch/Case (Select/Case) working down the cases until it finds the non-null value or until it has evaluated all the arguments.

Performance wise they both are same in most situations, except there are cases where SQL Server generates different execution plan based on whether you use ISNULL or COALESCE. This generally happens when you are using scalar sub-queries as part of your arguments to ISNULL or COALESCE.

Thank you.

Monday, July 9, 2012

T-SQL Best Practices

When writing a T-SQL either as an ad-hoc query or a stored procedure, besides ensuring the queries are fully optimized, you should keep few things in mind. Although none of these are required, but trust me someone reading your queries in the future will thank you for it, and it may even save you a headache in the future. I don't have many, but there are few guidelines / best practices that I try to adhere to.

Indentation
Properly Indenting your query makes it so much easier to read and / or troubleshoot.
For example, consider the following query..

SELECT FirstName,LastName,EmailAddress,Address1,Address2,Phone,Fax FROM Customers

Can be rewritten as...

SELECT
    FirstName,
    LastName,
    EmailAddress,
    Address1,
    Address2,
    Phone,
    Fax
FROM Customers

It is certainly easier to read and you can quickly figure out what columns are being selected. You can make it even more useful by making the following change...

SELECT
    FirstName
    ,LastName
    ,EmailAddress
    ,Address1
    ,Address2
    ,Phone
    ,Fax
FROM Customers

By using "," as a prefix in subsequent columns, you can easily comment out a line containing the column that you don't want your query to return instead of having to rearrange it.

Using Square Brackets
Although square brackets are not necessary unless you have a space in column name, using it ensures that if your column name ever becomes a reserved word in SQL, you query won't break. For example, you can write the above query as follows...

SELECT
    [FirstName]
    ,[LastName]
    ,[EmailAddress]
    ,[Address1]
    ,[Address2]
    ,[Phone]
    ,[Fax]
FROM [Customers]

Using Statement Terminators
Using semicolon (;) at the end of your SQL Statement. If you use C# or C++ or Java for application development, this may be second nature, but not so much for VB Developers. Although SQL Server currently doesn't require semicolon in most cases, it may change in the future as it is part of the ANSI SQL-92 standard. So to avoid future headaches, always use semicolon. Thus the proper way to write the above query would be...

SELECT
    [FirstName]
    ,[LastName]
    ,[EmailAddress]
    ,[Address1]
    ,[Address2]
    ,[Phone]
    ,[Fax]
FROM [Customers];

If you have multiple statements in your query, use semicolon after each one. For example, I can add SET NOCOUNT ON in my query above.

SET NOCOUNT ON;
SELECT
    [FirstName]
    ,[LastName]
    ,[EmailAddress]
    ,[Address1]
    ,[Address2]
    ,[Phone]
    ,[Fax]
FROM [Customers];

Capitalize all T-SQL Keywords
Capitalizing T-SQL Keywords such as SELECT, FROM, WHERE, JOIN etc. makes it so much easier to read your query and to differentiate between the T-SQL keywords and your own columns etc.

Happy Programming!

Friday, July 6, 2012

SQL SERVER - Little things can save you a big headache

Recently I came across a situation where someone was running a query, basically copying data from one table to another.

The query was simple enough. Basically it was using a fast forward cursor, copying one record from one table into another table. Using SCOPE_IDENTITY, the query retrieved the identity value of the newly inserted record and then inserted another record in a child table, using newly generated identity key. There were several hundred millions of records in the table from which the data was being moved to a new table. Whether this could have been done another way is not the topic for today.

In this particular instance, the query was running in Query Analyzer and it will run for a while, but then will throw an out of memory exception. Since the query didn't track what records were inserted already in the new table before out of memory exception was thrown, the developer truncated the new table and started all over again, only to hit the same issue again.

The reason query was throwing out of memory exception was because the query didn't have SET NOCOUNT ON clause at the beginning. As a result, the query was outputting messages for each statement in this query, not only resulting in memory consumption but also degrading the performance and increasing the network traffic. SET NOCOUNT ON basically prevents the sending of DONE_IN_PROC messages to the client.

The query need not run in a loop like above or in query analyzer to make use of SET NOCOUNT ON. If you don't need messages being returned from the SQL Server or aren't capturing them, be sure to add this clause to your stored procedures.

Thank you.

Monday, July 2, 2012

SQL Server - Degree of Parallelism

In SQL Server 2008, Microsoft introduced a concept called max degree of parallelism. Max degree of parallelism allows the engine to take advantage of all the available processors on the server up to 64 processors to execute a single SQL statement. By default, SQL Server is configured to use all available processors, however, there may be instances where you want to limit the number of processors a query should use. For example, you may have a low priority SQL agent job running in the background and you want to limit it to only use one processor. You can set max degree of parallelism to 1.

The query below will only use up to 2 processors for parallel operation regardless of the number of processors available.

USE AdventureWorks
GO
SELECT ProductID, OrderQty, SUM(LineTotal) As Total
FROM Sales.SalesOrderDetail 
WHERE UnitPrice < $10.00
GROUP BY ProductID, OrderQty
ORDER BY ProductID, OrderQty
OPTION (MAXDOP 2)
GO

To set the option for SQL Server to always use up to X number of processors, you can use SP_Configure.

For example, the following query forces SQL engine to only use 4 processors.

USE Master
GO
SP_Configure 'Show Advanced Options', 1
GO
RECONFIGURE
GO
SP_Configure 'Max Degree of Parallelism', 4
GO
RECONFIGURE
GO

Default "Max Degree of Parallelism" option is 0, which means use all available processors up to 64.

Thank you.

Thursday, June 28, 2012

SQL 2012 (Denali) - Paged Query Results

Anyone who has been developing web based or even windows based systems must have gone through the pain of fetching few records at a time from the database.

Suppose your grid view displays 20 records per page. Your option was to either fetch all records on every grid page and every time user changed the page, make another trip and fetch all records again (or save everything in view state / session) - not an ideal solution. Your other option was to get a list of all primary / identity keys and then pass a set of keys to the query to only fetch the records that pertain to those keys. For example, get keys 1 - 20 on page 1, 21 - 40 on page 2 and so on. But what if someone deleted, say record #22 from the database? Another implementation I have seen is where code passes the page number and number of records that each page displays and the stored procedure applies some complex logic to figure out what records to send back. Using ROW_NUMBER() and CTE was a better alternative, but it was still not very intuitive and easy to use.

SQL 2012 introduced Ad-Hoc paging, allowing you to only fetch the appropriate number of records. Two clauses "OFFSET" and "FETCH" make this happen. Fetch clause fetches only the next X number of records you want to retrieve and OFFSET tells the query to start from the record # Y.

For example, suppose I have 1000 users in my table and my grid only shows 10 users at a time. I can pass two parameters to my query and it will return me correct 10 users every time.

CREATE PROCEDURE getPagedRecords

@Page INT = 1,
@RecordsPerPage INT=10

AS
BEGIN
    DECLARE @Offset INT
    SET @OffSet = Page * @RecordsPerPage

    SELECT 
        UserID,
        FirstName,
        LastName,
        EmailAddress 
    FROM Users 
    ORDER BY UserID
    OFFSET @Offset ROWS
    FETCH NEXT @RecordsPerPage ROWS ONLY;
END

You can even use expressions in OFFSET and FETCH Clause such as @Offset - 1 and @RecordsPerPage - 1.

You don't have to have paging in grid view to make use of this. Often time you may need X number of records from the middle of the result set, where this could be very helpful.

For Fetch to return expected set of records every time, you obviously have to make sure other routines aren't deleting / inserting records in the middle which may cause an unexpected behavior. For example, let's say your OFFSET clause tells the query to start at record # 11, but what if someone deleted a record say at #5 after you retrieved first 10 records? Record number 11 is now what would have been record number 12, thus your query will not return you one record (original record at #11 but now at # 10).

Thank you.

Monday, June 25, 2012

SQL Server 2012 (Denali) Sequence Object

In addition to several enhancements in SQL 2012 - (some of which we will cover in subsequent posts), one of the key feature introduced is a sequence object.

Think of sequence object as an identity field on steroids. An identity field allows you to generate an auto number incremented by whatever increment value you desire (default is one) every time a record is inserted in a table. But, identity field is specific to a table, just like any other field defined in a given table. Additionally, if you delete the previous record, the identity value associated with the deleted record is lost forever until you do something to reseed it. Once the identity field hits the maximum limit (depends on the data type of this field), you've got a problem and you must reseed the table to start over or do something else.

Sequence object can help you mitigate this problem somewhat. First, let's briefly review this object. The Sequence object is created independently of a table and can be used across multiple tables.

For example - I can create a sequence object called OrderNumber, have it start with 1000 and increment by 1.

Create SEQUENCE [dbo].[OrderNumber]
as int
START WITH 1000
INCREMENT BY 1;

Now this sequence object is available to be used in multiple tables.

Let's imagine I have a table called OrderHeader and OrderDetails in my database.

CREATE TABLE [dbo].[OrderHeader](
    [OrderNumber] [int] NOT NULL,
    [OrderDate] [smalldatetime] NOT NULL,
    [StatusID] [int] NOT NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[OrderDetails](
    [OrderNumber] [int] NOT NULL,
    [ItemID] [int] NOT NULL,
    [Quantity] [int] NOT NULL,
    [Price] [money] NOT NULL
) ON [PRIMARY]

Instead of inserting record in OrderHeader and then using SCOPE_IDENTITY() or some other method to get the OrderNumber before inserting the record in OrderDetails table, I can use the sequence "OrderNumber" I created above and insert the same number in both tables.

DECLARE @OrderNumber INT
SELECT @OrderNumber = NEXT VALUE FOR dbo.OrderNumber
INSERT INTO Orders(OrderNumber,OrderDate,StatusID)
VALUE (@OrderNumber,GetDate(),1)

INSERT INTO OrderDetails(OrderNumber,ItemID,Quantity,Price)
VALUE (@OrderNumber,100,10,'2.25')

NEXT VALUE FOR gives you the next available sequence number. Every time you call it, the object will return the next available number, so if you want to use the same number in multiple tables, you have to be careful to call it only once.

Another nice feature of the sequence object is that you can reset it. As in my example above, if you are deleting previous records, then you know your previous order numbers are becoming available. After you have reached fairly large number, you can start over. You obviously have to ensure you don't reset sequence and then try to use the new sequence number which may already exist in your table.

There are two ways to reset the sequence (dropping and recreating is a third way). One, by defining a maximum value for the sequence when you created it and another by altering an already created sequence.

Create SEQUENCE [dbo].[OrderNumber]
as INT
START WITH 1000
INCREMENT BY 1
MINVALUE  1000
MAXVALUE  1000000
CYCLE

The OrderNumber sequence in above example will recycle once it hits 1000000.

ALTER SEQUENCE OrderNumber
 RESTART WITH 1000 ;

As you can see, Sequence object is quite flexible and if used judiciously can serve as a nice tool. Sys objects provide additional methods to get the range of values or to find out the current sequence value etc.

Thank you.

Friday, June 22, 2012

SQL Server - Checking Where an Object is used

As databases grow in size and complexity, often there are dead objects that are left behind. By dead, I mean these objects are no longer being used, but everyone is afraid to drop them for fear of breaking something. SQL Server provides a simple way to determine where an object is being used so you can decide whether it is needed or not. SQL Server will tell you whether your object is being used in a function or stored procedure or a view etc. it obviously can't tell you whether another application is referencing such an object. So, before you delete an object if the query below doesn't return anything, further research may be necessary.

The script below will accept the object name and list all other objects that are using/referencing this object.


    --Declare a variable or alternatively you could create 
    --a stored procedure and pass this as a parameter.
    DECLARE @objectName nvarchar(4000)
    --name of your object.
   SET @ObjectName = 'MyObject'
   SELECT
        S.NAME,
        C.Text

    FROM SysObjects S
    INNER JOIN SysComments C
        ON S.ID = C.ID
    WHERE 
        C.Text LIKE '%' + @ObjectName + '%'

For example, if I run the above script in Northwind database to check where "Categories" table is being used, I get the following result.

Happy Coding!

Thank you.

Saturday, June 16, 2012

SQL Server Pivot Function

In previous post we discussed exporting data from SQL Server into an excel file and how you can pivot the data etc. Pivoting is basically a way for you to convert rows into columns. Starting with 2005, SQL Server introduced PIVOT function which allows you to retrieve columns in tabular format. PIVOT function doesn't work on certain data types such as XML.

In the past, suppose you wanted to group employees by the city they reside in. You can use Group By clause to get the count of the number of employees by city, but such a query will result in each city as one row and you will have to do something extra to convert them into columns. The query below will return the number of employees that reside in three cities with the count for each city appearing in one row.

USE AdventureWorks
GO

SELECT City,COUNT(*) AS [Count] FROM HumanResources.vEmployee 
WHERE City IN ('Monroe','Redmond','Seattle')
Group By City

The result will look like this

City	Count
Monroe	14
Redmond	21
Seattle	44

Suppose you want to pivot this data so that each city appears as a column. You can modify the above query and use PIVOT function.

Use AdventureWorks
GO
SELECT [Monroe],[Redmond],[Seattle]
FROM
(
    SELECT E.City FROM HumanResources.VEmployee E
) C
PIVOT
( 
 COUNT (City)
 FOR City
 IN ([Monroe],[Redmond],[Seattle])
 ) AS PivotTable;

Now the result will look like this

Monroe	Redmond	Seattle
14	21	44

You may be wondering about missing WHERE clause. Basically FOR City IN () under PIVOT function takes care of filtering your data based on your criteria. This query works but the nested query still returns all rows. You can add a WHERE clause in your nested query to only return rows that satisfy your criteria. Above query can be rewritten as follows...

Use AdventureWorks
GO
SELECT [Monroe],[Redmond],[Seattle]
FROM
(
    SELECT E.City FROM HumanResources.VEmployee E
    WHERE City IN ('Monroe','Redmond','Seattle')
) C
PIVOT
( 
 COUNT (City)
 FOR City
 IN ([Monroe],[Redmond],[Seattle])
 ) AS PivotTable;

PIVOT is a useful function but not very intuitive, as you can see above, it takes a minute to make sense of it. Question for you guys - have you used PIVOT function before and if so, how did you use it?

Thank you.

Thursday, June 14, 2012

Exporting data from SQL Server to Excel

Excel is a very powerful tool and it can come in handy if you just want to build a quick report from your SQL Server database. In this post, we will discuss ways to export data and create a quick employee report.

1. Fire up your Excel and browse to data tab, click on From Other Sources and select From SQL Server.

2. Enter the SQL Server name. If your SQL Server's instance is other than default, you will also need to enter the instance name.

You will have to do this only once. After successfully connecting to your database, you can save the connection string and in the future just click on Existing Connections from the data tab to reuse the connection.

3. After entering the information, click on Next and select the database you want to export data from. In this example, we are going to use AdventureWorks. Select a table or a view to retrieve data from and click Next.

4. This is your chance to save your connection string for any future usage. Rename the connection string to a more appropriate name and click on Finish.

5. You are now ready to export data. You can either export table as it is saved in the database or create a pivot report or a pivot report with chart. In the example, I chose the second option.

6. Properties tab allows you to set certain parameters. Generally defaults work OK for most situations.

7. When you click on OK, Excel lists all the fields that are available in your selected table or view and you can select the ones you want to show up on report. As you select a column, you will notice that the value is added in the spreadsheet. The order you want the data to appear in the report is the order you should select the columns. For example, if you want Last Name to appear before First, select Last Name first and then select First Name column.

8. Close the Fields List and here is your nice little report.

Thank you.

Monday, June 11, 2012

Compressing your data

If deleting data is not an option, SQL Server 2008 and 2012 provide another option to compress individual tables by compressing rows or pages.

Before you compress a particular table, it may be prudent to know which table will provide you the biggest gain. In my previous post I talked about running a script to determine the tables with largest amount of data which can be used to determine which tables to tackle first.

Let's assume you have determined the table(s) to compress, fire up your management studio and then right click on the table > Storage > Manage Compression

Click on Compression and select Compression Type - Row Level Data Compression or Page Level Data Compression.

After you select compression type, you can click on Calculate which will show you how much you will gain.

Row Level Data Compression: This compression turns the fixed length data types into variable length data types and frees up empty space. As you can image, a table with more columns of char/nchar data type will yield more space. Row level compression can also ignore zero and null values resulting in additional space savings.
Page Level Data Compression: When you choose this compression type, it starts with row level data compression and then adds additional compression by performing two extra layer of compression...

Prefix Compression

SQL Server analyzes the repeated pattern at the beginning of the values in each column and substitutes them with an abbreviated reference which is stored in the compression information structure. This type of compression only works on one column.

Dictionary Compression

It works similarly to prefix compression except that it searches for repeated patterns anywhere in a page and replaces them with an abbreviated reference and stores the reference information in the compression information structure.

How much savings you will gain by the page level compression depends on the patterns of the data. Repeatable patterns will result in more savings, whereas random patterns will result in less savings.

As you can imagine, this will result in performance degradation, but how much and whether it is acceptable will depend on your environment.

Thank you.

Thursday, June 7, 2012

Determining SQL Table Size

You may come across a situation where you need to shrink your database by selectively deleting data from some tables.

It sure would be nice to know which tables are the biggest data hog and see if you can delete records to free up some space and then shrink the database. There is indeed a nice little script that you can use to determine the table size and act accordingly.

CREATE TABLE #tmpTable
(name SYSNAME, 
rows CHAR(11), 
reserved VARCHAR(18),  
data VARCHAR(18), 
index_size VARCHAR(18), 
unused VARCHAR(18)
)  
EXEC sp_msforeachtable 'INSERT INTO #tmpTable EXEC sp_spaceused ''?''' 
SELECT * FROM #tmpTable ORDER BY data desc 
DROP TABLE #tmpTable

If you are running SQL 2008 (or SQL 2012 enterprise edition, then you can also compress individual tables. We will discuss table compression in next post.

Thank you

Tuesday, June 5, 2012

Checking Foreign Key References

You may come across a situation where you need to check which tables are connected to a particular via a foreign key referential constraint. You may need this information when you are trying to delete one or more records or truncate a table.

Below is a script that you can use to check all the tables that are related to the table in question. In this example I am checking for the table "Employees"

SELECT f.name AS ForeignKey,
OBJECT_NAME(f.parent_object_id) AS TableName,
COL_NAME(fc.parent_object_id,
fc.parent_column_id) AS ColumnName,
OBJECT_NAME (f.referenced_object_id) AS RefTableName,
COL_NAME(fc.referenced_object_id,
fc.referenced_column_id) AS RefColumnName
FROM sys.foreign_keys AS f
INNER JOIN sys.foreign_key_columns AS fc
ON f.OBJECT_ID = fc.constraint_object_id

WHERE OBJECT_NAME (f.referenced_object_id)='Employees'

Below is a result showing the three tables it is connected with.

ForeignKey	TableName	ColumnName	RefTableName	RefColumnName
FK_Emp_Addresses	EmpAddresses	EmployeeID	Employees	EmployeeID
FK_Emp_Phones	EmpPhones	EmployeeID	Employees	EmployeeID
FK_Emp_Attributes	EmpAttributes	EmployeeID	Employees	EmployeeID

Thank you.

Monday, June 4, 2012

Change Database Compatibility Level

Recently I came across a database environment that had SQL Server 2005 (don't ask me why), but the databases (there were about 50 databases) were running at SQL Server 2000 compatibility level. After ensuring that nothing will break if I changed a database's compatibility level to 2005, I decided to go ahead and run a script to change all databases' compatibility level to 2005.

Below is a script that you can run in your environment.

DECLARE  @tmpTable TABLE (Name varchar(50),compLevel INT,StateDesc varchar(50))

INSERT INTO @tmpTable(Name,CompLevel,StateDesc)
SELECT Name,Compatibility_level,State_Desc FROM sys.databases 

DECLARE @DBName varchar(50)
DECLARE curDBName CURSOR FOR
    SELECT Name FROM @tmpTable WHERE StateDesc='ONLINE'
OPEN curDBName
FETCH NEXT FROM curDBName INTO @DBName
WHILE @@FETCH_STATUS=0
    BEGIN
        DECLARE @compLevel INT
        SELECT @compLevel=compLevel FROM @tmpTable WHERE Name=@DBName
        If @compLevel=80
            BEGIN
            Exec sp_dbcmptlevel @DBName,90
            END
        FETCH NEXT FROM curDBName INTO @DBName
    END
CLOSE curDBName
DEALLOCATE curDBName

You can change the version to move from 2000 to 2008 or 2005 to 2008, but before you do that, make sure that your database will not break. One way to check would be to create database scripts by logging into Management Studio > Right Click on Database > Tasks > Generate Scripts. Make sure you select all objects i.e. tables, views, stored procedures, functions etc. and then save scripts. Run these scripts to create an empty database in the SQL environment you are upgrading your database to. If there are no errors, there is a good chance that your production database will work fine.

Thank you.

Saturday, June 2, 2012

Creating a Composite Foreign Key

In previous post we discussed various types of primary keys such as natural key, surrogate key and composite key. To recap, a composite primary key is a combination of more than one columns. Ordinarily a single column primary key is preferable but there are situation as we discussed previously when it makes sense to create a composite primary key.

Naturally, if you want to use composite primary key as a foreign key in other tables, you will have to create the columns in the secondary table that corresponds to the primary table. For example, you have a table called Customers and you have a composite primary key "FirstName, LastName" in this table. To use this key as a foreign key in another table say "CustomerAddresses" you will have to create the FirstName and LastName columns in CustomerAddresses table and then add the referential integrity. (I am not saying this is a good way to design your table schema, this is just an example).

Below is the script that you can run to create a foreign key in CustomerAddresses table.

ALTER TABLE dbo.CustomerAddresses
   ADD CONSTRAINT FK_Customer
   FOREIGN KEY(FirstName, LastName)
   REFERENCES dbo.Customers(FirstName, LastName)

You can use the same script to create foreign key constraint on any other columns. For example, if your Customers table has "CustomerID" as primary key, then you would add "CustomerID" column in CustomerAddresses table and then use the script above replacing "FirstName, LastName" with "CustomerID".

Thank you.

Monday, May 21, 2012

Surrogate / Natural / Composite Primary Keys

I have heard several arguments both for and against the composite primary keys. First, let's understand the type of primary keys that a table may have...

Surrogate Key - This is non essential data element that simply represent a record in the table but is not part of the business data. For example, an auto-generated identity key or a GUID.

Natural Key - One or more data elements that are part of the data and is defined as a primary key. For example - ItemNumber or ItemDescription in an InventoryItems table.

Composite Key - When two or more keys are joined to form the primary key. It generally is a combination of two or more natural keys.

This is not cast in stone, but most developers would prefer a surrogate key such as an identity column as a primary key which makes it easier to join multiple tables for querying purpose etc. It also makes it easier if you are using code generators such as nHibernate to generate your data access layer. But, if you ask a DBA, you may get a different answer. They much rather see natural key as a primary key, even if you have to use composite key to define it.

Personally, I like having an auto generated identity column as a primary key in base tables such as Employees with an EmployeeID as a primary key and Stores table with StoreID.

For linking tables, I much rather use a composite key instead of an identity column. For example, if I have EmployeeStores table which links Employees and Stores, instead of having an identity column EmployeeStoreID, I rather have EmployeeID and StoreID as a composite primary key. You may be wondering why not use an identity key instead?

Let's assume your application has a constraint i.e. an Employee can only be assigned to one store. If you have EmployeeStoreID as your primary key, the following records are perfectly legal...

EmployeeStoreID	EmployeeID	StoreID
1	100	20
2	100	30

This doesn't satisfy business constraint, does it? Sure, your application should enforce this but what if someone enters data directly? You can also enforce this by creating a unique constraint on EmployeeID and StoreID, but to me that is another layer of complexity. If I made EmployeeID and StoreID as a composite primary key, I wouldn't have this problem.

Linking tables like these are generally not involved in complex join operations so having a composite key shouldn't force you to write more complex queries. Code generators that I am aware of or have worked with also support composite primary keys.

There is still one issue though that I haven't addressed yet. What if you want to update a record, how do you determine which record to update?

For example, assume an employee can be assigned to multiple stores. consider the following table...

EmployeeID	StoreID
100	20
100	30
110	30
120	40

Employee 100 is assigned to stores 20 and 30. Now imagine this employee no longer works in store 20 but instead now works in store 40. The manager now updates this employees record via UI and changes his/her store from 20 to 40.

How would you know which record to update? Do you update the 1st record (correct one) or do you update second record (storeID=30) since your stored procedure likely received EmployeeID=100 and StoreID=40?

There are couple of ways to get around this problem.

You can have your application send the stored procedure both old store ID and new store ID and you would know which record to update.
You can have the EmployeeStoreID as an auto increment column and your application can pass this ID to your stored procedure to update the record.

Personally, I prefer option #1. If you are using typed data sets or even data access layer using code generators, you will have access to row state which will give you the original and updated values that you can pass to your stored procedure. Alternatively, you can code in a way to keep track of old value and pass it along with the new value.

I generally don't post as a programmer on this blog, I have another blog for that, but here I felt it necessary to discuss a bit about programming. I hope you've enjoyed reading this as much as I've enjoyed writing it. Your comments are appreciated.

Thank you

Thursday, May 17, 2012

Using Temp Tables across Stored Procedures

I often come across a situation where it is hugely beneficial to create a temporary table, populate it with necessary information and then have another stored procedure use this temporary table.

In the past, developers were mostly relegated to one of the following scenarios...

Option 1

Create a temporary table in your database.
Call the stored procedure which can use this table just like any other table.

But, you must delete this table after you are done, otherwise next execution may result in an error (if you are creating table without checking whether it already exists).

Option 2

Create a temporary table with # prefix
Call the stored procedure which can use this table just like any other table.

Table created with # or ## prefix is only available for that session. As long as you did everything within the same connection, you are good. Once you close the connection, the table will be automatically deleted.

This works, however creating temporary tables is generally not recommended because every time a temporary table is created, stored procedure must be recompiled and query execution plan must be generated. Moreover, if your stored procedure is used in a multi-user environment, a copy of temporary table will be created for every user - (you can avoid this by creating a global temp table (with ## prefix), but then you must account for the fact that several users may share the same data which may not be desirable).

Table variables introduced in SQL 2005 solve this problem to a great extent although there may be some performance issues because you can't create non-clustered indexes on table variables. Further, you can't use table variables created in one stored procedure in another because they will be out of scope.

Creating Temp Table

CREATE TABLE #tmpTable(ID INT, Name varchar(100))

Creating Table Variable

DECLARE @tmpTable TABLE(ID INT, Name varchar(100))

As I mentioned earlier, you can't create a table variable in one stored procedure and use it in another. SQL Server 2008 introduced a Table variable type that you can use to pass a table variable from one stored procedure to another.

Create a Table Data Type

/****** Object:  UserDefinedTableType [dbo].[TableType]    Script Date: 05/17/2012 23:07:07 ******/
CREATE TYPE [dbo].[TableType] AS TABLE(
    [CustomerID] [int] NOT NULL,
    [Name] [varchar](255) NULL,
    [City] [varchar](50) NULL,
    [State] [char](2) NULL,
    [Zipcode] [varchar](20) NULL
)
GO

Create a Stored Procedure that declares a variable of this type - Here I am declaring @Customers variable and populating it with data.

ALTER PROCEDURE sp_GetCustomers

AS
BEGIN
    DECLARE @Customers AS TableType
    INSERT INTO @Customers(CustomerID,Name,City,State,ZipCode)
    SELECT C.CustomerID,C.FirstName + ' ' + C.LastName,A.City,A.State,A.ZipCode
    FROM Customers C INNER JOIN CustomerAddress A ON C.CustomerID=A.CustomerID
    EXEC sp_GetCustomersWithAddresses @Customers
    
END

Pass this variable to another Stored Procedure - You must pass a table type variable as READONLY

CREATE PROCEDURE sp_GetCustomersWithAddresses
@Customers TableType READONLY
AS
BEGIN
    SELECT * FROM @Customers
END

You can use table type variable in your applications as well and pass it around like a regular SQL data type.

Thank you.

Friday, May 11, 2012

Disconnect Users from Database

Recently I came across a situation where I wanted to take a database offline, but someone at work was connected to this database and had left the work. I could have rebooted his machine to disconnect the user but I didn't want to do that. So, I used the power of TSQL to disconnect this user and any others users connected to this database.

USE MASTER
GO

DECLARE @DBName varchar(50), @spid varchar(10)

SET @DBName= 'Orders'

WHILE(EXISTS(SELECT * FROM sysprocesses WHERE dbid= db_id('Orders')))
    BEGIN
        DECLARE spids CURSOR FAST_FORWARD FOR
               SELECT convert(varchar, spid) FROM sysprocesses
                       WHERE dbid = db_id(@DBName)
        OPEN spids
        FETCH NEXT FROM spids INTO @spid
        WHILE @@FETCH_STATUS = 0
        BEGIN
               EXEC('kill ' + @spid)
               FETCH NEXT FROM spids INTO @spid
        END
        CLOSE spids
        DEALLOCATE spids
   END
IF NOT exists(Select * FROM sysprocesses WHERE dbid = db_id(@DBName))
        EXEC sp_dboption @DBName, offline, true
ELSE
    BEGIN
        PRINT 'The following processes are still using the database:'
        SELECT spid, cmd, status, last_batch, open_tran, program_name, hostname
        FROM sysprocesses
        WHERE dbid = db_id(@DBName)
     END

GO

This is one of many useful queries in my arsenal that I use from time to time to get the job done. You may find it useful.

Thank you.

Wednesday, May 9, 2012

SQL Server - Grant Permission to Objects

Suppose you are using integrated security to login to SQL Management Studio and create stored procedures, views etc. But in production environment, the application connects to SQL Server via a SQL Server account that belongs to public role.

Unless you have assigned execute permissions to Public security group (server role), you will get a nasty permission denied error message. While manually assigning proper security rights to a single stored procedure isn't a problem, it becomes a hassle when you have several stored procedures to take care of.

This is a simple script that will recursively scan all your stored procedures and grant execute permissions to public.

--Grant permissions on Procedures
DECLARE curProcedures CURSOR FOR
    SELECT Name FROM sysobjects where type='P'
OPEN curProcedures
DECLARE @Procedure varchar(100)
DECLARE @strSQL nvarchar(200)
FETCH NEXT FROM curProcedures INTO @Procedure
WHILE @@FETCH_STATUS = 0
BEGIN
    SET @strSQL = 'GRANT EXECUTE ON ' + @Procedure + ' TO PUBLIC'
    --PRINT @strSQL
    EXEC SP_EXECUTESQL @strSQL
    FETCH NEXT FROM curProcedures INTO @Procedure
END
CLOSE curProcedures
DEALLOCATE curProcedures

You can grant SELECT/UPDATE/DELETE permissions on tables and views

Tables

--Grant permissions on Tables
DECLARE curProcedures CURSOR FOR
    SELECT Name FROM sysobjects where type='U'
OPEN curProcedures
DECLARE @Procedure varchar(100)
DECLARE @strSQL nvarchar(200)
FETCH NEXT FROM curProcedures INTO @Procedure
WHILE @@FETCH_STATUS = 0
BEGIN
    SET @strSQL = 'GRANT SELECT ON ' + @Procedure + ' TO PUBLIC'
    --PRINT @strSQL
    EXEC SP_EXECUTESQL @strSQL
    SET @strSQL = 'GRANT UPDATE ON ' + @Procedure + ' TO PUBLIC'
    --PRINT @strSQL
    EXEC SP_EXECUTESQL @strSQL
    SET @strSQL = 'GRANT DELETE ON ' + @Procedure + ' TO PUBLIC'
    --PRINT @strSQL
    EXEC SP_EXECUTESQL @strSQL
    FETCH NEXT FROM curProcedures INTO @Procedure
END
CLOSE curProcedures
DEALLOCATE curProcedures

Views

--Grant permissions on Views
DECLARE curProcedures CURSOR FOR
    SELECT Name FROM sysobjects where type='V'
OPEN curProcedures
DECLARE @Procedure varchar(100)
DECLARE @strSQL nvarchar(200)
FETCH NEXT FROM curProcedures INTO @Procedure
WHILE @@FETCH_STATUS = 0
BEGIN
    SET @strSQL = 'GRANT SELECT ON ' + @Procedure + ' TO PUBLIC'
    --PRINT @strSQL
    EXEC SP_EXECUTESQL @strSQL
    SET @strSQL = 'GRANT UPDATE ON ' + @Procedure + ' TO PUBLIC'
    --PRINT @strSQL
    EXEC SP_EXECUTESQL @strSQL
    SET @strSQL = 'GRANT DELETE ON ' + @Procedure + ' TO PUBLIC'
    --PRINT @strSQL
    EXEC SP_EXECUTESQL @strSQL
    FETCH NEXT FROM curProcedures INTO @Procedure
END
CLOSE curProcedures
DEALLOCATE curProcedures

Thank You.

Friday, May 4, 2012

SQL Server - Truncate Table

The other day I ran into an issue where a SQL table was used as a staging area. Application would insert data into this table and then a user will review the contents of this table via the application, make any corrections and then click on "Verified". When Verified was clicked, the application will execute a stored procedure which will copy data from this table to a permanent table and then delete the records from the staging table. Depending on the number of records, the delete could take a while. Further, since the table had an identity key which was defined as SMALL INT, the table was frequently running out of ID values and had to be reseeded to start all over.

The fix was rather simple. Since this was a stand alone table and the data was deleted after it was successfully copied over to another table, the stored procedure could have just truncated the table instead of deleting the records. Truncation would have been faster and it would reset the table every time, resulting in identity key starting from initial value (1 in this case).

What if you had a table that you want to truncate every now and then but it also has foreign key constraints with other tables. Obviously, first you will have to remove the constraints, truncate the table and add the constraints back. This could get cumbersome rather quickly. Fortunately, you could create a generic stored procedure which will accept the table name, figure out any constraints on this table, drop them, truncate the table and recreate the constraints. It can then be used to truncate any other table (although you better be careful, because you may end up truncating live data tables).

Here is a stored procedure that I have used in the past.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO
CREATE PROC [dbo].[sp_TruncateTable]
    @TableName    varchar(255)

AS

SET NOCOUNT ON

DECLARE @strSQL    nvarchar(MAX)

DECLARE @CheckForeignKeys TABLE
(
    Name            varchar(100),
    ParentColumn    varchar(100),
    ChildTable        varchar(100),
    ChildColumn        varchar(100))

--    Get Foreign Key info
INSERT INTO @CheckForeignKeys
SELECT
    FK.Name,
    PC.Name,
    CT.Name,
    CC.Name

FROM SYS.TABLES T
    INNER JOIN SYS.FOREIGN_KEYS FK
        ON FK.REFERENCED_OBJECT_ID = T.OBJECT_ID
    INNER JOIN SYS.TABLES CT
        ON CT.OBJECT_ID = FK.PARENT_OBJECT_ID
    INNER JOIN SYS.FOREIGN_KEY_COLUMNS FKC
        ON FKC.CONSTRAINT_OBJECT_ID = FK.OBJECT_ID
    INNER JOIN SYS.COLUMNS PC
        ON PC.OBJECT_ID = FKC.REFERENCED_OBJECT_ID
        AND PC.COLUMN_ID = FKC.REFERENCED_COLUMN_ID
    INNER JOIN SYS.COLUMNS CC
        ON CC.OBJECT_ID = FKC.PARENT_OBJECT_ID
        AND CC.COLUMN_ID = FKC.PARENT_COLUMN_ID

WHERE T.Name = @TableName

SET NOCOUNT OFF

--    Drop foreign keys
PRINT 'Dropping Foreign Keys'

DECLARE DropConstraints CURSOR FAST_FORWARD FOR
    SELECT 'ALTER TABLE ' + FK.ChildTable + ' DROP CONSTRAINT ' + FK.Name
    FROM @CheckForeignKeys FK
    
OPEN DropConstraints
FETCH NEXT FROM DropConstraints INTO @strSQL
WHILE @@FETCH_STATUS = 0
BEGIN
    EXEC sp_executeSQL @strSQL
    PRINT @strSQL
    FETCH NEXT FROM DropConstraints INTO @strSQL
END

CLOSE DropConstraints
DEALLOCATE DropConstraints

--    Truncate table
BEGIN TRY

    SELECT @strSQL = 'TRUNCATE TABLE ' + @TableName
    PRINT @strSQL
    EXEC sp_executeSQL @strSQL

END TRY
BEGIN CATCH
    PRINT 'ERROR OCCURRED -- ' + @@Error
END CATCH


--Add constraints
PRINT 'Adding Constraints'
DECLARE AddConstraints CURSOR FAST_FORWARD FOR

SELECT 'ALTER TABLE ' + FK.ChildTable + ' WITH NOCHECK ADD CONSTRAINT ' + FK.Name + 
       ' FOREIGN KEY (' + FK.ChildColumn + ') REFERENCES ' + @TableName + '(' + FK.ParentColumn + ')
ALTER TABLE ' + FK.ChildTable + ' CHECK CONSTRAINT ' + FK.Name

    FROM @CheckForeignKeys FK

OPEN AddConstraints

FETCH NEXT FROM AddConstraints INTO @strSQL
WHILE @@FETCH_STATUS = 0
BEGIN

    EXEC sp_executeSQL @strSQL
    PRINT @strSQL
    FETCH NEXT FROM AddConstraints INTO @strSQL

END

CLOSE AddConstraints
DEALLOCATE AddConstraints
PRINT 'DONE!'
GO

If I execute this stored procedure, passing the table name I want to truncate, here is the message I get...

EXEC sp_TruncateTable 'Customers_Audit'


And here is the output...

Dropping Foreign Keys
ALTER TABLE Customers_Audit2 DROP CONSTRAINT FK_Customers_Audit2_Customers_Audit
TRUNCATE TABLE Customers_Audit
Adding Constraints
ALTER TABLE Customers_Audit2 WITH NOCHECK ADD CONSTRAINT FK_Customers_Audit2_Customers_Audit FOREIGN KEY (CustomerID) REFERENCES Customers_Audit(CustomerID)
ALTER TABLE Customers_Audit2 CHECK CONSTRAINT FK_Customers_Audit2_Customers_Audit
DONE!

Keep in mind that sp_executeSQL expects unicode parameter that's why I had to declare @strSQL as nvarchar(MAX).

Thank You.

Sunday, April 29, 2012

SQL Server - Reseeding Identity Value

Recently we came across an issue where a table was not accepting any new records. Upon further inspection we discovered that the table had an identity key of type INT and it has reached its limit. An INT datatype is 4 byte large and as such can hold a maximum of 2147483647 rows. The table in question only had about 300,000 rows.

An identity field is an auto generated number and the default increment is 1, so in a table where you simply insert or update records, you should technically never run out of available values for INT datatype or it will take a long long time. But, let's assume you inserted 1000 records in this table, and then deleted 500 of those records. Because identity value is incrementally generated, the next identity value will be 1001, even though your table only has 500 records. Precisely this is what happened with this table. At most, about 2000 rows were inserted in this table, but the data was inserted and re-inserted throughout the day. For example, 2000 rows may be inserted in the morning, then after an hour, same 2000 rows were inserted again, deleting the first 2000 rows etc. So, during every insert even though the number of records were more or less same, extra 2000 identity values were being eaten up. In about a day close to 20,000 identity values were used for just 2000 entries.

Extrapolate that to several years and you can imagine why the table was running out of values. There are couple of ways to fix this problem, but it largely depends on how complex your database schema is, i.e. whether this identity key is also used as a foreign key in some other table(s) etc. Fortunately, this table was stand alone, so it was rather simple.

A temporary solution to this problem is to find a contiguous block of unused identity values. For example, let's imagine your table has 1,000,000 rows but the 1st record's identity value is 500,000. Which means you can reseed your table to restart your identity keys from value 1 again by running the following DBCC command...

DBCC CHECKIDENT (YourTableName, reseed, 0)

Once you run this command, the next record in this table will start from 1. But, as I said, this is a temporary solution because once the identity value reaches 499,999 in the above example, it will error out when you try to insert next record, because identity value 500,000 is already present.

But what if you couldn't find contiguous empty identity values. If you can afford to keep database down for a few hours, here is what you can do...

Disconnect users from the database (with proper notification of course) and bring it in single user mode.
Create a new table with the same schema (name it something other than your table in question) and then write a query to move all the records from old table to this new table (except the identity value) which should reseed the identity value in this new table and all the records will be inserted in contiguous fashion, starting from 1.
Once you have completed the move, rename or drop the old table (I would rename it at least until the dust has settled) and rename the new table to the same name as the old table.

But, what if you can't afford to keep your database down for more than a few minutes? It gets little tricky, but it is still doable. Here is what I would do...

Switch the database to single user mode.
Rename the current table to something else.
Create a new table with the same schema and name it same as the old table.
Bring the database back up so that the users can start using it and all the new records will not be added to this new table starting from identity value 1. (Old data is still not available).
Write a query to move the data from old table to the new table (ideally you would want to create a scheduled task that can be run when the load on the database, especially on this table is low to avoid deadlocks). You can also play with query wait times or CPU resource governor but I would recommend the following...

Normally I wouldn't recommend this, but since this is a background operation and the key is to lock the table for the least amount of time, I would create a FAST_FORWARD cursor to run on the old table and insert one record at a time in the new table. This, coupled with scheduling during low load will prevent or at least minimize deadlocks.

There may be other ways, but this has worked for me. Also, if you want to check the current identity value of your table, you can run the following command

DBCC CHECKIDENT (YourTableName)

While these are workarounds, deleting and reinserting the same data is never a good idea. Also, it doesn't beat a good database design. Don't use identity key if you don't have a particular reason for it and also if you must use it then ensure proper data type (SMALLINT, INT or BIGINT) depending on the potential table size.

Thank you.

Thursday, April 26, 2012

SQL Server Triggers

In previous post we talked about views, today we will review triggers and what purpose they serve. Triggers are useful but they should be used judiciously.

A trigger is a T-SQL script or a special stored procedure that is bound to the table and automatically executes based on a certain action on the table. For example, you can have a trigger that executes when a record is inserted, updated or deleted in a table.
There are two types of triggers - Data Definition Language (DDL) triggers and Data Manipulation Language (DML) triggers. DML triggers execute in response to data modification, for example, when you insert/update/delete a record. DDL triggers are executed when you modify data definition for example - adding a column to the table etc.

DML Triggers
DML Triggers can be further categorized into AFTER Trigger and INSTEAD OF Trigger. After triggers are executed after the statement with which they are associated with has been executed.
For example, suppose you have an audit table and anytime a record is inserted in the main table, you want to copy the newly inserted record in an audit table.

CREATE TRIGGER trgAfterInsert ON [dbo].[Customers] 
FOR INSERT
AS
    Declare @CustomerID INT;
    Declare @FirstName varchar(50);
    Declare @LastName varchar(50);
    Declare @EmailAddress varchar(100);
    
    SELECT @CustomerID =newRow.CustomerID from inserted newRow;    
    SELECT @FirstName = newRow.FirstName from inserted newRow;    
    SELECT @LastName = newRow.LastName from inserted newRow;    
    SELECT @EmailAddress = newRow.EmailAddress from inserted newRow;    
    
    INSERT INTO Customers_Audit
           (CustomerID,FirstName,LastName,EmailAddress) 
    values(@CustomerID,@FirstName,@LastName,@EmailAddress);

    PRINT 'AFTER INSERT trigger fired.'
GO

When you insert a record in customers table, it will fire this AFTER trigger and insert the newly inserted record into Customers_Audit table.

Similarly, you can create a trigger for UPDATE or DELETE.

Notice "from inserted" is a logical table inserted method that you can use to get the values from newly inserted rows. There is no logical update method, but you can use the same inserted method to get the updated rows and do something with them. For deleted rows, there is logical deleted method to get the deleted values. For example, I can create a new trigger when a record is deleted.

CREATE TRIGGER trgAfterDelete ON [dbo].[Customers] 
FOR DELETE
AS
    Declare @CustomerID INT;
    Declare @FirstName varchar(50);
    Declare @LastName varchar(50);
    Declare @EmailAddress varchar(100);
    
    SELECT @CustomerID =deletedRow.CustomerID from deleted deletedRow;    
    SELECT @FirstName = deletedRow.FirstName from deleted deletedRow;    
    SELECT @LastName = deletedRow.LastName from deleted deletedRow;    
    SELECT @EmailAddress = deletedRow.EmailAddress from deleted deletedRow;    
    
    INSERT INTO Customers_Audit
           (CustomerID,FirstName,LastName,EmailAddress) 
    values(@CustomerID,@FirstName,@LastName,@EmailAddress);

    PRINT 'AFTER DELETED trigger fired.'
GO

INSTEAD OF Trigger
INSTEAD OF Trigger allows you to intercept an execution and then roll back or commit transaction based on your criteria. For example, you don't want to delete certain records. You can create an INSTEAD OF trigger that can intercept the deletion, detect if the condition is satisfied and then either commit the deletion or rollback.

For example, I don't want to delete any customer whose last name is "Smith". I can create an INSTEAD OF trigger to prevent deleting such customers.

CREATE TRIGGER trgInsteadOfDelete ON [dbo].[Customers] 
INSTEAD OF DELETE
AS
    Declare @CustomerID INT;
    Declare @FirstName varchar(50);
    Declare @LastName varchar(50);
    Declare @EmailAddress varchar(100);
    
    SELECT @CustomerID =deletedRow.CustomerID from deleted deletedRow;    
    SELECT @FirstName = deletedRow.FirstName from deleted deletedRow;    
    SELECT @LastName = deletedRow.LastName from deleted deletedRow;    
    SELECT @EmailAddress = deletedRow.EmailAddress from deleted deletedRow;    
    
    IF (@LastName='Smith')
        BEGIN
            RAISERROR('Cannot Delete',16,1)
            ROLLBACK;
        END
    ELSE
        BEGIN
            DELETE FROM Customers WHERE CustomerID=@CustomerID
            COMMIT
        END
        
GO

DDL Trigger
DDL Triggers are fired when a table schema is modified. If I want to prevent any table modification, I can create a trigger like this...

CREATE TRIGGER TrgAlterTable
ON DATABASE 
FOR DROP_TABLE, ALTER_TABLE 
AS 
PRINT 'You must disable Trigger "TrgAlterTable" to drop or alter tables!' 
ROLLBACK ;

Unless I disable or drop this trigger, I will not be able to drop or modify a table.

Enabling / Disabling a Trigger
You can disable all triggers or a specific trigger by executing the following script.

ALTER TABLE Customers {ENABLE|DISBALE} TRIGGER ALL

To enable or disable a specific trigger, simply replace "ALL" with the trigger name.

While triggers serve a useful purpose, one of the issue I have with them is that they are hidden and after you have long forgotten, troubleshooting an issue becomes a problem because you just don't think about the triggers. That's why it is important to have some kind of messaging/logging in triggers to trace an issue.

Thank you.

Sunday, April 22, 2012

Using SQL Views to Insert/Update/Delete Data

In previous post we discussed SQL Server Views and how you can use them to logically abstract some of the complexities of the underlying schema. We focused mainly on selecting data using views.

Views also allow you to insert/update/delete data, but they are limited in their power when it comes to modifying data.

Think back for a second - when you insert data in a table, you can only insert in one table at a time. If more than one table is related via referential integrity, you must insert in one table and then insert in subsequent tables using the referential integrity key from the primary table. Similarly, when you are updating or deleting data from the tables, you can only update/delete from one table at a time. Additionally, the user must have proper permissions to do.

Same restrictions apply when you are using views to achieve this. Some of the restrictions that apply to views are...

Insert/Update/Delete statements must reference columns from only one base table.
Columns in the view must relate to underlying columns in the base table directly i.e. they cannot be computed columns such as AVG, COUNT, SUM, MIN, MAX etc.
The columns being modified cannot also be grouped or affected by DISTINCT, GROUP BY or HAVING clauses

If you think about it, it makes sense. SQL Server wouldn't know which column to modify if you have applied these clauses.

If you have used WITH CHECK OPTION (see previous post about this), you can't use TOP in SELECT statement of the view.
If you have any sub/nested queries, same restrictions apply to them as well.
Constraints defined at the table columns such as not null, referential integrityetc. also apply when modifying via views.

In previous post we created a view to retrieve some records from multiple tables. Let's see if we can use the same view to update or delete a record.

CREATE VIEW view_Orders

AS
SELECT C.FirstName As CustomerFirstName,C.LastName As CustomerLastName,
Count(O.OrderID)As TotalOrders,S.FirstName As SalesFirstName,
S.LastName As SalesLastName FROM Customers 
C INNER JOIN Orders O ON C.CustomerID=O.CustomerID
INNER JOIN SalesPerson S ON O.SalesPersonID=S.SalesPersonID
Group By C.FirstName,C.LastName,S.FirstName,S.LastName
GO

Since both customer's and sales person's first and last name are used in Group By clause and also the OrderID column is using Count function (OrderID is identity column so you can't update anyway, but just wanted to make a point), none of the columns in this view can be updated or deleted.

What if I create a view like this...

CREATE VIEW view_OrdersList

AS
SELECT C.FirstName As CustomerFirstName,C.LastName As CustomerLastName,
O.OrderID As OrderNumber,S.FirstName As SalesFirstName,
S.LastName As SalesLastName FROM Customers 
C INNER JOIN Orders O ON C.CustomerID=O.CustomerID
INNER JOIN SalesPerson S ON O.SalesPersonID=S.SalesPersonID
GO

I should be able to use this view to pretty much update any column i.e. Customer First Name, Last Name and Sales Person First and Last Name. I can't update Order ID because it is identity column.

In my database, this view returns the following records

I can run the following update commands one after another to update the customer last name and sales person last name using the same view, even though they are in two different underlying tables.

UPDATE View_OrdersList SET CustomerLastName='Roberts' WHERE 
CustomerLastName='Smith'
GO
UPDATE View_OrdersList SET SalesLastName='Kimberly' WHERE 
SalesLastName='Kimber'
GO

But what about updating both records using the same update query?

UPDATE View_OrdersList SET CustomerLastName='Roberts',SalesLastName='Kimberly' 
WHERE CustomerLastName='Smith' AND SalesLastName='Kimber'
GO

This will result in the following error

Needless to say it violates the above defined rules i.e. only one table can be updated at a time, hence the error.

I can also insert using the same view or delete a record as long as no referential integrity or other constraints are violated.

A view abstracts away some of the complexities and allows you to use the same view to select/update/insert/delete, although you still have to ensure all the conditions are satisfied just as you would when making modifications directly in tables.

Thank you.

Saturday, April 21, 2012

SQL Server Views

So far we have been discussing database management, backup, transactions, log shipping etc. In most places developers are generally not concerned with these tasks; but they fall under the domain of DBAs. We will now turn our attention to the database development, primarily the tasks that are in the domain of a developer. Although admittedly, today's topic could go either way.

In this post, we will take a look at views and how you can create views to simplify some of the complex operations. Views are logical representation of your schema. They hide the underlying complexities of your database schema, table relationships etc.

For example, below is a subset of tables from my database with related tables tied together via referential integrity.

Let's assume you want to run the following query...

SELECT C.FirstName,C.LastName,Count(O.OrderID)As TotalOrders,S.FirstName,S.LastName
FROM Customers C INNER JOIN Orders O ON C.CustomerID=O.CustomerID
INNER JOIN SalesPerson S ON O.SalesPersonID=S.SalesPersonID
Group By C.FirstName,C.LastName,S.FirstName,S.LastName

Even though this query only joins three tables, it is quite complex. What if you can create a view that abstracts all this information and your application can simply use the view to get the same results instead of using this query.

CREATE VIEW view_Orders

AS
SELECT C.FirstName As CustomerFirstName,C.LastName As CustomerLastName,
Count(O.OrderID)As TotalOrders,S.FirstName As SalesFirstName,
S.LastName As SalesLastName FROM Customers 
C INNER JOIN Orders O ON C.CustomerID=O.CustomerID
INNER JOIN SalesPerson S ON O.SalesPersonID=S.SalesPersonID
Group By C.FirstName,C.LastName,S.FirstName,S.LastName
GO

Now, I can simply write a query against this view which will return me same results as the query above

SELECT * FROM view_Orders

In many ways, views are pseudo-tables. You can open a view in design mode and it will show you the tables that are used in that particular view and relationship between them. For example, if I right click on the view I created above and select design, I will see something like this, showing three tables by their Alias name. Also notice that the columns this view is returning are checked and the column which is being computed (OrderID) has a function sign on right hand side.

While a view allows you to use many clauses available with SELECT such as GROUP BY, TOP and functions such as Count, Sum etc. you can't use ORDER BY Clause, unless you use TOP. For example, If I change my above view to something like this...

CREATE VIEW view_Orders

AS
SELECT C.FirstName As CustomerFirstName,C.LastName As CustomerLastName,
Count(O.OrderID)As TotalOrders,S.FirstName As SalesFirstName,
S.LastName As SalesLastName FROM Customers C 
INNER JOIN Orders O ON C.CustomerID=O.CustomerID
INNER JOIN SalesPerson S ON O.SalesPersonID=S.SalesPersonID
Group By C.FirstName,C.LastName,S.FirstName,S.LastName 
ORDER BY C.FirstName
GO

I will get an error. But I can use ORDER BY Clause if I want to return TOP x number of rows like this...

CREATE VIEW view_Orders

AS
SELECT TOP 10 C.FirstName As CustomerFirstName,C.LastName As CustomerLastName,
Count(O.OrderID)As TotalOrders,S.FirstName As SalesFirstName,
S.LastName As SalesLastName FROM Customers C 
INNER JOIN Orders O ON C.CustomerID=O.CustomerID
INNER JOIN SalesPerson S ON O.SalesPersonID=S.SalesPersonID
Group By C.FirstName,C.LastName,S.FirstName,S.LastName 
ORDER BY C.FirstName
GO

If you want to order result set returned by a view, you can always supply ORDER BY clause in your query

CREATE VIEW view_Orders ORDER BY CustomerFirstName

Views also allow you to add certain options when creating them. The two most important once are WITH SCHEMABINDING and WITH ENCRYPTION.

If you create a view with SchemaBinding, you will not be able to alter the schema of any tables that are associated with this view without first dropping the view or altering the view, removing SchemaBinding option. For example, above view can be created as

CREATE VIEW view_Orders WITH SCHEMABINDING

Also, you cannot use SchemaBinding unless you use the two part table name i.e. schema name.table name. In my previous example, all tables have dbo schema, so I will modify the above statement like this...

CREATE VIEW view_OrdersWITH SCHEMABINDING

AS
SELECT TOP 10 C.FirstName As CustomerFirstName,C.LastName As CustomerLastName,
Count(O.OrderID)As TotalOrders,S.FirstName As SalesFirstName,S.LastName As SalesLastName
FROM dbo.Customers C INNER JOIN dbo.Orders O ON C.CustomerID=O.CustomerID
INNER JOIN dbo.SalesPerson S ON O.SalesPersonID=S.SalesPersonID
Group By C.FirstName,C.LastName,S.FirstName,S.LastName ORDER BY C.FirstName
GO

You can also Encrypt view by using WITH ENCRYPTION. When a view is encrypted, you cannot see it either in design mode or generate a create / alter / drop and create script. The only script you can create is Drop. An encrypted view cannot be un-encrypted.

CREATE VIEW view_OrdersWITH SCHEMABINDING, ENCRYPTION

AS
SELECT TOP 10 C.FirstName As CustomerFirstName,C.LastName As CustomerLastName,
Count(O.OrderID)As TotalOrders,S.FirstName As SalesFirstName,S.LastName As SalesLastName
FROM dbo.Customers C INNER JOIN dbo.Orders O ON C.CustomerID=O.CustomerID
INNER JOIN dbo.SalesPerson S ON O.SalesPersonID=S.SalesPersonID
Group By C.FirstName,C.LastName,S.FirstName,S.LastName ORDER BY C.FirstName
GO

Partitioned Views
A partitioned view is defined as a view that selects data from multiple tables that have the same structure. The data is said to be partitioned, hence the term partitioned. The data is combined together using UNION ALL. The two tables can be on the same server or even on different servers.

Views are powerful feature in the sense that they abstract away some of the underlying complexities and simplify working with data.

Although views are generally used for selecting data, you can use them to update/insert data, but you are limited in what you can do when inserting/updating data via views. In next post we will discuss using views to insert/update data.

Thank you

Saturday, April 14, 2012

Log Shipping Configuration - Things to watch out

Recently I came across a situation where a server was configured to log ship. Everything was setup properly and the primary server had security rights to copy transaction log files to the secondary server.

Once configured, log shipping would work for a day and restore the logs to the secondary database as you would expect, but then it will start failing. Based on the error message, it was clear that log shipping chain was being broken somehow.

The person in charge of the database management had disabled differential backups erroneously thinking they were somehow breaking the log chain.

Remember that full backups and differential backups do not break transaction log chain. The only way you can break the log chain is either by switching the database to simple recovery model (which will truncate the log and break the chain) or if you take a backup outside of the log shipping backup process.

Log shipping consists of the following main operations:

Backup Transaction Log
Copy it to the secondary destination server (You may configure so that the backup file is created on the secondary server and this step won't be necessary).
Restore the Transaction Log to the secondary database (on the secondary database).

Log shipping simply creates a transaction log backup from the last log based on LSN and restore must happen in that order, i.e. previous log must be restored first and so on. Naturally, if an outside process takes a transaction log backup, it will break the log chain and any log restores without this backup will fail. The only way to recover would be to find the missing log backup or start all over.

Indeed that was the case here. Even though there wasn't another transaction log backup process, there was a shrink log file job configured to run nightly and as part of this process, the job was first creating a backup and then shrinking the log file.

Transaction log backups will automatically truncate the log file (although it may not happen right away) but file generally won't shrink, so if you are low on disk space it is good practice to shrink the log file periodically, but you shouldn't be taking the backup, if you already have transaction log backup or log shipping configured elsewhere.

To know more about transaction log shipping and how to configure it, check out this post.

Thank you.