Wednesday, February 29, 2012

Using Cursors in SQL Server


Cursors are generally not recommended and are slower compared to other methods of reading data. But there are situations where you must use cursors. I have found them to be useful when I am trying to run a SELECT on a table in Query Analyzer and performance is not a concern.

For example, consider a Stores table that has one to many relationship with another table called StoreHours. The store closes at different time depending on weekday. Also stores close at different times in different regions. You need to run a query to determine the latest closing time for each store. You can easily achieve this by running a cursor.

As I mentioned earlier, cursors are rather slow, but you can increase their performance using FAST_FORWARD and READ_ONLY hints. Also remember to always deallocate and close cursor when done.

Let's use the cursor to select latest closing time for each store, insert them into a temporary table and then select the records from the temp table.

DECLARE @tmpStoreClosing(StoreID INT, ClosingTime DateTime)
DECLARE @StoreID INT
DECLARE curStore CURSOR FAST_FORWARD READ_ONLY FOR
    SELECT StoreID FROM Stores
OPEN curStore
FETCH NEXT FROM curStore INTO @StoreID
WHILE @@FETCH_STATUS=0
    BEGIN
         INSERT INTO @tmpStoreClosing(StoreID,ClosingTime)
         SELECT TOP 1 @StoreID,ClosingTime FROM StoreHours 
         WHERE StoreID=@StoreID ORDER BY ClosingTime DESC
         FETCH NEXT FROM curStore INTO @StoreID
    END
CLOSE curStore
DEALLOCATE curStore

SELECT StoreID,ClosingTime FROM @tmpStoreClosing

You can also improve the performance using INSENSITIVE or STATIC keyword. When this keyword is used, cursor copies the data into tempDB and runs the select from there. The drawback of this approach is that any changes made to the data are not reflected.

FAST_FORWARD is same as FORWARD_ONLY and READ_ONLY combined. Data can only be read and in forward only manner. When using this flag, you can only use FETCH NEXT.

There are ways to achieve the same thing using a WHILE loop, which doesn't have the drawbacks of a cursor. We will discuss using a WHILE loop to achieve the same objective in future post.

Cursors should be avoided and if used they should be used primarily for reading the data in forward only manner.

Thank you.



Monday, February 27, 2012

SQL Server Locking Hints

                                                                SQL Server Locks - Part IPart II, Part III, Part IV

We have been discussing locking mechanism in SQL Server since last three posts. In previous posts we covered locks, deadlocks and isolation levels. Today we will discuss ways to control how SQL Server locks the data by passing locking hints.

Generally leaving SQL Server engine to handle locks works as expected, but there are circumstances where you may want to use locking hints to lock or not lock certain transaction.

Locking Hints

  • HOLDLOCK- When you pass this hint to your select queries, shared locks are held until the entire transaction completes instead of releasing the lock as soon as the read is complete and the required lock (table, row or page) is no longer needed. HOLDLOCK is same as SERIALIZABLE isolation level.
  • NOLOCK - When you pass this hint, SQL doesn't issue shared locks for reading data and doesn't honor exclusive locks. In other words, if there is an exclusive lock on row, page or table, query with NOLOCK hint will still execute instead of waiting for exclusive lock to be released. With this hint, dirty reads are possible. This hint only applies to SELECT statements.
  • PAGLOCK - Forces SQL Server to issue page lock instead of a default table lock that may be taken.
  • READCOMMITTED -Same as READ COMMITTED isolation level, i.e. only read committed data.
  • READPAST - When reading data, if a row is exclusively locked, read past that row, i.e. any locked rows are skipped. Beware, your results may not have all the data if you use this hint. Also, you can only use this hint if isolation level is set at READ COMMITTED. It also only applies to SELECT.
  • READUNCOMMITTED - Same as NOLOCK hint.
  • REPEATABLEREAD - Same as REPEATABLE READ isolation level, i.e. once a shared lock is acquired, it isn't released until the entire transaction completes. 
  • ROWLOCK - Forces SQL to issue row level locks instead of page and table level locks.
  • SERIALIZABLE - Same as SERIALIZABLE isolation level and same as HOLDLOCK. All transactions are blocked until the current transaction completes.
  • TABLOCK - Forces SQL to issue a table lock instead of row or page level lock. This lock is only held until the executing statement completes. However if you are running under SERIALIZABLE isolation level or also pass HOLDLOCK, the lock will be held until the end of the transaction.
  • TABLOCKX - Issues an exclusive lock on the table, preventing others from reading or updating the table. 
  • UPDLOCK - This is a special lock used when you are updating data. When you are updating data, data must be first read and then updated. Ordinarily when data is read, a shared lock is issued which will then be converted to exclusive lock to update the data. But what if a transaction acquires a shared lock and another transaction acquires an exclusive lock on the same data at the same time? The data just read could get out of sync. To prevent this, SQL Server issues an update lock instead of shared lock. This allows other transactions to read the data while this transaction with UPDLOCK is reading the data but no other transaction can obtain an exclusive lock on this data. The UPDLOCK is then converted to exclusive lock to update the data. This ensures data integrity since you can be sure that the data hasn't been updated since this transaction read it.
  • XLOCK - A exclusive lock which will be held until the end of the transaction. You can specify it with PAGLOCK or TABLOCK, indicating whether to exclusively lock page or a table.
Let's see a lock in action...

SELECT FirstName, LastName FROM CUSTOMERS WITH (NOLOCK)

SELECT FirstName,LastName FROM CUSTOMERS WITH (ROWLOCK)

Hope this helps you understand how you can use various hints available to you. This was a long series but hopefully it will be a good refresher to understand how locking mechanism works in SQL Server.

Thank you.

Saturday, February 25, 2012

SQL Server Isolation Levels

                                                                    SQL Server Locks - Part IPart II, Part III, Part IV

In previous two posts we have been discussing SQL Server locking mechanism and how deadlocks can occur. Today we will talk about Isolation Levels. Different Isolation levels allow for various levels of concurrent operations that can be performed on the same data.

Isolation is the level to which one operation / transaction should be isolated from the other operation.

Serialization is an important concept in database management systems. The serialization allows concurrent transactions to run one after another serially. For example, if one transaction is running a read operation on a set of data, and another concurrent operation is running an update on the same data, both operations will be serialized in order the requests were received by the database management engine. For example, if read operation was requested first, a shared lock will be issued to this transaction and any other update/delete operations will be blocked or queued. Once the read operation is complete, an update lock will be issued to the update operation next in line which will read the data and then receive an exclusive lock to update the data.

But what if you don't really need this level of isolation? For example, you want to read the data even though the data may be exclusively locked by another transaction? You can achieve this by using a lower level transaction or by using locking hints (we will discuss locking hints in future post).

There are following types of isolation levels according to SQL-92 specifications.

Read Uncommitted - This is the lowest level of isolation. In this level, transactions are isolated only enough to ensure that corrupt data is not read. This level allows dirty reads, i.e. the data not yet committed.

Read Committed - This is the default level. Only committed data will be read. When a transaction has an exclusive lock, all other transactions requesting shared lock will be blocked

Repeatable Read - Repeatable read locks all the rows that it touches unlike read committed. Let's explain this with an example.

Let's say you are running a cursor where you are fetching one row at a time and then adding them to a temporary table (see below)

    DECLARE @tmpTable TABLE (FirstName varchar(50), LastName varchar(50),
                                                            Address varchar(255)

    DECLARE @customerID INT
    DECLARE CURSOR curCustomer READ_ONLY FOR
    SELECT CustomerID FROM Customers
    OPEN curCustomer
    FETCH NEXT FROM curCustomer INTO @customerID
    WHILE @@FETCH_STATUS=0
          BEGIN
              INSERT INTO @tmpTable(FirstName,LastName,Address)
              SELECT FirstName,LastName,Address FROM Customers
              WHERE   CustomerID=@customerID
              FETCH NEXT FROM curCustomer INTO @customerID
          END
    CLOSE curCustomer
    DEALLOCATE curCustomer

If you run this transaction at read committed level, only one row will be locked for reading at a time and then the lock will be released. Imagine you read the first row where address was "123, Roswell Road, Atlanta, GA, 30330" and insert this row in tmpTable. Now you lock the second row to read and insert into @tmpTable. Now imagine another concurrent transaction locked the first row and updated the address to "567, Peachtree Road, Atlanta, GA 30101". The record in your temp table is now incorrect.

If you run this transaction under repeatable read, this isolation level will lock the 1st row and then the second row and then the third row and so on, but will not release any lock until the entire transaction is complete. This ensures that the data read cannot be updated while the transaction is still running. While this promotes better data integrity, concurrency suffers because all other transactions must now wait for the resources to be released, resulting in timeouts.

Although repeatable read prevents data integrity, phantom inserts/updates/deletes are still possible. If one transaction reads the data more than once, then the rows are locked as they are read first time and will then be released. But, before the data is read second time, another transaction may insert a new row or update/delete rows that were read previously, resulting in phantom data.  

Serialization Level - Serialization level works by using key range locks. Using the same example as above, when this isolation is read, entire key range (in this case entire table since we are reading all records in customer table) will be locked until the entire transaction completes, which will prevent phantom rows since the other transactions must wait until this transaction ends. Serialization is the highest isolation level and while it promotes data integrity, it reduces concurrency. To know more about key range locks, refer back to my post about SQL Server Locks.

In addition to these isolation levels, SQL Server supports two additional isolation levels.

READ_COMMITTED_SNAPSHOT - This isolation level uses row versioning instead of shared lock when reading the data. Read operations only require SCH-S table level locks (i.e. to prevent DDL modifications) but no page or row level locks are needed. When reading rows modifed by another transaction, only the version of the row that existed when the transaction started is read. This only works when another isolation level "ALLOW_SNAPSHOT_ISOLATION" option is ON. 

You can set isolation level by using the following syntax...

    SET TRANSACTION ISOLATION LEVEL
    {
        READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ
        | SNAPSHOT |  SERIALIZABLE
     }
For Example,

    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
    GO
    DECLARE @tmpTable TABLE (FirstName varchar(50), LastName varchar(50),
                                                           Address varchar(255)

    DECLARE @customerID INT
    DECLARE CURSOR curCustomer READ_ONLY FOR
    SELECT CustomerID FROM Customers
    OPEN curCustomer
    FETCH NEXT FROM curCustomer INTO @customerID
    WHILE @@FETCH_STATUS=0
          BEGIN
              INSERT INTO @tmpTable(FirstName,LastName,Address)
              SELECT FirstName,LastName,Address FROM Customers
              WHERE CustomerID=@customerID
              FETCH NEXT FROM curCustomer INTO @customerID
          END
    CLOSE curCustomer
    DEALLOCATE curCustomer
    GO

You can set ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT option at the database level by using the following script...
    ALTER DATABASE <DBName>
    SET
    {
      ALLOW SNAPSHOT_ISOLATION {ON | OFF}
       | READ_COMMITTED_SNAPSHOT {ON | OFF}
    }

Hopefully this post will help you determine the best isolation environment if you must change it. Although, SQL Server defaults work in most cases.

Thank you!

Thursday, February 23, 2012

SQL Server Deadlocks

                                                                   SQL Server Locks - Part I, Part II, Part III, Part IV

Deadlocks
In previous post we discussed locking mechanism in SQL Server and what are different types of locks and how you can control default locking mechanism by using various hints with your queries.

Today, we will discuss deadlocks. Locking is a necessity for any DBMS to work in a multi-user environment and to maintain data consistency. Deadlocks occur when two transactions are waiting on each other to release the lock. Eventually SQL Server chooses one of the transaction as victim and kills it.

Let's explain this using an example -

Suppose you have a Customers table and you run the following queries in two separate transactions.

Transaction A
--statement 1
SELECT * FROM Customers WHERE Customer ID = 1
--statement 2
UPDATE Customers SET EmailAddress='123@mail.com' WHERE CustomerID = 2

Transaction B
--statement 1
SELECT * FROM Customers WHERE Customer ID = 2
--statement 2
UPDATE Customers SET EmailAddress='123@mail.com' WHERE CustomerID = 1

Let's assume one user calls a routine that executes transaction A and another use calls a routine that executes transaction B concurrently.

Transaction A will issue a shared lock while executing statement 1 on row 1 and transaction 2 will issue a shared lock while executing statement 1 on row 2.

Now  transaction  A will request an exclusive lock on row 2, while transaction B will request an exclusive lock on row 1.

Transaction A cannot complete because it is waiting on transaction B while transaction B cannot complete because it waiting on transaction A.

This condition is also called cyclic dependency. SQL Server engine deadlock monitor periodically checks for cyclic dependency and when it detects a dependency, it chooses one of the transaction as a victim and terminates it. The transaction chosen as a deadlock victim will terminate but the other transaction will complete successfully.

Remember, deadlocking is different than blocking. Blocking occurs when a transaction requests a lock on a resource which is exclusively locked by another transaction, in that case the requesting transaction will wait for the lock to be released.

Minimizing Deadlocks
Deadlocks are a fact of life, but you can minimize them by following some basic steps...

  • Objects should be accessed in the same order
    • All concurrent transactions should access the objects in the same order. For example, If there are two tables - Orders and Order Details, all concurrent transactions should request locks in the same order - For example first on Orders table and then on Order Details table. If all lock requests flow in the same direction, the subsequent requests will be blocked until a lock is released from the previous request, but deadlock should not occur.
  • Avoid prompting for User input when a transaction is running
    • A transaction should be completed as fast as possible so that the locks are acquired and released as quickly as possible. Requiring user input delays the transaction, hence resources will be locked for longer period of time.
  • Keep Transactions Short and in one batch
    • Keeping transactions short and in one batch will minimize resources, resulting in faster transaction completion.
  • Use a lower Isolation Level
    • We will discuss isolation levels in more detail in subsequent post but lower the isolation level, less locking. For example if you implement read committed isolation level, it allows a transaction to read data previously read by another transaction without waiting for that transaction to complete.
  • Set READ_COMMITTED_SNAPSHOT to ON
    • When this database option is ON, a transaction that is running under read committed isolation level uses row versioning instead of shared lock during read operations. Row versioning does not acquire shared lock for read operations. You must also set ALLOW_SNAPSHOT_ISOLATION to ON in addition to read committed snapshot for it to work.
  • Use bound connections
    • If your application opens two or more connections, you can bound them together such that one is a primary connection while other is a secondary connection. Under this scenario, any locks that are acquired by the primary connection are treated as they are also acquired by the secondary connection and vice versa.
This concludes our topic for today. There are some other aspects of locking such as providing locking hints with your queries or setting up different isolation levels. We will discuss them in subsequent posts.

Thank you.


Sunday, February 19, 2012

SQL Server Locks

                                                                    SQL Server Locks - Part I, Part II, Part III, Part IV

Locking is a major part of any relational database, including SQL Server. The concept of Locking and how database engine issues them and how can you troubleshoot to determine some of the problems your DBMS may be facing is quite large and complex. As a result, we will be covering it in multiple parts. In this part, I will explain the types of locks available to SQL Server and how they are used. Subsequent posts will cover them in more details.

Types of Locks 


Shared Locks
Shared locks are issued at the database level when you connect to a database. They are also issued when you run a SELECT statement against a table i.e. when you read data. Shared locks are issued when you are using the pessimistic concurrency model. When a shared lock is issued, more than one transaction can read the data but the data cannot be modified. After the data has been read, the lock is released, unless you run your queries with READCOMMITTED OR READCOMMITTEDLOCK locking hint (more on these later) or you have a more restrictive isolation level.

Update Locks
They are a combination of shared and exclusive locks. When you issue an update statement, SQL server first searches for the data to be modified. Since SQL engine has to first read the data to be modified and then modify it, it can in theory issue a shared lock when reading the data and then convert it to exclusive lock when updating this data, but this could result in a deadlock, hence instead of using a shared lock then upgrading it to exclusive lock, SQL uses an Update Lock. It is very similar to an exclusive lock in that only one update lock can be held on the data at one time. The difference between an update lock and exclusive lock is that the before the data is modified, update lock has to be converted to an exclusive lock, since an update lock cannot modify the data. You can also provide an UPDLOCK hint in your statements to force an update lock.

Exclusive Locks
These locks are issued to lock the data that is being modified by one transaction to prevent modifications by other concurrent transactions. You cannot even read the data held by exclusive locks, unless you provide "NOLOCK" hint to your select query or you are using read uncommitted isolation level. Since data must be first read before it can be modified, exclusive locks are accompanied by shared or update locks on the same data. NOLOCK hint allows you to read data locked by an exclusive lock, but you may end up reading dirty data i.e. before the data is committed.

Intent Locks
Intent Locks are a mechanism for one transaction to notify others that it is intending to lock the data. Generally a transaction will issue an intent to lock the object that is higher in lock hierarchy than what is currently locked by this transaction. For example, if a transaction has exclusive lock at the row level, it may issue an intent lock on the page level or table level. This prevents others transactions from locking the higher objects. The way this works is before a transaction obtains a lock at the row or page level, it first sets an intent lock on the table. Thus, you get a more granular lock (update lock or exclusive lock) at the lower level after setting an intent level lock on its parent.

Schema Locks
When database engine is generating execution plans, it issues Schema Stability Lock. It doesn't block access to the underlying data. Schema modification lock is used when a DDL statement (Data Definition Language) is being executed i.e. schema is being modified. This lock blocks the access to the underlying data.

Bulk Update Locks
These locks are issues when performing bulk operations and you use TABLOCK hint. They allows for multiple inserts concurrently and prevent data read by other transactions. In essence this is a table lock.

Conversion Locks
Converting one type of lock to other lock results in a conversion lock. There are three types of conversion locks.
  • Shared With Intent Exclusive - A transaction with a shared lock also has exclusive intent lock on some pages/rows.
  • Shared with Intent Update - A transaction with a shared lock also has an update intent lock on some pages/rows.
  • Update with Intent Exclusive- A transaction with an update lock also has an exclusive intent lock on some pages/rows.
Key - Range Locks
When using serializable transaction isolation level, any query that is executed more than once in a transaction must obtain the same set of rows. For example, if a query when executed first time returns 10 rows, any subsequent execution of the same query within the same transaction must return the same 10 set of rows. If this query tries to fetch a row that doesn't exist, it cannot be inserted by other transactions until the first transaction reading the row completes, because if the second transaction was allowed to insert a row, it would appear as a phantom to the first transaction. A key-range lock locks the index rows and the ranges between those index rows. Any attempt to insert/update or delete any row within this range by a second transaction would modify the index, the key-range lock blocks the second transaction until the first transaction completes.

There are two types of Key-Range Locks
  • RangeX-X - Exclusive lock is issued on the interval between the keys and an exclusive lock is issued on the last key in the range.
  • RangeS-U - Shared lock is issued on the interval between the keys and update lock is issued on the last key in the range.
Spin Locks
A light-weight locking in which data is not locked but the transaction waits for a short period of time for a lock to be free if data is already locked. It is a mutual exclusivity mechanism to reduce the context switching between multiple threads.

Lock Granularity
Locks can be issued against a table, page or a row. If a table has a clustered index, there is also a Key lock. When you lock at the lower level with intent to lock at the higher level, it increases concurrency since multiple operations can get the locks they need. It also depends on the isolation level we choose. More restricted the isolation level, more higher level locks to keep data intact. SQL Server allows you to pass hints such as NOLOCK, ROWLOCK, PAGLOCK or TABLOCK to override the default locking based on the isolation level. Although I would discourage using locking hints except in very rare situations.

As I mentioned at the beginning of this post, Locking is a complex topic and we will be covering it in multiple posts. So, watch out for the next post.

Thank you and your comments are welcome!

Saturday, February 18, 2012

SQL Server Transactions

Transactions in SQL Server allows you to batch multiple inserts/updates/deletes in one single process and an error in one sub-step will invalidate or roll back the entire transaction. This ensures the data integrity and atomicity.

Let's assume for example, you have a set of three tables in a SQL database
  • A Employee Profile Table
  • An Earnings table recording the salary, taxes paid etc.
  • A Tax rate table that records an employee's tax rate.
If the tax rate changes, you would want to update the TaxRate table first, then update the Earnings table to reflect the taxes paid (based on new tax rate). For example, if the earnings were $100,000 and the tax rate was 20%, the taxes paid in the Earnings table would be $20,000. Now, if the tax rate jumps to 25%, TaxRate table would be updated to reflect tax rate of 25% and the Earnings table would be reflected to show the updated tax of $25,000.

Now lets assume you are performing these operations as individual updates without wrapping them in one transaction.

      --1st you update tax rate table
       UPDATE TaxRate SET rate=0.25 WHERE EmployeeID=1
       GO
      -- now you update taxes paid 
      UPDATE Earnings SET TaxesPaid = 
      (SELECT Salary * 0.25 FROM Earnings WHERE EmployeeID=1)
      WHERE EmployeeID=1
      GO

Now, lets imagine first update executed successfully but a deadlock or some other error occurred before second update could be performed (more on deadlocks in future posts), resulting in out of sync data. TaxRate table will show the tax rate of 25% but Earnings table will show only $20,000 taxes paid.

You want to ensure that either both tables are updated together or none at all. You can wrap both updates in one transaction, and if an error occurred, entire transaction will rollback, ensuring the data integrity.

Here is how you would wrap the above updates in a transaction.

DECLARE @errorcode INT
BEGIN TRAN
    --1st you update tax rate table
       UPDATE TaxRate SET rate=0.25 WHERE EmployeeID=1
       GO
       -- check if an error occurred
       If @@Error <> 0 GO TO ERROR  -- @@Error would trap any error that may occur
      -- now you update taxes paid 
      UPDATE Earnings SET TaxesPaid =
          (SELECT Salary * 0.25 FROM Earnings WHERE EmployeeID=1)
          WHERE EmployeeID=1
      GO
     -- check again if an error occurred
       If @@Error <> 0 GO TO ERROR
COMMIT TRAN

ERROR:    --- GO TO ERROR statement will jump here anything an error occurred.
PRINT ('An error occurred')
ROLLBACK TRAN

This allows you to declare transaction at the SQL Script level - in a stored procedure for example. You can achieve the same effect by using ADO.NET Transaction or TransactionScope object in code when executing two or more separate stored procedures or in-line SQL.

Using Transaction
public void updateTaxInformation (string connectstring)
{
   using (SqlConnection conn =  new SqlConnection(connectstring))
{
      conn.Open();
      SqlCommand cmd = connection.CreateCommand();    // create sql command
      SqlTransaction trans;   // create a new sql transcation
      // start the transaction
      trans = conn.BeginTransaction("TransName");
     //assign connection and transaction to the command
     cmd.Connection = conn;
     cmd.Transaction = trans;
     try
     {
          cmd.commandText = "UPDATE TaxRate SET rate=0.25 WHERE EmployeeID=1;
          cmd.ExecuteNonQuery();
          //second update
          cmd.commandText = "UPDATE Earnings SET TaxesPaid =
                                             (SELECT Salary * 0.25 FROM Earnings WHERE EmployeeID=1)
                                             WHERE EmployeeID=1;
          cmd.ExecuteNonQuery();
          // commit transaction
         trans.commit();
      }
    catch (Exception ex)
    {
        Console.WriteLine("Exception Occurred: " + ex.Message);
        trans.Rollback();
     }
  }
}

Using TransactionScope
Beginning in .NET 2.0, Microsoft introduced TransactionScope which basically does the same thing as above, except a few subtle differences. TransactionScope simplifies some of the code and takes care of enrolling SQL commands into a transaction autmatically. The above code can be simplified using TransactionScope 

Using (TransactionScope transscope = New TransactionScope())
  {
     using (SqlConnection conn =  new SqlConnection(connectionstring))
       {
          conn.Open();
          SqlCommand cmd = connection.CreateCommand();    // create sql command
          cmd.commandText = "UPDATE TaxRate SET rate=0.25 WHERE EmployeeID=1;
          cmd.ExecuteNonQuery();
          //second update
          cmd.commandText = "UPDATE Earnings SET TaxesPaid =
                                             (SELECT Salary * 0.25 FROM Earnings WHERE EmployeeID=1)
                                             WHERE EmployeeID=1;
          cmd.ExecuteNonQuery();
          transscope.Complete();
         }
    }
       
That's all there is to it. While a transaction has its advantages such as maintaining data integrity, you have to be careful when creating a transaction. Large transactions could result in dreaded deadlocks. We will discuss deadlocks in future posts.

Thank you and your comments are welcome.








Thursday, February 16, 2012

Database Mail - Send Emails from SQL Server

Today, we will discuss how to configure and send emails using SQL Server 2008. Although the setup process is more or less same in both SQL 2005 and SQL 2008. Prior to SQL Server 2005, the database mail was called SQL Mail and lacked a few features that Microsoft introduced in database Mail.

Before you can send emails using SQL Server, you must first create a profile to use when sending emails. Below are step by step instructions on configuring database mail using management studio.

1. Expand Management and right click on Database Mail

2. Click on Next


3. Select the first option "Set up Database Mail by performing the following tasks".


4. Enter a profile name and click on Add. Note, since you won't have a profile setup, you will see the screen in step 5 below.


5. This is where you will enter the SMTP server information. In theory, if you are running IIS on the Database Server, you can configure IIS - SMTP Mail (IIS 7.x no longer have this feature, although there are still ways to configure it for now, but I expect this to go away soon). Whether you should run IIS on database server or not is a whole another argument.

Alternatively, you can configure SMTP server on another machine in your network and use it to send emails from both SQL Server via database mail and also from other applications.


6. Click on OK and click on Next, which will show you the following screen. You can leave defaults in most cases, but you can change the file attachments that you want to prohibit here.


7. Click on Next and click on Finish. Now Database mail is configured and ready to be used. You can right click on Database Mail again and click on Send Test Email to send a test email.

Before you send test email, make sure SQL Server is configured to send emails. Run the following script.


sp_CONFIGURE 'Show Advanced', 1
GO
RECONFIGURE
GO
sp_CONFIGURE 'Database Mail XPs',1
GO
RECONFIGURE
GO


Before you can use the database mail to send emails or alerts from SQL Agent jobs, you need to add operators.

8. Right click on Operators from the Management Studio.

9. Setup one or more Email Operators


10. Now you are ready to configure a SQL agent job to send an alert or a notification email based on an action.  I will not go through creating a SQL agent job in this post, but let's assume you have a job that you want to send you an alert or an email when it fails. First, we will configure an alert. Right click the job and click on properties. Click on Alerts.


11. You can pick an operator that you created below and the type of alert you want to send. Response and Options allows you to configure additional options.


12. Now lets setup Notification. Click on OK from the Alerts screen to go back to the main screen and click on Notifications. Here you should be able to select the email or page that you want to send and the condition along with the operator you want to send it to.

Click on OK, and you have configured database mail to work with your SQL agent job.

You can also use "sp_send_dbmail" to send an email using SQL script

USE msdb
GO
EXEC sp_send_dbmail @profile_name='YourProfile', @recipients='youremail@email.com',
@subject='Test',@Body='Test Email'

SQL Server stores emails in the following system tables in msdb database. You can run select statements on these tables to check if an email has been successfully processed or failed.

sysmail_allitems, sysmail_sentitems, sysmail_unsentitems, sysmail_faileditems.

sysmail_mailitems has the initial emails and when an email is sent or failed, the sent_status flag is updated and also the email is copied into the appropriate table.

You can also check the log of all the emails in sysmail_log table.

Database Mail is a powerful feature and you can use it to ensure you are getting notified when a critical job didn't execute as it should have.

As always, your comments are welcome.

Tuesday, February 14, 2012

Database Replication

In previous posts we discussed transaction log shipping and database mirroring and the key differences between the two. Continuing the series, today we will discuss database replication.

There are several uses for replication. For Example, disaster recovery or for reporting (analysis).

When used for Disaster Recovery, replication is generally setup between geographically dispersed data-centers. For example, you are hosting your servers with a hosting company that has multiple data centers located around the country. You can setup replication service which will allow your database to be replicated in one more data centers. If the database in your primary data center is inaccessible, traffic could be re-routed to the secondary data center.

Apart from the disaster recovery, another usage I have seen is that the data is periodically replicated from the primary server to the secondary server for running reports or heavy analysis. This way, you can run your reports, setup data cubes etc. on a secondary server while the primary server is mainly for online transaction processing.

Key difference between Transaction Log shipping, Mirroring and Replication is that the secondary databases in both log shipping and mirroring are in standby mode and are mostly unusable but in replication both primary and secondary databases are active.

Replication can be configured in several ways such as...

  • Snapshot - Data is simply copied from the main server to the one or more secondary servers, sometime even to a secondary database on the same server.
  • Merging Replication - Data from two or more databases is merged into a single database
  • Transactional Replication - Full initial copies of the databases are configured on one or more secondary servers and then periodic updates are applied.

Distributed database management system ensures that any inserts/updates/deletes that are performed on the data in one location are automatically synched with other locations.

There are two types of replication - Synchronous and Asynchronous. In Synchronous mode, the data is replicated on both primary and secondary server(s) at the same time, while in asynchronous replication, there is a delay before data is written to the secondary server. While asynchronous is a better option for performance reasons, a data loss could occur in case of a disaster.

Replication generally occurs either in the storage array (array of disks where the data is stored), at the server or in the network. Most SAN vendors provide a replication software that you can use to setup replication at the storage array level. There are also software solutions to setup replication at the server level, when you don't have a storage array. This is usually a cheaper solution but could impact server performance. Network replication usually is hardware based using an appliance.

Below are some of the Key differences between Log Shipping / Replication

  • Latency - A minute or more in Log Shipping, just few seconds in Replication
  • Schema altered at the primary server (also known as publisher)?
    • Log Shipping - No
    • Replication
      • Snapshot - No
      • Transactional - No
      • Merge / Updates- Yes
        • A GUID column is added if one doesn't exist.
  • Schema altered at the secondary server (also known as subscriber)?
    • Log Shipping - No
    • Replication - Depends - see below
      • Snapshot and Transactional Replication may make schema changes. Standard Snapshot and Transactional replication will not transfer identity attributes. Instead they become normal numeric columns on the secondary server.
  • Are there any requirements at the Database Schema level?
    • Log Shipping - No
    • Replication - Primary Keys are required on the tables involved in replication.
  • Can you Select individual Articles?
    • Log Shipping - No
    • Replication - Yes
  • Is the Secondary server usable 
    • Log Shipping - Mostly No - Requires exclusive lock to apply the transaction log.
    • Replication - Yes
Hopefully this article will provide you enough insight into replication and how to get started if you are thinking about setting one.

Thank you and as always, your comments are welcome.





Monday, February 13, 2012

Clustered vs. Non-Clustered Index

All About SQL!
The other day someone asked me a question about the difference between clustered and non-clustered indexes in SQL Server and why can you only have one clustered index in a table? It is one of those things that you know but often find it difficult to explain in a non-technical manner, at least that's how I felt. So, what is the difference and why can you only have one clustered index?
Clustered Index: Clustered index sort the data physically in the data table according to the keys in the clustered index. Basically all the data rows are physically sorted according to the clustered index. Since you can only sort data rows one way, there can only be one clustered index. The lowest level of clustered index are the data rows themselves. If you have a primary key defined in the table, then SQL Server automatically creates a clustered index on the primary key.
You don't have to have clustered index and the table without a clustered index stores the data on a heap. Data inserted in this table will return in the same order it was inserted in, unless you apply ORDER BY clause to your query which is SLOW!!!
Non-Clustered Index: A Non-Clustered index on the other hand is stored independently of the data rows' physical order. Think of this as a logical order of the data rows and hence there can be more than one non-clustered index. Non-clustered index's lowest level contains the keys used in the index and are a pointer to the actual data rows containing that key. This pointer is called a row location and its structure depend on whether the table has a clustered index or not. As mentioned earlier, if a table does not have a clustered index, the data rows are stored in a heap and in this case the row locator is simply a pointer to the row. For tables that do have a clustered index, the row locator is the key of the clustered index.
Both clustered and non-clustered indexes can be unique. Primary key by default is a unique index whether it is defined as a clustered or non-clustered index. Unique indexes ensure that no two rows can contain the same keys used in the unique index.
IGNORE_DUP_KEY: When creating a unique index, you can also use "IGNORE_DUP_KEY" hint. When a duplicate key is inserted, if IGNORE_DUP_KEY was specified, SQL will issue a warning and ignore the duplicate row, but if it was not specified, SQL server will issue an error message and roll back the entire insert. IGNORE_DUP_KEY is only allowed when you create a UNIQUE clustered or a non-clustered index.
FILL FACTOR: A Fill Factor defines how densely the index will be packed when it is created. If you are going to have many inserts and updates, you should use a lower fill factor to leave room for new data, which reduces the number of page splits that may occur when data is inserted. The space doesn't change when data is inserted or updated, only when index is created, so it is recommended to rebuild indexes periodically. If the table is rather static, create an index with a high fill-factor.
Indexes generally speed up the retrieval, update and deletion but often slow down inserts because every new record will have to be added to the index. Care should be taken in creating the number of indexes on a table, especially when data is frequently inserted.
While you can create some indexes during database design, often time it is not possible to know the table usage until it is out in the field. Once the database is out in the field and you want to optimize certain heavily used tables, you can do the following
  • In SQL Management Studio (SQL 2005 or SQL 2008) go to Tools > SQL Profiler and run the trace for the tables that are heavily used.
  • Collect some metrics and save the file.
  • Go to Tools > Database Engine Tuning Advisor and load the workload file, select the database and start analysis.
  • Review the recommendations and apply them to the appropriate tables.
I have generally found the tuning advisor to be a good tool, but be careful in applying all its recommendations. As always, do this in staging environment first, before applying to your production database.
As always, your comments are welcome and feel free to correct me if I made any mistake!

Benefits of Database Indexing

All About SQL!
In my previous Post I talked about various types of indexes you can create in SQL Server. Today we will discuss the benefits of Indexing and why you should have at least a clustered index in every table.
Every database system offers some kind of indexing that allows for sublinear time lookup for increased performance as opposed to linear search which is highly inefficient. For example, if a table is un-indexed, database engine will examine each record to find the searched record, resulting in approximately half the data rows being scanned on an average. Further, if the searched record doesn't exist, all the data rows will be scanned before a match not found result is returned. Indexing allows for faster searches resulting in improved performance.
Indexing however is a double edged sword. While more indexes generally will result in faster data retrieval from a table, but the more indexes you have, the longer it will take to insert new data in the table.
Depending on your table definition, some indexes are automatically created by the SQL Server. For example, to enforce a UNIQUE constraint. You can create other indexes either via CREATE INDEX statement or by using SQL Management Studio.
Below are some of the guidelines you can use when creating indexes. This is not an exhaustive list and may not be suitable in all situations, so use this as a general guideline.
1. When creating an index, choose the right data type. Certain data types are more suitable for indexing than others. Integer data types (INT, BIGINT, SMALLINT, TINYINT) are good candidates for indexing because of their specific size and are easy to compare and perform other mathematical operation. CHAR, VARCHAR etc. on the other hand are much less inefficient.
2. Make sure that indexes are actually used. I have often seen database schema where indexes were created in advance but any of the data retrieval operations don't use the keys that are indexed.
3. Also keep in mind when a retrieval query applies a SQL function to the keys that are part of the index, the indexed value is not used in the query. For example, you convert a date column which is part of the cluster to a string while retrieving the data. In this case, the index won't be used in the retrieval process.
4. If you create an index with multiple columns, pay attention to the order of the columns. Data is stored in order the columns are used in the index. The column with less unique data should be used first and so on.
5. Remember each non-clustered index column is a pointer to the clustered index columns (assuming clustered index exists in the table). Hence, care should be taken in selecting the number of columns used in the clustered index. For the same reason, try to avoid or limit using the columns that are frequently updated in a clustered index.
6. Be extra careful when rebuilding a clustered index. I have seen it many times where someone has created a SQL job to drop a clustered index and then used the "CREATE INDEX" command to recreate the index. When you drop a clustered index and then recreate it, you end up rebuilding non-clustered indexes multiple times. To avoid this, use DROP_EXISTING clause along with CREATE INDEX which will rebuild non-clustered indexes only once.
As explained in previous Post, use Fill Factor wisely.
I realize this is a long post but I wanted to keep all the crucial points in one blog to make it easier to read. As always your comments are appreciated and please don't hesitate to add other points I may have missed or let me know if I missed anything.

Sunday, February 12, 2012

Database Mirroring

All About SQL! In one of my previous posts I talked about Transaction Log Shipping and how it can be used to create a copy of the database on a secondary server and periodically apply the transaction logs. Mirroring is a similar process except a few differences.

Whereas in Log Shipping, transaction logs are periodically backed up and physically copied to the secondary server and applied to the database in recovery mode. In mirroring, SQL Server directly reads the data from the transaction logs and copies them from the principal server to the mirrored server.

 Mirroring can be configured to operate synchronously or asynchronously. In synchronous configuration, primary or principal server sends the data to the mirrored server and waits until the mirrored server saves the data on the hard-drive before committing the transaction. In this scenario, basically buffered data is sent to the mirrored server and principal waits for a signal from the mirrored instance. Once the signal is received from the mirror, only then the transaction is committed on the principal. As you can imagine the performance of your database depends on several factors such as network bandwidth, disk write speed etc.

 In asynchronous mode, the principal server doesn't wait for the mirrored server before committing the transaction. It sends the buffered data to the mirrored instance, while simultaneously committing the transaction.

Mirroring supports only one mirrored instance. This is yet another difference between Mirroring and Transaction Log Shipping, which supports multiple instances of the secondary server.

Mirroring supports automatic failover. In case of a failure on primary server, SQL Server automatically brings the mirrored instance online, usually within a few seconds.

While Log Shipping has been available since SQL 2000, mirroring was introduced only in SQL Server 2005.

Another key advantage of mirroring is built-in support in .NET Framework which doesn't require special routing or switching code to handle the fail-over switch from one server to another. It requires ADO.NET 2.0 and higher.

SQL Server 2008 introduced another key feature in mirroring, i.e. ability to automatically repair corrupt pages. For example, if a page is corrupted on the principal server, SQL Server will replace it with a page from the mirror and vice versa.

We can go in more technical details in further posts if needed, but in a nutshell this is what mirroring is all about.

As always, your comments are welcome!

Saturday, February 11, 2012

Database Recovery Models

All About SQL! In my previous post, we talked about transaction log shipping and that for transaction log shipping, your database must be set to Full or Bulk-Logged recovery model. Recovery models specify the way SQL Server manages log files. There are three types of recovery models..

  • Simple Recovery Model
  • Full Recovery Model
  • Bulk-Logged Recovery Model
Simple Recovery Model

As the name implies, simple recovery model is just that, simple. SQL Server maintains transaction log only until the database reaches a transaction checkpoint. Each time the database reaches a transaction checkpoint, SQL Server truncates the transaction log. Hence, point in time restores are not allowed if you have to restore the database. You can only restore such a database by using full or differential backups. Which means, you will lose any changes made between the latest full / differential backup and the time you restore the database.

Full Recovery Model

In this mode, SQL Server saves the transaction logs until you back them up or until full database backup occurs. As you can imagine, this model allows for point in time database recovery. You can restore the database first by applying the latest full  / differential backup and then applying any transactions up until the time database was operational. Of course you will have to have a transaction log backup process in place to backup transactions periodically.

Bulk Logged Recovery Model


This model is similar to Full Recovery Model except in the way it handles bulk data operations (For example Bulk Insert). This model uses a process called minimal logging for bulk operations which saves significant process time. This model doesn't allow for point in time database restore operation. It is recommended to use this model only for a short period of time when you are performing bulk operations. Best way to implement this is to switch to bulked logged recovery model before performing bulk operations and then switch back to full recovery model.

Transaction Log File Management

Transaction Log Files, if not handled correctly can become a nightmare when they grow to be too large and suddenly there is no hard drive space and your database is down. I have seen situations where transaction log file has taken so much space on the hard drive that it must be truncated, but it cannot be backed up because there is no space on the hard-drive and no network location where the file can be backed up. Then you are forced to truncate the log file without making a backup, leaving you vulnerable until full database back is performed.

It is absolutely essential that you have a transaction log backup process in place if your database recovery model is Full or Bulk-Logged. If you are not backing up transaction logs in full or bulk-logged recovery model, your log is not being truncated, period. Even when you backup transaction log, remember the log file is only truncated after a restore checkpoint has reached.

If you are not backing up transaction logs and don't need point in time restore in case of a disaster, you should switch your recovery model to Simple. Simple recovery model will automatically truncate the transaction log after the checkpoint.

Also, log file truncation does not shrink the file, it simply clears the file to be reused for future transactions. If your file has grown too large and you want to recover some hard drive space, you must shrink the file manually. You can use the following SQL script


DBCC SHRINKFILE(<logName>, 1)
GO
BACKUP LOG <DBName> TO DISK = N'C:\Backup\TestDb.bak'
GO
DBCC SHRINKFILE(<logName>, 1)

GO


SELECT Name FROM sys.database_files WHERE type_desc='LOG'


NOTE: In SQL 2005 you can also use the following script. Although I will not recommend this because it doesn't backup the log and breaks the transaction log chain which means you can not restore to a point in time because not only you have lost transactions, any future backups of the transactions will be useless until the next full or differential backup.

BACKUP LOG <DBName> WITH TRUNCATE_ONLY

You can also set your database to auto-shrink. But I will recommend against this because it could create performance problems.

Under certain situations transaction logs may not truncate even when you are backing up transaction logs. We will discuss some of those situations in our future blog but usually this should not be a concern for most environments.

That's all there is to it. As always, your comments are welcome!

Friday, February 10, 2012

Transaction Log Shipping

All About SQL!
Recently I had to setup log shipping for our production environment. Although, this wasn’t entirely new for me, nonetheless this was a good refresher. I decided to go ahead and note the steps involved in configuring transaction log shipping and various available options.

Points to Ponder
Some of you may disagree but I feel these are important considerations, especially in your production environment where high availability is important.
In our environment, we have about 100 databases. There are two clustered environments and we host half databases on one cluster and the other half on the other cluster. Each cluster has 2 SQL Servers.
Our initial thought was to go ahead and setup log shipping in a way that the secondary server for the databases on one cluster will be the other cluster and vice-versa. It just didn’t feel right. Take for example, if your cluster 1 goes down then all load will end up on cluster 2 which may not be feasible or desirable.
So, we decided to setup a third environment just to act as a secondary cluster for both clusters. It may be overkill for some, but it certainly was a worthwhile investment for us.

What is Log Shipping?
Transaction log shipping is basically a way to automatically backup transaction logs, copy them to the secondary server and restore them in the database on the secondary server.
You can actually set it up yourself using SQL server agent and using the following steps…
         Create a SQL Agent Job to backup transaction logs for a specific database.
         As a second step in the same SQL Agent Job create an execute Xp_CmdShell to copy the transaction log files to the secondary server.
         Create a SQL Agent Job on the secondary server to restore the log files to a copy of the same database on the secondary server.  The database on the secondary server obviously has to be in “Restoring” mode.
But SQL Server takes care of all that for you. Using SQL Management Studio, you can simply walk through the steps to setup log shipping in just a few minutes.

Using SQL Management Studio to Setup Log Shipping
Below is a step by step process on how to setup log shipping for a specific database.
Note: Your database recovery model must be full or bulk logged in order to configure transaction log shipping.

         Right click on Database > Properties



         Select Transaction Log Shipping

o  Check enable this as a primary database in a log shipping configuration and then click on Backup Settings.






         Define the network path where you want to backup the transaction log. If the backup path is on the same server, also type the physical path, otherwise leave it blank. You can change settings such as delete files older than 72 hours and alert settings, but generally defaults are good. If you are short on space, you may want to lower the 72 hour threshold.




         You can change the schedule by clicking on the Schedule button. 15 minute default is generally good.

         Click OK and then from the main window click on “Add” button to configure secondary server. Click on connect to connect to the secondary server.

   The three tabs allow you to setup secondary database information and restore location. 
       There are three ways to setup a database on the secondary server.

  •             Manually copy the primary database and restore on the secondary server. Remember to keep in Restoring mode.
  •       Chose from one of two options on the first tab (see above) to restore the database to the secondary server.


  Second tab allows you to configure the restore location


       You can change the restore schedule by clicking on the Schedule button but default work just as well.
       On the last tab, you can set database recovery mode and alert settings.

      Click on OK and you have configured the transaction log shipping.
      You can optionally setup the Monitor Server, which allows you to monitor your transaction log jobs remotely. This is only useful when you have a separate monitor server, because history and status of the backup log operation is stored on the primary server and that of the restore operation is stored at the secondary server. Remote monitoring server allows the history and status of both copy and restore operations on the remote server. You can also configure alerts to fire if alert service fails.




      That’s all there is to it. One issue that I have seen is the permissions issue. Make sure the user under which SQL agent runs has the read/write permissions on both primary and secondary servers.

Wednesday, February 8, 2012

Creating Indexes using T-SQL

All About SQL!
In my previous post, we discussed creating indexes using SQL Management Studio. You can achieve the same results using SQL Scripts.

Using T-SQL

1.  Create a Non-Clustered, Non-Unique Index

/****** Object:  Index [idx_Test]    Script Date: 02/05/2012 12:56:50 ******/
CREATE NONCLUSTERED INDEX [idx_Test] ON [dbo].[DTA_input]
(
[SessionID] ASC,
[GlobalSessionID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS  = ON,
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

2. Create a Unique Non-Clustered Index

/****** Object:  Index [idx_Test]    Script Date: 02/05/2012 12:59:41 ******/
CREATE UNIQUE NONCLUSTERED INDEX [idx_Test] ON [dbo].[DTA_input]
(
[SessionID] ASC,
[GlobalSessionID] ASC
)WITH (PAD_INDEX  = OFF,
STATISTICS_NORECOMPUTE  = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS  = ON,
ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

HINTS / CLAUSE DEFINITIONS

PAD_INDEX
This option specifies whether you want to leave some space in each node (also referred to as page) for future inserts/updates. This is only useful when you specify a fill factor, because it uses the % specified in fill factor. Default is OFF.

STATISTICS_NORECOMPUTE
This flag determines whether index statistics are automatically recomputed. If you set it to ON, your SQL Query Optmizer may not be able to pick the optimal execution plan for any queries using this table.

SORT_IN_TEMPDB
Specifies that during index build/rebuild, intermediate sort results will be stored in tempDB. If your tempDB is on a different disk(s) than your production DB, it may reduce the time needed to create an index.

IGNORE_DUP_KEY
As I discussed in one of my previous post, when this option is ON and an attempt is made to insert a duplicate key, server issues a warning and ignores the duplicate row. If this option is OFF, server issues an error message and rolls back the entire INSERT. This clause can only be turned on if you have specified UNIQUE clause in your index.

DROP_EXISTING
This clause signals the SQL Server to drop and rebuild the pre-existing index with the same name. When you drop a clustered index, all non-clustered indexes must be rebuilt because they contain pointers to clustered index keys. This clause is extremely useful when dropping a clustered index on the table that also has non-clustered indexes. The non-clustered indexes are rebuilt only once and only if the keys are different.

ONLINE
This clause needs some explanation. When this clause is ON, it means database can be online i.e. being used for other processes while Index is being built or rebuilt. In other words, Index operations do not need exclusive lock. Default is OFF meaning indexing operation requires exclusive lock on the table. This was a nice enhancement in SQL 2005. Prior versions required exclusive lock for index operations. Some of columns such as VARCHAR(MAX) cannot be indexed while online.

ALLOW_ROW_LOCKS
This clause specifies whether data row is locked when performing operations on the indexed keys. When performing OLTP operations, it is a good practice to leave turn this ON.

ALLOW_PAGE_LOCKS
This clause determines whether the entire data page will be locked during index operations.

If both clauses are OFF, SQL Engine will not lock data page or data rows instead entire table will be locked during the operation.

Generally, it is a good idea to leave the defaults alone unless you have a very good reason to change them.

Thank you and as always, your comments are welcome!