Selecting a data access strategy for large numbers of concurrent users

Application performance can suffer from multiple users reading and writing to the same data in the database. When you don’t take this into account into your design, the performance of your application will be less than optimal. Here’s an overview of the available transaction isolation levels:

Transaction isolation levels:

  • Level 0: read uncommitted
  • Level 1: read committed
  • Level 2: repeatable read
  • Level 3: serializable

Level 0. read uncommitted

Results in dirty reads. Data modification will still use locking. During a write action or rollback the read will perhapse result in reading halfway modified records. Schema changes will still lock reads. Dirty reads may be good for sum or count aggregations where the results need to be an approximation.

Level 1. read committed (default isolation level)

Long reads may result in reading inconsistencies, like: phantoms or non repeatable reads. To understand this, imagine that SQL Server is like a finger flipping through the telephone book, looking for the record. The way this works is that the record is locked as soon as it’s found and released after it has been found. Now suppose the record is first found on the first page, the finger flips further through the book, the record is changed by another process and therefore no longer will be found on page 1 but on the last page. The finger continues and finds the record again, but now modified. This is a non repeatable read. Within the same transaction, reads are not repeatable. A phantom is a record that is inserted during the transaction by another transaction. Suppose you read the same result set twice in a transaction, the second result set might be larger because of this phenomena. Locked rows are not accessible for other users for writing. They are readable by other users.

Level 2. repeatable read

SQL server will lock a row and read the next row while leaving the lock on. This will leave a trace of locks until the transaction is complete. Depending on how large the reads are, this may imply a lot of blocking. Phantoms are still an issue, as a new record might be added while doing a read. Locked rows are not accessible for other users for writing.

Level 3. serializable

To prevent phantoms, SQL server locks a ‘set’. A set is usually in an index (1 or more nodes in the index tree), but if the indexes are not created correctly, the entire table is locked.

The isolation levels go from almost no isolation at all, to completely isolated. The availability goes from most available to least available. Level 1 is a good compromise, but will still give you phantoms and non repeatable reads. It also gives you blocking issues when more than one user needs the same record.

Since MSSQL 2005 there’s the option of snapshot isolation. This is a feature that needs to be turned on at the database level, so: all or nothing.

There are two options available:

  1. READ_COMMITTED_SNAPSHOT (statement level isolation), also known as Read Committed Isolation using Row Versioning
  2. ALLOW_SNAPSHOT_ISOLATION (transaction level isolation)

Both options deliver Level 3 isolation, without the downside: blocking. The difference is that READ_COMMITTED_SNAPSHOT gives you statement level isolation and ALLOW_SNAPSHOT_ISOLATION gives you transaction level isolation. This means that you will get the state of the records either from the point in time the statement started, or from the point in time the transaction started.

Both options also suffer from a slight performance degradation, because of the row versioning introduced. MSSQL server will need to turn this on to make the snapshot isolation work.

Turning them on is as simple as:

ALTER DATABASE <databasename>
SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK AFTER 5

Attention: READ_COMMITTED_SNAPSHOT needs all connections to be closed. (WITH ROLLBACK AFTER 5 == force rollback for all running transactions after 5 seconds and close all open connections, so that the command can be executed)

ALTER DATABASE <databasename>
SET ALLOW_SNAPSHOT_ISOLATION ON

Disadvantage of snapshot isolation:

  • row versioning uses tempdb, which takes diskspace and IO
  • you need to have an optimized tempdb

When you also use ALLOW_SNAPSHOT_ISOLATION, you will need to issue a SET TRANSACTION ISOLATION LEVEL SNAPSHOT command, to use the transaction level isolation.

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
BEGIN TRAN
sql
COMMIT TRAN

Impact of READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION:

  • possibility of version conflicts when two writes occur on the same row from different transactions
  • 14 byte overhead per row (only added when the version is needed), may result in index fragmentation
  • use READCOMMITTTEDLOCK hint for queues

You can monitor the version store using DMV’s.

In the overal design of your application consider the read and write strategy. When you have a highly volatile table, that needs to process lots of writes, don’t use aggregations/analysis on that same table. This will make the performance horrible. Separate OLTP and OLAP environments. When you need analysis functionality, do it in your datawarehouse: separate readers and writers as much as possible.

Conclusion

When designing an application for large numbers of users, you need to think carefully about your data access strategy. Snapshot isolation is a good alternative for the default transaction isolation level, but comes at a cost. This cost will have to be evaluated before turning this feature on. Nonetheless, the advantages are great: no phantoms, repeatable reads and no blocking.

Reference

Using Snapshot Isolation

SQL Server 2008 High volume maintenance plan logs

A nice way to implement simple batch processes in SQL Server is by creating a stored procedure that gets run periodically by a maintenance plan. I’ve used this frequently and it gets the job done. However, if the maintenance plan is executed frequently, the logs quickly fill up. The default behaviour will be to always log, whether the plan succeeded or failed. The logs of all maintenance plans are kept in the ‘Log’ folder of your SQL Server instance installation path. Symptoms of a overloaded log folder are that looking at the history of the maintenance plan takes forever. I’ve been searching for a way to get this toned down.

Ideal situation for me would be:

1. only log errors (with as much detail as possible)
2. log when was the plan last run and if it was succesfull

Unfortunately it doesn’t seem to be possible to only log errors. It’s all or nothing.

Let’s have a look at the possible options.

Option 1: Create logs in different folders to minimize number of log files per folder

The log history is stored in one folder for every maintenance plan. If you’re editing a maintenance plan click the Reporting and Logging button.

After clicking the Reporting and Logging button you will see:

In the dialog there are options to:

– store the logs to a new text file for every execution
– append the logs of every execution to the same file
– send the report to an e-mail recipient (if Database Mail is configured properly)
– Add extended information to the logs
– Log to a remote server

There is no option to only log when an error has occurred.

Option 2: Disable logging

Next my search turned to the properties of the maintenance plan itself.

There is a property called ‘LoggingMode’. To my dismay, when I hit F1 on this property I actually get forwarded to Visual Studio documentation about property pages, so no information about the LoggingMode property! After this grumble, I remembered that a maintenance plan is actually nothing more than a SSIS package that gets executed.
So this property maps to DTSLoggingMode.
This property has three possible values, which basically come down to on or off. So this roads is also a dead end. I don’t want to disable all logging, I just want to disable logging succesfull executions.

Option 3: Keep logs from growing infinitely

As I’ve found no way to tone down down the logging, I’ve decided to search for ways to keep the logs from growing infinitely. There’s a maintenance task called ‘History Cleanup Task’. With this task the history of the maintenance plans can be deleted. This is also all or nothing. I can’t delete only the logs of a certain plan that gets executed very often.

Conclusion

Logging maintenance plans works, but when the volume of the logs get high, fine tuning is almost impossible. I’ve decided to spread the logs of every maintenance plan with high volume logs to a different folder. That should speed up the search for the log history. Also I’ve created a maintenance plan that removes logs older than 3 days. As the stored procedures do their own logging in the database itself, this shouldn’t be a big issue. As the application that uses this database is used very intensely I usually know within 3 days if a serious error has occurred. Although I’d rather keep this window somewhat wider. I could also create an application that copies the logs every three days, so more history is kept in a safe place, should I need it. Finally, I’ve found an application called ‘DTLoggedExec‘ which might solve my problems. I’ll have to dive into this another time, to see if it would fit this issue also.