Selecting a data access strategy for large numbers of concurrent users

Application performance can suffer from multiple users reading and writing to the same data in the database. When you don’t take this into account into your design, the performance of your application will be less than optimal. Here’s an overview of the available transaction isolation levels:

Transaction isolation levels:

  • Level 0: read uncommitted
  • Level 1: read committed
  • Level 2: repeatable read
  • Level 3: serializable

Level 0. read uncommitted

Results in dirty reads. Data modification will still use locking. During a write action or rollback the read will perhapse result in reading halfway modified records. Schema changes will still lock reads. Dirty reads may be good for sum or count aggregations where the results need to be an approximation.

Level 1. read committed (default isolation level)

Long reads may result in reading inconsistencies, like: phantoms or non repeatable reads. To understand this, imagine that SQL Server is like a finger flipping through the telephone book, looking for the record. The way this works is that the record is locked as soon as it’s found and released after it has been found. Now suppose the record is first found on the first page, the finger flips further through the book, the record is changed by another process and therefore no longer will be found on page 1 but on the last page. The finger continues and finds the record again, but now modified. This is a non repeatable read. Within the same transaction, reads are not repeatable. A phantom is a record that is inserted during the transaction by another transaction. Suppose you read the same result set twice in a transaction, the second result set might be larger because of this phenomena. Locked rows are not accessible for other users for writing. They are readable by other users.

Level 2. repeatable read

SQL server will lock a row and read the next row while leaving the lock on. This will leave a trace of locks until the transaction is complete. Depending on how large the reads are, this may imply a lot of blocking. Phantoms are still an issue, as a new record might be added while doing a read. Locked rows are not accessible for other users for writing.

Level 3. serializable

To prevent phantoms, SQL server locks a ‘set’. A set is usually in an index (1 or more nodes in the index tree), but if the indexes are not created correctly, the entire table is locked.

The isolation levels go from almost no isolation at all, to completely isolated. The availability goes from most available to least available. Level 1 is a good compromise, but will still give you phantoms and non repeatable reads. It also gives you blocking issues when more than one user needs the same record.

Since MSSQL 2005 there’s the option of snapshot isolation. This is a feature that needs to be turned on at the database level, so: all or nothing.

There are two options available:

  1. READ_COMMITTED_SNAPSHOT (statement level isolation), also known as Read Committed Isolation using Row Versioning
  2. ALLOW_SNAPSHOT_ISOLATION (transaction level isolation)

Both options deliver Level 3 isolation, without the downside: blocking. The difference is that READ_COMMITTED_SNAPSHOT gives you statement level isolation and ALLOW_SNAPSHOT_ISOLATION gives you transaction level isolation. This means that you will get the state of the records either from the point in time the statement started, or from the point in time the transaction started.

Both options also suffer from a slight performance degradation, because of the row versioning introduced. MSSQL server will need to turn this on to make the snapshot isolation work.

Turning them on is as simple as:

ALTER DATABASE <databasename>
SET READ_COMMITTED_SNAPSHOT ON
WITH ROLLBACK AFTER 5

Attention: READ_COMMITTED_SNAPSHOT needs all connections to be closed. (WITH ROLLBACK AFTER 5 == force rollback for all running transactions after 5 seconds and close all open connections, so that the command can be executed)

ALTER DATABASE <databasename>
SET ALLOW_SNAPSHOT_ISOLATION ON

Disadvantage of snapshot isolation:

  • row versioning uses tempdb, which takes diskspace and IO
  • you need to have an optimized tempdb

When you also use ALLOW_SNAPSHOT_ISOLATION, you will need to issue a SET TRANSACTION ISOLATION LEVEL SNAPSHOT command, to use the transaction level isolation.

SET TRANSACTION ISOLATION LEVEL SNAPSHOT
BEGIN TRAN
sql
COMMIT TRAN

Impact of READ_COMMITTED_SNAPSHOT and ALLOW_SNAPSHOT_ISOLATION:

  • possibility of version conflicts when two writes occur on the same row from different transactions
  • 14 byte overhead per row (only added when the version is needed), may result in index fragmentation
  • use READCOMMITTTEDLOCK hint for queues

You can monitor the version store using DMV’s.

In the overal design of your application consider the read and write strategy. When you have a highly volatile table, that needs to process lots of writes, don’t use aggregations/analysis on that same table. This will make the performance horrible. Separate OLTP and OLAP environments. When you need analysis functionality, do it in your datawarehouse: separate readers and writers as much as possible.

Conclusion

When designing an application for large numbers of users, you need to think carefully about your data access strategy. Snapshot isolation is a good alternative for the default transaction isolation level, but comes at a cost. This cost will have to be evaluated before turning this feature on. Nonetheless, the advantages are great: no phantoms, repeatable reads and no blocking.

Reference

Using Snapshot Isolation

Leave a Reply

Your email address will not be published. Required fields are marked *