Ensuring Data Integrity: The Role of Database Transactions

1. What is database translation

Database transactions are activities transferring, changing data from one consistent state to another consistent state for example in everyday life we make different kind of business transactions from buying products, changing or cancelling orders, buying tickets, etc. And all these activities involve the movement/transaction of data in the database.

For example, consider a simple transaction of moving an amount of 5000 from one bank account to another. This transaction includes several steps: decrease the account balance by 5000, and then increase the other account balance by 50003. Each of these steps is a part of the transaction, and if any step fails, the entire transaction fails

2. Why it is crucial in reliable software systems?

When it comes to business, whether it is personal or financial data, it is crucial to ensure the reliability and stability of data management.

In our daily lives, we may frequently encounter errors related to database transactions like customer orders, payments, or product shipment being lost in an online store, or fail to register a seat reservation, double booking a train/airplane passengers.

To ensure reliability of data operations we could follow ACID properties:

Atomicity: "All or nothing". Either the DBMS makes all the changes to the database state permanent or no change is made permanent.
Consistency: Database must be consistent before and after a transaction. The transaction should not be allowed to commit if any rule violation occurs
Isolation: If multiple transactions are happening at the same time (concurrently), they should be isolated from each others, and only happens one after another to keep the consistency and avoid conflicts
Durability: Once a transaction is done and successful, it should be permanent regardless of any failure like program crash

3. What is SQL transaction and SQL statements need to use to control SQL transaction?

SQL Transaction: An SQL transaction consists of a sequence of SQL statements that are executed as a single atomic unit. This means the transaction is executed in an “all-or-nothing” manner. When an application triggers a transaction, it may involve various operations such as setting the access mode, performing constraint checks, establishing transaction isolation levels, or defining the start and end times of the transaction.

Workflow for SQL transactions:

Begin transaction: starts a new transaction (explicit transaction start in SQL Server)

Commit: terminates the transaction and make all changes permanent

Rollback: terminate the transaction and roll back all the changes ( previous state will be stored in database)

BEGIN TRANSACTION
UPDATE account SET balance = balance + 100 WHERE accountNo = 2
UPDATE account SET balance = balance - 100 WHERE accountNo = 1
  if (everything succeeded) then
   COMMIT -- commit to the changes and make them permanent
  else
   ROLLBACK -- undo all changes from the database
  end-if

4. What is concurrency control?

A mechanism to make sure that a transaction is isolated from the effect of other transactions, that concurrently happens at the same time. The level of isolation should be set by programmer like below:

Locking

When a transaction changes a row in database with INSERT, UPDATE, DELETE, the DBMS lock the row exclusively so that other concurrent transactions are not allowed to access (read or modify) the row as long as this transaction is running.

Locking resolves concurrency conflicts in two ways:

Blocking a Transaction: In this scenario, a transaction is required to wait for the necessary resource. By default, the maximum duration for this waiting period is set to an indefinite length of time, often referred to as “forever”.
Forcing a transaction to abort: When two transactions start to permanently block each other, the DBMS forces either one transaction to abort (rollback). This allows the other transaction to continue normally.

Deadlock

This occurs when two or more transactions permanently block each other by each transaction having a lock on a resource which the other transaction is trying to lock exclusively.

For example, transaction A has locked a row in the Students table and needs to update some rows in the Grades table. At the same time, Transaction B has locked those very rows in the Grades table that Transaction A needs to update, but Transaction B needs to update the rows in the Student table that are locked by Transaction A. Now, Transaction A is waiting for Transaction B to release its lock, and Transaction B is waiting for Transaction A to release its lock. They’re both stuck waiting for each other, and that’s a deadlock actions permanently block each other by each transaction having a lock on a resource which the other transaction is trying to lock exclusively.

When the DBMS detects a deadlock it chooses one of the transactions as a victim and terminates it (rollback) with an error message. This allows the other transaction to continue normally.

Locks protect your data but decrease concurrency. Even without a deadlock a transaction can make others wait for a long time and it can seriously affect the performance of the application.

Row-level locking in SQL Server

Depending on the lock type, when one user has a lock on a row, the lock prevents other users from modifying or even reading that row. The basic lock types are write lock and read lock. In SQL Server, to avoid a certain type of deadlock, "intent to update lock" is also available

Write lock : This allows user to read and modify the database and other users are not allow to write or even read until the transaction ends

Read lock: this allows user can select a row if another transaction doesn't have the Write Lock on. There are two type of Read Lock:

READ COMMITTED: Lock is released immediately after user has finished reading
REPEATABLE READ OR SERIALIZABLE: lock is not released before the transaction ends

"Intent to update" lock (UPDATE LOCK) In SQL Server: this prevents a certain form of deadlock that occurs when two transactions are first reading the same row and then updating the row. The DBMS allows only one transaction at time to read the same row with the UPDLOCK. If another transaction tries to read the same row with the UPDLOCK, it has to wait until the first transaction ends.

5. Best practices

Avoid User Interaction During Transactions: To keep things running smoothly, it’s best not to have users do anything inside a transaction. If they do, the database transaction might be blocking other concurrent transactions for all too long.
Understand Your Database System: Different database systems have their own ways of handling concurrency control and managing behavior. It is important for reliability and performance that a developer must know how their chosen database system works in these areas.
Always Ensure Accurate Data: To prevent mistakes in your database, you should always explicitly set the transaction isolation level for a database connection.

In SQL Server, Oracle, and PostgreSQL the default transaction isolation level is READ COMMITTED. This transaction isolation level provides maximum concurrency, which allows the most users to work at the same time, but it is usually not enough for such transactions that both read and update the database in a multi-user environment.

Hung Hoang

Search This Blog