Demystifying ACID: Ensuring Data Integrity in Database Management Systems

What is ACID?

After delving into Database Management Systems (DBMS) like Postgres, you might have lingering questions about ACID. What exactly is ACID, and what purpose does it serve?

ACID stands for Atomicity, Consistency, Isolation, and Durability. It is a vital compliance requirement for DBMS, enforcing four essential characteristics. The main objective of ACID is to ensure the integrity and reliability of transactions within a database, even in the face of errors, power failures, or other unexpected incidents.

To determine if your DBMS already incorporates ACID, you can conduct a verification process. Notably, both MySQL (specifically InnoDB's engine) and Postgres (since 2001) are recognized as ACID-compliant. In this article, we will delve into the inner workings of ACID within the context of Postgres.

Setup Project

To practice ACID, it is necessary to set up the project using PostgreSQL. Personally, I prefer using psql for executing SQL commands via the terminal.

  • Create the acid_example database and connect to it:
postgres=# CREATE DATABASE acid_example;
CREATE DATABASE
postgres=# \c acid_example
You are now connected to database "acid_example" as user "xxx".
acid_example=#
  • Create the accounts table:
CREATE TABLE accounts (
    id INT GENERATED BY DEFAULT AS IDENTITY,
    name TEXT NOT NULL,
    balance BIGINT NOT NULL,
    PRIMARY KEY(id)
);

SELECT * FROM accounts;
 id | name | balance
----+------+---------
(0 rows)

Atomicity

The first characteristic of ACID is Atomicity. Atomicity treats every statement within a transaction (create, read, update, or delete) as an indivisible unit. If the transaction is successful, all the statements should be executed, and if not, none of the statements should take effect. This property ensures that data remains intact and prevents any loss or damage during the transaction process.

In Postgres, Atomicity is achieved using database transactions. By default, PostgreSQL executes transactions in an unchained mode, also referred to as "auto-commit" in other database systems. However, if we want to properly implement Atomicity, we need to manually use database transactions, where each transaction can consist of one or more statements.

Successful Transaction

For example, let's store two pieces of data in the database. When we commit the transaction, both pieces of data will be successfully stored in the database. To initiate a transaction, you can use BEGIN TRANSACTION, BEGIN WORK, or simply BEGIN. To make the changes visible to other sessions, you need to commit the transaction using COMMIT WORK, COMMIT TRANSACTION, or COMMIT.

Session 1:

BEGIN TRANSACTION;

INSERT INTO accounts(name, balance)
VALUES('Adi', 1000000);

To verify that the data has not been stored yet, we can use Session 2 (or another terminal) to connect to the database.

Session 2:

postgres=# \c acid_example;
You are now connected to database "acid_example" as user "xxx".
acid_example=# SELECT * FROM accounts;
 id | name | balance
----+------+---------
(0 rows)

As we can see, the data is currently empty. This indicates that even if we store or insert data, Postgres does not consider it a success or failure because we have not yet committed the data.

Session 1:

INSERT INTO accounts(name, balance)
VALUES('Angli', 2000000);

COMMIT;

Session 2:

SELECT * FROM accounts;
 id | name  | balance
----+-------+---------
  1 | Adi   | 1000000
  2 | Angli | 2000000
(2 rows)

Rollback Transaction

When we decide to undo the transactions, we can use the rollback feature. Rollback cancels all the statements within the current transaction.

Session 1:

BEGIN TRANSACTION;

DELETE FROM accounts WHERE id = 1;

DELETE FROM accounts WHERE id = 2;

SELECT * FROM accounts;
 id | name | balance
----+------+---------
(0 rows)

Remember, this is just temporary data because we have not yet committed the transaction. You can observe that the data from Session 2 is still present.

Session 2:

SELECT * FROM accounts;
 id | name  | balance
----+-------+---------
  1 | Adi   | 1000000
  2 | Angli | 2000000
(2 rows)

We return to Session 1 and perform a rollback on the transactions, but we can observe that the data is not deleted; it remains intact.

Session 1:

ROLLBACK;

SELECT * FROM accounts;
 id | name  | balance
----+-------+---------
  1 | Adi   | 1000000
  2 | Angli | 2000000
(2 rows)

Consistency

Consistency is the second characteristic of ACID. It ensures that a transaction can only change data from one consistent state to another consistent state, guaranteeing data integrity. Every data written to the database must adhere to established constraints. By enforcing consistency, we prevent data from becoming inconsistent and restrict arbitrary changes. For instance, setting a data field as not null requires ensuring that the data is always filled.

Invalid Update

BEGIN TRANSACTION;

UPDATE accounts SET name = null WHERE id = 2;
ERROR:  null value in column "name" of relation "accounts" violates not-null constraint
DETAIL:  Failing row contains (2, null, 2000000).

COMMIT;

SELECT * FROM accounts;
 id | name  | balance
----+-------+---------
  1 | Adi   | 1000000
  2 | Angli | 2000000
(2 rows)

As we can see, even inside the transaction, the query generates an error when attempting to update data that does not adhere to the constraints. Therefore, if we commit the data, it won't have any effect, and the data will remain unchanged. This principle also applies to relational foreign keys. If we store a foreign key that does not match the actual foreign key defined in the related table, it will result in an error.

Remember, we need to add consistency at the database level, not just at the application layer. This effectively prevents users from making arbitrary changes to the data.

Isolation

The third characteristic is Isolation. If we build an application where transactions are executed by 10, 100, or 1000 people simultaneously, Isolation ensures that these transactions are executed sequentially, thereby preventing race conditions that can result in data conflicts. The primary purpose of Isolation is to control concurrency, allowing processes to run independently. For example, if we have two transactions, with the first one performing an update, the second transaction cannot immediately access the data modified by the first one until the first transaction is executed. This separation between transactions helps maintain data consistency and prevents conflicts.

Session 1:

BEGIN TRANSACTION;

SELECT * FROM accounts WHERE id IN (1, 2) FOR UPDATE;
 id | name  | balance
----+-------+---------
  1 | Adi   | 1000000
  2 | Angli | 2000000
(2 rows)

The usage of FOR UPDATE in a SELECT statement locks the retrieved rows, simulating an update operation. This prevents other transactions from modifying or deleting those rows until the current transaction is completed.

Session 2:

BEGIN TRANSACTION;

SELECT * FROM accounts WHERE id IN (1, 2) FOR UPDATE;

As you can see, the transaction in Session 2 is still in progress because the transactions in Session 1 have not yet been completed or committed.

Session 1:

UPDATE accounts SET balance = balance + 500000
WHERE id = 1;

UPDATE accounts SET balance = balance - 500000
WHERE id = 2;

COMMIT;

Session 2:

SELECT * FROM accounts WHERE id IN (1, 2) FOR UPDATE;
 id | name  | balance
----+-------+---------
  1 | Adi   | 1500000
  2 | Angli | 1500000
(2 rows)

UPDATE accounts SET balance = balance + 500000
WHERE id = 1;

COMMIT;

SELECT * FROM accounts;
 id | name  | balance
----+-------+---------
  2 | Angli | 1500000
  1 | Adi   | 2000000
(2 rows)

After Session 1 is committed, Session 2 automatically shows the updated data, indicating that the data has been changed.

Why do we need isolation? Imagine if multiple processes are working simultaneously, the balance could become chaotic and pose a significant risk to our data. Even the database may prioritize the data that arrives first, even though the data is being processed concurrently.

Durability

The last characteristic of ACID is Durability. Durability guarantees that once a transaction is saved, it will remain saved even in the event of system failure, such as a blackout or crash.

Session 1:

BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 1000000
WHERE id = 1;

SELECT * FROM accounts;
 id | name  | balance
----+-------+---------
  2 | Angli | 1500000
  1 | Adi   | 1000000
(2 rows)

# stop postgres

UPDATE accounts SET balance = balance - 500000
WHERE id = 2;
FATAL:  terminating connection due to unexpected postmaster exit
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

# start postgres
SELECT * FROM accounts;
 id | name  | balance
----+-------+---------
  2 | Angli | 1500000
  1 | Adi   | 2000000
(2 rows)

As we can see, the data remains consistent even when PostgreSQL is experiencing a shutdown. So, when the data has not been committed and our database encounters issues, the transaction is not considered successful.

"So when you are using a database, make sure that the database complies with the ACID principles. However, there are some databases that do not follow these principles due to their different requirements, such as Redis."

"That's all from me, and thank you for reading" (=`ェ´=).