Database crash just before appending the checkpoint entry to write ahead log

Question

From what I read about WAL, its an append-only file where all the operations to the DB are written to before the operations are actually performed to the data.
There is also a concept of a "checkpoint" which is when the DB actually writes the data to disk from memory, and appends a special checkpoint entry at the end of the WAL.
Now if the DB crashes at any point, it can read the WAL starting from the latest checkpoint entry and redo all the subsequent operations.
But how does the DB ensure that the checkpoint WAL entry and the actual flushing of the data to disk happen in a transactional way?
What if the data is flushed but the DB crashes before the checkpoint entry is made in the WAL?
Conversely, if the WAL is modified first, then what happens if the DB crashes after the checkpoint entry but before the data is actually flushed.

For example, consider the following case:

We have a dummy table Person(name, age, salary).
It has an entry John, 25, 100.
At time T1, a new transaction arrives UPDATE Person SET salary += 100 WHERE name='John'.
Assume that before T1, all the data had been flushed and the checkpoint entry had been appended to WAL.
Now after this transaction, the DB will first append the log with the exact transaction statement UPDATE Person SET salary += 100 WHERE name='John'.
Now the data become John, 25, 200.
Then after some time, lets say the DB decides to flush the data to disk at time T2.
Then at time T3 (just after T2), the DB attempts to write the checkpoint entry to the WAL.
However, before it could finish, there was a power failure between T2 and T3.
Now when the DB restarts and tries to recover, it will notice that there is one transaction after the latest checkpoint and will try to execute that: UPDATE Person SET salary += 100 WHERE name='John'
But since the transaction was already executed before the crash, this time the salary will take the value 300, although it should have been 200.

How does the DB prevent these redundant updates during the recovery?

Charlieface · Accepted Answer · 2024-10-02 11:38:44Z

after this transaction, the DB will first append the log with the exact transaction statement UPDATE Person SET salary += 100 WHERE name='John'.

You seem to have a misunderstanding here.

The WAL doesn't contain UPDATE Person SET salary += 100 WHERE name='John' so it doesn't "run" that again and get the wrong result.

It just contains new versions of whichever rows/pages were modified by that statement. So all it needs to do to checkpoint is write all those pages to the main file. It can always do that again, as it's idempotent, so a second roll-forward on recovery doesn't affect anything.

Oh ok. But then if it's actually writing the values in the file, then why doesn't the db update the actual table itself? Because it's anyways writing to the disk. Is there a performance improvement in writing to the WAL instead? — Anmol Singh Jaggi
– Anmol Singh Jaggi, Commented Oct 3, 2024 at 11:47
Yes it's much faster to use an append-only sequential file than to do loads of random writes to the main file. The WAL is also often stored on faster disks or at least on separate disks from the main file. — Charlieface
– Charlieface, Commented Oct 5, 2024 at 19:35

Stack Exchange Network

Database crash just before appending the checkpoint entry to write ahead log

1 Answer 1

Hot Network Questions

Database crash just before appending the checkpoint entry to write ahead log

1 Answer 1

Related

Hot Network Questions