Reiser4 transaction design

Joshua MacDonald
Hans Reiser


A “transcrash” is a set of operations, of which all or none must
Survive a crash. A “particle” is the minimum amount of data whose
Modification will be individually tracked, which can be larger than
The unit of modification itself. An “atom” is the collection of
Particles a transcrash has attempted to modify, plus all particles
That are part of atoms that have “fused” with this atom. Two atoms
Fuse when one transcrash attempts to modify particles that are part of
Another atom.

In traditional Unix semantics, a sequence of write() system calls are
Not expected to be atomic, meaning that an in-progress write could be
Interrupted by a crash and leave part new, part old data behind.
Writes are not even guaranteed to be ordered in the traditional
Semantics, meaning that newer data could survive a crash while

Data does not. File systems with atomic writes are called
“data-journaling”. The straight-forward way to add data-journaling to
A file system is to log the contents of every modified block before
Overwriting its real location. This technique doubles the amount of
Data written to the disk, which is significant when disk transfer rate
Is the limiting performance factor.

Something more clever is possible. Instead of writing every modified
Block twice, we can write the block once to a new location and then
Update the pointer in its parent. However, the parent modification
Must be included in the transaction too. The WAFL (Write Anywhere
File Layout) technique [Hitz94] handles this by propagating file
Modifications all the way to the root node, which is then updated
Atomically. In Reiser4 whether we log and overwrite, or relocate,
Depends on what the block allocation plugin determines is optimal for
The blocks in question based on the current layout.


Most file systems perform write caching, meaning that modified data
Are not immediately written to the disk. Writes are deferred for a
Period of time, which allows the system greater control over disk
Scheduling. A system crash can happen at any time, and some recent
Modifications will be lost. This can be a serious problem if an
Application has made several interdependent modifications, some of
Which are lost while others are not. Such an application is said to
Require atomicity – a guarantee that all or none of a sequence of
Interdependent operations will survive a crash. Without atomicity, a
System crash can leave the file system in an inconsistent state.

Dependent modifications may also arise when an application reads
Modified data and then produces further output. Consider the
Following sequence of events:

1 Process P_a writes file F_a
2 Process P_b reads file F_a
3 Process P_b writes file F_b

At this point, the file F_b might be dependent on F_a. If the
Write-caching strategy allows F_b to be written before a F_a and a
Crash occurs, the file system may again be left in an inconsistent

Our definition of atomicity is based on the notion of a “sphere of
Influence” which encompasses a set of modifications that must commit
Atomically. In our implementation, an atom maintains a sphere of
Influence. We offer transcrashes with two degrees of fusion,
“explicit dependence” and “assumed dependence”. A transcrash with
Explicit dependence is allowed to read the modified data of other
Atoms without causing the atoms to fuse except when it explicitly

Reiser4 transaction design