Introduction
This glossary is an overview of the concepts that you’ll need to think about distributed systems reliability. We’re writing chiefly for industry practitioners – software developers who are learning about distributed systems testing at any stage of their careers.
It’s meant as a handy guide, bringing together information that was previously scattered all over the internet – because the concepts here originate in many different disciplines (and naturally everyone’s too shy to talk to people outside their field, us included). To the best of our knowledge, it’s the first resource to do so. At the same time, we hope that simply putting all these ideas together in one place starts to show how they all fit together.
But! And we cannot stress this enough…
Introduction
This glossary is an overview of the concepts that you’ll need to think about distributed systems reliability. We’re writing chiefly for industry practitioners – software developers who are learning about distributed systems testing at any stage of their careers.
It’s meant as a handy guide, bringing together information that was previously scattered all over the internet – because the concepts here originate in many different disciplines (and naturally everyone’s too shy to talk to people outside their field, us included). To the best of our knowledge, it’s the first resource to do so. At the same time, we hope that simply putting all these ideas together in one place starts to show how they all fit together.
But! And we cannot stress this enough – this is a reference, not required reading!
We’re not saying you need to understand every one of these concepts in order to test a distributed system. Every time you write an integration test, you’re testing a distributed system already! This glossary is here to encourage you to get deeper into a topic that’s increasingly important for every developer committing production code today.
So our goal is to provide intuitive explanations, with pointers to more formal definitions should you need them. We present clear, univalent definitions of terms that are actually messy and contested, like “process”, “repeatable read” or “eventual consistency.” In such cases, we attempt to nod at the diversity of definitions and usages that exists, but our priority is to give a reader something that’s directionally correct and actually useful for a learner.
We’ve also included essential concepts for which there are no formally defined or widely accepted terms in existing literature, like “garbage reads” and “g-nonadjacent.” Maybe the names will stick?
We know it’s incomplete, and if you care about software reliability or distributed systems, we’d love your help!
This glossary is organized as follows:
- Preliminaries: concepts used in defining phenomena and consistency models.
- Consistency models: which define what systems are allowed to do.
- Availability models: which describe different ways systems can be available.
- Phenomena: something a system does which someone, somewhere, thought was a bad idea.
- Faults: fault models to which you might want your system to be resilient.
- Testing techniques: ways to test whether your system actually obeys these models, or experiences these failure modes.
- Further reading: a reading list of key reading lists. Regardless of whether you’re working on your first distributed system or your fiftieth, we hope this will help you make it more reliable.
Yours, Jepsen & Antithesis
Preliminaries
These concepts are used often in defining phenomena and consistency models.
Dependency In consistency models, a dependency is a relationship between two operations (e.g. transactions). For example, a single process could execute one operation before another: a process dependency. One operation could read data that was written by another: a write-read dependency. Definite error A definite error is returned by an operation which definitely did not happen. For instance, a transaction abort error is usually a definite error: the state of the system should be as if the transaction never happened at all. By contrast, an indefinite error may mean that the requested operation did or did not happen, or might happen later. Indefinite error An indefinite error is returned by an operation which may or may not have happened, or might happen later. For instance, a timeout is an indefinite error: the operation may not have been received at all, or it may have taken place without an acknowledgement, or it may be in-flight and execute five minutes later. By contrast, a definite error is known to have not executed. Distinguishing between definite and indefinite errors is a key challenge in distributed systems design and testing. If one writes a unique value x = 3, receives a definite error, and later reads x = 3, that very likely signals an invariant violation. If the write receives an indefinite error, it is legal to read x = 3 now, at some later time, or never at all. Checkers must account for all possible outcomes. Object In consistency models, a database usually contains a set of distinct objects, also called items. Each object often has a unique identifier, and over time, goes through a series of versions. Operation Operations are “the things a system does.” Exactly what constitutes an operation depends on the system or formalism. For instance, the operations performed by a queuing system might be “enqueue” and “dequeue.” A counter system might have “increment” and “read.” A user-registration system might have operations like “register a user,” “log in,” and “change password.” Systems can also have other kinds of operations. For example, a test of a database system might define an operation for “compact old files”, or “add a node to the cluster.” Similarly, faults can be viewed as operations: “kill a process” or “partition the network” are things one can do to a system. Transaction systems can be interpreted a few ways. Much of the transaction consistency literature uses the term “operation” to refer to (e.g.) an individual read or write of some object. These small operations are then grouped into transactions. On the other hand, it can also be convenient to think of a transaction as a single operation; this makes symmetries between single-object consistency models like Linearizability and multi-object models like Serializability clearer. Predicate A predicate identifies a set of objects, rather than identifying a single object by primary key. For example “every box containing a cat” is a predicate; so is “all boxes.” A system may offer different levels of safety for operations which access individual items, vs operations which use a predicate. Process For our purposes, a process (which depending on context, may also be called an agent, actor, node, machine, replica, server, session, etc.) is a logically single-threaded state machine which participates in a distributed system. We can often interpret a single database client as a process. However, once a client requests an operation which results in an indefinite error, it can perform no further operations – that operation might take place at some later time. Were it to perform another operation, it might execute the two concurrently, and no longer be “logically single-threaded”. Process dependency A process dependency relates two subsequent operations performed by the same process. If process P performs operation A, then B, we say that B process-depends on A. It can be convenient to identify a session as a single process and vice-versa (making session dependencies process dependencies), but a single process could execute more than one session, and a session could in some systems be handed off between two processes (thereby making it useful to distinguish the two types of dependency). Real-time dependency A real-time dependency relates two operations which were not concurrent in real-time. If operation A ends before operation B begins, B real-time-depends on A. Real-time dependencies are useful for finding violations of real-time consistency models like Strong Serializable. Read-write dependency A read-write dependency relates two operations A and B, where A reads some version v1 of an object x, and B writes the next version v2 of x. Speaking loosely, we can say A did not see some part of B, and therefore must have executed before it. Read-write dependencies are used to define stronger consistency models like Repeatable Read, Snapshot Isolation, and Serializability. Session dependency Some systems provide a notion of a session: a sequence of operations performed in total order, usually by a single client or process. A session dependency relates two operations performed within a session: if the session performs operation A, then B, we say that B session-depends on A. Session dependencies can be used to define session consistency models like Strong Session Serializable. A single process could execute more than one session, and a session could (in some systems) be handed off between two processes. A test harness might choose to view each session as a single process; in this case, session dependencies and process dependencies are equivalent. Version In consistency models, a version refers to a specific state that some object took on. Formalisms differ, but in broad terms, every write of an object generally produces a new version of that object, and every read generally returns one (or more) versions. Two different versions may have the same value: if one operation sets x = 3, and another also sets x = 3, those are two different versions of x. Version order A version order is an order over versions of objects in a database, which encodes the sequence of versions each object took on during some execution. In Adya’s formalism, the version order is total over the versions of any single object. Write-read dependency A write-read dependency relates two operations A and B, where A writes some version v1 of an object x, and B reads v1. Speaking loosely, B observes A’s write. Write-read dependencies help trace the way information flows between operations, and are used to define consistency models like Read Committed. Write-write dependency A write-write dependency relates two operations A and B, where A writes some version v1 of an object x, and B writes v2: the next version of x in the version order. Speaking informally, B overwrites A’s write. Write-write dependencies help trace the way information flows between operations, and are used to define consistency models like Read Uncommitted.
Consistency models
These properties define what systems are allowed to do. Each consistency model usually proscribes some set of phenomena.
We draw many of our definitions of consistency models and phenomena from papers like Berenson et al’s A Critique of ANSI SQL Isolation Levels, Adya’s 1999 thesis Weak Consistency, and Cerone et al’s A Framework for Transactional Consistency Models with Atomic Visibility. In a broad sense, this line of research is an attempt to fix the incomplete and ambiguous definitions in the ANSI SQL standard. Other models and phenomena are adapted from Fekete et al’s Making Snapshot Isolation Serializable, Cerone & Gotsman’s Analyzing Snapshot Isolation, Bailis et al’s Highly Available Transactions and Scalable Atomic Visibility with RAMP Transactions, and so on.
Consistency model A consistency model is a set of allowed histories. It constrains which operations are permissible, and in which orders. We say that one model is stronger than another if the stronger model’s allowed histories are a strict subset of the weaker model’s allowed histories. For example, Monotonic Read is a consistency model which requires that for each process, reads never “go backwards”. Serializability is a consistency model which requires that the outcomes of operations are equivalent to if those operations had been executed in total order. Causal consistency Causal consistency ensures that when a single process (sometimes called a “node” or “session”) performs a series of operations on an object, other processes observe those operations in the same order. A variant of Causal consistency, Real Time Causal, is one of the strongest totally available consistency models. Eventual consistency Eventual consisistency requires that after updates cease, given sufficient time and network messages, every node reaches the same final value. Eventually consistent systems provide total availability: reads and updates can always be performed locally, without network communication. History Speaking loosely, a history is a set of operations performed by some system, along with information like “what process performed each operation” and “when did each operation begin and end. A history is often concurrent: at any given time, several operations may be executed by different processes. Conceptually, a consistency model is a set of allowable histories. In practice, distributed systems tests often record a history, then try to determine whether it satisfied some properties. To do this, the test may establish a mapping between the recorded history as seen by the test harness, and the abstract history used to define some consistency model. Linearizability Linearizability is a consistency model which ensures that on a single object, operations appear to execute atomically (i.e. not interleaved with other operations), in a total order, and that order is consistent with the real-time order of operations. If operation A ends before operation B begins, A must appear to execute before B. This holds regardless of which process performs an operation. Relaxing Linearizability’s real-time constraint gives Sequential Consistency. If one identifies a transaction as an operation, and the entire database as a single object, Linearizability is equivalent to Strong Serializability. Monotonic Atomic View Monotonic Atomic View is a less-common, totally available transactional consistency model. It strengthens Read Committed by requiring that: once a transaction T1 observes any effect of some other transaction T2, T2 must observe all effects of T1. This is particularly helpful for preventing missing foreign keys, or for ensuring indices and materialized views don’t miss their underlying objects. Monotonic Atomic View is weaker than Read Atomic, Repeatable Read, and Snapshot Isolation. Monotonic Reads Monotonic Reads is a consistency model which guarantees that each process observes a monotonically advancing view of the system. Once a process observes the effects of some write, it must always observe those effects. Monotonic Writes Monotonic Writes is a consistency model which guarantees that if a single process performs two writes, all processes observe those two writes in that order. The second write should “overwrite” the first. Read Atomic Read Atomic is a transactional consistency model which ensures atomic visibility: either all or none of a transaction’s updates are observed by other transactions. Read Atomic prevents Fractured Read, Aborted Read, and Intermediate Read; it is stronger than Monotonic Atomic View in that it prevents Fractured Read. Read Atomic is weaker than Snapshot Isolation. Read Committed Read Committed is a relatively weak transactional consistency model which strengthens Read Uncommitted. It proscribes Write Cycle, Aborted Read, Intermediate Read, and Cyclic Information Flow. However, it allows all forms of Anti-Dependency Cycle: cycles where one transaction fails to observe another’s effects. Read Committed can be totally available. Read Committed strengthens Read Uncommitted by proscribing Aborted Read, Intermediate Read, and Cyclic Information Flow. It is weaker than Snapshot Isolation (which proscribes Non-Adjacent Anti-Dependency Cycle) and Repeatable Read (which proscribes Anti-Dependency Cycle). Read Uncommitted Read Uncommitted is one of the weakest transactional consistency models. Definitions vary, but in Adya’s formalism, Read Uncommitted proscribes G0: transactions should not overwrite each other’s effects. However, Read Uncommitted allows G1, and more. Read Uncommitted can be totally available. Read Uncommitted is weaker than Read Committed, which, in addition to G0, proscribes G1. Read Your Writes Read Your Writes is a consistency model which ensures that if a process performs a write, any subsequent read performed by that same process must observe that write’s effects. Repeatable Read Repeatable Read is a transactional consistency model which strengthens Read Committed. It proscribes G0, G1, and G2-item: cycles between transactions which fail to observe each other’s effects, based on single-object reads. Repeatable Read is stronger than Read Committed, which allows G2-item. It is weaker than Serializability, which proscribes G2 in general, not just on single-item reads. It follows that Repeatable Read and Serializable are indistinguishable from Serializable for transactions which interact with objects by primary key; they differ only with respect to predicate phenomena. Definitions and implementations of Repeatable Read vary widely. We use Adya’s PL-2.99, which is often used in research on consistency models. The ANSI SQL specification offers a much weaker, ambiguous definition which, unlike PL-2.99, allows G0 (write cycle). Serializability Serializability is a relatively strong transactional consistency model which requires that all transactions appear to execute in some total order. It is stronger than Repeatable Read and Snapshot Isolation, proscribing G0, G1, and G2. It does not require that the apparent order of transactions be consistent with the per-process or real-time order: for that, see Strong Session Serializability and Strong Serializability, respectively. Sequential (Consistency model) Sequential consistency is a relatively strong consistency model which guarantees that all operations appear to take place in some total order, and that order is consistent with the order on each process. Sequential is stronger than Causal, in that it requires equivalence to a total order of operations. It is weaker than Linearizable, in that it does not require real-time order. Snapshot Isolation Snapshot Isolation is a transactional consistency model which strengthens Read Committed. It proscribes G0, G1, and G-nonadjacent: a kind of dependency cycle where some transactions fail to observe each other’s effects. Informally, Snapshot Isolation provides each transaction with an isolated snapshot of the entire database. At commit time, the transaction’s writes are applied atomically to produce a new version of the database. The transaction only commits if no other transaction has written the same keys since the transaction’s snapshot was taken. Snapshot Isolation strengthens Read Committed by prohibiting G-nonadjacent. It is weaker than Serializability, which prohibits G2 in general. It is neither stronger nor weaker than Repeatable Read: Repeatable Read allows some predicate phenomena which Snapshot Isolation doesn’t, and Snapshot Isolation allows some single-item phenomena, like Write Skew, which Repeatable Read doesn’t. Strong consistency There is no one accepted definition of “strong consistency”, and the term is used to cover a broad range of consistency models. It can refer to many consistency models, including Sequential, Linearizable, Serializable, Strong Session Serializable, Strong Serializable, and so on. In general, “strong” and “weak” are relative terms. Strong Serializable Strong Serializable is one of the strongest transactional consistency models. It strengthens Serializable by proscribing real-time phenomena: the apparent order of transactions must be consistent with their real-time order. If transaction A commits before transaction B begins, B must appear to execute after A. Strong Serializability is also stronger than Strong Session Serializability, which allows processes to skew relative to one another. If one considers a transaction as a single operation, and the database as a single object, Strong Serializability is equivalent to Linearizability. Strong Session Serializable Strong Session Serializable is a relatively strong transactional consistency model. It strengthens Serializable by proscribing session phenomena: the apparent order of transactions must be consistent with the order within each session. If we identify each session as a process, Strong Session Serializable requires that the graph of write-write, write-read, read-write, and session dependencies between transactions is acyclic. Strong Session Serializable is weaker than Strong Serializable, which ensures that all transactions, regardless of which process performs them, appear to execute in real-time order. Strong Session Snapshot Isolation Strong Session Snapshot Isolation is a relatively strong transactional consistency model. Just as Strong Session Serializable strengthens Serializable, Strong Session Snapshot Isolation strengthens Snapshot Isolation by proscribing session phenomena: the apparent order of transactions must be consistent with the order within each session. If we identify each session as a process, Strong Session Snapshot Isolation requires that the graph of write-write, write-read, read-write, and session dependencies has no cycles, except for cycles which have adjacent read-write edges. Strong Session Snapshot Isolation is weaker than Strong Snapshot Isolation, which enforces real-time order between processes, not just within each process. Strong Snapshot Isolation Strong Snapshot Isolation is a relatively strong transactional consistency model. Just as Strong Serializable strengthens Serializable, Strong Snapshot Isolation strengthens Snapshot Isolation by proscribing real-time phenomena: the apparent order of transactions must be consistent with their real-time order. If transaction A commits before transaction B begins, B’s snapshot must reflect A’s commit. Strong Snapshot Isolation requires that the graph of write-write, write-read, read-write, and real-time dependencies is acyclic, except for cycles which have adjacent read-write edges. Strong Snapshot Isolation is also stronger than Strong Session Snapshot Isolation, which only enforces order within each process, rather than across all processes. Writes Follow Reads Writes Follow Reads is a consistency model which ensures that if a process performs a read r observing some write w1, any later write w2 that process executes must take effect after w1. Reading a value “seals” the past.
Availability Models
These are different ways systems can be available.
| Model | Operations succeed when… | Achievable consistency models |
|---|---|---|
| Total | The node is non-faulty | Monotonic Atomic View, Read Committed, Read Uncommitted, Monotonic Read, Monotonic Write, Writes Follow Reads |
| Sticky | A non-faulty client is connected to a non-faulty server | Causal, Read Your Writes |
| Majority | The node is non-faulty and can communicate with a majority of nodes | Sequential, Linearizable, Snapshot Isolation, Repeatable Read, Serializable, Strong Serializable |
High availability Many systems claim to offer high availability, but the term is not well-defined. The Megastore paper uses “high availability” to mean majority availability, but the Dynamo paper uses the term to mean total availability. One working definition is that a highly available system is available much of the time; this is the definition adopted in, for example, Ladin, Liskov, Shrira, and Ghemawat’s “Providing High Availability Using Lazy Replication”. Another definition is that a highly available should generally be available more often than a single node. Total availability Totally available systems ensure that every non-faulty node can execute any operation. Operations continue to execute even when other nodes, or the network between them, fail. Total availability is particularly useful for mobile apps, geo-replicated systems, and IOT devices with limited power or connectivity. Total availability also has important performance implications. Regardless of node or network failures, a totally available system can process requests without waiting for any network communication. This is particularly useful for latency-sensitive applications. Consistency models like Read Uncommitted, Read Committed, Monotonic Atomic View, Writes Follow Reads, Monotonic Reads, and Monotonic Writes can all be implemented in totally available systems. Bailis et al, Abadi, and DeCandia et al’s Dynamo paper all call this model of availability “highly available”. Gilbert & Lynch simply call it “available” in their CAP Theorem paper, as do the PNUTS and COPS papers. We call it “total availability”, in an attempt to disambiguate. Majority availability Majority available systems ensure that if a majority of non-faulty nodes can communicate with one another, those nodes can execute operations. Operations may fail on faulty nodes, or when nodes cannot exchange messages with a majority. Many systems use “just over half” as their majority, but some Byzantine-fault-tolerant systems use a different fraction, like a two-thirds majority. Some systems can use large quorums for (e.g.) leader election in exchange for smaller quorums when committing operations using a stable leader. Consistency models like Sequential, Linearizable, Snapshot Isolation, Repeatable Read, Serializable, and their stronger variants can only be, at best, majority available. In practice, systems like Zookeeper, Raft, Consul, MongoDB, and CockroachDB are (broadly speaking) majority available. The distributed systems literature uses varying names for this model. Bailis et al’s HAT paper calls it “unavailable”, whereas Megastore calls it “highly available”. We opt for “majority available”. Sticky availability Sticky availability means that if a client executes an operation against a database state that reflects all the client’s past transactions, it eventually receives a response – even in the presence of indefinite node or network failures. Broadly speaking, stateless clients must “stick” to one server, which ensures the consistency of that client’s operations. In the limiting case where every process is a server (or maintains its own state) this reduces to total availability. Consistency models like Read Your Writes and Causal consistency can be implemented in sticky available systems, or in totally available systems in which all processes retain state about the past.
Phenomena
A phenomenon is something a system does which someone, somewhere, thought was a bad idea. For example, G0 (Write Cycle) describes a set of transactions which all overwrite each other. Since some names for phenomena have multiple interpretations, we try to use the less-ambiguous, short names presented in the literature – e.g. G1a for “aborted read”.
A5A (Read Skew) A5A (also known as Read Skew) is a phenomenon where a transaction observes part, but not all, of another transaction’s writes. A5A is a special case of G-single. It is allowed by Read Committed, but forbidden by Snapshot Isolation and Repeatable Read. A5B (Write Skew) A5B (also known as Write Skew) is a phenomenon where two transactions write different objects, and each fails to observe the other’s write. A5B is a special case of G2-item; it is allowed by Snapshot Isolation, but is forbidden by Repeatable Read. Fractured Read Fractured Read is a phenomenon where one transaction writes two objects x and y, and a second transaction reads that version of x, but an older version of y. Fractured Reads are prevented by Read Atomic. G0 (Write Cycle) G0 (also known as Write Cycle) is a phenomenon where a set of transactions overwrite one another, forming a cycle linked purely by write-write dependencies. G0 is proscribed by Read Uncommitted. G1 G1 is the conjunction of three phenomena: G1a (Aborted Read), G1b (Intermediate Read), and G1c (Cyclic Information Flow). G1 is used to define Read Committed. G1a (Aborted Read) G1a (also known as Aborted Read) is a phenomenon where an aborted transaction’s write is visible to a committed transaction. G1a is prohibited by Read Committed. Checking for G1a is straightforward: one builds a set of the versions written by every aborted transaction, then scans every committed read. If any sees an aborted transaction, one has a direct example of G1a. G1b (Intermediate Read) G1b (also known as Intermediate Read) is a phenomenon where a transaction reads a version of some object from the middle of a different transaction; i.e., a version other than that transaction’s last write of that object. G1b is prohibited by Read Committed. Checking for G1b is straightforward. Start by scanning through every transaction, building up a set of intermediate versions: those versions which were overwritten by writes later in the same transaction. Then scan every committed read: if any observes an intermediate version, it constitutes an example of G1b. G1c (Cyclic Information Flow) G1c (also known as Cyclic Information Flow) is a phenomenon where a set of transactions all observe or overwrite each other’s writes. G1c is prohibited by Read Committed. Garbage Read Garbage Read is a phenomenon where a database returns a version of some object that was not the product of any write(s). Garbage reads can occur due to memory corruption, serialization errors, race conditions, and more. We are not aware of a formal name for this phenomenon, but it occurs in a surprising number of real systems. Checking for garbage reads is relatively simple. G-single (Single Anti-Dependency Cycle) G-single (also known as Single Anti-dependency Cycle) is a phenomenon where a set of transactions are linked in a cycle whose edges are write-write, write-read, or read-write dependencies, and there is exactly one read-write dependency. This is one of the cycles proscribed by Snapshot Isolation. G-nonadjacent (Non-Adjacent Anti-Dependency Cycle) G-nonadjacent is a phenomenon where a set of transactions are linked in a cycle whose edges consist of write-write, write-read, or read-write dependencies, and no two read-write dependencies appear adjacent to each other. This is not a standardized term in the literature, but it is the general kind of cycle proscribed by Snapshot Isolation. G2-item (Item Anti-Dependency Cycle) G2-item (also known as an Item Anti-Dependency Cycle) is a phenomenon where a set of transactions are linked in a cycle whose edges consist of write-write, write-read, or read-write dependencies purely on single objects, rather than predicates. This is the general kind of cycle proscribed by Repeatable Read. G2 (Anti-Dependency Cycle) G2 (also known as an Anti-Dependency Cycle) is a phenomenon where a set of transactions are linked in a cycle whose edges consist of write-write, write-read, or read-write dependencies. These cycles are proscribed by Serializability. Internal Consistency Anomaly An Internal Consistency Anomaly occurs when a transaction fails to provide sequential semantics within a single transaction. For read-write registers, an internal violation occurs when a read of object x fails to observe that transaction’s most recent write of x, or the version of x from the start of the transaction. This comes from Cerone, Bernardi, and Gotsman’s A Framework for Transactional Consistency Models with Atomic Visibility, where it is formalized as the axiom INT. Similar behaviors are required in Adya’s and Alvisi et al’s formalisms. Internal consistency is relatively easy to verify. Within each transaction independently, one steps through each sub-operation and keeps track of what the database state should have been, given that operation. For instance, once one observes a write of x=5, any later reads of x should observe 5, until another write of x supervenes. Long Fork Long Fork is a phenomenon where two transactions make concurrent updates to separate objects, creating a “fork” in the timeline of the database, and two other transactions each observe one of those forks before they are logically merged. Lost Write (Phenomenon) At the level of a distributed system, Lost Write is a phenomenon where the system acknowledges a write as complete, but after some time, the effects of that write are no longer visible. A write could be lost immediately. For instance, a server might internally abort a write transaction, but due to a bug, inform the client that the transaction had actually committed. A write could also be visible for some time, then vanish. For example, a write might be acknowledged by a server and stored in memory, but not written to disk before a crash. Alternatively, a write could be replicated to, say, two out of five nodes in a cluster, visible in reads on those nodes, but later overwritten by the remaining three nodes. P0 (Dirty Write) P0 (also known as Dirty Write) is a phenomenon where one transaction overwrites a version written by another transaction before that transaction completes. See also the generalized definition, G0. P1 (Dirty Read) P1 (also known as Dirty Read) is a phenomenon where one transaction reads data from another transaction before that transaction completes. See also the generalized definition G1. P2 (Non-Repeatable Read) P2 (also known as Non-Repeatable Read or Fuzzy Read) is a phenomenon where a transaction modifies data that another, ongoing transaction has read. See also the generalized definition G2-item. P3 (Phantom) P3 (also known as Phantom) is a phenomenon where one transaction modifies data that another, ongoing transaction has read via a predicate. See also the generalized definition G2. P4 (Lost Update) P4 (also known as Lost Update) is a phenomenon where the effects of a committed transaction are effectively lost due to another transaction’s concurrent write. Specifically, two committed transactions read the same version of some object, and each writes to it. Since neither observed the other’s write, one of their updates is effectively lost. P4 is prohibited by Snapshot Isolation and Repeatable Read. P4 is relatively straightforward to test for: scan through all transactions in a history, finding any that read and then wrote the same object. Construct a map of object versions to sets of those transactions. If any set has more than one element, it indicates P4. Real-Time Phenomena Real-Time phenomena are variants of cycle phenomena which involve one or more real-time dependencies between transactions. For instance, in a Serializable system, one process could perform a write, and after that transaction commits, a second process could begin a transaction which fails to observe the write. The two transactions form a cycle involving a read-write and a real-time dependency: a violation of Strong Serializability. We are not aware of “real-time phenomena” as a term in the literature, but the concept arises in work on Strong Serializability and Strong Snapshot Isolation, along with Adya’s formalism. Real-time phenomena are permitted by most consistency models, but proscribed by Linearizable, Strong Snapshot Isolation, and Strong Serializability. Process Phenomena Process phenomena (or session phenomena) are variants of cycle phenomena which involve one or more process (session) dependencies between transactions. For instance, a Serializable system could allow a single process to execute a transaction which writes something, then a second transaction which observes the state before its earlier write. We are not aware of “session phenomena” or “process phenomena” as terms in the literature, but the concept arises in work on Strong Session Serializability and Strong Session Snapshot Isolation, along with Adya’s formalism. Process/session phenomena are allowed by most consistency models, but proscribed by Sequential, Strong Session Serializability, and Strong Session Snapshot Isolation. Stale Read Stale Read is a kind of Real-Time phenomenon where one operation begins after another ends, but appears to execute before the prior operation. Note that both transactions can execute on any process. Stale reads are allowed by most consistency models, but proscribed by Linearizable, Strong Snapshot Isolation, and Strong Serializability.
Faults
These are things that can go wrong in distributed systems.
Amnesia Amnesia means a node has forgotten something. Total amnesia occurs when a node forgets everything it has received. Most nodes undergo partial amnesia when they crash: information in memory is lost, but some state (depending on sync calls, the filesy