Saturday, June 24, 2023

My Explanation Of Sequential Consistency

Clearing up a common misconception about sequential consistency

The purpose of this post is to clear up a common misunderstanding that many might have about Sequential Consistency.

Background: Most modern CPUs are not sequentially consistent. But many modern programming languages have a SC-DRF memory model (sequential consistency for data-race-free programs) which guarantees that your programs will be executed in a sequentially consistent manner, as long as your programs do not contain any data races.

What is sequential consistency?

The simplest analogy I’ve seen for the “interleaved sequence” (or “total order”) is that of a riffle shuffle, which I think is very intuitive. Imagine you have 2 threads, and each operation of a thread is a card, so you have two decks of cards, each deck represents a thread, and you do a riffle shuffle which interleaves the cards and the resulting merged deck is what gets executed. It preserves the order of the cards within each deck, but there can be many possible interleavings.

The best explanation of sequential consistency that I’ve come across is from the 2010 ACM article by Adve & Boehm titled “Memory Models: a case for Rethinking Parallel Languages and hardware”:

A natural view of the execution of a multithreaded program operating on shared variables is as follows. Each step in the execution consists of choosing one of the threads to execute, and then performing the next step in that thread’s execution (as dictated by the thread’s program text, or program order). This process is repeated until the program as a whole terminates.

Effectively, the execution can be viewed as taking all the steps executed by each thread, and interleaving them in some way. Whenever an object (that is, variable, field, or array element) is accessed, the last value stored to the object by this interleaved sequence is retrieved. For example, consider Figure 1, which gives the core of Dekker’s mutual exclusion algorithm.

Figure 1. core of Dekker’s algorithm.
can r1 = r2 = 0?
initially X=y=0

Red Thread Blue Thread
X = 1;
r1 = Y;
Y = 1;
r2 = X;

The program can be executed by interleaving the steps from the two threads in many ways. Formally, each of these interleavings is a total order over all the steps performed by all the threads, consistent with the program order of each thread. Each access to a shared variable “sees” the last prior value stored to that variable in the interleaving.

Figure 2. Some executions for figure 1.

Execution 1 Execution 2 Execution 3
X = 1;
r1 = Y;
Y = 1;
r2 = X;
Y = 1;
r2 = X;
X = 1;
r1 = Y;
X = 1;
Y = 1;
r1 = Y;
r2 = X;
// r1 == 0
// r2 == 1
// r1 == 1
// r2 == 0
// r1 == 1
// r2 == 1

Figure 2 gives three possible executions that together illustrate all possible final values of the non-shared variables r1 and r2. Although many other interleavings are also possible, it is not possible that both r1 and r2 are 0 at the end of an execution; any execution must start with the first statement of one of the two threads, and the variable assigned there will later be read as one. Following Lamport, an execution that can be understood as such an interleaving is referred to as sequentially consistent.

It is very important to note the phrasing used - they did not write “is”, but rather “can be understood as”. This is important, because the memory model that they described above is in fact NOT sequential consistency but is in fact strict consistency - but this is necessary because it is best to understand sequential consistency in terms of strict consistency.

Here is my definition of sequential consistency:

An execution of a program is sequentially consistent if its output could have been produced by a strictly consistent execution of the same program.

Strict consistency is the most intuitive, natural, and easy to understand consistency model. It is actually what Adve and Boehm described above, and also here:

Each step in the execution consists of choosing one of the threads to execute next, and then performing the next step in its execution. This process is repeated until the program as a whole terminates.

Thus effectively the execution of the whole program can be viewed as taking all the steps executed by each thread, and interleaving them in some way.

Whenever an object (i.e. variable, field, or array element) is accessed, the last value stored by this interleaved sequence is retrieved as the value of that object.

Source: https://www.hpl.hp.com/techreports/2009/HPL-2009-259html.html#seqcon

Here’s another explanation of strict consistency:

The intuitive notion of memory consistency is the strict consistency model. In the strict model, any read to a memory location X returns the value stored by the most recent write operation to X. If we have a bunch of processors, with no caches, talking to memory through a bus then we will have strict consistency. The point here is the precise serialization of all memory accesses.

Here’s Sinha’s definition of strict consistency:

The strict consistency model is the strongest form of memory coherence, having the most stringent consistency requirements.

A shared-memory system is said to support the strict consistency model if the value returned by a read operation on a memory address is always the same as the value written by the most recent write operation to that address, irrespective of the locations of the processes performing the read and write operations. That is, all writes instantaneously become visible to all processes. (Sinha97)

What’s the difference between strict consistency and sequential consistency?

Strict consistency is concerned with what actually happens in the execution (i.e. it cares about the actual ordering of instructions), whereas sequential consistency only cares about the output of executions - it doesn’t care about what actually happens in the execution.

Sequential consistency is a slightly weaker model than strict consistency. It was defined by Lamport as the result of any execution is the same as if the reads and writes occurred in some order, and the operations of each individual processor appear in this sequence in the order specified by its program.

In essence, any ordering that could have been produced by a strict ordering regardless of processor speeds is valid under sequential consistency. The idea is that by expanding from the sets of reads and writes that actually happened to the sets that could have happened, we can reason more effectively about the program (since we can ask the far more useful question, “could the program have broken?”). We can reason about the program itself, with less interference from the details of the hardware on which it is running. It’s probably fair to say that if we have a computer system that really uses strict consistency, we’ll want to reason about it using sequential consistency.

Source: https://web.archive.org/web/20160312012103/https://www.cs.nmsu.edu/~pfeiffer/classes/573/notes/consistency.html

Thus, an execution of a program is sequentially consistent if and only if its output could have been produced by a strictly consistent execution of that program. See Leslie Lamport’s definition of sequential consistency in his seminal 1979 paper “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs”:

[A multiprocessor is sequentially consistent if] The result of any execution [on the multiprocessor] is the same as if the [memory fetch and store] operations of all the processors [of that multiprocessor] were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

Yes, this definition has some counter-intuitive implications. For example, it allows reads from the future:

Oddly enough, the precise definition, as given by Lamport, doesn’t even require that ordinary notions of causality be maintained; it’s possible to see the result of a write before the write itself takes place, as in:


P1:        W(x)1
-----------------------
P2:  R(x)1

This is valid because there is a different ordering which, in strict consistency, would yield P2 reading x as having a value of 1.

Definitions

An execution of a program is sequentially consistent if there exists a strictly consistent execution of that program which would produce the same output.

A multiprocessor is sequentially consistent if any possible execution on that multiprocessor is sequentially consistent. So for example, modern x86 CPUs are not sequentially consistent:

These days, you won’t easily find a modern multicore device which guarantees sequential consistency at the hardware level. Sequential consistency only really becomes interesting as a software memory model, when working in higher-level programming languages. In Java 5 and higher, you can declare shared variables as volatile. In C++11, you can use the default ordering constraint, memory_order_seq_cst, when performing operations on atomic library types. If you do those things, the toolchain will restrict compiler reordering and emit CPU-specific instructions which act as the appropriate memory barrier types. In this way, a sequentially consistent memory model can be “emulated” even on weakly-ordered multicore devices.

When you see sources here using the terms “total order” and “interleaved sequence” - they’re all talking about the same thing. It is also referred to as “memory order” or “global memory order” in this source: https://www.cis.upenn.edu/~devietti/classes/cis601-spring2016/sc_tso.pdf

The total order of operations is called memory order.

In SC, memory order respects each core’s program order, but other consistency models may permit memory orders that do not always respect the program orders.

The Misconception

The misconception that I wanted to address in this post comes from this question asked on cstheory.stackexchange.com: https://cstheory.stackexchange.com/questions/5937/difference-between-strict-consistency-and-sequential-consistency

But it makes sense from the idea that sequential consistency allows writes to propagate slowly and one thread may not have any idea as to what other processors are up to.

This question demonstrates the misunderstanding that can arise as a result of the widespread assertions to the effect that writes in a Sequentially Consistent system can be “delayed”.

In a sequentially consistent system, writes can indeed be delayed. Actually, a sequentially consistent execution can violate causality and read values from the future. The output is the only thing that matters in order for an execution (or a system) to be called sequentially consistent.

As explained earlier, a sequentially consistent execution of Dekker’s algorithm cannot result in r1==r2==0, because, as the asker said, there is no total order that can give this result. The fact that reads and writes can be delayed in a sequentially consistent execution is irrelevant because only the output matters.

For an execution to be sequentially consistent, the actual ordering of reads and writes in the execution doesn’t matter, the only thing that matters is that the output of the execution could have been produced by a strictly consistent execution of the same program.

Further Notes

Found this interesting (https://preshing.com/20120612/an-introduction-to-lock-free-programming/#sequential-consistency):

A simple (but obviously impractical) way to achieve sequential consistency is to disable compiler optimizations and force all your threads to run on a single processor. A processor never sees its own memory effects out of order, even when threads are pre-empted and scheduled at arbitrary times.

Some programming languages offer sequentially consistency even for optimized code running in a multiprocessor environment. In C++11, you can declare all shared variables as C++11 atomic types with default memory ordering constraints. In Java, you can mark all shared variables as volatile. Here’s the example from my previous post, rewritten in C++11 style:

std::atomic<int> X(0), Y(0);
int r1, r2;

void thread1()
{
    X.store(1);
    r1 = Y.load();
}

void thread2()
{
    Y.store(1);
    r2 = X.load();
}

Because the C++11 atomic types guarantee sequential consistency, the outcome r1 = r2 = 0 is impossible. To achieve this, the compiler outputs additional instructions behind the scenes – typically memory fences and/or RMW operations. Those additional instructions may make the implementation less efficient compared to one where the programmer has dealt with memory ordering directly.

Conclusion

In the end, I finally settled on this definition for Sequential Consistency:


An execution is Sequentially Consistent if there exists a Strictly Consistent execution of the same program that produces the same output.

Or alternatively

An execution is Sequentially Consistent if there exists an interleaved sequence (aka total order) of operations that produces the same output when the operations in that interleaved sequence (aka total order) are executed sequentially by a uniprocessor, and the operations in each thread appear in the sequence in the order specified by the source code.