Hoare-Style Specifications as Correctness Conditions for Non-linearizable Concurrent Objects

Ilya Sergey† Aleksandar Nanevski‡ Anindya Banerjee‡ Germán Andrés Delbianco‡
†University College London, UK ‡IMDEA Software Institute, Spain
i.sergey@ucl.ac.uk {aleks.nanevski, anindya.banerjee, german.delbianco}@imdea.org

Abstract
Designing efficient concurrent objects often requires abandoning the standard specification technique of linearizability in favor of more relaxed correctness conditions. However, the variety of alternatives makes it difficult to choose which condition to employ, and how to compose them when using objects specified by different conditions.

In this work, we propose a uniform alternative in the form of Hoare logic, which can explicitly capture—in the auxiliary state—the interference of environment threads. We demonstrate the expressiveness of our method by verifying a number of concurrent objects and their clients, which have so far been specified only by non-standard conditions of concurrency-aware linearizability, quiescent, and quantitative quiescent consistency. We report on the implementation of the ideas in an existing Coq-based tool, providing the first mechanized proofs for all the examples in the paper.

Categories and Subject Descriptors D.3.1 [Programming Languages]: Formal Definitions and Theory; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about Programs

General Terms Theory, Verification

Keywords Concurrency, Hoare logic, linearizability, quiescent consistency, counting networks, mechanized proofs.

1. Introduction
Linearizability [25] remains the most well-known correctness condition for concurrent objects. It works by relating a concurrent object to a sequential behavior. More precisely, for each concurrent history of an object, linearizability requires that there exists a mapping to a sequential history, such that the ordering of matching call/return pairs is preserved either if they are performed by the same thread, or if they do not overlap. As such, linearizability has been used to establish the correctness of a variety of concurrent objects such as stacks, queues, sets, locks, and snapshots—all of which have intuitive sequential specs.

However, as argued by Shavit [42], efficient parallelization may require the development of concurrent objects that are inherently non-linearizable: in the presence of interference, such objects exhibit behavior that is not reducible to any sequential behavior via linearizability. To reason about such objects, a variety of novel conditions has been developed: concurrency-aware linearizability (CAL) [21], quiescent consistency (QC) [4, 10], quasi-linearizability (QL) [2], quantitative relaxation [23], quantitative quiescent consistency (QQC) [28], and local linearizability [20], to name a few. These conditions, formulated as relations on execution traces, specify a program’s behavior under concurrent interference. Some, such as QC, devote special treatment to the quiescent (i.e., interference-free) moments.

This proliferation of alternative conditions is problematic, as it makes all of them non-canonical. For any specific example, it is difficult to determine which condition to use, or if a new one should be developed. Worse, each new condition requires a development of its own dedicated program logic or verification tool. Furthermore, it is unclear how to combine the conditions/logics/tools, when different ones have been used for different subprograms. Finally, having criteria defined semantically, e.g., in terms of execution traces, makes it challenging to employ them directly for reasoning about clients of the corresponding data structures.

1.1 Concurrency Specification via Program Logics
In this paper, we propose an alternative, uniform, approach: a Hoare logic equipped with special subjective kind of auxiliary state [32] that makes it possible to name the amount of concurrent interference, and relate it to the program’s inputs and outputs directly, without reducing to sequential behavior. We use Fine-grained Concurrent Separation Logic (FCSL) [35], designed to reason about higher-order concurrent programs, and has been recently implemented as a verification tool on top of Coq [40], but whose ability to address non-linearizable programs has not been observed previously.
More specifically, subjective auxiliary state permits that within a spec of a thread, one can refer to the private state (real or auxiliary) of other interfering threads in a local manner. This private state can have arbitrary user-specified structure, as long as it satisfies the properties of a partial commutative monoid (PCM). A particularly important PCM is that of time-stamped histories, which has previously been applied to linearizable objects [41], where it replaced call/return histories. A (logically) time-stamped history consists of entries of the form \( t \mapsto a \), signifying that an atomic behavior \( a \) occurred at a time (or linearization point) \( t \). A subjective specification further distinguishes the histories of the thread and its interfering environment, and usefully relates both to the thread’s input and output.

Of course, Hoare-style reasoning about histories is a natural idea, exploited recently in several works [5, 16, 18, 22]. Here, however, we rely on the unifying power of PCMs, in combination with subjective specifications, to show that by generalizing histories in different ways—though all subject to PCM laws—we can capture the essence of several different conditions, such as CAL, QC and QQC in one-and-same off-the-shelf logical system and tool. More precisely, our histories need not merely identify a point at which an atomic behavior logically occurred, but can also include information about interference, or lack thereof. Moreover, we will use generic FCSL constructs for delimiting the scope of auxiliary state, to reason about quiescent moments.

1.2 Contributions and Outline

The ability to use FCSL for specifying and verifying linearizable objects (e.g., fine-grained stacks and atomic snapshots) has been recognized before [41]. In contrast, the main conceptual contribution of this work is an observation that the very same abstractions provided by FCSL are sufficient to ascribe non-trivial non-linearizable objects with specs that can hide object implementation details, but are sufficiently strong to be used in proofs of concurrent client programs, as we demonstrate in Section 2. Specifically, we recognize that auxiliary histories can be subject of user-defined invariants beyond mere adherence to sequential executions (e.g., be concurrency-aware [21]), and can be used to capture intermediate interference, allowing for quantitative reasoning about outcomes of concurrent executions (e.g., in the spirit of QQC [28]). These observations, surprisingly, enabled reasoning about non-linearizable data structures and their clients, which were never previously approached from the perspective of program logics or mechanically verified.

In this unified approach based on program logic, it seems inherently impossible (and contrary to the whole idea) to classify Hoare triples as corresponding to this or that correctness condition. Thus, instead of providing theorems that relate Hoare triples to existing conditions, we justify the adequacy of our approach by proof-of-concept verifications of concurrent objects and their clients.

Hence, as key technical contributions, we present subjective specs and the first mechanized proofs (in Coq) of (1) an elimination-based exchanger [39] (Section 3), previously specified using CAL, and (2) a simple counting network [4] (Section 6) that inspired definitions of QC and QQC. We then employ these specs to verify client programs (Sections 5 and 7). We discuss alternative design choices for specs and further applications of our verification approach in Section 8, and summarize our mechanization experience in Section 9. Section 10 compares to related work and Section 11 concludes.

2. Main Ideas and Overview

We begin by outlining the high-level intuition of our specification approach, and summarize the main formalization steps. As the first motivating example, we consider the concurrent exchanger structure from java.util.concurrent [14, 39]. The main purpose of the exchanger is to allow two threads to efficiently swap values in a non-blocking way via a globally shared channel. The exchange might fail, if a thread trying to swap a value does not encounter a peer to do that in a predefined period of time.

For instance, the result of the two-thread program

\[
T_1 \quad T_2
\]

\[
\begin{align*}
T_1 & := \text{exchange} \ 1 \ |\ | \\
T_2 & := \text{exchange} \ 2
\end{align*}
\]

(1)

can be described by the following assertion:

\[
T_1 = T_2 = \text{None} \lor T_1 = \text{Some} \ 2 \land T_2 = \text{Some} \ 1
\]

(2)

That is, \( T_1 \) and \( T_2 \) store the results of the execution of subthreads \( T_1 \) and \( T_2 \) correspondingly, and both threads either succeed, exchanging the values, or fail. The ascribed outcome is only correct under the assumption that no other threads besides \( T_1 \) and \( T_2 \) attempt to use the very same exchange channel concurrently.

Why is the exchanger not a linearizable data structure? To see that, recall that linearizability reduces the concurrent behavior to a sequential one [25]. If the exchanger were linearizable, all possible outcomes of the program (1) would be captured by the following two sequential programs, modelling selected interleavings of the threads \( T_1 \) and \( T_2 \):

\[
\begin{align*}
T_1 & := \text{exchange} \ 1; \ T_2 := \text{exchange} \ 2; \\
& \quad \text{and} \\
T_2 & := \text{exchange} \ 2; \ T_1 := \text{exchange} \ 1;
\end{align*}
\]

(3)

However, both programs (3) will always result in \( T_1 = T_2 = \text{None} \), as, in order to succeed, a call to the exchanger needs another thread, running concurrently, with which to

\[1\]

We use ML-style option data type with two constructors, Some and None to indicate success and failure of an operation, correspondingly.
exchange values. This observation demonstrates that linearizability with respect to a sequential specification is too weak a correctness criterion to capture the exchanger’s behavior observed in a truly concurrent context [21]: an adequate notion of correctness for exchange must mention the effect of interference.

Consider another structure, whose concurrent behavior cannot be related to sequential executions via linearizability:

\[
\text{flip2}(x : \text{ptr nat}) : \text{nat} = \\
a := \text{flip} \; x; \\
b := \text{flip} \; x; \\
\text{return} \; a + b \]  \tag{4}

The procedure \text{flip2} takes a pointer \(x\), whose value is either 0 or 1 and changes its value to the opposite, twice, via the \textit{atomic} operation \text{flip}, returning the sum of the previous values. Assuming that \(x\) is being modified only by the calls to \text{flip2}, what is the outcome \(r\) of the following program?

\[
r := \text{flip2} \; x; \]  \tag{5}

The answer depends on the presence or absence of interfering threads that invoke \text{flip2} concurrently with the program (5). Indeed, in the absence of interference, \text{flip2} will flip the value of \(x\) twice, returning the sum of 0 and 1, i.e., 1. However, in the presence of other threads calling \text{flip2} in parallel, the value of \(r\) may vary from 0 to 2.

What are the intrinsic properties of \text{flip2} to be specified? Since the effect of \text{flip2} is distributed between two internal calls to \text{flip}, both subject to interference, the specification should capture that the variation in \text{flip2}’s result is subject to interference. Furthermore, the specification should be expressive enough to allow reasoning under bounded interference. For example, in the absence of interference from any other threads besides \(T_1\) and \(T_2\) that invoke \text{flip2} concurrently, the program below will always result in \(r = 2:\)

\[
\begin{array}{c}
T_1 \\
r_1 := \text{flip2} \; x \\
r := r_1 + r_2 \\
\end{array} \quad \begin{array}{c}
T_2 \\
r_2 := \text{flip2} \; x \\
\end{array} \]  \tag{6}

2.1 Abstract Histories of Non-linearizable Objects

Execution histories capture the traces of a concurrent object’s interaction with various threads, and are a central notion for specifying concurrent data structures. For example, linearizability specifies the behavior of an object by mapping the object’s global history of method invocations and returns to a sequence of operations that can be observed when the object is used sequentially [25]. However, as we have shown, neither exchange nor \text{flip2} can be understood in terms of sequential executions.

We propose to specify the behavior and outcome of such objects in terms of \textit{abstract concurrent histories}, as follows. Instead of tracking method invocations and returns, our histories track the “interesting” changes to the object’s state. What is “interesting” is determined by the user, depending on the intended clients of the concurrent object. Moreover, our specifications are subjective (i.e., thread-relative) in the following sense. Our histories do not identify threads by their thread IDs. Instead, each method is specified by relating two different history variables: the history of the invoking thread (aka. \textit{self}-history), and the history of its concurrent environment (aka. \textit{other}-history). In each thread, these two variables have different values.

For example, in the case of the exchanger, the interesting changes to the object’s state are the exchanges themselves. Thus, the global history \(\chi_E\) tracks the successful exchanges in the form of pairs of values, as shown in below:

\[
\chi_E = [..., (1, 2), (2, 1), (4, 5), (5, 4), (9, 8), (8, 9), ...] \\
\begin{array}{cccccccc}
\text{exchange ok} & \text{exchange ok} & \text{exchange ok} \\
T_1 & T_2 & T_2 & T_3 & T_2 & T_1 \\
\end{array}
\]

The diagram presents the history from the viewpoint of thread \(T_1\). The exchanges made by \(T_1\) are colored white, determining the \textit{self}-history of \(T_1\). The gray parts are the exchanges made by the other threads (e.g., \(T_2, T_3, \text{etc.}\)), and determine the \textit{other}-history for \(T_1\).

The subjective division between \textit{self} and \textit{other} histories emphasizes that a successful exchange is actually represented by \textit{two} pairs of numbers \((x, y)\) and \((y, x)\), that appear consecutively in \(\chi_E\), and encode the two ends of an exchange from the viewpoint of the exchanging threads. We call such pairs \textit{twins}. As an illustration, the white entry \((2, 1)\) from the self-history of \(T_1\), is matched by a twin gray entry \((1, 2)\) from the other-history of \(T_1\), encoding that \(T_1\) exchanging 2 for 1 corresponds to \(T_1\)’s environment exchanging 1 for 2.

The subjective division is important, because it will enable us to specify threads \textit{locally}, i.e., without referring to the code of other threads. For example, in the case of program (1), we will specify that \(T_1\), in the case of a successful exchange, adds a pair \((1, r_1)\) to its self history, where \texttt{Some } \(r_1\) is \(T_1\)’s return value. Similarly, \(T_2\) adds a pair \((2, r_2)\) to its self history, where \texttt{Some } \(r_2\) is \(T_2\)’s return value.

On the other hand, it is an important invariant of the exchanger object—but not of any individual thread—that twin entries are symmetric pairs encoding different viewpoints of the one-and-the-same exchange. This object invariant will allow us to reason about clients containing combinations of exchanging threads. Taking program (1) as an example again, the object invariant will imply of the individual specifications of \(T_1\) and \(T_2\), that \(r_1\) must equal 2, and \(r_2\) must equal 1, if no threads interfered with \(T_1\) and \(T_2\).

We can similarly employ abstract histories to specify \text{flip2}. One way to do it is to notice that the value of the shared counter \(x\) will be changing as 0, 1, 0, 1, ..., and exactly \textit{two} of these values will be contributed by each call to
specified (we postpone the full presentation until Section 3):

\[ \chi_E = [\_\_\_\_] \]

\[
\text{exchange } v \\
\begin{cases} 
\chi_E = [\_\_\_\_] & \text{if res is Some } w \text{ then} \\
\chi_E = [\_\_\_\_, (v, w), \_\_\_] & \text{else } \chi_E = [\_\_\_] 
\end{cases}
\] (7)

The ellipsis (\_) stands for an existentially-quantified chunk of the history. The spec (7) says that a successful exchange adds an entry \((v, w)\) to the self-history (hence, the entry is white). In the case of failed exchange, no entry is added.

In the complete and formal specification in Section 3, we will have to add a timing aspect, and say that the new entry appears after all the history entries from the precondition. We will also have to say that no entries are removed from the other history (i.e., the exchanger cannot erase the behavior of other threads), but we elide those details here.

The spec of \texttt{flip2} is defined with respect to history \(\chi_F\):

\[ \chi_F = [\_\_\_] \]

\[
\text{flip2 } x \\
\begin{cases} 
\exists a, b, \chi_F = [\_\_\_, a, \_\_\_, b, \_\_\_] & \text{res} = \overset{\ddash}{a} + \overset{\ddash}{b} 
\end{cases}
\] (8)

It says that the return value \(\text{res}\) is equal to the sum of binary complements \(\overset{\ddash}{a} + \overset{\ddash}{b}\) for the thread’s two separate self-contributions to the history. Due to the effects of the interference, the history entries \(a\) and \(b\) may be separated in the overall history by the contributions of the environment, as indicated by \(_\_\_\_\_\_) between them.

To illustrate, consider the history \(\chi_F\) of a successful exchange:

\[ \chi_F = [\_\_\_, (\_\_\_, \overset{\ddash}{a}, \overset{\ddash}{b}), \_\_\_] \]

The leftmost entry \(\_\_\_\_\_\_) is the initial state, and \((\_\_\_, \overset{\ddash}{a}, \overset{\ddash}{b})\) is an entry added by the exchange. The rightmost entry \(\_\_\_\_]\) is the current state.

2.3 Using Subjective Specifications in the Client Code

The immediate benefit of using Hoare logic is that one can easily reason about programs whose components use different object invariants, whereas there is not much one can say about programs whose components require different correctness conditions. For example, Figure 1 shows a proof sketch for a toy program that uses both \texttt{exchange} and \texttt{flip2}. As each of these methods requires its own auxiliary history variable (\(\chi_E\) for the exchanger, and \(\chi_F\) for \texttt{flip2}), the combined program uses both, but the proof simply ignores those histories that are not relevant for any specific method (i.e., we can “frame” the specs (7) and (8) wrt. the histories of the objects that they do not depend upon).

The program first forks two instances of \texttt{flip2}, storing the results in \(r_1\) and \(r_2\) (line 4). Next, two new threads are forked, trying to exchange \(r_1\) and \(r_2\) (line 8). The conditional (line 12) checks if the exchange was successful, and if so, assigns the sum of exchanged values to \(t\) (line 14); otherwise \(t\) gets assigned 2. We want to prove via the specs (7) and (8), that in the absence of external interference on the \texttt{flip2}'s pointer \(x\) and the exchanger, the outcome is always \(t = 2\).
framing out) $\chi_\varepsilon$, as this history variable does not apply to them $\text{flip2}$. Upon finishing, the postconditions of $\text{flip2}$ in line 4 capture the relationship between the contributions to the history $\chi_F$ and the results $r_1$ and $r_2$ of the two calls.

Both postconditions in line 4 talk about the very same history $\chi_F$, just using different colors to express that the contributions of the two threads are disjoint: $a$ and $b$ being white in the left thread, implies that $a$ and $b$ are history entries added by the left thread. Thus, they must be gray in the right thread, as they cannot overlap with the entries contributed by the right thread. The right thread cannot explicitly specify in its postcondition that $a$ and $b$ are gray, since the right thread is unaware of the specific contributions of the left thread.

Dually, $c$ and $d$ being white in the right thread in line 4, implies that they must be gray on the left. Thus, overall, in line 5, we know that $\chi_F$ contains all four entries in some permutation, and in the absence of interference, it contains no other entries but these four. From the object invariant on $\chi_F$ it then follows that the entries are some permutation of $\{1, 0, 1, 0\}$, which makes their sum total $r_1 + r_2 = 2$.

Similarly, we ignore $\chi_F$ while reasoning about calls to $\text{exchange}$ via $\text{spec}$ (7) (lines 7 and 9). As before, we know that the entry $(r_1, v_1)$, which is white in the left postcondition in line 9, must be gray on the right, and dually for $(r_2, v_2)$. In total, the history $\chi_\varepsilon$ must contain both of the entries, but, by the invariant, it must also contain their twins. In the absence of any other interference, it therefore must be that $(r_1, v_1)$ is a twin for $(r_2, v_2)$, i.e., $r_1 = v_2$ and $r_2 = v_1$, as line 11 expresses for the case of a successful exchange. The rest of the proof is then trivial.

The sketch relied on several important aspects of program verification in FCSL: (i) the invariants constraining $\chi_F$ and $\chi_\varepsilon$ were preserved by the methods, (ii) upon joining the threads, we can rely on the disjointness of history contributions of the two threads, in order to combine the thread-local views into a specification of the parent thread, and, (iii) we could guarantee the absence of the external interference.

The aspect (i) is a significant component of what it means to specify and verify a concurrent object. As we will show in Sections 3 and 6, defining a sufficiently strong object invariant, and then proving that it is indeed an invariant, i.e., that it is preserved by the implementation of the program, is a major part of the verification challenge. We will explain FCSL rules for parallel composition and hiding in Section 4, justifying the reasoning principles (ii) and (iii).

2.4 Specifying Non-linearizable Objects in Three Steps

As shown by Sections 2.1–2.3, our method for specifying and verifying non-linearizable concurrent objects and their clients boils down to the following three systematic steps.

**Step 1 (§2.1):** Define object-specific auxiliary state and its invariants. The auxiliary state will typically include a specific notion of abstract histories, recording whatever behavior is perceived as essential by the implementor of the object. To account for the variety of object-specific correctness conditions, we do not fix a specific shape for the histories. We do not restrict them to always record pairs of numbers (as in the exchanger), or record single numbers (as in $\text{flip2}$). The only requirement that we impose on auxiliary state in general, and on histories in particular, is that the chosen type of auxiliary state is an instance of the PCM algebraic structure [41], thus providing an abstract, and user-defined, notion of disjointness between self/other contributions.

**Step 2 (§2.2):** Formulate Hoare-style specifications, parametrized by interference, and verify them. This step provides a suitable “interface” for the methods of the concur-
rent object, which the clients use to reason, without knowing the details of the object and method implementations. Naturally, the interface can refer to the auxiliary state and histories defined in the previous step. When dealing with non-linearizable objects in FCSL, it is customary to formulate the spec in a subjective way (i.e., using self/other, dually white/gray division between history entries) so that the specification has a way to refer to the effects of the interfering calls to the same object. The amount of interference can be later instantiated with more specific information, once we know more about the context of concurrent threads in which the specified program is being run.

**Step 3** ([2.3]): Restrict the interference when using object specs for verification of clients. Eventually, thread-local knowledge about effects of individual clients of one and the same object, should be combined into a cumulative knowledge about the effect of the composition. To measure this effect, one usually considers the object in a quiescent situation, FCSL provides a program-level constructor for hiding. In particular, hide e executes e, but statically prevents other threads from interfering with e, by making e’s auxiliary history invisible. Program e’s other contribution is fixed to be empty, thus modeling quiescence.

### 3. Verifying the Exchanger Implementation

We now proceed with more rigorous development of the invariants and specification for the exchanger data structure, necessary to verify its real-world implementation [14], which was so far elided from the overview of the approach.

The exchanger implementation is presented in ML-style pseudo-code in Figure 2. It takes a value \( v : A \) and creates an offer from it (line 2). An offer is a pointer \( p \) to two consecutive locations in the heap.\(^3\) The first location stores \( v \), and the second is a “hole” which the interfering thread tries to fill with a matching value. The hole is drawn from the type \( \text{hole} = U \mid R \mid M \ w \). Constructor \( U \) signals that the offer is unmatched; \( R \) that the exchanger retired (i.e., withdrew) the offer, and does not expect any matches on it; and \( M \ w \) that the offer has been matched with a value \( w \).

The global pointer \( g \) stores the latest offer proposed for matching. The exchanger proposes \( p \) for matching by making \( g \) point to \( p \) via the atomic compare-and-set instruction \( \text{CAS} \) (line 3). We assume that \( \text{CAS} \) returns the value read, which can be used to determine if it failed or succeeded. If \( \text{CAS} \) succeeds, exchanger waits a bit, then checks if the offer has been matched by some \( w \) (lines 6, 7). If so, Some \( w \) is returned (line 7). Otherwise, the offer is retired by storing \( R \) into its hole (line 6). Retired offers remain allocated (thus, exchanger has a memory leak) in order to avoid the ABA problem, as usual in many concurrent structures [24, 46]. If the exchanger fails to link \( p \) into \( g \) in line 3, it deallocates the offer \( p \) (line 10), and instead tries to match the offer \( cur \) that is current in \( g \). If no offer is current, perhaps because another thread already matched the offer that made the \( \text{CAS} \) in line 3 fail, the exchanger returns None (line 12). Otherwise, the exchanger tries to make a match, by changing the hole of \( cur \) into \( M \ w \) (line 14). If successful (line 16), it reads the value \( w \) stored in \( cur \) that was initially proposed for matching, and returns it. In any case, it unlinks \( cur \) from \( g \) (line 15) to make space for other offers.

### 3.1 Step 1: Defining Auxiliary State and Invariants

To formally specify the exchanger, we decorate it with auxiliary state. In addition to histories, necessary for specifying the observable behavior, the auxiliary state is used for capturing the coherence constraints of the actual implementation, e.g., with respect to memory allocation and management of outstanding offers. The state is subjective as described in Section 2: it keeps thread-local auxiliary variables that name the thread’s private state (self), but also the private state of all other threads combined (other).

The subjective state of the exchanger for each thread in this example consists of three groups of two components: (1) thread-private heap \( h_S \) of the thread, and of the environment \( h_O \), (2) a set of outstanding offers \( \pi_S \) created by the thread, and by the environment \( \pi_O \), and (3) a time-stamped history of values \( \chi_S \) that the thread exchanged so far, and dually \( \chi_O \) for the environment. In Section 2, we illustrated subjectivity by means of histories, white we used white and gray entries, respectively, to describe what here we name \( \chi_S \) and \( \chi_O \), respectively. Now we see that the dichotomy extends beyond histories, and this example requires the dichotomy applied to heaps, and to sets of offers as well. In addition to self/other components of heaps, permissions and histories, we also need shared (aka. joint) state consisting of two components: a heap \( h_J \) of storing the offers that have been made, and

---

\(^3\) In our mechanization, we simplify a bit by making \( p \) point to a pair instead.
a map \( m_j \) of offers that have been matched, but not yet collected by the thread that made them. Heaps, sets and histories are all PCMs under the operation of disjoint union, with empty heap/set/history as a unit. We overload the notation and write \( x \mapsto v \) for a singleton heap with a pointer \( x \) storing value \( v \), and \( t \mapsto a \) for a singleton history. Similarly, we apply disjoint union \( \cup \) and subset \( \subseteq \), to all three types uniformly.

We next describe how the exchanger manipulates the above variables. First, \( h_j \) is a heap that serves as the “staging” area for the offers. It includes the global pointer \( g \). Whenever a thread wants to make an offer, it allocates a pointer \( p \) in \( h_s \), and then tries to move \( p \) from \( h_s \) into \( h_j \), simultaneously linking \( g \) to \( p \), via the CAS in line 3 of Figure 2.

Second, \( \pi_S \) and \( \pi_O \) are sets of offers (hence, sets of pointers) that determine offer ownership. A thread that has the offer \( p \in \pi_S \) is the one that created it, and thus has the sole right to retire \( p \), or to collect the value that \( p \) was matched with. Upon collection or retirement, \( p \) is removed from \( \pi_S \).

Third, \( \chi_S \) and \( \chi_O \) are exchanger-specific histories, each mapping a time-stamp (isomorphic to nats), to a pair of exchanged values. A singleton history \( t \mapsto (v, w) \) symbolizes that a thread having this singleton as a subcomponent of \( \chi_S \), has exchanged \( v \) for \( w \) at time \( t \). As we describe below, the most important invariant of the exchanger is that each such singleton is matched by a “symmetric” one to capture that a thread having this singleton as a subcomponent of \( \chi_S \), has exchanged \( v \) for \( w \) at time \( t \). Classical linearizability cannot express this simultaneous behavior, making the exchanger non-linearizable.

Fourth, \( m_j \) is a map storing the offers that were matched, but not yet acknowledged and collected. Thus, \( \text{dom} \ m_j = \pi_S \cup \pi_O \). A singleton entry in \( m_j \) has the form \( p \mapsto (t, v, w) \) and denotes that offer \( p \), initially storing \( v \), was matched at time \( t \) with \( w \). A singleton entry is entered into \( m_j \) when a thread on the one end of matching, matches \( v \) with \( w \). Such a thread also places the twin entry \( \vec{t} \mapsto (w, v) \), with inverted order of \( v \) and \( w \), into its own private history \( \chi_S \), where:

\[
\vec{t} = \begin{cases} 
  t + 1 & \text{if } t \text{ is odd} \\
  t - 1 & \text{if } t > 0 \text{ and } t \text{ is even}
\end{cases}
\]

For technical reasons, \( 0 \) is not a valid time-stamp, and has no distinct twin. The pending entry for \( p \) resides in \( m_j \) until the thread that created the offer \( p \) decides to “collect” it. It removes \( p \) from \( m_j \), and simultaneously adds the entry \( t \mapsto (v, w) \) into its own \( \chi_S \), thereby logically completing the exchange. Since twin time-stamps are consecutive integers, a history cannot contain entries between twins.

Thus, two twin entries in the combined history including \( \chi_S \), \( \chi_O \) and \( m_j \), jointly represent a single exchange, as if it occurred atomically. For example, the entries \( 1 \mapsto (v_1, w_1) \) and \( 2 \mapsto (w_1, v_1) \) will encode the end-points of the first exchange; the entries \( 3 \mapsto (v_2, w_2) \) and \( 4 \mapsto (w_2, v_2) \) will encode the end-points of the second exchange, etc., the entries at timestamps \( t \) and \( t + 1 \), for odd \( t \), will encode the end-points of the \( \frac{t+1}{2} \)-th exchange. Concurrency-aware histories [21] capture this by making the ends of an exchange occur as simultaneous events. We capture it via twin timestamps. More formally, consider \( \chi = \chi_S \cup \chi_O \cup \| m_j \| \). Then, the exchanger’s main invariant is that \( \chi \) always contains matching twin entries:

\[
t \mapsto (v, w) \subseteq \chi \iff \vec{t} \mapsto (w, v) \subseteq \chi
\]

Here \( \| m_j \| \) is the collection of all the entries in \( m_j \). That is, \( \| \emptyset \| = \emptyset \), and \( \| p \mapsto (t, v, w) \cup \| m_j \| \| = t \mapsto (v, w) \cup \| m_j \| \| \).

In our implementation, we prove that atomic actions, such as CAS, preserve the invariant, therefore, the whole program, being just a composition of actions, doesn’t violate it.

### 3.2 Step 2: Hoare-style Specification of Exchanger

We can now give the desired formal Hoare-style spec.

\[
\{ h_s = \emptyset, \pi_S = \emptyset, \chi_S = \emptyset, \eta \subseteq \chi_O \cup \| m_j \| \} \\
\text{exchange } v \\
\{ h_s = \emptyset, \pi_S = \emptyset, \eta \subseteq \chi_O \cup \| m_j \|, \text{if res is Some } w \text{ then} \\
\exists t. \chi_S = t \mapsto (v, w), \text{last}(\eta) < t, \text{else } \chi_S = \emptyset \}
\]

The precondition says that the exchanger starts with the empty private heap \( h_s \), set of offers \( \pi_S \) and history \( \chi_S \); hence by framing, it can start with any value for these components.\(^4\) The logical variable \( \eta \) names the initial history of all threads, \( \chi_O \cup \| m_j \| \), which may grow during the call, thus, we use subset instead of equality to make the precondition stable under other threads adding entries to \( \chi_O \) or \( m_j \).

In the postcondition, the self heap \( h_s \) and the set of offers \( \pi_S \) didn’t change. Hence, if exchange made an offer during its execution, it also collected or retired it by the end. The history \( \eta \) is still a subset of the ending value for \( \chi_O \cup \| m_j \| \), signifying that the environment history only grows by interference. We will make a crucial use of this part of the spec when verifying a client of the exchanger in Section 5.

If the exchange fails (i.e., \( \text{res} = \text{None} \)), then \( \chi_S \) remains empty. If it succeeds (either in line 7 or line 16 in Figure 2), i.e., if the result \( \text{res} = \text{Some } w \), then there exists a time-stamp \( t \), such that self-history \( \chi_S \) contains the entry \( t \mapsto (v, w) \), symbolizing that \( v \) and \( w \) were exchanged at time \( t \).

Importantly, the postcondition implies, by invariant (9), that in the success case, the twin entry \( \vec{t} \mapsto (w, v) \) must belong to \( \chi_O \cup \| m_j \| \), i.e., another thread matched the exchange (this was made explicit by the spec (7)). Moreover, the exchange occurred after the call to exchange: whichever \( \eta \) we chose in the pre-state, both \( t \) and \( \vec{t} \) are larger than the last time-stamp in \( \eta \).

The proof outline for the exchanger is available in Appendix A. In Section 5, after introducing necessary FCSL background, we will illustrate Step 3 of our method and show how to employ the subjective Hoare spec (10) for modular verification of a concurrent client.

\(^4\)Framing in FCSL is similar to that of separation logic, allowing extensions to the initial state that remain invariant by program execution. In FCSL, however, framing applies to any PCM-valued state component (e.g., heaps, histories, etc.), whereas in separation logic, it applies just to heaps.
4. Background on FCSL

In order to formally present Step 3 of our method, we first need to introduce some important parts of FCSL.

A Hoare specification in FCSL has the form \( \{P\} e \{Q\} @ R \). \( P \) and \( Q \) are pre- and postcondition for partial correctness, and \( R \) defines the shared resource on which \( e \) operates. The latter is a state transition system describing the invariants of the state (real and auxiliary) and atomic operations that can be invoked by the threads that simultaneously operate on that state. We elide the transition system aspect of resources here, and refer to [35] for detailed treatment.

An important secondary role of a resource is to declare the variables that \( P \) and \( Q \) may scope over. For example, in the case of exchanger, we use the variables \( h_S, \pi_S, \chi_S, h_Q, \pi_O, \chi_O, \) and \( h_m, m_J \). The mechanism by which the variables are declared is as follows. Underneath, a resource comes with only three variables: \( a_S, a_O \) and \( a_J \) standing for abstract self state, other state, and shared (joint) state, but the user can pick their types depending on the application. In the case of exchanger, \( a_S \) and \( a_O \) are triples containing a heap, an offer-set and a history. The variables we used in Section 2 are projections out of such triples: \( a_S = (h_S, \pi_S, \chi_S) \), and \( a_O = (h_Q, \pi_O, \chi_O) \). Similarly, \( a_J = (h_m, m_J) \).

It is essential that \( a_S \) and \( a_O \) have a common type exhibiting the algebraic structure of a PCM, under a partial binary operation \( \sqcup \). PCMs give a way, generic in \( R \), to define the inference rule for parallel composition.

\[
\{P\} e_1 \{Q_1\} @ R \quad \{P\} e_2 \{Q_2\} @ R
\]
\[
\{P \oplus P\} e_1 \parallel e_2 \{[\text{res.1}/\text{res}Q_1 \oplus [\text{res.2}/\text{res}Q_2] \} \quad @ R
\]

Here, \( \oplus \) is defined as follows.

\[
(P_1 \oplus P_2)(a_S, a_J, a_O) \iff \exists x_1, x_2, a_S = x_1 \sqcup x_2, \\
(P_1(x_1, a_J, x_2 \sqcup a_O), P_2(x_2, a_J, x_1 \sqcup a_O)
\]

Thereby, when a parent thread forks \( e_1 \) and \( e_2 \), then \( e_1 \) becomes part of the environment for \( e_2 \), and vice-versa. This is so because the self component \( a_S \) of the parent is split into \( x_1 \) and \( x_2 \); \( x_1 \) becomes the self part of \( e_1 \), but \( x_2 \) is added to the other part \( a_O \) of \( e_1 \) (and symmetrically for \( e_2 \)).

To reason about quiescent moments, we use one more constructor of FCSL: hiding. The program hide \( e \) operationally executes \( e \), but logically installs a resource within the scope of \( e \). In the case of the exchanger, hide \( e \) starts only with private heaps \( h_S \) and \( h_Q \), then takes a chunk of heap out of \( h_S \) and "installs" an exchanger in this heap, allowing the threads in \( e \) to exchange values. hide \( e \) is quiescent wrt. exchanger, as the typechecker will prevent composing hide \( e \) with threads that want to exchange values with \( e \).

The auxiliaries \( \pi_S, \chi_S, \pi_O, \chi_O, \) and \( h_m, m_J \), belonging to the exchanger (denoted as resource \( E \)) are visible within hide, but outside, only \( h_S \) persists (denoted as a resource \( P \) for private state). We elide the general hiding rule [35], and just show the special case for the exchanger.

\[
\{P\} e \{Q\} @ E
\]
\[
\{h_S = \Phi_1(h_J), \Phi_2(h_J)\} \quad \text{hide} \ e \{\exists \Phi_2, h_S = \Phi_2(h_J), \Phi_2(Q)\} @ P
\]

Read bottom-up, the rule says that we can install the exchanger \( E \) in the scope of a thread that works with \( P \), but then we need substitutions \( \Phi_1 \) and \( \Phi_2 \), to map variables of \( E \) (\( h_S, \pi_S, \chi_S, \) etc) to values expressed with variables from \( P \) (\( h_S \) and \( h_Q \)). \( \Phi_1 \) is an initial such substitution (user provided), and the rule guarantees the existence of an ending substitution \( \Phi_2 \). The substitutions have to satisfy a number of side conditions, which we elide here for brevity. The most important one is that other variable \( a_O = (h_Q, \pi_O, \chi_O) \) is fixed to be the PCM unit (i.e., a triple of empty sets). Fixing \( a_O \) to unit captures that hide protects \( e \) from interference.

At the beginning of hide \( e \), the private heap equals the value that \( \Phi_1 \) gives to \( h_J \) (\( h_S = \Phi_1(h_J) \)). In other words, the hide rule takes the private heap of a thread, and makes it shared, i.e., gives to it the \( h_J \) component of \( E \). Upon finishing, hide \( e \) makes \( h_J \) private again.

In the subsequent text we elide the resources from specs.

5. Verifying Exchanger’s Client

We next illustrate how the formally specified exchanger from Section 3 can be used by real-world client programs, and how the other component, asserted by the spec to satisfy \( \forall \eta \subseteq \chi_O \cup \{m_J\} \), is crucial for their verification. We emphasize that the proof of the client does not see the implementation details, which are hidden by the spec (10).

While simple, our client is realistic, and has been used in java.util.concurrent [14]. It is defined as follows. First, the exchanger loops until it exchanges the value.

\[
\text{exchange}'(v : A) : A = \{
\text{if } w \text{ is Some } w \text{ then return } w \text{ else exchange}' v\}
\]

Next, exchange' is iterated to exchange a sequence in order, appending the received matches to an accumulator.

\[
\text{ex_seq}(v_S, ac : \text{seq} A) : \text{seq} A = \{
\text{if } v_S \text{ is Some ac' then}
\text{w }\leftarrow \text{exchange}' v_S; \text{ex_seq}(v_S', \text{snoc ac w})
\text{else return ac}\}
\]

Our goal is to prove, via (10), that the parallel composition

\[
e = \text{ex_seq}(v_S1, \text{nil}) \parallel \text{ex_seq}(v_S2, \text{nil})
\]

exchanges \( v_S1 \) and \( v_S2 \), i.e., returns the pair \( (v_S2, v_S1) \). This holds only under the assumption that \( e \) runs without interference (i.e., quiescently), so that the two threads in \( e \) have no choice but to exchange the values between themselves.

We make the quiescence assumption explicit using the FCSL hide constructor, as described in Section 4. Thus, we
establish the following Hoare triple:

\[
\{ h_5 = g \rightarrow \text{null} \} \quad \text{hide} \quad e \quad \{ g \in \text{dom } h_5, \text{res} = (v_{s2}, v_{s1}) \} \tag{13}
\]

It says that we start with a heap where \( g \) stores null, and end with a possibly larger heap (due to the memory leak), but with the result \((v_{s2}, v_{s1})\). The auxiliaries \( \pi_5, \pi_0, \eta, \eta_0, h_1, m_j \) are visible inside hide, but outside, only \( h_5 \) persists.

**Explaining the verification.** We illustrate the verification by listing the specs of selected subprograms. First, the spec of \( \text{exchange}' \) easily derives from (10) by removing the now-impossible failing case.

\[
\{ h_5 = \emptyset, \pi_5 = \emptyset, \chi_5 = \emptyset, \eta \subseteq \chi_0 \cup \{ m_j \} \} \quad \text{exchange}' \quad v \quad \{ h_5 = \emptyset, \pi_5 = \emptyset, \eta \subseteq \chi_0 \cup \{ m_j \}, \exists t. \chi_5 = t \rightarrow (v, \text{res}), \text{last}(\eta) < t, t \}
\]

Next, \( \text{ex}_\text{seq} \) has the following spec:

\[
\{ h_5 = \emptyset, \pi_5 = \emptyset, \chi_5 = \emptyset \} \quad \text{ex}_\text{seq} \quad (v, \text{nil}) \quad \{ \text{grows-notwins} \ t_1, \text{zip} \ t_2 \text{ vs res}, \}
\]

Here, \( t_1 \) is a list of time-stamps, and \( \text{zip} \ t_2 \text{ vs ws} \) joins up the singleton histories \( t \rightarrow (v, w) \), for each \( t, v, w \) drawn, in order, from the lists \( t_1, v, w \). The spec says that at the time-stamps from \( t_1 \), \( \text{ex}_\text{seq} \) exchanged the elements of \( v \) for those of \( w \). That \( t_1 \) is increasing and contains no twins, follows from the spec of \( \text{exchange}' \) which says that the time-stamps \( t \) and \( t \) that populate \( t_1 \) and \( t_2 \), are larger than anything in \( \eta \), and thus only grow with iteration. From the same postcondition, it follows that \( \chi_0 \cup \{ m_j \} \) contains all the twin exchanges, by invariant (9), as commented in Section 2 about the spec for \( \text{exchange} \).

Next, by the FCSL parallel composition rule (Section 4):

\[
\{ h_5 = \emptyset, \pi_5 = \emptyset, \chi_5 = \emptyset \} \quad \text{ex}_\text{seq} \quad (v, \text{nil}) \quad \{ \text{grows-notwins} \ t_1, \text{grows-notwins} \ t_2, \}
\]

To explain: \( t_1 \) and \( t_2 \) from the left and right \( \text{ex}_\text{seq} \) threads become \( t_{s1}, t_{s2}, \text{res}_1 \) and \( \text{res}_2 \), respectively. The values of each self component \( h_5, \pi_5, \chi_5 \) from the two threads are joined into the self component of the composition. At the same time, the other component \( \chi_0 \) of the left (resp. right) thread equals the sum of \( \chi_5 \) of the right (resp. left) thread, and the \( \chi_0 \) of the composition. This formalizes the intuition that upon forking, the left thread becomes part of the environment for the right thread, and vice-versa.

The postcondition says that the self history of \( e \) contains both \( \text{zip} t_{s1} v_{s1} \text{ res}_1 \) and \( \text{zip} t_{s2} v_{s2} \text{ res}_2 \). Thus, \( v_{s1} \) is exchanged for \( \text{res}_1 \), and \( v_{s2} \) for \( \text{res}_2 \). But we further want to derive \( \text{res}_1 = v_{s2} \) and \( \text{res}_2 = v_{s1} \), i.e., the lists are exchanged for each other, in the absence of interference.

We next explain how this desired property follows for \( \text{hide} \), from the two inequalities in \( e \)'s postcondition

\[
\text{zip} t_{s1} \text{ res}_1 v_{s1} \subseteq \text{zip} t_{s2} v_{s2} \text{ res}_2 \cup \chi_0 \cup \{ m_j \}, \tag{14}
\]

\[
\text{zip} t_{s2} \text{ res}_2 v_{s2} \subseteq \text{zip} t_{s1} v_{s1} \text{ res}_1 \cup \chi_0 \cup \{ m_j \}. \tag{15}
\]

Notice that (14) and (15) are ultimately instances of the conjunction \( \chi_0 \cup \{ m_j \} \) that was part of the specification (10), thereby justifying the use of subjective other variables.

We know that dom \( m_j = \pi_5 \cup \pi_0 \) (from Section 2), that \( \pi_5 = \emptyset \) (from \( e \)'s postcondition), and that by hiding, \( \pi_0 = \chi_0 = \emptyset \). Thus, towards deriving the postcondition of \( \text{hide} \), we simplify (14) and (15) into:

\[
\text{zip} t_{s1} \text{ res}_1 v_{s1} \subseteq \text{zip} t_{s2} v_{s2} \text{ res}_2
\]

\[
\text{zip} t_{s2} \text{ res}_2 v_{s2} \subseteq \text{zip} t_{s1} v_{s1} \text{ res}_1
\]

Because \( t_{s1} \) and \( t_{s2} \) are increasing lists of time-stamps, and contain no twins, the above implies \( t_{s2} = t_{s1} \). Hence:

\[
\text{zip} t_{s1} \text{ res}_1 v_{s1} = \text{zip} t_{s2} v_{s2} \text{ res}_2
\]

and thus \( \text{res}_1 = v_{s2}, v_{s1} = v_{s2} \). We omit the remaining technical argument that explains how the heap \( h_j \), with the pointer \( g \), is folded into \( h_5 \), which ultimately obtains (13).

### 6. Specifying Counting Networks

We now show how to use subjective histories to specify another class of non-linearizable objects—counting networks. Counting networks are a special case of balancing networks introduced by Aspnes et al. [4], themselves building on sorting networks [3], aimed to implement concurrent counters in a way free from synchronization bottlenecks. The key idea is to decompose the workload between several counters, so that each of them is responsible for a disjoint set of values. A thread trying to increment first approaches the balancer, which is a logical “switch” that “directs” the thread, i.e., provides it with the address of the counter to increment. The balancers make counting networks’ operations non-linearizable, as in the presence of interference the results of increments might be observed out of order.

Figure 3 presents a schematic outline and a pseudocode implementation of a counting network with a single balancer. The implementation contains three pointers: the balancer \( bal \), which stores either 0 or 1, thus directing

![Figure 3. Simple counting network](image-url)
threads to the shared pointers $c_0$ or $c_1$, which count the even and odd values, respectively. Threads increment by calling getAndInc, which works as follows. It first atomically changes the bit value of the balancer via a call to atomic operation flip (line 2). The flip operation returns the previous value $b$ of the balancer as a result, thus determining which of the counters, $c_0$ or $c_1$, should be incremented. The thread proceeds to atomically add 2 to the value of $c_0$ via fetchAndAdd2 (line 3). The old value of $c_0$ is returned as the result of the procedure.5

Assuming that $c_0$ and $c_1$ are initialized with 0 and 1, it is easy to see that in a single-threaded program, the network will behave as a conventional counter; that is, consecutive invocations of getAndInc return consecutive nats. However, in the concurrent setting, getAndInc may return results out of order, as follows.

Example 6.1. Consider two threads, $T_1$ and $T_2$ operating on the network initialized with $bal \rightarrow 0$, $c_0 \rightarrow b$. $T_1$ calls getAndInc and executes its line 2 to set $bal$ to 1. It gets suspended, so $T_2$ proceeds to execute lines 2 and 3, therefore setting $bal$ back to 0 and returning 1. While $T_1$ is still suspended, $T_2$ calls getAndInc again, gets directed to $c_0$, and returns 0, after it has just returned 1.

This out-of-order behavior, however, is not random, and can be precisely characterized as a function of the number of threads operating on the network [2, 28]. In the rest of this section and in Section 7, we show how to capture such bounds in the spec using auxiliary state of (subjective) histories in a client-sensitive manner. As a form of road map, we list the desired requirements for the spec of getAndInc, adapting the design goals of the criteria, such as QC, QQC and QL [2, 4, 28], which we will proceed to verify formally, following Step 1 and Step 2 of our approach, and then employ in client-side reasoning via Step 3:

- **R1**: Two different calls to getAndInc should return distinct results (strong concurrent counter semantics).
- **R2**: The results of calls to getAndInc, separated by a period of quiescence (i.e., absence of interference), should appear in their sequential order (quiescent consistency).
- **R3**: The results of two sequential calls $C_1$ and $C_2$, in a single thread should be out of order by no more than $2N$, where $N$ is the number of interfering calls that overlap with $C_1$ and $C_2$ (quantitative quiescent consistency).

6.1 Step 1: Counting Network’s Histories and Invariants

To formalize the necessary invariants, we elaborate the counting network with auxiliary state: tokens (isomorphic to nats) and novel interference-capturing histories.

Figure 4. Tokens and histories of the simple network

A token provides a thread that owns it with the right to increment an appropriate counter [4]. In our example, a thread that performs the flip in line 2 of getAndInc will be awarded a token which it can then spend to execute fetchAndAdd2. Thus, any individual token represents a “pending” call to getAndInc, and the set of unspent tokens serves as a bound on the out-of-order behavior that the network exhibits. We introduce auxiliary variables for the held tokens: $\tau_0$ keeps the tokens owned by the self thread, with its even and odd projections $\tau_0^e$ and $\tau_0^o$, such that $\tau_0 = \tau_0^e \cup \tau_0^o$, administering access to $c_0$ and $c_1$, respectively. Similarly, $\tau_0$, featuring the same projections, keeps the tokens owned by the other thread. We abbreviate $\tau_i = \tau_i^e \cup \tau_i^o$ for $i = 0, 1$.

Figure 4 illustrates a network with three even tokens: $(x^0, y^0, z^0) \in \tau_0$, held by threads that will increment $c_0$, and one odd token $u^1 \in \tau_1$, whose owner will increment $c_1$.

A history of the counting network is an auxiliary finite map, consisting of entries of the form $t \mapsto (i, z)$. Such an entry records that the value $t$ has been written into an appropriate counter ($c_0$ or $c_1$, depending on the parity of $t$), at the moment when $\tau_0$ and $\tau_1$ held values of $\hat{z}$'s even/odd projections $\hat{z}^e$ and $\hat{z}^o$, respectively. Moreover, in order to write $t$ into a counter, the token $z$ was spent by the thread. We will refer to $z$ as the spent token. Notice that the entries in the history contain tokens held by both self and other threads. Thus, a history captures the behavior of a thread subjectively, i.e., as a function of the interfering threads’ behavior.

Similarly to tokens, network histories are represented by the auxiliary variables $\chi_5$, tracking counter updates (even and odd) performed by the self thread, and dually $\chi_4$ for the other thread. We abbreviate $\chi_i = \chi_i^e \cup \chi_i^o$ for $i = 0, 1$.

Figure 4 illustrates a moment in network’s history and how it relates to the state of the counters. Only 0 has been written to $c_0$ so far (upon initialization), hence $\chi_0^o$ only contains an entry for $t = 0$ (we ignore at the moment the contents of the history entries). On the other hand, $\chi_1^o$ has entries for 1 and 3, because after initialization, one thread has increased $c_1$. The gray boxes indicate that 0 and 3 are the current values of $c_0$ and $c_1$, and thus also the latest entries in $\chi_0$ and $\chi_1$, respectively. In particular, these values will be returned by the next invocations of fetchAndAdd2. The dashed boxes correspond to the entries to be contributed by the currently running threads holding tokens $x^0$, $y^0$, $z^0$, $u^1$.

In addition to $\tau$ and $\chi$ which come in flavors private to self and other threads, we require the following shared variables:

---

5 In the counting network from Figure 3, the balancer itself might seem like a contention point. However, the flip operation is much less expensive than CAS as a synchronization mechanism. The performance can be further improved by constructing a diffracting tree of several balancers [24, §12.6], but we do not consider diffracting trees here.
(1) \(h_j\) for the joint heap of the network, and (2) \(b_j, n_2^j\) and \(n_1^j\) for the contents of \(bal, c_0\) and \(c_1\), respectively.

**Invariants of the counting network** The main invariant of the network relates the number of tokens, the size of histories and the value of the balancer:

\[
|\chi^0| + |\tau^0| = |\chi^1| + |\tau^1| + b_j
\]  
(16)

The equation formalizes the intuition that out-of-order anomalies of the counting network appear if one of the two counters is too far ahead of the other one. The invariant (16) provides a bound on such a situation. One counter can get ahead temporarily, but then there must be a number of threads waiting to spend their tokens on the other counter. Thus, the other counter will eventually catch up.

The approaches such as quiescent and quantitative quiescent consistency describe this situation by referring to the number of unmatched call events in an event history \([10, 28]\). In contrast, we formalize this property via auxiliary state: the sets of tokens \(i\) recorded in the entry for the number \(t\) determine the environment’s capability to add new history entries, and thus “run ahead” or “catch up” after \(t\) has been returned. The other invariants of the counting network are as follows:

(i) \(h_j = bal \mapsto b_j \cup c_0 \mapsto n_2^j \cup c_1 \mapsto n_1^j\)

(ii) The histories contain disjoint-\(i\) stamps.

(iii) The history \(\chi^0\) contains all even (resp. odd) values in \([0, n_2^0]\) (resp. \([1, n_1^0]\)). This ensures that \(n_1^0\) and \(n_2^0\) are the last time-stamps in \(\chi^0\) and \(\chi^1\), respectively.

(iv) \(\tau^0, \tau^1\) and spent tokens \((\chi_S \cup \chi_O)\) contain mutually disjoint tokens, where \(\text{spent}(t \mapsto (i, z) \cup \chi') = \{z\} \cup \text{spent} \chi'\), and spent \(\emptyset = \emptyset\). In other words, a spent token never appears among the “alive” ones (i.e., in \(\tau^0 \cup \tau^1\)).

(v) \(t \mapsto (i, z) \subseteq \chi_S \cup \chi_O \implies z \in i\).

(vi) For any \(t, i, z, z': \)

\[
\begin{align*}
& \cdot t \mapsto (i, z) \subseteq \chi_O \implies t + 2 |i \cap \tau^0| < n_1^j + 2 |i \cap \tau^1| + 2, \\
& \cdot t \mapsto (i, z) \subseteq \chi_O \implies t + 2 |i \cap \tau^1| < n_2^j + 2 |i \cap \tau^0| + 2.
\end{align*}
\]

The invariant (vi) provides quantitative information about the network history by relating the actual \((n_2^j, n_1^j)\) and the past \(t\) counter values, via the current amount of interference \((\tau)\) and the snapshot interference \((i)\). To explain (vi), we resort to the intuition provided by the following equality, which, however, being not quite valid, cannot be used as an invariant, as we shall see. Focusing on the first clause in (vi), if \(t \mapsto (i, z) \subseteq \chi_O\), then, intuitively:

\[
t + 2 |i^0 \setminus \tau^0| + 2 |i \cap \tau^0| = n_2^j + 2 |i \cap \tau^1| + (2b_j - 1)
\]

The equality says the following. When \(t\) is snapshot from \(c_0\) and placed into the history \(\chi^0\), the set of outstanding even tokens was \(i^0\). By the present time, \(c_0\) has been increased \(|i^0 \setminus \tau^0|\) times, each time by 2, thus \(n_2^0 = t + 2 |i^0 \setminus \tau^0|\). What is left to add to \(c_0\) to reach the period of quiescence, when no threads interfere with us, is \(2 |i \cap \tau^0|\). Similar reasoning applies to \(c_1\). It is easy to see at the period of quiescence, \(c_0\) and \(c_1\) differ by \(2b_j - 1\); that is, the counter pointed to by \(bal\) is behind by 1. However, the equality is invalid, as \(b_j\) can be read off only in the present, whereas the “intuitive” reasoning behind the equality requires a value of \(b_j\) from a quiescent period in the future. Hence, in order to get a valid property, we bound \(2b_j - 1\) by 2. For simplicity, we even further weaken the bounds by dropping \(|i^0 \setminus \tau^0|\) to obtain (vi); as it will turn out, even such a simpler bound will suffice for proving R1–R3.

**Allowed changes in the counting network** The state of the counting network (auxiliary and real) can be changed in two possible ways by concurrent threads. These changes formalize the way the atomic operations \(\text{flip}\) and \(\text{fetchAndAdd2}\) from Figure 3 (b) work with auxiliary state. Flipping alters the bit value \(b_j\) of \(bal\) to the complementary one, \(\neg b_j\). It also generates a token \(z\) (of parity \(b_j\)) and stores it into \(\tau_5\). The token is fresh, i.e., distinct from all alive and spent tokens in \(\tau_5 \cup \tau_0 \cup \text{spent} (\chi_S \cup \chi_O)\). Incrementation spends a token \(z\) from \(\tau_5\), and depending on its \(i\), it atomically increases the value \(n_j^i\) of \(c_i\) by two, while simultaneously removing \(z\) from \(\tau_5\) (thus, the precondition is that \(z \in \tau_5\)). It also adds the entry \((n_2^j + 2) \mapsto (\tau^0 \cup \tau^1, z')\) to \(\chi_S\), thus snapshoting the values of \(\tau^0\) and \(\tau^1\). It is easy to check that both these allowed changes preserve the state-space invariants (16), (i)–(vi), and that their effect on real state (with auxiliary state erased) are those of \(\text{flip}\) and \(\text{fetchAndAdd2}\).

**6.2 Step 2: a Hoare Spec for getAndInc**

Figure 5 provides a Hoare-style spec for \(\text{getAndInc}\) and verified in our proof scripts. We use the logical variable \(i\) and its variants to range over token sets, and \(\eta\) to range over histories.

The precondition starts with an empty token set \((\tau_5 = \emptyset)\), and hence by framing, any set of tokens. The initial self-history \(\chi_5\) is set to an arbitrary \(\eta_5\). The precondition records the other components of the initial state as follows. First, \(\eta_0\) names (a subset of) \(\chi_O\), to make it stable under interference, as in Section 2. Next, we use \(i_0\) to name the (subset of) initially live tokens \(\tau_0\). However, as \(\tau_0\) may shrink due to other threads spending tokens, simply writing \(i_0 \subseteq \tau_0\) is unstable. Instead, we write \(i_0 \subseteq \tau_0 \cup \text{spent} \chi_O \cup \text{spent} \eta_5\).
spent \( \eta_0 \) to account for the tokens spent by other threads as well. The set \( \tau_0 \cup (\text{spent } \chi_0 \setminus \text{spent } \eta_0) \) only grows under interference, as new live tokens are generated, or old live tokens are spent, making the inclusion of \( \iota_0 \) stable. Indeed, one cannot take any arbitrary \( \eta_0 \) and \( \iota_0 \) to name the other components of the initial state. Therefore, we constrain these two variables by the invariant \( I \), that relates them to the self-components of the actual state and to each other according to the invariants (ii)-(vi).\(^7\) This is natural, since, as we will see in Section 7, all clients instantiate \( \eta_0 \) and \( \iota_0 \) with the other-components of the actual pre-state, respecting (ii)-(vi).

The postcondition asserts that the final token set \( \tau_5 \) is also empty (i.e., the token that \( \text{getAndInc} \) generates by \( \text{f1isp} \), is spent by the end). The history \( \chi_5 \) is defined by an entry \( (\text{res} + 2) \mapsto (i, z) \), corresponding to writing the value of the result (plus two) into one of the network’s counters, snapshoting the tokens of that moment into \( \iota \), and spending the token \( z \) on the write. \( \eta_0 \) is a subset of the new value of \( \chi_0 \), and \( \iota_0 \) is a subset of the new value of \( \tau_0 \cup (\text{spent } \chi_0 \setminus \text{spent } \eta_0) \), by the already discussed stability.

The next inequality describes where the entry for \( \text{res} + 2 \) is placed wrt. the pre-state history \( \eta = \eta_0 \cup \eta_0. \eta \) may have gaps arising due to out-of-order behavior of the network, and \( \text{res} + 2 \) may fill one such gap. However, there is a bound on how far \( \text{res} \) (and hence \( \text{res} + 2 \)) may be from the tail of \( \eta \). We express it as a function of \( \iota_0 \) and \( i \), derived from the bounds in (vi), taking \( \text{res} + 2 \) for \( t \) and over-approximating the instant value \( n^i_j \) of the incremented counter via \( \eta_0 \cup \eta_0 \). The inequality weakens the invariant (vi), making it hold for even and odd entries by moving \( 2 \lfloor |i \cap \iota| \rfloor \) (for \( i = 0, 1 \)) to the right side of \( < \) and joining them, since \( \iota_0 \cap \iota_0 = \emptyset \).

Finally, the predicate \( \text{ResPast} \) provides more bounds that we will need in the proofs of the client code’s properties.

\[
\text{ResPast} \eta \text{ res } i \text{ z \equiv } i \subseteq \tau_0 \cup (\text{spent } \chi_0 \cup \{z\}),
\forall t \ i, t \mapsto (i, \neg), \neg \subseteq \eta \Rightarrow z \notin \iota, t < \text{res} + 2 + 2 \lfloor |i \cap \iota| \rfloor \tag{18}
\]

When instantiated with \( \eta = \eta_0 \cup \eta_0 \), \( \text{ResPast} \) says the following. The token set \( i \mapsto 2 \) snapshot when \( \text{res} + 2 \) was committed to history, is a subset of all the tokens in post-state, including the live ones (\( \tau_0 \)), and spent ones (\( \text{spent } \chi_0 \cup \{z\} \)). Moreover, if \( t \) is an entry in \( \eta, \) with contents \( (i, \neg), \) then: (1) \( z \notin \iota, \) because \( z \) is a token generated when \( \text{getAndInc} \) executed \( \text{f1isp} \). Hence, \( z \) is fresh wrt. any token-set from the pre-state history \( \eta; \) and (2) \( \iota \) and \( t \) satisfy the same bounds wrt. \( \text{res} + 2 \), as those described for the last history entry and \( \iota_0 \).

**How will the spec (17) be used?** The clause \( \chi_S = \eta_0 \cup (\text{res} + 2) \mapsto - \) of (17), in conjunction with the invariant (ii), ensures that any two calls to \( \text{getAndInc} \), sequential or concurrent, yield different history entries, and hence different results. This establishes \( \textbf{R1} \), which we will not discuss further.

The inequality on last \( (\eta_0 \cup \eta_0) \) will provide for \( \textbf{R2} \) in client reasoning. To see how to consider the particular case

\[
\{ \tau_5 = \emptyset, \chi_S = \eta_0 \cup \eta_0 \subseteq \chi_0, \iota_0 \cup \iota_0, \}
\]

\[
\begin{aligned}
\text{getAndInc()} & \models e_i \tag{19}
\end{aligned}
\]

when \( \iota_0 \) is empty, i.e., the pre-state is quiescent. In that case, the intersection with \( \iota \) is empty, and we can infer that \( \text{res} + 2 \) is larger than either counter’s value in the pre-state. As we shall see in Section 7, this captures the essence of QC.

Finally, the predicate \( \text{ResPast} \) (18) establishes a bound for the “out-of-order” discrepancy between the result \( \text{res} \) and any value \( t \) committed to the history in the past, via \( 2 \lfloor |i \cap \iota| \rfloor \). We will further bound this value using the size of \( i \), and the inclusion \( \iota \subseteq \tau_0 \cup (\text{spent } \chi_0 \setminus \text{spent } \eta_0) \) from (18). These bounds will ultimately enable us to derive the requirement \( \textbf{R3} \).

### 7. Verifying Counting Network’s Clients

Following \textit{Step 3} of our verification method, we now illustrate requirements \( \textbf{R2} \) and \( \textbf{R3} \) from the previous section via two different clients which execute two sequential calls to \( \text{getAndInc} \). Both clients are higher-order, i.e., they are parametrized by subprograms, which can be “plugged in”. The first client will exhibit a quiescence between the two calls, and we will prove that the call results appear in order, as required by \( \textbf{R2} \). The second client will experience interference of a program with a \( N \) concurrent calls to \( \text{getAndInc} \), and we will derive a bound on the results in terms of \( N \), as required by \( \textbf{R3} \).

Both our examples will rely on the general mechanism of hiding, presented in Section 4, as a way to logically restrict the interference on a concurrent object, in this case, a counting network, in a lexically-scoped way. To “initialize” the counting network data structure, we provide the starting values for the shared heap (\( h_0 \)) and for the history (\( \eta_0 \)), assuming that the initial set of tokens is empty:

\[
h_0 \equiv b 0 d 0 \mapsto 0 \cup c_0 \mapsto 0 \cup c_1 \mapsto 1
\]

\[
\eta_0 \equiv \{0 \mapsto 0\}, 1 \mapsto \{1\}, 1\tag{19}
\]

That is, \( \eta_0 \) provides the “default” history for the initial values 0 and 1 of \( c_0 \) and \( c_1 \), with the corresponding tokens represented by numbers 0 and 1. As always with hiding, the post-condition of the hidden program will imply that \( \tau_0 \) and \( \chi_0 \) are both empty, as there is no interference at the end.
7.1 Exercising Quiescent Consistency

Our first client is the following program $e_{qc}$:

$$
\begin{align*}
\text{(res}_1, \text{res}_2) & \leftarrow (\text{getAndInc()} \cup e_1) ; \\
\text{(res}_2, \text{res}_2) & \leftarrow (\text{getAndInc()} \cup e_2) ;
\end{align*}
$$

(20)

Each of the calls to getAndInc interferes with either $e_1$ or $e_2$, but in the absence of external interference, the quiescent state is reached between the lines 1 and 2. Hence, after executing $e_{qc}$, it should be $\text{res}_1 < \text{res}_2$, following R2.

The programs $e_1$ and $e_2$ can invoke getAndInc and modify the counters concurrently with the two calls of $e_{qc}$, which we capture by giving both the following generic spec:

$$
\begin{align*}
\{ \chi \in \emptyset, \tau_1 = 0, \text{spent } \chi_0 \} \\
\{ \exists \eta, \chi \in \eta, \tau_2 = 0, \text{spent } \chi_0 \}
\end{align*}
$$

(21)

The postcondition allows for a number of increments via calls to getAndInc, which is reflected in the addition $\eta_i$ to $\chi S$. However, all such calls are required to be finished by the end of $e_i$ ($\tau_1 = \emptyset$). As customary, by now, we use the logical variable $i$ to name the initial set of other tokens.

Figure 6 provides a spec for each of the parallel compositions in the program (20), proved via the corresponding FCSL inference rule for parallel composition (11). The spec is very similar to (17), with the differences highlighted via gray boxes: (a) the self-history $\chi S$ is increased by $e_i$’s contribution $\eta_i$ in addition to the entry, introduced by getAndInc, (b) the result of the parallel composition is a pair, but we only constrain its first component $\text{res}_1$, resulting from the left subprogram. We also drop the last conjunct with ResPast from Figure 17, which we won’t require for this example.

Next, we use the spec from Figure 6 to specify and verify the program $e_{qc}$, so far assuming external interference.

$$
\begin{align*}
\{ \text{Fig. 6’s precondition with } \eta_0 := \eta_0, \eta_0 := \chi_0, \text{ and } \iota_0 := \tau_0 \} & \quad // P \\
\{ (\text{res}_1, \text{res}_2) \leftarrow (\text{getAndInc()} \cup e_1) ; \\
\{ \exists \eta_1, \tau_2 = 0, \chi \in \eta_0' \} & \quad \text{ where } \eta_0' = \eta_0 \cup \eta_i \cup (\text{res}_1 + 2) \rightarrow \rightarrow, \eta_0 := \chi \text{ and } \iota_0 := \tau_0
\end{align*}
$$

(22)

The inequality in the postcondition $Q$ gives the boundary on the out-of-order position of $\text{res}_2$ with respect to the last value in the history captured in between the two parallel compositions. The boundary is given via the size of intersection of the two sets of tokens: snapshot $(i)$ and “alive” between the calls $(\iota_0)$. Now, to ensure the absence of external interference, we consider the program (hide $(i)$ and “alive” between the calls $(\iota_0)$). By the general property of hiding (Section 4), we know that at the final state there is no interference, hence $\tau_0 = \emptyset$ and $\chi_0 = \emptyset$ in $Q$. Therefore, from the set inclusion on $\iota_0$ in $Q$ (the grayed part), we deduce that $\iota_0 = \emptyset$. As a consequence, the intersection $i \cap \iota_0 = \emptyset$, so from the inequality we obtain

$$
\text{last (} \eta_0' \cup \eta_0 \text{)} < \text{res}_2 + 2
$$

(23)

But $\eta_0'$ is defined as $(\text{res}_1 + 2) \rightarrow \rightarrow \ldots$, hence, $\text{res}_1 + 2 \in \text{dom } \eta_0'$, and thus $\text{res}_1 + 2 \leq \text{last } \eta_0'$. Even more:

$$
\text{res}_1 + 2 \leq \text{last } (\eta_0' \cup \eta_0).
$$

(24)

From (22) and (23) follows the result R2: $\text{res}_1 < \text{res}_2$.

7.2 Proving Quantitative Bounds

We next show how the spec (17) also obtains quantitative bounds on the out-of-order anomalies in terms of a number of running threads in the following program $e_{qqc}$:

$$
\begin{align*}
\text{res}_1 & \leftarrow \text{getAndInc()} ; \\
\text{res}_2 & \leftarrow \text{getAndInc()} ; \\
\text{return } (\text{res}_1, \text{res}_2) & \quad // : = : \text{res}
\end{align*}
$$

(24)

The $e$’s spec says that the number of calls to getAndInc in $e$ (i.e., the size of interference $e$ exhibits) is some fixed $N$:

$$
\{ \tau_1 = \emptyset, \chi = \eta \} \in \{ (\exists \eta, \tau_1 = \emptyset, \chi = \eta, |\eta| = N) \}
$$

(25)

Our goal is to prove that in the absence of external interference for $e_{qqc}$, $\text{res}_1 < \text{res}_2 + N$ (requirement R3).

We first verify the sequential composition of the two calls in (24); the proof outline is in Figure 7. As previously, we
start by instantiating the logical variables $\eta_5, \eta_0$ and $\iota_0$
from spec (17) with $\eta_5, \chi_0$ and $\tau_0$, respectively. In the
assertion, resulting by of the first getAndInc, we keep only
the clauses involving $\tau_5$ and $\chi_5$, dropping the rest. To verify
the second getAndInc call, we instantiate $\eta_5, \eta_0$ and $\iota_0$
with $\eta'_5 = \eta_5 \cup (\text{res}_1 + 2) \rightarrow (\iota, -), \text{current } \chi_0$ and $\tau_0$.
In the postcondition of the second call to getAndInc, we
focus on the ResPast $\eta'_5 \cup \eta_0$ res$_2$ $\iota$ $z$ clause, where $\iota$
is the set of tokens snapshot when contributing res$_2 + 2$.
Unfolding the definition of ResPast from (18), we obtain
$\iota \subseteq \tau_0 \cup \text{spent } \chi_0 \cup \{z\}$. Also, using (res$_1 + 2) \rightarrow (\iota, -)$ in
the implication that the unfolding obtains, we get $z \notin \iota$ and
\[
\text{res}_1 + 2 < \text{res}_2 + 2 + 2 |\iota \cap z| \tag{26}
\]
Now we use the following trivial fact to simplify.

**Lemma 7.1.** If $z \in \iota$ and $z \notin \iota$, then $|\iota \cap z| \leq |\iota| - 1$.

Using the invariant (v), Lemma 7.1 derives $|\iota \cap z| \leq |\iota| - 1$
after which, the inclusion $\iota \subseteq \tau_0 \cup \text{spent } \chi_0 \cup \{z\}$ leads to
\[
|\iota \cap z| \leq |\tau_0 \cup \text{spent } \chi_0| \tag{27}
\]
Combined with (26), this gives us $\text{res}_1 < \text{res}_2 + 2 |\tau_0 \cup
\text{spent } \chi_0|$, as shown in Figure 7’s postcondition. In words,
its asserts that the discrepancy between res1 and res2 is
bounded by the size of the tokens, which are either held by
the interfering threads at the end or are spent.

Figure 8 shows the proof outline for $e_{q\text{eq}}$ via the spec
from Figure 7. By the parallel composition rule (11), the
precondition splits into two subjective views, where we send
the initial history $\eta_0$ to the left thread, and the empty history
to the right thread. The proof from Figure 7 then applies to
the left thread, and the spec (25) applies to the right one.
Final $\chi_0$ of the left thread is the union of $\chi_0$ from
the joined thread with $\eta_1$ since the environment of the left thread
includes the right thread and of the join. Rewriting by this
property in the postcondition of the left thread gives us the
post of the joint thread: $\text{res}_1 < \text{res}_2 + 2 |\tau_0 \cup \text{spent } (\chi_0 \cup \eta)|$, which
we can next simplify into
\[
\text{res}_1 < \text{res}_2 + 2 |\tau_0 \cup \text{spent } \chi_0| + 2 N
\]
because spent distributes over $\cup$, and $|\text{spent } \eta| = |\eta| = N$.
Finally, we restrict the external interference by considering
(hide $e_{q\text{eq}}$). From the properties of hiding, we deduce that
$\tau_0$ and $\chi_0$ in $Q$ are empty, hence we can simplify into
$\text{res}_1 < \text{res}_2 + 2 N$, which is the desired result R3.

### 8. Discussion

**Reasoning about quantitatively quiescent queues** The idea of
interference-capturing histories, which allowed us to
characterize the out-of-order discrepancies between the results
of a counting network in Section 6, can be applied to
specify other balancer-based data structures, for instance,
queues [10]. The picture on the right illustrates schematically
a non-linearizable queue [10], which is built out of
two atomic queues, $q_0$ and $q_1$, and two balancers, $bal_e$ and
$bal_d$. The balancers are used to distribute the workload between
the two queues by directing the threads willing to enqueue and
dequeue elements, correspondingly.

One can think of representing the pending enqueue/dequeue
requests to each of the two queues, $q_0$ and $q_1$, by two
separate sets of tokens, as shown in Figure 9. The white and
grey boxes correspond to the present and dequeued nodes
in the queue in the order they were added/removed. Therefore,
white elements are those that are currently in the queue.
Similarly, the white-colored tokens are for enqueueing elements,
so the elements $x, y, z$ and $k$ are going to be added to
the corresponding atomic queues. Gray-colored tokens
are for dequeueing capabilities for one or another atomic
queue, distributed among the threads, so the elements $c$ and
d are going to be removed next, on the expense of the corresponding
dequeue tokens. The timestamps of the entries in
the queue history, omitted from the figure, are created, as elements
are being enqueued to $q_0$ and $q_1$, and the parity of a
timestamp corresponds to the atomic queue being changed.
Thus, there might be “gaps” in the combined queue history,
reminiscent to the gaps in the counter history from Section 6
(e.g., the gap caused by the absence of an “even” element in
the combined history right between $d$ and $e$ in Figure 9, as
indicated by “?"), which will cause out-of-order anomalies
during concurrent executions. By accounting for the number
of past and present tokens for enqueuing and dequeueing,
one should be able to capture the effects of interference and
express a quantitative boundary on the discrepancy between
the results, coming out of order.

**How much information to expose in a spec?** The specs
we have proved for concurrent objects in Sections 2 and 6
allow for efficient compositional reasoning about clients, but
they are also non-trivial to formulate and verify. Luckily, the
FCSL way of reasoning provides a flexible solution for the
compositionality-versus-complexity conundrum [31, §7].
Facts
–
1085
Stab
446
Main
–
27
440
115
688
Inv
374
182
321
2058
162
57s
–
Total
180
12m 23s
258
3m 11s
1879
Build
tokens for enqueueing/dequeueing

e
\( k \)

\( x \)
of resource invariants (and Ssreflect 1.6 [17]. As the table indicates, a large fraction of the development is essentially "in the eye of the beholder", as is typical in programming, when designing libraries and abstract data structures, and the logic-based approach we advocate provides precisely this flexibility in choosing desired specs.

Hoare-style specifications of concurrent objects. Hoare-style program logics were used with great success to verify a number of concurrent data structures and algorithms, which are much more natural to specify in terms of observable state modifications, rather than via call/return histories. The examples of such objects and programs include barriers [12, 26], concurrent indices [8], flat combiner [41, 47], event handlers [44], shared graph manipulations [37, 40], as well as their multiple client programs. The observation about a possibility of using program logics as a correctness criterion, alternative to linearity, has been made in some of the prior works [8, 27, 45]. Their criticism of linearity addressed its incapability to capture the state-based

Figure 9. Tokens and histories of a balancer-based queue.

In FCSL, it is up to the library implementor to decide, how much of implementation-specific insight should go into a spec. The amount of such details is determined based on the foreseen client scenarios. For instance, we have hidden the balancer in the spec (17), but decided keep the exact constant 2, which would allow us to derive more precise quantitative bounds later (see Section 7.2). We could have hidden this component too (as well as some parts of the invariant \( I \)), by employing in the specification sigma-types (a dependently-typed analogue of existential types), provided by FCSL as it’s embedded into Coq [7]. We could have also omitted tokens from the spec, therefore, reducing the set of derivable client-specific properties to Section 6’s \( R1 \) only.

9. Mechanization and Evaluation

We have mechanized the specs and the proofs of all the examples from this paper [1], taking advantage of the fact that FCSL has been recently implemented as a tool for concurrency verification [40] on top of the Coq proof assistant [7].

Table 1 summarizes the statistics with respect to our mechanization in terms of lines of code and compilation times. The examples were proof-checked on a 3.1 GHz Intel Core i7 OS X machine with 16 Gb RAM, using Coq 8.5pl2 and Ssreflect 1.6 [17]. As the table indicates, a large fraction of the implementation is dedicated to proofs of preservation of resource invariants (Inv), i.e., checking that the actual implementations do not “go wrong”. In our experience, these parts of the development are the most tricky, as they require library-specific insights to define and reason about auxiliary histories. Since FCSL is a general-purpose verification framework, which does not target any specific class of programs or properties, we had to prove problem-specific facts, e.g., lemmas about histories of a particular kind (Facts), and to establish the specs of interest stable (Stab). Once this infrastructure has been developed, the proofs of main procedures turned out to be relatively small (Main).

Fortunately, trickiness in libraries is invisible to clients, as FCSL proofs are compositional. Indeed, because specs are encoded as Coq types [40], the substitution principle automatically applies to programs and proofs. At the moment, our goal was not to optimize the proof sizes, but to demonstrate that FCSL as a tool is suitable off-the-shelf for machine-checked verification of properties in the spirit of novel correctness conditions [4, 22, 28]. Therefore, we didn’t invest into building advanced tactics [34] for specific classes of programs [52] or properties [6, 13, 51].

Table 1. Mechanization of the examples: lines of code for program-specific facts (Facts), resource invariants and transitions (Inv), stability proofs for desired specs (Stab), spec and proof sizes for main functions (Main), total LOC count (Total), and build times (Build). The “–” entries indicate the components that were not needed for the example.

<table>
<thead>
<tr>
<th>Program</th>
<th>Facts</th>
<th>Inv</th>
<th>Stab</th>
<th>Main</th>
<th>Total</th>
<th>Build</th>
</tr>
</thead>
<tbody>
<tr>
<td>Exchanger</td>
<td>365</td>
<td>1085</td>
<td>446</td>
<td>162</td>
<td>2058</td>
<td>4m 46s</td>
</tr>
<tr>
<td>Exch. Client</td>
<td>258</td>
<td>182</td>
<td>440</td>
<td>1879</td>
<td>12m 23s</td>
<td></td>
</tr>
<tr>
<td>Count. Netw.</td>
<td>379</td>
<td>785</td>
<td>688</td>
<td>27</td>
<td>27</td>
<td>3m 11s</td>
</tr>
<tr>
<td>CN Client 1</td>
<td>141</td>
<td>-</td>
<td>180</td>
<td>321</td>
<td>3m</td>
<td>3m 9s</td>
</tr>
<tr>
<td>CN Client 2</td>
<td>115</td>
<td>-</td>
<td>259</td>
<td>374</td>
<td>3m</td>
<td>-</td>
</tr>
</tbody>
</table>

10. Related Work

Linearizability and history-based criteria. The need for correctness criteria alternative to linearizability [25], which is more relaxed yet compositional, was recognized in the work on counting networks [4]. The suggested notion of quiescent consistency [43] required the operations separated by a quiescent state to take effect in their logical order. A more refined correctness condition, quasi-linearizability, implementing a relaxed version of linearity with an upper bound on nondeterminism, was proposed by Afek et al. [2], allowing them to obtain the quantitative boundaries similar to what we proved in Section 7.2. The idea of relaxed linearizability was later used in the work on quantitative relaxation (QR) [23] for designing scalable concurrent data structures by changing the specification set of sequential histories. Most recently, quantitative quiescent consistency has been proposed as another criterion incorporating the possibility to reason about effects of bounded thread interference [28]. It is worth noticing that some of these correctness criteria are incomparable (e.g., QC and QR [23], QL and QQC [28]) hence, for a particular concurrent object, choosing one or another criterion should be justified by the needs of the object’s client. Therefore, a suitable correctness condition is essentially “in the eye of the beholder”, as is typical in programming, when designing libraries and abstract data structures, and the logic-based approach we advocate provides precisely this flexibility in choosing desired specs.
properties, such as dynamic memory ownership [27]—something that linearizability indeed cannot tackle, unless it’s extended [19]. However, we are not aware of any prior attempts to capture CAL, QC and QQC-like properties of concurrent executions by means of one and the same program logic and employ them in client-side reasoning.

Several logics for proving linearizability or, equivalently, observational refinement [15, 49], have been proposed recently [33, 47, 50], all employing variations of the idea of using specifications as resources, and identifying (possibly, non-fixed or non-local) linearization points, at which such specification should be “run”. In these logics, after establishing linearizability of an operation, one must still devise its Hoare-style spec, such that the spec is useful for the clients.

Similarly to the way linearizability allows one to replace a concurrent operation by an atomic one, several logics have implemented the notion of logical atomicity, allowing the clients of a data structure to implement application-specific synchronization on top of the data structure operations. Logical atomicity can be implemented either by parametrizing specs with client-specific auxiliary code [27, 30, 44, 45] or by engineering dedicated rules relying on the simulation between the actual implementation and the “atomic” one [9].

Instead of trying to extend the existing approaches for logical atomicity to non-linearizable objects (for which the notion of atomicity is not intuitive), we relied on a general mechanism of auxiliary state, provided by FCSL [35]. Specifically, we adopted the idea of histories as auxiliary state [41], which, however, was previously explored in the context of FCSL only for specifying linearizable structures. We introduced enhanced notation for referring directly to histories (e.g., \( \chi_S \), \( \chi_O \)), although FCSL’s initial logical infrastructure and inference rules remained unchanged.

In this work, we do not argue that FCSL is the only logic capable of encoding custom correctness conditions and their combinations, though, we are not aware of any other work exploring a similar possibility. However, we believe that FCSL’s explicit other subjective state component provides the most straightforward way to do so. The logics like CAP [11] and TaDA [9], from our experience and personal communication with their authors, may be capable of implementing our approach at the expense of engineering a complicated structure of capabilities to encode histories and “snapshot” interference of an environment. Other logics incorporating the generic PCM structure [29, 30, 37, 48] might be able to implement our approach, although none of these logics provide an FCSL-style rule for hiding (12) as a uniform mechanism to express explicit quiescence.

Concurrently with this work, Hemed et al. developed a (not yet mechanized) verification technique for CAL [22], which they applied to the exchanger and the elimination stack. Similarly to our proposal, they specify CAL-objects via Hoare logic, but using one global auxiliary history, rather than subjective auxiliary state. This tailors their system specifically to CAL (without a possibility to incorporate reasoning about other, non CAL-linearizable, concurrent structures), and to programs with a fixed number of threads. In contrast, FCSL supports dynamic thread creation, and is capable of uniformly expressing and mechanically verifying several different criteria, with CAL merely a special case, obtained by a special choice of PCM. Moreover, in FCSL the criteria combine, as illustrated in Section 5, where we combined quiescence with CAL via hiding. Hiding is crucial for verifying clients with explicit concurrency, but is currently unsupported by Hemed et al.’s method.

11. Conclusion and Future Work

We have presented a number of formalization techniques, enabling specification and verification of non-linearizable concurrent objects and their clients in Hoare-style program logics. All the explored reasoning patterns involve the idea of formulating execution histories as auxiliary state, capturing the expected concurrent behavior. We have discovered that quantitative logic-based reasoning about concurrent behaviors can be done by storing relevant information about interference directly into the entries of a logical history.

We believe that our results help to bring the Hoare-style reasoning into the area of non-linearizable concurrent objects and open a number of exciting opportunities for the field of mechanized logic-based concurrency verification.

For instance, in this paper we have deliberately chosen to focus on simple client programs to showcase the specs we gave to concurrent libraries. However, any larger program incorporating these examples can be verified compositionally in FCSL, out of these clients’ specs, via the substitution principles of FCSL [35, 40], without the need to deal with concepts such as histories and tokens that are specific to particular libraries. We believe that the reasoning patterns we have described will be useful for mechanical verification of larger weakly-synchronized approximate parallel computations [38], exploiting the QC and QQC-like behavior.

Furthermore, by ascribing interference-sensitive quantitative specs in the spirit of (17) to relaxed concurrent libraries [23], one can assess the applicability of a library implementation for its clients: the clients should tolerate the anomalies caused by interference, as long as they can logically infer the desired safety assertions from a library spec, which is fine-tuned for particular usage scenarios.

Acknowledgements We thank the anonymous reviewers from OOPSLA’16 PC and AEC for their feedback. We are grateful to Yannis Smaragdakis for his efforts as OOPSLA PC chair and to Sophia Drossopoulou for her dedication to bring out the best of the paper. This research was partially supported by the Spanish MINECO project RISCO (TIN2015-71819-P) and the US National Science Foundation (NSF). Any opinion, findings, and conclusions or recommendations expressed in the material are those of the authors and do not necessarily reflect the views of NSF.
A. Exchanger Invariants and Proof Outline

In this section, we formally define the exchanger’s state invariants, and present the proof outline for its spec (10).

Additional exchanger invariants The states in the exchanger-state space must satisfy other invariants in addition to (9). These properties arise from our description of how the exchanger behaves on decorated state. We abbreviate with $p \mapsto (x; y)$ the heap $p \mapsto x \cup p + 1 \mapsto y$.

(i) $h_1$ contains a pointer $g$ and a number of offers $p \mapsto (v; x)$, and $g$ points to either null or to some offer in $h_j$.

(ii) $\chi_S, \chi_O$ and $|m_1|$ contain only disjoint-time stamps. Similarly, $\pi_S$ is disjoint from $\pi_O$.

(iii) All offers in $m_1$ are matched and owned by some thread: $\exists, p \mapsto (t, v, w) \subseteq m_1 \Rightarrow p \in \pi_S \cup \pi_O, p \mapsto (v; M w) \subseteq h_j$.

(iv) There is at most one unmatched offer; it is the one linked from $g$. It is owned by someone: $p \mapsto (v; U) \subseteq h_j \Rightarrow p \in \pi_S \cup \pi_O, g \mapsto p \subseteq h_j$.

(v) Retired offers aren’t owned: $p \mapsto (v; R) \subseteq h_1 \Rightarrow p \notin \pi_S \cup \pi_O$.

(vi) The outstanding offers are included in the joint heap, i.e., if $p \in \pi_S \cup \pi_O$ then $p \in \text{dom } h_j$.

(vii) The combined history $\chi_S \cup \chi_O \cup |m_1|$ is gapless: if it contains a time-stamp $t$, it also contains all the smaller time-stamps (sans 0).

Explaining the proof outline Figure 10 presents the proof outline for the spec (10). We start with the precondition, and after allocation in line 2, $h_0$ stores the offer $p$ in line 3.

If $\mathsf{CAS}$ at line 4 succeeds, the program “installs” the offer; that is, the state (real and auxiliary) is changed simultaneously with the modification of $g$. In particular, $p$ is added to $\pi_S$, and the offer $p$ changes ownership, to move from $h_0$ to $h_j$.

Since $b$ will be bound to null, this leads us to the assertion in line 7. We explain in Section 4 how these kinds of changes to the auxiliary state, which are supposed to occur simultaneously with some atomic operation (in this case, $\mathsf{CAS}$), are specified and verified in FCSL. The assertion in line 7 further states bounded $p \mapsto \eta$. We do not formally define bounded here (it is in the proof scripts, accompanying the paper), but it says that $p$ has been moved to $h_j$, i.e., $p \mapsto (v; -) \subseteq h_j$, and that any time-stamp $t$ at which another thread may match $p$, and thus place the entry $p \mapsto (t, v, -)$ into $m_j$, must satisfy last($\eta$) $< t$, $t$. Intuitively, this property is valid, and stable under interference, because entries in $m_j$ can be added only by generating fresh time-stamps w.t. the collective history $\chi_O \cup |m_j|$, and $\eta$ is a subset of it. If $\mathsf{CAS}$ in line 4 fails, then nothing changes, so we move to the spec in line 15.

At line 8, $\mathsf{CAS}$ succeeds if $x = U$, and fails if $x = M w$. Notice that $x$ cannot be $R$; since we own $p \in \pi_S$, no other thread could retire $p$. If $\mathsf{CAS}$ fails, then the offer has been matched with $w$. $\mathsf{CAS}$ simultaneously “collects” the offer as follows.

By invariant (iii), and bounded $p \mapsto \eta$, the auxiliary map $m_j$ contains an entry $p \mapsto (t, v, w)$, where last($\eta$) $< t$, $t$. The auxiliary state is changed to remove $p$ from $m_j$, and simultaneously place $t \mapsto (v, w)$ into $\chi_S$. If $\mathsf{CAS}$ succeeds, the offer was unmatched, and is “retired” by removing $p$ from $\pi_S$. Lines 12-13 branch on $x$, selecting either the assertion 10 or 11, so the postcondition follows.

After reading $\mathsf{cur}$ in line 18, by invariant (i), we know that $\mathsf{cur}$ either points to null, or to some offer $p \mapsto (w; -) \subseteq h_j$.

At line 24, the $\mathsf{CAS}$ succeeds if $x = U$ and fails otherwise. If $\mathsf{CAS}$ succeeded, then it “matches” the offer in $\mathsf{cur}$; that is, it writes $M w$ into the hole of $\mathsf{cur}$, and changes the auxiliary state as follows. It takes $t$ to be the smallest unused time-stamp in the history $\chi_S \cup \chi_O \cup |m_j|$. Thus last($\chi$) $< t$, and because $\chi$ has even size by invariant (9), $t$ must be odd, and hence $t < t = t + 1$. The $t \mapsto (v, w)$ is placed into $\chi_S$, giving us assertion 26. To preserve the invariant (iii), $\mathsf{CAS}$ simultaneously puts the entry $p \mapsto (t, w, v)$ into $m_j$, for future collection by the thread that introduced offer $\mathsf{cur}$. But, we do not need to reflect this in line 26. If the $\mathsf{CAS}$ fails, the history $\chi_S$ remains empty, as no matching is done. However, the hole $y$ associated with $\mathsf{cur}$ cannot be $U$, as then $\mathsf{CAS}$ would have succeeded. Therefore, it is sound in line 28 to “unlink” $\mathsf{cur}$ from $g$, as the unlinking will not violate the invariant (iv), which says that an unmatched offer must be pointed to by $g$. Finally, lines 30 and 33 select the assertion 26 or 27, and either way, directly imply the postcondition.
References


