Xr0

C But Safe

Blog

Theses

Tutorial

Try

A first example of static debugging in Xr0

As programmers we reason about our programs using very simple-minded models, and this is right. The computer is simple — it understands nothing — and we ourselves are subject to great limitations of mind. Perhaps the day will come when AI-based systems will pick up on human intuitions, but until then, the hallmark of good programming will be humility:

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.
– Dijkstra, The Humble Programmer.

Turing compared the process of reasoning about a subroutine’s correctness to that of summing a list of numbers. Each operation is simple — if you know arithmetic you know how to sum two digits — but if there are enough numbers in the list, errors will tend to creep in. In the same way, it is easy enough to reason about each line of code in a function like this one:

#include <stdlib.h>

int
func(int flag, int x, int y)
{
	int k; int *p;

	p = &k;
	if (flag) {
		*p = x;
	} else {
		*p = y;
	}
	return k;
}

The challenge in making entire functions and entire programs correct is there are so, so, so many opportunities to make a small mistake that throws off everything. Every line is simple, but the whole program is impossible to understand. E pluribus unum may be the politician’s motto, but the rule in programming is plures ex uno, complexity arising out of simplicity.¹ The interplay between these two realities when we reason about programs, local simplicity and composite inscrutability, is a fascinating paradox: we may well call it Turing’s paradox.

Understanding the paradox shows why static debugging (and Xr0, more broadly) is possible. Programmers do not need an ingenious system to help them analyse their code. We already understand it, i.e. line-by-line. What we need is the aid of a simple-minded accountant that can help us check our reasoning, to prevent errors from creeping in as the parts of the program multiply.

All Xr0 does, then, is attempt to model the locally-simple reasoning that programmers employ in their analysis of code. Below is what Xr0 considers as knowable on the line p = &k; in func above:

phase:	ACTUAL

text:
        int k;
        int * p;
-->     p = &(k);
        if (flag) {
                *(p) = x;
        } else {
                *(p) = y;
        }
        return k;

context:
        typedef int size_t

rconst:
        $0: int:{MIN:MAX}       (flag)
        $1: int:{MIN:MAX}       (x)
        $2: int:{MIN:MAX}       (y)

stack:
        0: {int := {0:<rconst:$0>}} (flag, π)
        1: {int := {0:<rconst:$1>}} (x, π)
        2: {int := {0:<rconst:$2>}} (y, π)
        3: {int := {0:<>}}          (k)
        4: {int * := {0:<>}}        (p)
        ---------------------------- func

The text section is quite intuitive: it shows us where we are in the program. Some things appear slightly differently from the source, because we do some normalising of representations during parsing. For example, the declarations have been placed on separate lines.

The context shows us the user-defined² types; this is where structures and union types would be. (Technically, size_t should be an unsigned integer, but we haven’t implemented unsigned yet.)

The next section, rconst, has what we call runtime constants. These are values which are fixed at runtime but statically unknowable. For example, at runtime the values of flag, x and y can each be any integer between INT_MIN and INT_MAX.³

Finally we get to the stack. The lowest line indicates that we are in the stack frame corresponding to func, which happens to be stack[0], the only stack frame. Above this line we have the variables in the frame, in order of appearance. The parameters are technically also local variables, so we begin with them: flag, x, and y all have the tag “π” for “parameter”, and their values are preceded by “rconst:” to indicate they’re runtime constants. Last of all we have k and p, which both have <> as their value, indicating indeterminate value, which is distinct from 0 or NULL.⁴

The effect of the assignment on the state is as follows:

phase:	ACTUAL

text:
        int k;
        int * p;
        p = &(k);
-->     if (flag) {
                *(p) = x;
        } else {
                *(p) = y;
        }
        return k;

context:
        typedef int size_t

rconst:
        $0: int:{MIN:MAX}       (flag)
        $1: int:{MIN:MAX}       (x)
        $2: int:{MIN:MAX}       (y)

stack:
        0: {int := {0:<rconst:$0>}}        (flag, π)
        1: {int := {0:<rconst:$1>}}        (x, π)
        2: {int := {0:<rconst:$2>}}        (y, π)
        3: {int := {0:<>}}                 (k)
        4: {int * := {0:<ptr:stack[0]:3>}} (p)
        ---------------------------- func

p has been assigned the value ptr:stack[0]:3, a pointer to the third spot in stack[0], which is k’s address.

At this point we are confronted with something of a difficulty. The controlling expression of the if-statement is flag, but, as we stated above, flag’s value is unknowable until runtime. Xr0 resolves the difficulty by employing simple-minded reasoning, similar to the way you would say, “well if flag is true, the first branch will execute; otherwise, the second branch will run”. Xr0 accordingly splits the state into two realities, into parallel universes: one in which $0, flag’s value at runtime, is logically true (nonzero) and one in which it’s false (zero).

It begins with the “true path”, incorporating the assumption that $0 is nonzero, which assumption guarantees execution of the first branch of the if-statement. Thus x’s value is assigned to k through the indirection *p:

phase:	ACTUAL

text:
        int k;
        int * p;
        p = &(k);
        if (flag) {
                *(p) = x;
        } else {
                *(p) = y;
        }
-->     return k;

context:
        typedef int size_t

rconst:
        $0: int:{MIN:MAX}       (flag)
        $1: int:{MIN:MAX}       (x)
        $2: int:{MIN:MAX}       (y)

assume:
        ⊢ $0

stack:
        0: {int := {0:<rconst:$0>}}        (flag, π)
        1: {int := {0:<rconst:$1>}}        (x, π)
        2: {int := {0:<rconst:$2>}}        (y, π)
        3: {int := {0:<rconst:$1>}}        (k)
        4: {int * := {0:<ptr:stack[0]:3>}} (p)
        ---------------------------- func | $0

With the execution of the return statement we’re done with this branch:

phase:	ACTUAL

text:
        <end of frame>

context:
        typedef int size_t

return:	<rconst:$1>

rconst:
        $0: int:{MIN:MAX}       (flag)
        $1: int:{MIN:MAX}       (x)
        $2: int:{MIN:MAX}       (y)

assume:
        ⊢ $0

stack:
        0: {int := {0:<rconst:$0>}}        (flag, π)
        1: {int := {0:<rconst:$1>}}        (x, π)
        2: {int := {0:<rconst:$2>}}        (y, π)
        3: {int := {0:<rconst:$1>}}        (k)
        4: {int * := {0:<ptr:stack[0]:3>}} (p)
        ---------------------------- func | $0

At this point Xr0 goes down the “false branch” corresponding to the assumption that $0 is logically false:

phase:	ACTUAL

text:
        int k;
        int * p;
        p = &(k);
        if (flag) {
                *(p) = x;
        } else {
                *(p) = y;
        }
-->     return k;

context:
        typedef int size_t

rconst:
        $0: int:{MIN:MAX}       (flag)
        $1: int:{MIN:MAX}       (x)
        $2: int:{MIN:MAX}       (y)

assume:
        ⊢ !($0)

stack:
        0: {int := {0:<rconst:$0>}}        (flag, π)
        1: {int := {0:<rconst:$1>}}        (x, π)
        2: {int := {0:<rconst:$2>}}        (y, π)
        3: {int := {0:<rconst:$2>}}        (k)
        4: {int * := {0:<ptr:stack[0]:3>}} (p)
        ---------------------------- func | !($0)

Accordingly, the else branch is now taken, and we get rconst:$2 as k’s value this time. The execution of the return statement is similar to the previous case:

phase:	ACTUAL

text:
        <end of frame>

context:
        typedef int size_t

return:	<rconst:$2>

rconst:
        $0: int:{MIN:MAX}       (flag)
        $1: int:{MIN:MAX}       (x)
        $2: int:{MIN:MAX}       (y)

assume:
        ⊢ !($0)

stack:
        0: {int := {0:<rconst:$0>}}        (flag, π)
        1: {int := {0:<rconst:$1>}}        (x, π)
        2: {int := {0:<rconst:$2>}}        (y, π)
        3: {int := {0:<rconst:$2>}}        (k)
        4: {int * := {0:<ptr:stack[0]:3>}} (p)
        ---------------------------- func | !($0)

So there you have it: we’ve stepped-through all the possibilities that could arise when executing func, instead of the miniscule cases that we could examine using GDB.

Continue to Anatomy of the debugging (and verification) process or return to the contents page.

E pluribus unum: “out of the many, one”, one of the mottos of the US. Plures ex uno: many out of the one. ↩︎
User-defined in the sense that it appears within a header file. It is not a language primitive in the strict sense, even though it is required by the Standard. All source-defined structures, unions and typedefs fall under this umbrella of user-defined. ↩︎
The ranges, such as MIN:MAX include the lower bound and exclude the upper bound. The values of MIN and MAX correspond to the type, so here MIN is INT_MIN, and MAX is INT_MAX+1. ↩︎
The Standard puts it like this in 3.5.7:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
↩︎