Thesis 1 reads ‘It is easy to write safe C code in theory, but impossible in practice’. This statement is somewhat paradoxical, but contains the key to understanding why a system like Xr0 is possible. Both parts of this statement are controversial, the first to critics (sworn enemies) of C and the second to large numbers of its capable users (devotees).
What is meant in the first part of the thesis is that the vast majority, if not all, of the safety vulnerabilities that exist in programs written by competent C programmers (whom we shall call ‘programmers’ from now on, to avoid drudgery) are what may be called minor errors. Minor, not in their costs, for
‘…the cost of error in certain types of program may be almost incalculable—a lost spacecraft, a collapsed building, a crashed aeroplane, or a world war.’1
Rather, these errors are minor in that they do not reflect genuine misunderstandings of C’s semantics. In particular, we may focus upon dormant or latent errors, those errors which will only be detected when the program is run under circumstances or with inputs that the programmer does not test for.
When a NULL-dereference, or buffer overflow, or leak, or double free, or other similar error occurs, it is not usually the case that the programmer is unfamiliar with the kind of error, or misunderstands its causes, for he is able to fix it the moment it is shown to him. There is nothing particularly complicated about these errors, and the challenge is not one of understanding, but of absolutely rigorous, never-failing consistency. To prevent leaks and double-frees, for example, programmers are forced to develop conventions around memory ownership and apply these uniformly.
Unsafe:
int *
foo(int x);
void
bar(int x)
{
int *p = foo(x);
/* should p be freed? */
}
‘Safe’:
/* foo: Return a reference to the heap if x > 0 */
int *
foo(int x);
void
bar(int x)
{
int *p = foo(x);
if (x > 0) {
free(p);
}
}
It is possible, as on the left above, for a programmer to misunderstand the
semantics of a given function, like foo
, and call it without realising that it
allocates memory on the heap. He may do this and introduce a leak unwittingly
into his program. But upon being shown his mistake, he has no fundamental
difficulty in freeing the memory as necessary, or indeed, restructuring the
functions to avoid the error:
/* new_foo: Same as foo but will never return reference to heap */
int *
new_foo(int x);
void
bar(int x)
{
int *p = new_foo(x);
}
/* foo: Return a reference to the heap if x > 0 */
int *
foo(int x);
int *
new_foo(int x)
{
int *p = foo(x);
if (x > 0) {
free(p);
p = NULL;
}
return p;
}
The case is analogous to that of finding the sum of a list of ten (or a hundred) single-digit numbers.2 Each successive summation is trivial, and if the wrong sum is achieved, no one who is shown the line (or lines) on which he erred will have difficulty admitting and rectifying his mistake. Yet provided the number of lines is high enough, the fallibility of our nature guarantees to us the presence of mistakes. For a human being, flawlessly summing a list of N numbers is easy in theory, but for large N, impossible in practice.
We see then that it is possible to write safe C code if one is extremely careful. While this point may be debated by proponents of so-called higher-level languages, their deeds betray them, because nearly all these safe languages are compiled by compilers either written in C or bootstrapped from it, in the final analysis. A safety vulnerability in the compiler could always lead to a safety vulnerability in programs built by it, so those who claim that there are safer languages than C, are expressing their faith in the safety of some programs written in C.3
On the other hand, a chain is as strong as its weakest link, and if we define safety as the absence of the classes of errors described above (NULL-dereference, double free, etc.) a single instance disqualifies a C program as safe. This implies that almost no C program of substantial size is safe. We shall adapt an argument of Dijkstra to clarify this point.
Let us suppose that a programmer is writing a C program. Let us further suppose that this programmer—call him Ken—is extremely competent, indeed, outrageously so, and makes minor errors of the kind described above only very rarely. At any given point, the program is composed of $N$ lines of code. Let $p$ denote the probability that Ken will not make a minor error on any given line of code that he writes. One could argue that $p$ should be a monotonically—if not strictly—decreasing function of $N$, but let us assume to the credit of Ken’s incredible skill level this number remains constant no matter what $N$ is. Let’s say then that $p$ is as high as 99.99%, that is, that on average, Ken makes a minor error only once in ten-thousand lines of code.
It is worth emphasizing how high an average this is. Steve McConnell lists4 several estimates about errors per LOC. Whereas the industry average is between 1–25 errors per LOC, he mentions that ‘cases that have one-tenth as many errors as this [i.e. cases with Ken’s $p$ above] are rare’. He further cites two cases with high values for $p$, a technique introduced by Harlan Mills that can achieve ‘0.1 defects per 1000 lines of code’ (which is Ken’s $p$) and another by Watts Humphrey that achieved defect levels of 0 ‘0.06 defects per 1000 lines of code’. However, Mills approach involves exhaustive formal specification and hand-verification, so it is hardly comparable to programming as practiced today. Humphrey’s case appears to yield higher values of $p$ until one finds that this figure, 0.06 errors per 1000 LOC, is obtained through testing rather than rigorous verification, meaning that this admittedly remarkable ratio (0.99994) gives an upper bound for $p$ but no lower bound. So our conceiving of Ken as capable of producing code with $p$ at 0.9999, is reflective of the least error-prone programming imaginable with current techniques.
For what value of $N$ do we begin to predict that Ken’s program contains a safety vulnerability? With the above numbers, we have $p^N = 0.5$ with $N$ just under 7000 LOC5. So even Ken cannot write a C program significantly larger than five or six thousand LOC without (probably) introducing a safety vulnerability.
Most if not all major OSs are north of 1M LOC. GCC and LLVM are well beyond 1M LOC. PostgreSQL and MySQL are north of this same figure.6 Now, we mean no disrespect to these projects. The PostgreSQL codebase, in particular, is phenomenal for its organisation and consistency, and we have a lot to learn from these projects. Modern computing relies upon them. But even if Ken wrote one of these projects and passed the 1M mark, his $p^N$ would be down to 3.7e-44, which tells us there are many, many, many safety vulnerabilities in each of these systems, or they have found a way around the exponential curve.7
Nevertheless, these figures ought to startle us, chasten our pride, and teach us humility.8 Dijkstra’s argument, in fact, was not about safety vulnerabilities, but bugs of all sorts. Formalising the principle deducible from the argument, we may speak of
Dijkstra’s Law. Every non-trivial program (whose correctness has not been proven mathematically) contains bugs. As the number of lines of code in such a program tends to infinity, the number of bugs tends to infinity also.9
It may be argued, and not without some merit,10 that multiple programmers working together could have a higher $p$. OK. Make it Ken & DMR11 working in unison. Suppose that this brings $p$ up to 99.999% (surely an order of magnitude is a fair upper bound for the benefit of working in a group?) What now is $p^N$ for $N$ equal to one million? 0.00005. The expected number of vulnerabilities? 10. These numbers suggest that even the best teams in the world would struggle to make a program of such sizes without introducing a significant number of vulnerabilities.
The only solution is to get off the exponential curve entirely, and require that p = 1. The fact that the errors which compromise safety in C programs are minor means that it is easy to write a single line (or ten lines) without making one of these errors. It is simply a matter of setting up local assumptions correctly.
This fact means also that automated verification of the safety of a few lines of C code is feasible, provided one is able to state assumptions clearly:
int *
foo(int x) ~ [ if (x > 0) return malloc(sizeof(int)); ];
void
bar(int x)
{
int *p = foo(x);
if (x > 0) {
free(p);
}
}
Xr0 will give C programmer the only thing he lacks: an accountant to help him avoid making the errors he already understands. We will have our C and eat it.
Hoare, An Axiomatic Basis for Computer Programming, 1969. ↩︎
This example is based on Turing’s in Checking a Large Routine, 1949. ↩︎
Thompson, Reflections on Trusting Trust, 1984. To be clear, we are not denying that there are safer languages than C, for we are comfortable expressing our faith in the safety of some programs written in C. ↩︎
McConnell, Code Complete 2, 2004. ↩︎
The formula is $ N = \log_p(0.5) $, which is approximately 6931.125. ↩︎
The same—if not far worse—is obviously the case for SQL Server, Oracle DB etc. but the figures are not available online. ↩︎
In the presence of this startling number 3.7e-44 it is important to
remind ourselves that $p^N$ is the probability that there are zero safety
vulnerabilities. Although this probability decays exponentially, the number of
expected vulnerabilities will grow linearly: with the above per-line error
probability of 0.01%, we expect a system with 1M LOC to contain 100
vulnerabilities.
Another fact worth bearing in mind is that large systems of the kind
referenced are generally complexes of modules that interface with very loose
coupling, which can be expected to reduce the effective $N$. But surely it
will not reduce $N$ by a factor of 10 or 100, so the exponential trajectories
being referred to are legitimate. ↩︎
‘The competent programmer is fully aware of the strictly limited size
of his own skull; therefore he approaches the programming task in full
humility, and among other things he avoids clever tricks like the plague.’
Dijkstra, The Humble Programmer, 1972. ↩︎
We are using the term bug for the sake of clarity, but do not mean to disagree with Dijkstra in his insistence that the correct word is error, as usage in the rest of the article reflects. ↩︎
‘Given a sufficiently large number of eyeballs, all bugs are shallow’, “Linus’s Law”, formulated by Eric S Raymond. ↩︎
Any resemblance to actual persons is purely coincidental. ↩︎