Probabilities in Proofreading

Suppose you write a program and you send the source code to two of your friends, ${\cal A}$ and ${\cal B}$. Your two friends read the code and when they finish, $A$ errors are detected by ${\cal A}$, $B$ errors are detected by ${\cal B}$, and $C$ errors are detected by both. So, in total, $A+B-C$ errors are detected and can now be eliminated. We wish to estimate the number of errors that remain unnoticed and uncorrected.

The original version of this problem concerns manuscripts and proofreaders, instead of source code and programmers. It was posed and solved by George Polya and published in 1976 on The American Mathematical Monthly under the name of Probabilities in Proofreading. Because the problem is interesting and Polya’s solution is short and elegant, I have decided to record and share it. Also, since code sharing and reading is a frequent activity in the software development world, estimating the desired value can be helpful for some readers of this blog.

Estimating the number of unnoticed errors

Let $E$ be the number of all errors, noticed and unnoticed, in the source code. Our goal is to estimate the value of $E-(A+B-C)$. Let $p$ be the probability that friend ${\cal A}$ notices any given error and $q$ the analogous probability for friend ${\cal B}$. The expected number of errors that may be detected by ${\cal A}$ is $E{\cdot}p$ and by ${\cal B}$ is $E{\cdot}q$. Assuming that these probabilities are independent, the expected number of errors that may be mutually detected by both friends is $E{\cdot}p{\cdot}q$.

Because we are interested in an estimate, we can safely assume that the expected numbers are approximately equal to the number of errors detected, that is, $E{\cdot}p \sim A$, $E{\cdot}q \sim B$, and $E{\cdot}p{\cdot}q \sim C$. (We use the notation $\sim$ to denote that two numbers are approximately equal.)

We now have all the ingredients to conclude the solution. Recall that our goal is to estimate the value of $E-(A+B-C)$. We calculate:

\[
\beginproof
\pexp{E-(A+B-C)}
\hint{=}{we rewrite $E$ to prepare the introduction of $A$, $B$, and $C$}
\pexp{\frac{E{\cdot}p{\cdot}E{\cdot}q}{E{\cdot}p{\cdot}q} - (A+B-C)}
\hint{\sim}{assumption on estimates}
\pexp{\frac{A{\cdot}B}{C} - (A+B-C)}
\hint{=}{arithmetic}
\pexp{\frac{(A-C){\cdot}(B-C)}{C}~~.}
\endproof
\]

This is the desired estimate!

Related Posts:

A Calculational Proof of the Handshaking Lemma

UPDATE (2011/09/20): This post was superseded by An improved proof of the handshaking lemma.

In graph theory, the degree of a vertex $A$, $\fapp{d}{A}$, is the number of edges incident with the vertex $A$, counting loops twice. So, considering Graph 0 below, we have $\fapp{d}{A}=3$, $\fapp{d}{B}=3$, $\fapp{d}{C}=1$, $\fapp{d}{D}=3$, and $\fapp{d}{E}=2$.

Example of an undirected graph with five nodes

Graph 0: Example of an undirected graph with five nodes

A well-known property is that every undirected graph contains an even number of vertices with odd degree. The result first appeared in Euler’s 1736 paper on the Seven Bridges of Königsberg and is also known as the handshaking lemma (that’s because another way of formulating the property is that the number of people that have shaken hands an odd number of times is even).

As we can easily verify, Graph 0 satisfies this property. There are four vertices with odd degree ($A$,$B$, $C$, and $D$), and 4, of course, is an even number.

Although the proof of this property is simple, I have never seen it proved in a calculational and goal-oriented way. My aim with this post is to show you a development of a goal-oriented proof.
Continue reading

Related Posts:

Multiples in the Fibonacci series

I found the following problem on K. Rustan M. Leino’s puzzles page:

[Carroll Morgan told me this puzzle.]

Prove that for any positive K, every Kth number in the Fibonacci sequence is a multiple of the Kth number in the Fibonacci sequence.

More formally, for any natural number n, let F(n) denote Fibonacci number n. That is, F(0) = 0, F(1) = 1, and F(n+2) = F(n+1) + F(n). Prove that for any positive K and natural n, F(n*K) is a multiple of F(K).

This problem caught my attention, because it looks like a good example for using a result that I have derived last year. My result gives a reasonable sufficient condition for showing that a function distributes over the greatest common divisor and shows that the Fibonacci function satisfies the condition.

In fact, using the property that the Fibonacci function distributes over the greatest common divisor, we can solve this problem very easily. Using $\fapp{fib}{n}$ to denote the Fibonacci number $n$, $m{\nabla}n$ to denote the greatest common divisor of $m$ and $n$, and $\setminus$ to denote the division relation, a possible proof is:

\[
\beginproof
\pexp{\text{$\fapp{fib}{(n{\times}k)}$ is a multiple of $\fapp{fib}{k}$}}
\hint{=}{definition}
\pexp{\fapp{fib}{k} \setminus \fapp{fib}{(n{\times}k)}}
\hint{=}{rewrite in terms of $\nabla$}
\pexp{\fapp{fib}{k} ~\nabla~ \fapp{fib}{(n{\times}k)} ~=~ \fapp{fib}{k}}
\hint{=}{$fib$ distributes over $\nabla$}
\pexp{\fapp{fib}{(k{\nabla}(n{\times}k))} = \fapp{fib}{k}}
\hint{=}{$k{\nabla}(n{\times}k) = k$}
\pexp{\fapp{fib}{k} = \fapp{fib}{k}}
\hint{=}{reflexivity}
\pexp{true~~.}
\endproof
\]

The crucial step is clearly the one where we apply the distributivity property. Distributivity properties are very important, because they allow us to rewrite expressions in a way that prioritizes the function that has the most relevant properties. In the example above we could not simplify $\fapp{fib}{k}$ nor $\fapp{fib}{(n{\times}k)}$, but applying the distributivity property prioritised the $\nabla$ operator — and we know how to simplify $k{\nabla}(n{\times}k)$. Furthermore, in practice, distributivity properties reduce to simple syntactic manipulations, thus reducing the introduction of error and simplifying the verification of our arguments.

(Now that I think about it, perhaps it would be a good idea to write a note on distributivity properties, summarizing their importance and their relation with symbol dynamics.)

If you have any corrections, questions, or alternative proofs, please leave a comment!

Related Posts:

Calculational proofs are usually direct

jd2718 asked in his blog if anyone knew a direct proof of the irrationality of $\sqrt{2}$. In this post I present a proof that, even if some don’t consider it direct, is a nice example of the effectiveness of calculational proof. But first, there are two concepts that need to be clarified: direct proof and irrational number.

Direct proofs

The concept of direct proof can vary slightly from person to person. For instance, Wikipedia defines it as:

In mathematics and logic, a direct proof is a way of showing the truth or falsehood of a given statement by a straightforward combination of established facts, usually existing lemmas and theorems, without making any further assumptions.

Alternatively, in Larry W. Cusick’s website we can read:

A direct poof [sic] should be thought of as a flow of implications beginning with “P” and ending with “Q”.

P -> … -> Q

Most proofs are (and should be) direct proofs. Always try direct proof first, unless you have a good reason not to.

I consider the wording ‘without making any further assumptions‘ in the first definition ambiguous and I don’t understand why the second definition only applies to implications. But anyway, with these definitions in mind, a direct proof for the irrationality of $\sqrt{2}$ can be something like:

\[
\beginproof
\pexp{\text{$\sqrt{2}$ is irrational}}
\hint{=}{justification}
\pexp{true~~.}
\endproof
\]

Or, alternatively, we can also use a proof of the following shape:

\[
\beginproof
\pexp{\text{$\sqrt{2}$ is irrational}}
\hint{\Leftarrow}{justification}
\pexp{true~~.}
\endproof
\]

Irrational numbers

An irrational number is a real number that can’t be expressed as a simple fraction. Therefore, the number $\sqrt{2}$ is irrational because for all integers $m$ and $n$, with $n$ non-negative, we have that:

\[
\sqrt{2} \neq \frac{m}{n} ~~.
\]

A direct proof for the irrationality of $\sqrt{2}$

Now that we have clarified the concepts above, we prove that $\sqrt{2}$ is irrational. For all integers $m$ and $n$, with $n$ non-negative, we have:

\[
\beginproof
\pexp{\sqrt{2} \neq \frac{m}{n}}
\hint{=}{Use arithmetic to eliminate the square root operator.}
\pexp{{n^2{\times}2}\neq{m^2}}
\hintf{\Leftarrow}{Two values are different if applying the same function to them}
\hintl{yields different values.}
\pexp{\fapp{exp}{(n^2{\times}2)} \neq \fapp{exp}{m^2}}
\hintf{=}{Now we choose the function $exp$.}
\hintm{Let $\fapp{exp}{k}$ be the number of times that $2$ divides $k$.}
\hintm{The function $exp$ has two important properties:}
\hintm{$~~~\fapp{exp}{2}=1$ and}
\hintm{$~~~\fapp{exp}{k{\times}l} = \fapp{exp}{k} + \fapp{exp}{l}$}
\hintl{We apply these properties to simplify the left and right sides.}
\pexp{1{\,+\,}2{\times}\fapp{exp}{n} ~\neq~ 2{\times}\fapp{exp}{m}}
\hintf{=}{The left side is an odd number and the right side is an even}
\hintl{number. Odd numbers and even numbers are different.}
\pexp{true ~~.}
\endproof
\]

Note that, unlike traditional proofs, we don’t assume that $m$ and $n$ are co-prime, nor that $\sqrt{2}$ is a rational number. We simply derive the boolean value of the expression $\sqrt{2}~{\neq}~{\frac{m}{n}}$.

If you have any suggestions or corrections, please leave a comment. I’d be more than happy to hear from you.

Note: I learnt the contrapositive of this proof from Roland Backhouse (page 38, Program Construction — Calculating Implementations from Specifications).

Related Posts: