# Probabilities in Proofreading

Suppose you write a program and you send the source code to two of your
friends, ${\cal A}$ and ${\cal B}$. Your two friends read the code and
when they finish, $A$ errors are detected by ${\cal A}$, $B$ errors are
detected by ${\cal B}$, and $C$ errors are detected by both. So, in total,
$A+B−C$ errors are detected and can now be eliminated. We
wish to *estimate* the number of errors that remain unnoticed and uncorrected.

The original version of this problem concerns manuscripts and proofreaders,
instead of source code and programmers. It was posed and solved by George Polya
and published in 1976 on The American Mathematical Monthly
under the name of
*Probabilities in Proofreading*. Because the problem is interesting and Polya’s
solution is short and elegant, I have decided to record and share it. Also,
since code sharing and reading is a frequent activity in the software
development world, estimating the desired value can be helpful for some readers
of this blog.

## Estimating the number of unnoticed errors

Let $E$ be the number of all errors, noticed and unnoticed, in the source code. Our goal is to estimate the value of $E-(A+B-C)$. Let $p$ be the probability that friend ${\cal A}$ notices any given error and $q$ the analogous probability for friend ${\cal B}$. The expected number of errors that may be detected by ${\cal A}$ is $E{\cdot}p$ and by ${\cal B}$ is $E{\cdot}q$. Assuming that these probabilities are independent, the expected number of errors that may be mutually detected by both friends is $E{\cdot}p{\cdot}q$.

Because we are interested in an *estimate*, we can safely assume that
the expected numbers are approximately equal to the number of errors detected,
that is, $E{\cdot}p \sim A$, $E{\cdot}q \sim B$, and $E{\cdot}p{\cdot}q \sim
C$. (We use the notation $\sim$ to denote that two numbers are approximately
equal.)

We now have all the ingredients to conclude the solution. Recall that our goal is to estimate the value of $E-(A+B-C)$. We calculate:>

$$ \beginproof \pexp{E-(A+B-C)} \hint{=}{we rewrite $E$ to prepare the introduction of $A$, $B$, and $C$} \pexp{\frac{E{\cdot}p{\cdot}E{\cdot}q}{E{\cdot}p{\cdot}q} - (A+B-C)} \hint{\sim}{assumption on estimates} \pexp{\frac{A{\cdot}B}{C} - (A+B-C)} \hint{=}{arithmetic} \pexp{\frac{(A-C){\cdot}(B-C)}{C}~~.} \endproof $$

This is the desired estimate!