Probabilities in Proofreading

Suppose you write a program and you send the source code to two of your friends, ${\cal A}$ and ${\cal B}$. Your two friends read the code and when they finish, $A$ errors are detected by ${\cal A}$, $B$ errors are detected by ${\cal B}$, and $C$ errors are detected by both. So, in total, $A+B-C$ errors are detected and can now be eliminated. We wish to estimate the number of errors that remain unnoticed and uncorrected.

The original version of this problem concerns manuscripts and proofreaders, instead of source code and programmers. It was posed and solved by George Polya and published in 1976 on The American Mathematical Monthly under the name of Probabilities in Proofreading. Because the problem is interesting and Polya’s solution is short and elegant, I have decided to record and share it. Also, since code sharing and reading is a frequent activity in the software development world, estimating the desired value can be helpful for some readers of this blog.

Estimating the number of unnoticed errors

Let $E$ be the number of all errors, noticed and unnoticed, in the source code. Our goal is to estimate the value of $E-(A+B-C)$. Let $p$ be the probability that friend ${\cal A}$ notices any given error and $q$ the analogous probability for friend ${\cal B}$. The expected number of errors that may be detected by ${\cal A}$ is $E{\cdot}p$ and by ${\cal B}$ is $E{\cdot}q$. Assuming that these probabilities are independent, the expected number of errors that may be mutually detected by both friends is $E{\cdot}p{\cdot}q$.

Because we are interested in an estimate, we can safely assume that the expected numbers are approximately equal to the number of errors detected, that is, $E{\cdot}p \sim A$, $E{\cdot}q \sim B$, and $E{\cdot}p{\cdot}q \sim C$. (We use the notation $\sim$ to denote that two numbers are approximately equal.)

We now have all the ingredients to conclude the solution. Recall that our goal is to estimate the value of $E-(A+B-C)$. We calculate:

\hint{=}{we rewrite $E$ to prepare the introduction of $A$, $B$, and $C$}
\pexp{\frac{E{\cdot}p{\cdot}E{\cdot}q}{E{\cdot}p{\cdot}q} - (A+B-C)}
\hint{\sim}{assumption on estimates}
\pexp{\frac{A{\cdot}B}{C} - (A+B-C)}

This is the desired estimate!

Related Posts: