Recently, I wrote a Bayesian formulation of Carl Sagan’s famous maxim, ‘Extraordinary claims require extraordinary evidence.’ However, since the aim of that post was not to teach Bayes’ Theorem, but to reply to a criticism of the maxim, I may have left readers unprepared to actually use this theorem. I thought it might be helpful to provide an explanation. The equation looks difficult, but it’s actually quite simple once you understand the symbols. It’s just a matter of figuring out a few numbers.
Bayes’ Theorem
Here is one way to formulate Bayes’ Theorem when you want to know the probability of A given B:

Since A and B might sound a little abstract, let’s provide a concrete example and we’ll walk through the equation step-by-step.
Breast Cancer
Eliezer Yudkowsky uses the following example[i] that most doctors get wrong when polled:
1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get a positive mammography. 9.6% of women without breast cancer will also get a positive mammography. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?
So, we want to find the probability of breast cancer (A) given a positive mammography (B). We will just need to concern ourselves with four figures—the two in the red box (the numerator is repeated exactly in the denominator) and the two in the green box. These four figures are:
- P(B│A) = the probability that you will have a positive mammography if you have breast cancer
- P(A) = the probability that you have breast cancer prior to considering the evidence, which is why it is often called the ‘prior probability’
- P(B│¬A) = the probability that you will have a positive mammography if you do not have breast cancer
- P(¬A) = the prior probability that you do not have breast cancer
So, now we should be able to figure out where to plug in the numbers from the initial problem:
- P(B│A) = 0.80
- P(A) = 0.01
- P(B│¬A) = 0.096
- P(¬A) = 0.99
The last figure of P(¬A) was not explicitly given above, but we can always figure it out if we know P(A). That’s because P(A) + P(¬A) = 1. Think of it this way: 1 is the total amount of possible probability to be divided up among options because 1 is really the same as 100%. So, whatever the probability is that A is true, it is necessarily the case that the probability that A is not true completes whatever is left to add up to 1. If there is a 70% chance that A is true (0.7) then there is a 30% chance that A is not true (0.3). In this particular case, the prior probability of A was 1%, so the prior probability of ¬A is necessarily 99%.
Now that we have our numbers matched to the terms, we can return to the original formula:
P(A│B) = 0.80*0.01_____________
(0.80*0.01) + (0.096*0.99)
Simplified further, it reads:
P(A│B) = 0.008____________
0.008 + 0.09504
P(A│B) = 0.0776
This means that if you are a woman at age 40 and receive a positive mammography, there is actually only a 7.76% chance that you have breast cancer. Only 15% of doctors across several surveys gave the correct answer. Most of them replied that there was an 80% chance that you have breast cancer in this scenario. That’s quite a difference, especially to the woman wondering whether she has cancer!
Belle and Gaston
That problem is relatively easy, since you are given the numbers. What happens, though, in a fairly ambiguous situation where estimation is required?
Let’s say that Gaston has approached Belle for a date and been rejected on multiple occasions. He is trying to determine whether Belle really likes him and is playing hard to get or whether she genuinely does not like him. So, we are looking for the probability that Belle likes Gaston (A) given the evidence of her repeated rejections (B). Again, we will just need to concern ourselves with four figures, tackled one-by-one:
- P(B│A) = the probability of rejections if Belle likes Gaston
We can recognize that the first figure should be quite low, given that Belle is known to be a very honest and straightforward person. If Belle actually likes Gaston, we would expect an honest admission of this, rather than repeated rejections. However, women can be funny in showing their affection, so I would say it’s still greater than 0. Let’s place the probability at 0.05. It’s very low, but leaves a little room for error in case our perception of Belle has been mistaken.
- P(A) = the prior probability that Belle likes Gaston
The second figure should be high, given Gaston’s popularity with women in general. Remember that in factoring this number, the evidence, such as the rejections, should not count. In other words, there is no B in this part of the equation. So, let’s err on the side of Gaston’s prominent features—good looks, size, fighting ability, egg consumption—and place the probability at 0.9.
- P(B│¬A) = the probability of rejections if Belle does not like Gaston
The third figure should be very high for the reasons previously discussed about Belle’s honest manner. If Belle does not like Gaston, then it is almost certain she would reject him. Let’s place that probability at 0.99.
- P(¬A) = the prior probability that Belle does not like Gaston
Finally, the last figure is 0.1. Remember, it is simply filling out the remainder from P(A).
So, now we can plug in the numbers:
P(A│B) = 0.05*0.9____________
(0.05*0.9) + (0.99*0.1)
Simplified further, it reads:
P(A│B) = 0.045____________
0.045 + 0.099
P(A│B) = 0.3125
Even though Gaston finds himself desired by women in the overwhelming majority of cases, the evidence here is such that he can be confident that Belle does not like him. There is only a 31.25% chance that she does in fact like him (A) given her rejections (B).
Conclusion
If you came into this not knowing much about Bayes’ Theorem, hopefully you now understand it a bit more. The math involved here is really not that difficult. You simply have to determine four numbers and then just let the formula do the heavy lifting. I would recommend plugging in some numbers to see what happens to the outcome. For example, when you have really high or low prior probabilities, it’s hard to overcome them with evidence. This was the essential point in my earlier post about extraordinary evidence.