Base Rate Error or Forgotten Base Frequency

Base rate error

Base rate error refers to the tendency to ignore relevant statistical information in favor of case-specific information. Instead of taking into account the base rate or prior probability of an event. For this reason, people often make inaccurate probability judgments in decision-making contexts. Base rate error is also called base rate neglect or base rate bias or base frequency forgetting.

Examples

Consider a city with a population of one million. Of these one million (1,000,000) individuals, 100 are suspected criminals and are listed as such, while the other 999,900 are presumed non-criminals. To detect the presence of a criminal, the city installs CCTV cameras with an automatic facial recognition system: this system should trigger an alert as soon as the filmed face matches that of one of the 100 criminals on the list. Unfortunately, the facial recognition system is not perfect. Let's assume it has an "error rate of 1/%," or, more precisely, that:

its sensitivity is 99 %, or a rate of 1 % of false negatives among real offenders;
its specificity is 99 %, or a false positive rate of 1 % among non-offenders.
When an alert is triggered, what is the probability that it involves a listed offender?

If we reason by "ignoring the base frequency," that is, by only considering the "error rate is 1/%," we too hastily conclude that there is a 99/% probability that the individual is indeed a criminal when an alert is triggered. This is incorrect. In fact, when counting all alerts, two situations must be considered simultaneously:

99 % offenders trigger the alert, i.e. 99 offenders out of the 100 on the list (according to the definition of sensitivity); ;
1 % of non-offenders trigger the alert, i.e. 9,999 non-offenders out of 999,900 (according to the definition of specificity).

This gives a total of 99 + 9999 = 10,098 alerts. When an alert is triggered, the probability that the individual is indeed a criminal is therefore 99 out of 10,098, or 0.98 (%), and not 99 (%). This probability can be found using Bayes' theorem.

Here's another example.

During the pandemic, we have often heard or read statistics in the news, such as “70 % of hospitalized Covid patients have been vaccinated” or “7 in 10 hospitalized Covid patients are vaccinated”. At the time, many wondered what this implied about the vaccine's effectiveness.

The key to correctly interpreting this information lies in the baseline vaccination rate (i.e., the percentage of the population vaccinated). If a large proportion of the population is vaccinated and only a small fraction is not, we can expect a higher ratio of vaccinated to unvaccinated individuals in the hospital.

For example, suppose that a population is vaccinated at 99 % and that 51 % of the infected individuals have been vaccinated. The base rate error would lead most people to believe that the vaccine has no preventive effect. However, if the vaccine was ineffective, we would expect that approximately 99 % of infected people would have been vaccinated.

The base rate fallacy has created the misconception that vaccines are ineffective because, in highly vaccinated populations, the majority of COVID-19 cases occur among vaccinated people.

How to Avoid Base Rate Mistakes

This is very simple; in addition to statistics, you need to know the number of individuals in each tested population. The easiest way to avoid an error in the base rate is to rely on a confusion matrix.