Recall and Precision

Jojo de León
4 min readOct 28, 2021

which would be better for a model…

Precision and recall are two diagnostic tools that help in the interpretation of probabilistic forecast for binary (two-class) predictive modeling problems. Depending upon the goal for a model, it can be more valuable to predict probabilities of an observation (data point or instance) belonging to each class in a classification problem rather than predicting classes directly. The reason for this is to provide the capability to choose and even calibrate the threshold for how to interpret the predicted probabilities. When making a prediction for a binary or two-class classification problem, there are two types of errors that we could make.

  • False Positive. Predict an event when there was no event.
  • False Negative. Predict no event when in fact there was an event.

By predicting probabilities and calibrating a threshold, a balance of these two concerns can be chosen by the operator of the model.

A note on hypothesis testing…

In hypothesis testing, you are performing statistical tests to determine whether you believe a statement to be true or false. This initial statement you are testing is called the null hypothesis.

The notion of a statistical error is an integral part of hypothesis testing. The test goes about choosing two competing propositions called null hypothesis, denoted by H0, and alternative hypothesis, denoted by H1 .

The null hypothesis may be true, whereas we reject H0. On the other hand, the alternative hypothesis H1 may be true, whereas we do not reject H0. Two types of error are distinguished: Type I error and type II error

When conducting hypothesis testing, there will almost always be the chance of accidentally rejecting a null hypothesis when it should not have been rejected. Data scientists have the ability to choose a confidence level, alpha (𝛼) that they will use as the threshold for accepting or rejecting the null hypothesis. This confidence level is also the probability that you reject the null hypothesis when it is actually true. This scenario is a type I error, more commonly known as a False Positive (example: “an innocent person is convicted”). Conversely, a type II error is the mistaken acceptance of an actually false null hypothesis, also known as a False Negative (example: “a guilty person is not convicted”).

Different scenarios call for scientists to minimize one type of error over another. The two error types are inversely related to one other; reducing type I errors will increase type II errors and vice versa.

Now back to precision and recall…

Precision is a ratio of the number of true positives divided by the sum of the true positives and false positives. It describes how good a model is at predicting the positive class. Precision is referred to as the positive predictive value.

Recall is calculated as the ratio of the number of true positives divided by the sum of the true positives and the false negatives. Recall is the same as sensitivity and is also known as the the true positive rate.

The two measures are sometimes used together in the F1 Score(or f-measure) to provide a single measurement for a system.

Note: Other related measures used in classification include true negative rate and accuracy. Accuracy is how close you are to the true value (precision is how close two or more measurements are to each other. See Bias-Variance Trade Off).

More generally, recall is simply the complement of the type II error rate, i.e. one minus the type II error rate. Precision is related to the type I error rate, but in a slightly more complicated way, as it also depends upon the prior distribution of seeing a relevant vs an irrelevant item. So, precision is the fraction of relevant instances among the retrieved instances, while recall is the fraction of relevant instances that were retrieved.

Now to choose between the two…

So precision or recall? As often is the answer in this field, it depends — is it acceptable to miss some relevant instances?

Let’s take for example, cancer screening. In this situation, it would not be acceptable to miss some relevant instances as that would mean a patient with cancer would be classified as cancer free. Therefore, having a high recall score (i.e. low false negatives and high false positives )— and subsequently a lower precision score — would be preferred.

As a counter example, consider a recommendation system for films. Here, it is acceptable to miss some relevant instances. Therefore having a high precision score (i.e. high false negatives and low false positives) — and subsequently a lower recall score — would be preferred.

--

--