A Bayesian Approach to Basketball


motivation

Bayesian inference is the process of using evidence (data) to update one’s prior beliefs about phenomena. The Bayesian approach can be especially useful in a field like sports, where we often don’t have large samples to work with (rendering frequentist approaches less effective). In this post, I’ll apply Bayesian principles in a basketball context.


prerequisites

This post assumes that the reader is familiar with probability theory and notation. I’ll provide a quick review, but I recommend reading up on Conditional Probability, Bayes’ Theorem, and the Law of Total Probability.

Conditional Probability

\[P(A|B) = \frac{P(A,B)}{P(B)}\] This says that the probability of event \(A\) conditioned on event \(B\) (we usually say “\(A\) given \(B\)”) is equal to the joint probability of \(A\) and \(B\) divided by the probability of \(B\).

Bayes’ Theorem

Bayesian inference is centered around one equation: Bayes’ Theorem. \[P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}\] Here’s a quick summary of the various parts of the theorem:

  • \(H\) represents the hypothesis, which is updated based on \(E\), the evidence.
  • \(P(H)\) is called the prior probability. This is the probability of \(H\) prior to observing \(E\).
  • \(P(E|H)\) is called the likelihood. This is the probablity of observing \(E\) given that \(H\) is true.
  • \(P(E)\) is the total probability of the evidence being observed, we’ll use the Law of Total Probability to break this out. According to Wikipedia, \(P(E)\) is sometimes called the marginal likelihood.
  • \(P(H|E)\) is the posterior probability. The is the probability of \(H\) given that we have observed \(E\). In simpler terms, this is the updated probability of our hypothesis after we’ve accounted for new evidence.

Law of Total Probability

\[P(A) = \sum_{i=1}^n P(A, B_i)\] The Law of Total Probability just says that the total probability of some event, \(A\), is equal to the sum of the joint probabilities of \(A\) and all other disjoint sets, \(B_i\), in the probability space.


bayesian inference applied to basketball

problem

Suppose a good 3-point shooter makes 35% of their 3-pointers from the top of the arc, and a bad one makes 15% of their 3-pointers from the top of the arc.

We want to know if Christina is a good 3-point shooter. Before she shoots, we think that there is a 20% chance of her being a good 3-point shooter. Christina takes three 3-point shots from the top of the arc, and makes all three.

  1. What is the probability that Christina is a good 3-point shooter?
  2. How many 3-point shots from the top of the arc would Christina have to make in a row for us to believe there’s at least a 95% chance Christina is a good 3-point shooter?

solution

> part 1.

Based on the problem statement, we want to update our belief about Christina being a good 3-point shooter for each shot that she makes in a row. So, \(H\) – our hypothesis – is that Christina is a good 3-point shooter, and \(E\) – the evidence – is that she makes three shots in a row. Since we’re going to update our prior probability iteratively, I’m going to modify our notation a bit, and define \(E_i\) as the \(i\)th shot that she makes in a row (i.e. \(E_1\) is the first make, \(E_2\) is the second make, etc.),

From the problem statement:

  • \(P(H) = 0.20 \Longleftrightarrow P(\neg H) = 0.80\)
  • \(P(E_i|H) = 0.35\)
  • \(P(E_i|\neg H) = 0.15\)
  • \(P(E_i) = P(E_i,H) + P(E_i,\neg H) = P(E_i|H) \cdot P(H) + P(E_i|\neg H) \cdot P(\neg H)\) (by the Law of Total Probability)

Let’s look at our Bayesian update after Christina makes her first shot: \[ \begin{aligned} P(H|E_1) &= \frac{P(E_1|H) \cdot P(H)}{P(E_1)} \\ &= \frac{P(E_1|H) \cdot P(H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)} \\ &= \frac{0.35 \cdot 0.20}{0.35 \cdot 0.20 + 0.15 \cdot 0.80} \end{aligned} \]

Here’s our posterior probability after Christina makes her first shot: $P(H|E_1) = $ 0.3684211.
We can now update our prior probability: \(P(H) := P(H|E_1)\).

We can apply the same iterative update procedure to find the posterior probabilities after Christina’s second and third made shots, but first, let’s see if we can simplify our equation a bit. We’ll start by examining the equation for the posterior after Christina’s second make: \[ \begin{aligned} P(H|E_2) &= \frac{P(E_2|H) \cdot P(H)}{P(E_2)} \\ &= \frac{P(E_2|H) \cdot P(H)}{P(E_2|H) \cdot P(H) + P(E_2|\neg H) \cdot P(\neg H)} \\ \end{aligned} \] Now, remember, we updated our initial prior probability: (\(P(H) := P(H|E_1)\) and \(P(\neg H) := 1 - P(H|E_1)\)).
So: \[ \begin{aligned} P(H|E_2) &= \frac{P(E_2|H) \cdot P(H|E_1)}{P(E_2|H) \cdot P(H|E_1) + P(E_2|\neg H) \cdot P(\neg H|E_1)} \\ &= \frac{P(E_2|H) \cdot \frac{P(E_1|H) \cdot P(H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)}}{P(E_2|H) \cdot \frac{P(E_1|H) \cdot P(H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)} + P(E_2|\neg H) \cdot \Big(1 - \frac{P(E_1|H) \cdot P(H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)}\Big)} \\ \end{aligned} \] We can simplify this further by noting that \(1 - \frac{P(E_1|H) \cdot P(H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)} = \frac{P(E_1|\neg H) \cdot P(\neg H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)}\).
Now, we have: \[ \begin{aligned} P(H|E_2) &= \frac{P(E_2|H) \cdot \frac{P(E_1|H) \cdot P(H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)}}{P(E_2|H) \cdot \frac{P(E_1|H) \cdot P(H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)} + P(E_2|\neg H) \cdot \frac{P(E_1|\neg H) \cdot P(\neg H)}{P(E_1|H) \cdot P(H) + P(E_1|\neg H) \cdot P(\neg H)}} \\ &= \frac{P(E_2|H) \cdot P(E_1|H) \cdot P(H)} {P(E_2|H) \cdot P(E_1|H) \cdot P(H) + P(E_2| \neg H) \cdot P(E_1| \neg H) \cdot P(\neg H)} \\ \end{aligned} \] I won’t write out all of the math, but if you follow the same procedure to find the posterior probability after Christina’s third made shot, you’ll get the following: \[ \begin{aligned} P(H|E_3) &= \frac{P(E_3|H) \cdot P(E_2|H) \cdot P(E_1|H) \cdot P(H)} { P(E_3|H) \cdot P(E_2|H) \cdot P(E_1|H) \cdot P(H) + P(E_3| \neg H) \cdot P(E_2| \neg H) \cdot P(E_1| \neg H) \cdot P(\neg H)} \\ \end{aligned} \] Do you see the pattern? Let me make it a bit more obvious. In the following equation, \(P(H|E_n)\) represents the posterior probability of Christina being a good 3-point shooter, given that she has made \(n\) shots in a row from the top of the arc. \[ \begin{aligned} P(H|E_n) &= \frac{\big[ \prod_{i=1}^n P(E_i|H) \big] \cdot P(H) } {\big[ \prod_{i=1}^n P(E_i|H) \big] \cdot P(H) + \big[ \prod_{i=1}^n P(E_i|\neg H) \big] \cdot P(\neg H)} \end{aligned} \] But the probability of making a shot from the top of the arc given that you’re a good or bad shooter doesn’t change. As we noted earlier, \(P(E_i|H) = 0.35\) and \(P(E_i|\neg H) = 0.15\), regardless of the shot number (\(i\)). This is another way of saying that \(E_1|H, E_2|H, \dots, E_n|H\) are conditionally independent and identical (\(E_1|\neg H, E_2|\neg H, \dots, E_n|\neg H\) are conditionally independent and identical as well). These properties allow us to make a final simplification: \[ \begin{aligned} P(H|E_n) &= \frac{[P(E_1|H) \cdot P(E_2|H) \cdot ... \cdot P(E_n|H) ] \cdot P(H)} {[P(E_1|H) \cdot P(E_2|H) \cdot ... \cdot P(E_n|H) ] \cdot P(H) + [P(E_1|\neg H) \cdot P(E_2|\neg H) \cdot ... \cdot P(E_n|\neg H) ] \cdot P(\neg H)} \\ &= \frac{[P(E_i|H)]^n \cdot P(H)} {[P(E_i|H)]^n \cdot P(H) + [P(E_i|\neg H)]^n \cdot P(\neg H)} \end{aligned} \] It took some work, but we finally have a simple equation for calculating the posterior after \(n\) made shots. Let’s finish up part 1 of the problem: \[ \begin{aligned} P(H|E_3) &= \frac{[P(E_i|H)]^3 \cdot P(H)} {[P(E_i|H)]^3 \cdot P(H) + [P(E_i|\neg H)]^3 \cdot P(\neg H)} \\ &= \frac{(0.35)^3 \cdot 0.20} {(0.35)^3 \cdot 0.20 + (0.15)^3 \cdot 0.80} \end{aligned} \]

The posterior probability after Christina makes her third shot in a row: $P(H|E_3) = $ 0.7605322.

> part 2.

We can write a simple loop to figure out how many makes in a row it would take for us to believe there’s at least a 95% chance Christina is a good 3-point shooter:

# clear environment
rm(list=ls())

# variables
p_h <- 0.2  # prior probability
p_e_given_h <- 0.35  # probability of making shot given good shooter
p_e_given_not_h <- 0.15  # probability of making shot given bad shooter

posterior <- 0
thresh <- 0.95
i <- 0

# loop through shots until threshold is breached
while (posterior < thresh) {
  # update shot iteration
  i <- i + 1
  # find posterior probability and update the prior
  posterior <- (p_e_given_h^i * p_h) / (p_e_given_h^i * p_h + p_e_given_not_h^i * (1-p_h))
  
  bayes_update <- glue::glue(
    "shots made in a row: {i} \nposterior probability: {posterior}\n\n"
  )
  print(bayes_update)
}
## shots made in a row: 1 
## posterior probability: 0.368421052631579
## 
## shots made in a row: 2 
## posterior probability: 0.576470588235294
## 
## shots made in a row: 3 
## posterior probability: 0.760532150776053
## 
## shots made in a row: 4 
## posterior probability: 0.881100917431193
## 
## shots made in a row: 5 
## posterior probability: 0.945328758647843
## 
## shots made in a row: 6 
## posterior probability: 0.975813876332269

Christina would have to make 6 shots in a row from the top of the arc for us to believe that there’s a 95% chance at minimum that she is a good 3-point shooter.

> an alternate approach

Let’s try tackling this problem from a different perspective. We’ll start by asking “what’s the probability that a good shooter will make \(n\) shots from the top of the arc in a row?” \[ \begin{aligned} P(E_1,\dots,E_n | H) &= \frac{P(E_1,\dots,E_n, H)}{P(H)} \end{aligned} \] But our equation doesn’t include the value that the problem is asking for, \(P(H | E_1,\dots, E_n)\). We can fix this by applying the definition of conditional probability to the term in the numerator.
We then have: \[ \begin{aligned} P(E_1,\dots,E_n | H) &= \frac{P(H | E_1,\dots, E_n) \cdot P(E_1, \dots, E_n)}{P(H)} \end{aligned} \] Now that we’ve got the term of interest in our equation, we can rearrange our equation to isolate it: \[ \begin{aligned} P(H | E_1,\dots,E_n) &= \frac{P(E_1,\dots,E_n | H) \cdot P(H) }{P(E_1, \dots, E_n)} \end{aligned} \] We can break out the denominator using the Law of Total Probability and the definition of conditional probability: \[ \begin{aligned} P(H | E_1,\dots,E_n) &= \frac{P(E_1,\dots,E_n | H) \cdot P(H) }{P(E_1, \dots, E_n , H) + P(E_1, \dots, E_n , \neg H)} \\ &= \frac{P(E_1,\dots,E_n | H) \cdot P(H) }{P(E_1, \dots, E_n | H) \cdot P(H) + P(E_1, \dots, E_n | \neg H) \cdot P(\neg H)} \end{aligned} \] We can apply the fact that \(E_1|H \perp \!\!\!\!\!\! \perp E_2|H \perp \!\!\!\!\!\! \perp\dots \perp \!\!\!\!\!\! \perp E_n | H\) (\(\perp \!\!\!\!\!\! \perp\) is just notation for independence) to simplify further: \[ \begin{aligned} P(H | E_1,\dots,E_n) &= \frac{P(E_1 | H) \cdot ... \cdot P(E_n | H) \cdot P(H)}{P(E_1 | H) \cdot ... \cdot P(E_n | H) \cdot P(H) + P(E_1 |\neg H) \cdot ... \cdot P(E_n |\neg H) \cdot P(\neg H)} \\ \end{aligned} \] And lastly, we use the facts that \(P(E_i | H) = P(E_j | H) \ \forall i,j\) and \(P(E_k |\neg H) = P(E_l |\neg H) \ \forall k,l\) to make a final simplification: \[ \begin{aligned} P(H | E_1,\dots,E_n) &= \frac{[P(E_i|H)]^n \cdot P(H)} {[P(E_i|H)]^n \cdot P(H) + [P(E_i|\neg H)]^n \cdot P(\neg H)} \end{aligned} \] We’ve found the same result as when we used the Bayesian update procedure!


wrapping things up

In this post, we worked through a problem dealing with probability in a sports context using two different methods, and we saw that both methods resulted in the same answer. That’s the beauty of math – there’s rarely ever one sole path to the solution. This was meant to be a straightforward use of Bayesian inference to solve a simple problem, but resulted in a pretty lengthy post. For those of you who stuck it out with me, I hope you were able to follow along and that you found the process of solving this problem as rewarding as I did. Thanks for reading!

Next
Previous

Related