Bayesian Inference in Baseball - Simply Estimating a Player’s Hitting Ability

What is Bayesian Inference?

Bayesian inference is a way to update our beliefs about something using new data.

In baseball, we can use it to estimate a players batting average.

Example: A player has 5 hits in 20 at bats (average .250). Is He really a .250 hitter, or could it be luck?

We will use Bayesian methods to combine prior knowledge with this data using Bayesian methods.

Why Use Bayesian Inference in Baseball?

it lets us include what we are already know (like average batting averages in the league).
gives a range of possible true hitting abilities, not just one number.
helps predict future performance.

League average batting average is about .250. We will use that as our starting belief or prior.

Data: 5 hits out of 20 at bats. Prior: League average is about .250

The Prior Distribution

We start with a “prior” belief about the batting average $\theta$ (theta).

we will use a Beta distribution, which is good for probabilities between 0 and 1.

prior: Beta($\alpha$ = 5 $\beta$ = 15) this means we think the average is around 5/(5+15) = 0.25 with some uncertainty.

Bayes’ Theorem

We model hits as Binomial: each at bat is a “trial” with success probability $\theta$.

Likelihood: $) P(X = k | \theta) = \binom{n}{k} \theta^k (1 - \theta)^{n - k}$ where n = 20 at bats, k = 5 hits.

Prior: ($\theta$ ~ Beta($\alpha$, $\beta$) $ p() ^{- 1} (1- )^{- 1} )

Bayes’ Theorem: $p(\theta | X) \propto p(X | \theta) \cdot p(\theta)$

Updating to Posterior

With Beta prior and Binomial data, the posterior is also Beta

Posterior parameters: $) \alpha' = \alpha + k = 5 + 5 = 10$ $) \beta' = \beta + (n - k) = 15 + (20 - 5) = 30$

Posterior mean: $) E[\theta | X] = \frac{\alpha'}{\alpha' + \beta'} = \frac{10}{40} = 0.25$

95% Credible Interval: The range where 95% of the posterior probability lies.

Prior vs Posterior

The data updates our prior to the posterior.

#Posterior mean: 0.25 #Credible Interval: [0.13, 0.393]

Sensitivity to Prior Strength

How does changing the “Strength” of our prior affect the posterior?

We vary the prior strength(like how many “fake” at bats we believe in).

Predicting Future Hits

Using the posterior to predict hits in the next 10 at bats.

Sample many possible $\theta$ then simulate future hits.

#Expected future hits: 2.5

R Code for plots

Example of the code for the Prior vs. Posterior plot:

hits <- 5 at_bats <- 20 prior_alpha <- 5 prior_beta <- 15 post_alpha <- prior_alpha + hits post_beta <- prior_beta + (at_bats - hits) theta <- seq(0, 1, length.out = 500) df_prior <- data.frame(theta = theta, density = dbeta(theta, prior_alpha, prior_beta)) df_post <- data.frame(theta = theta, density = dbeta(theta, post_alpha, post_beta)) df_combined <- rbind(cbind(df_prior, dist = “Prior”), cbind(df_post, dist = “Posterior” )) ggplot(df_combined, aes(x = theta, y = density, color = dist)) + geom_line(size = 1) + labs(title = “Prior vs. Posterior Distribution”, x = expression(Batting_Average(theta)), y = “Density”) + theme_minimal()

Summary and Notes

Bayesian inference helps us make better estimates in baseball by combining data and prior knowledge.

This example used simple numbers to show:

prior to posterior update
Credible intervals
Sensitivity analysis
Future Predictions

Try changing the hits, at bats, or prior to see how it affects results!

Created with R Markdown and Ioslides.

Uses ggplot2, plotly, and LaTex.