2026-02-05

Introduction

Logistic Regression We are using Logistic Regression to predict the probability of a home win based on match statistics.

Variables:

  • Outcome (\(Y\)): Home Win (1 = Yes, 0 = No)
  • Predictor 1 (\(X_1\)): Shots on Target
  • Predictor 2 (\(X_2\)): Goals Scored by Half Time

The Logistic Model

We model the log-odds of winning as a linear function

\[\ln \left( \frac{P(Y=1)}{1 - P(Y=1)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2\] Solving for probability gives us the Sigmoid function:

\[P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2)}}\]

Visualizing Wins vs Shots

Winning teams generally have a higher median number of shots on target.

The Sigmoid Curve

The “S-Curve” shows how the probability of winning increases as shots increase

3D Probability Surface

Visualizing the probability of winning (Z-axis) based on both half time goals and shots.

Estimation: Maximum Likelihood

Logistic Regression uses Maximum Likelihood Estimation(MLE) rather than Least Squares. We maximize the likelihood function \(L\):

\[L(\beta) = \prod_{i=1}^{n} P(Y_i)^{y_i} (1 - P(Y_i))^{1-y_i}\]

This finds the parameters that make the observed data most probable.

R Code Implementation

We use the ‘glm()’ function with the binomial family.

df <- df %>% mutate(Win = ifelse(Result == "H", 1, 0))
  
logit_model <- glm(Win ~ HomeShotsOnTarget + HalfTimeHomeGoals,
                   data = df, family = "binomial")

summary(logit_model)

Results and Interpretation

##                    Estimate  Std. Error   z value      Pr(>|z|)
## (Intercept)       -1.997725 0.056554017 -35.32420 2.497009e-273
## HomeShotsOnTarget  0.168077 0.007898827  21.27873 1.787012e-100
## HalfTimeHomeGoals  1.251115 0.035724330  35.02138 1.063579e-268

Conclusion:

  • Intercept: Negative, meaning a team with 0 shots and 0 goals has a near-zero chance of winning.
  • Half Time Goals: The strongest predictor. A lead at halftime massively increases win probability.

Summary

This Logistic Regression analysis demonstrates:

  1. Binary Modeling: We successfully modeled the Win/No-Win outcome.
  2. Sigmoid Relationship: The probability of winning follows an “S-shape”, not a straight line.
  3. Key Factors: Shots on Target and Halftime Goals are statistically significant drivers of victory.