4/1/2025

Background Info for Project

  • Got really into Premier League soccer recently
  • Curious about which teams actually outperform expectations over time
  • Used historical data from to 1992 - 2023
  • Tried to build models that could predict match outcomes
  • Used Kaggle for my data

Data Overview

epl_data <- read.csv("prepared_epl_data.csv")
  • Structure: 12026 matches across 31 seasons
  • Key variables:
    • Teams (Home/Away)
    • Goals scored
    • Match date and season
    • Final result (win/draw/loss)

Match Outcome Distribution

  • Home teams win: 45.9% of matches
  • Draws: 25.8% of matches
  • Away teams win: 28.4% of matches

Home Advantage Over Time

Home Advantage by Decade

Goals Analysis

Goal Distribution Patterns

Team Performance - Home & Away

Random Forest Model & Performance

  • Random Forest: 73% accuracy vs. baseline of 46%

Conclusion

  • Home advantage has declined significantly over three decades

    • From ~60% in the 1990s to ~45% in recent seasons
    • Suggests tactical evolution and professionalization of away approaches
  • Modern Premier League is slightly more goal-rich than early seasons

  • Elite clubs maintain their dominance despite more competitive league

    • Top teams win ~65% of home games (vs. league average of ~46%)
  • Random Forest achieved 73% accuracy

    • Goal difference and team identity are strongest predictors
    • Significantly outperforms baseline prediction methods