This project explores the role and influence of penalty kicks in
professional soccer. Drawing on data from several top-tier and
second-tier leagues around the world, the analysis aims to quantify how
penalty kicks affect match outcomes, league standings, and individual
scoring achievements.
Motivation
The idea for this project originated from an offhand remark during a
soccer match: “Maybe penalty kicks should be worth half as much.” That
casual comment lingered in my mind and sparked a deeper curiosity about
the true value and impact of penalty kicks in the modern game. What
began as a thought experiment has grown into a full-fledged data
analysis project.
Key Questions
This analysis seeks to answer several core questions:
What is the relationship between penalty kicks (both awarded and
converted) and game results, league rankings, and top scorer races?
How does this relationship vary across different leagues and
regions?
How have the patterns and influence of penalty kicks evolved over
time?
Can penalty kick statistics serve as predictors for broader
outcomes, such as a team’s season success or an individual player’s
awards?
Project Deliverables
The results of this project will be presented in three distinct
formats:
Code & Analysis: Annotated code used in the
data collection and analysis process, along with commentary explaining
the methods and findings.
Discussion: A narrative interpretation of the
results, aimed at a general audience. This document will highlight key
takeaways while minimizing technical jargon.
Reports: Visually appealing presentations of the
findings, designed for hypothetical stakeholders. These reports will
focus on clarity and accessibility while retaining essential technical
context when needed.
Planning & Approach
Audience
This project is intended to be accessible to anyone with a basic
understanding of soccer. If you know that a penalty kick is a one-on-one
shot against the goalkeeper from a designated spot, you have all the
background needed. More technical or sport-specific concepts will be
explained as necessary.
Data Sources
I plan to collect data from regular-season matches in the following
leagues:
Top-tier leagues:
Premier League (England)
Ligue 1 (France)
Bundesliga (Germany)
Serie A (Italy)
La Liga (Spain)
Major League Soccer (USA)
Liga MX (Mexico)
Second-tier leagues:
EFL Championship (England)
Ligue 2 (France)
Bundesliga (Germany)
Serie B (Italy)
Segunda División (Spain)
Anticipated Challenges
The primary challenge will be acquiring comprehensive and consistent
data. While top-division leagues are well-documented, lower-tier leagues
may have less publicly available or granular data—particularly at the
match level. This could limit the depth of analysis for certain leagues
or time periods.