Dataset Description

This dataset contains detailed information about all FIFA World Cup matches from 1930 to 2022.

Source: Kaggle – FIFA World Cup Matches Dataset

Variables include:

  • Teams, match scores, managers, captains
  • Attendance, venue, date, round
  • Expected goals (xG), penalties, red/yellow cards, substitutions, and more

Line Plot: Average Attendance by Year

Bar Chart: Average Goals by Match Round

3D Scatter Plot: Home Score vs Away Score Over Time

Pie Chart: Top 10 Goal Scoring Teams

Statistical Analysis: SLR for Home Score

We built a simple linear regression model to explore the relationship between a match’s home score and two variables: - Predictors: away_score, Year - Response: home_score

This will help us explore - if higher away scores correlate with higher home scores, and if scoring changes with time.

model <- lm(home_score ~ away_score + Year, data = df)
## R-squared: 0.116
Linear Regression Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.9163 4.0207 11.6688 0.000
away_score 0.0479 0.0453 1.0565 0.291
Year -0.0227 0.0020 -11.2304 0.000

Key Insights:

  • R-squared = 0.116 → This model explains ~11.6% of the variation in home scores
  • away_score has a small positive coefficient (0.048) but is not statistically significant (p = 0.291).
  • Year has a statistically significant negative coefficient (-0.0227), suggesting that home teams have scored slightly fewer goals over time.

Thus, it suggests that more variables are needed to fully understand what influences home team scoring.

Concluding Thoughts

  • The visualizations highlight clear trends in World Cup history, such as fluctuating attendance, goal scoring across rounds, and differences between teams.
  • From the regression, we see limited predictive power using just away scores and year — suggesting other variables (team strength, venue, round) likely play a role.
  • Further analysis could explore more sophisticated models using categorical variables like team and round.