Welcome to our exploration of FIFA Players Data. In this notebook, we’re diving into the world of football players to uncover some interesting insights about their attributes and how they stack up against each other.
Getting to Know the Data: We’ll start by summarizing the basics—like the ages, heights, and overall ratings of the players. We’ll also check out how players are distributed across different nationalities. Finally, we will visualize how a player’s physical attributes change by age and how their wages are related to their overall rating.
Foot Preferences and Ratings: Ever wondered if a player’s preferred foot (left or right) affects their performance? We’ll explore whether there’s a noticeable difference in average ratings between players who favor their left foot versus their right.
Top Footballing Nations: We’ll identify which countries have the most players in the dataset. This will give us a snapshot of which nations are most represented in the world of football.
Player Count vs. Performance: Finally, we’ll investigate if there’s a pattern between the number of players from a country and their average ratings. Do countries with more players tend to have higher ratings?
Visualization: Physical attributes vs Age: Ever wondered how a Player’s physical attributes change by age ?
Visualization: Player earning vs Overall rating: How does a Football Player’s earning depend on their overall rating ?
# For numeric columns
numeric_summary <- Fifa_Players_Data |>
summarise(
age_min = min(age, na.rm = TRUE),
age_max = max(age, na.rm = TRUE),
age_mean = mean(age, na.rm = TRUE),
age_median = median(age, na.rm = TRUE),
age_25th = quantile(age, 0.25, na.rm = TRUE),
age_75th = quantile(age, 0.75, na.rm = TRUE),
height_min = min(height_cm, na.rm = TRUE),
height_max = max(height_cm, na.rm = TRUE),
height_mean = mean(height_cm, na.rm = TRUE),
height_median = median(height_cm, na.rm = TRUE),
height_25th = quantile(height_cm, 0.25, na.rm = TRUE),
height_75th = quantile(height_cm, 0.75, na.rm = TRUE),
overall_rating_min = min(overall_rating, na.rm = TRUE),
overall_rating_max = max(overall_rating, na.rm = TRUE),
overall_rating_mean = mean(overall_rating, na.rm = TRUE),
overall_rating_median = median(overall_rating, na.rm = TRUE),
overall_rating_25th = quantile(overall_rating, 0.25, na.rm = TRUE),
overall_rating_75th = quantile(overall_rating, 0.75, na.rm = TRUE)
)
#For Category Columns - Nationality
nationality_counts <- Fifa_Players_Data |>
count(nationality) |>
rename(Category = nationality, Count = n) |>
mutate(Column = "nationality")
# Combine the data frames into a single summary table
summary_table <- bind_rows(nationality_counts)
# Arrange the summary table by Column and Category for better readability
summary_table <- summary_table |>
arrange(Column, desc(Count))
print(numeric_summary)
## # A tibble: 1 × 18
## age_min age_max age_mean age_median age_25th age_75th height_min height_max
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 17 46 25.6 25 22 29 152. 206.
## # ℹ 10 more variables: height_mean <dbl>, height_median <dbl>,
## # height_25th <dbl>, height_75th <dbl>, overall_rating_min <dbl>,
## # overall_rating_max <dbl>, overall_rating_mean <dbl>,
## # overall_rating_median <dbl>, overall_rating_25th <dbl>,
## # overall_rating_75th <dbl>
The summary statistics for the data show that the minimum age is 16, the maximum age is 40, and the median age is 25. The average height is 175.26 cm, with a minimum height of 154 cm and a maximum height of 205 cm. The overall rating ranges from 3 to 94, with a median of 67.
print(summary_table)
## # A tibble: 160 × 3
## Category Count Column
## <chr> <int> <chr>
## 1 England 1658 nationality
## 2 Germany 1199 nationality
## 3 Spain 1070 nationality
## 4 France 925 nationality
## 5 Argentina 904 nationality
## 6 Brazil 832 nationality
## 7 Italy 655 nationality
## 8 Colombia 624 nationality
## 9 Japan 466 nationality
## 10 Netherlands 441 nationality
## # ℹ 150 more rows
The top 10 nationalities in the dataset are dominated by England, with the highest number of players at 1,658, followed by Germany and Spain. Although there’s a significant drop after the top three, countries like the Netherlands still contribute notable numbers of players. Overall, these figures highlight England’s prominent presence in the dataset, while showing diverse representation from other major footballing nations.
average_rating_by_foot <- Fifa_Players_Data |>
group_by(preferred_foot) |>
summarise(average_overall_rating = mean(overall_rating, na.rm = TRUE)) |>
arrange(desc(average_overall_rating))
print(average_rating_by_foot)
## # A tibble: 2 × 2
## preferred_foot average_overall_rating
## <chr> <dbl>
## 1 Left 66.8
## 2 Right 66.1
This small difference suggests that a player’s preferred foot has a negligible impact on their average overall rating.
#Compute top 10 nations
top_10_nations <- nationality_counts |>
arrange(desc(Count)) |>
head(10)
print(top_10_nations)
## # A tibble: 10 × 3
## Category Count Column
## <chr> <int> <chr>
## 1 England 1658 nationality
## 2 Germany 1199 nationality
## 3 Spain 1070 nationality
## 4 France 925 nationality
## 5 Argentina 904 nationality
## 6 Brazil 832 nationality
## 7 Italy 655 nationality
## 8 Colombia 624 nationality
## 9 Japan 466 nationality
## 10 Netherlands 441 nationality
avg_rating_by_nationality <- Fifa_Players_Data |>
group_by(nationality) |>
summarise(
player_count = n(),
avg_rating = mean(overall_rating, na.rm = TRUE)
) |>
arrange(desc(player_count))
print(avg_rating_by_nationality)
## # A tibble: 160 × 3
## nationality player_count avg_rating
## <chr> <int> <dbl>
## 1 England 1658 63.6
## 2 Germany 1199 66.1
## 3 Spain 1070 69.6
## 4 France 925 67.9
## 5 Argentina 904 68.7
## 6 Brazil 832 71.1
## 7 Italy 655 68.8
## 8 Colombia 624 65.2
## 9 Japan 466 62.6
## 10 Netherlands 441 67.9
## # ℹ 150 more rows
Brazil stands out with the highest average rating of 71.05. Despite having fewer players compared to some other nations, Brazilian players tend to perform better on average. This suggests Brazil’s footballers are among the best in this data set.
# Assuming your data is in a data frame named "player_data"
player_data_summary <- Fifa_Players_Data |>
filter(age <= 45) |>
group_by(age) |>
summarize(
mean_acceleration = mean(acceleration),
mean_sprint_speed = mean(sprint_speed),
mean_stamina = mean(stamina),
mean_strength = mean(strength)
)
ggplot(player_data_summary, aes(x = age)) +
geom_line(aes(y = mean_acceleration, color = "Acceleration"), linetype = 1, size = 0.8) +
geom_line(aes(y = mean_sprint_speed, color = "Sprint Speed"), linetype = 2, size = 0.8) +
geom_line(aes(y = mean_stamina, color = "Stamina"), linetype = 3, size = 0.8) +
geom_line(aes(y = mean_strength, color = "Strength"), linetype = 4, size = 0.8) +
labs(
title = "Range of Physical Attributes by Age",
x = "Age",
y = "Mean Yearly Change in Rating"
) +
scale_color_manual(values = c("Acceleration" = "#000000", "Sprint Speed" = "#FF0000", "Stamina" = "#008000", "Strength" = "#FFA500")) +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
As football players get older, their speed and agility naturally decline. While they might gain some muscle mass and strength, they often find it harder to keep up with younger players. This is especially true for skills like acceleration and sprint speed, which are crucial for attacking players.
ggplot(Fifa_Players_Data, aes(x = overall_rating, y = wage_euro)) +
geom_point(data = Fifa_Players_Data %>%
filter(!is.na(wage_euro) & !is.infinite(wage_euro)),
alpha = 0.5) +
geom_smooth(formula = 'y ~ x',
data = Fifa_Players_Data %>%
filter(!is.na(wage_euro) & !is.infinite(wage_euro)),
method = "lm", se = FALSE, color = "red") +
labs(title = "Wage vs. Overall Rating",
x = "Overall Rating",
y = "Wage (EUR)") +
scale_y_continuous(labels = scales::comma)
Overall rating is a strong predictor of wage, but other factors like age, potential, and nationality also influence player performance and earnings.