Loading Libraries & Dataset

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.2.0     âś” readr     2.1.5
## âś” forcats   1.0.1     âś” stringr   1.6.0
## âś” ggplot2   4.0.2     âś” tibble    3.3.0
## âś” lubridate 1.9.4     âś” tidyr     1.3.1
## âś” purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
 
setwd("~/Documents/EC/Spring 2026/DATA 101/Project 1")
 
fifa_21 <- read.csv("FIFA-21 Complete.csv", sep = ";")
str(fifa_21)
## 'data.frame':    17981 obs. of  9 variables:
##  $ player_id  : int  158023 20801 190871 203376 200389 192985 188545 183277 212831 209331 ...
##  $ name       : chr  "Lionel Messi" "Cristiano Ronaldo" "Neymar Jr" "Virgil van Dijk" ...
##  $ nationality: chr  "Argentina" "Portugal" "Brazil" "Netherlands" ...
##  $ position   : chr  "ST|CF|RW" "ST|LW" "CAM|LW" "CB" ...
##  $ overall    : int  94 93 92 91 91 91 91 91 90 90 ...
##  $ age        : int  33 35 28 29 27 29 31 29 27 28 ...
##  $ hits       : int  299 276 186 127 47 119 89 66 53 94 ...
##  $ potential  : int  94 93 92 92 93 91 91 91 91 90 ...
##  $ team       : chr  "FC Barcelona " "Juventus " "Paris Saint-Germain " "Liverpool " ...
head(fifa_21, n=6)
##   player_id              name nationality position overall age hits potential
## 1    158023      Lionel Messi   Argentina ST|CF|RW      94  33  299        94
## 2     20801 Cristiano Ronaldo    Portugal    ST|LW      93  35  276        93
## 3    190871         Neymar Jr      Brazil   CAM|LW      92  28  186        92
## 4    203376   Virgil van Dijk Netherlands       CB      91  29  127        92
## 5    200389         Jan Oblak    Slovenia       GK      91  27   47        93
## 6    192985   Kevin De Bruyne     Belgium   CM|CAM      91  29  119        91
##                   team
## 1        FC Barcelona 
## 2            Juventus 
## 3 Paris Saint-Germain 
## 4           Liverpool 
## 5     Atlético Madrid 
## 6     Manchester City

Introduction

Does the age of the soccer player effect the current and potential overall rating of the player? The data set I selected to work on contains data of many soccer players’ stats on the video game FIFA. The stats include the players’ ratings on the game, their current age, their nationality, the position that they play and their potential for growth in the game. The question I stated in the first sentence is what I am going to discover throughout this project with various coding techniques. I will utilize the variables in this data set such as overall, age, and potential. I discovered the data set from the Git Hub link on blackboard which linked to Kaggle and it states that this particular data set was taken from fifaindex.com.

Data Analysis

To find if the age of the player effects the current and potential overall rating of the player, I will perform a table showing the means of each age’s potential and overall rating. I will then plug this into a scatter plot to have a nice visualization of the points. First, I will perform cleaning to the data set and select the main variables I am going to use in this project which are age, overall, and potential (I kept the names in there as well as it is appealing to see how many popular players are rated).

Cleaning

names(fifa_21) <- gsub("[(). \\-]", "_", names(fifa_21))
names(fifa_21) <- gsub("_$", "", names(fifa_21))
names(fifa_21) <- tolower(names(fifa_21))

head(fifa_21, n=6)
##   player_id              name nationality position overall age hits potential
## 1    158023      Lionel Messi   Argentina ST|CF|RW      94  33  299        94
## 2     20801 Cristiano Ronaldo    Portugal    ST|LW      93  35  276        93
## 3    190871         Neymar Jr      Brazil   CAM|LW      92  28  186        92
## 4    203376   Virgil van Dijk Netherlands       CB      91  29  127        92
## 5    200389         Jan Oblak    Slovenia       GK      91  27   47        93
## 6    192985   Kevin De Bruyne     Belgium   CM|CAM      91  29  119        91
##                   team
## 1        FC Barcelona 
## 2            Juventus 
## 3 Paris Saint-Germain 
## 4           Liverpool 
## 5     Atlético Madrid 
## 6     Manchester City

Selecting Variables

fifa_21_ratings <- fifa_21 |>
  select(name, age, overall, potential)
head(fifa_21_ratings, n=6)
##                name age overall potential
## 1      Lionel Messi  33      94        94
## 2 Cristiano Ronaldo  35      93        93
## 3         Neymar Jr  28      92        92
## 4   Virgil van Dijk  29      91        92
## 5         Jan Oblak  27      91        93
## 6   Kevin De Bruyne  29      91        91

Table of mean of overall in correlation to age

overall_rating_mean <- fifa_21_ratings |>
  group_by(age) |>
  summarize(mean_overall = mean(overall, na.rm = TRUE))
overall_rating_mean
## # A tibble: 27 Ă— 2
##      age mean_overall
##    <int>        <dbl>
##  1    17         60.7
##  2    18         60.9
##  3    19         61.4
##  4    20         63.5
##  5    21         63.6
##  6    22         65.2
##  7    23         66.1
##  8    24         67.2
##  9    25         67.5
## 10    26         68.1
## # ℹ 17 more rows

Table of mean of potential in correlation to age

overall_rating_potential <- fifa_21_ratings |>
  group_by(age) |>
  summarize(mean_potential = mean(potential, na.rm = TRUE))
overall_rating_potential
## # A tibble: 27 Ă— 2
##      age mean_potential
##    <int>          <dbl>
##  1    17           78.5
##  2    18           78.0
##  3    19           76.5
##  4    20           75.6
##  5    21           74.6
##  6    22           74.5
##  7    23           73.8
##  8    24           72.8
##  9    25           72.0
## 10    26           71.0
## # ℹ 17 more rows

Scatterplot of mean of overall in correlation to age

scatterplot_overall <- ggplot(overall_rating_mean, aes(x = age, y = mean_overall)) +
  labs(title = "Correlation between Age and Overall Ratings in FIFA 21",
  caption = "Source: fifaindex.com",
  x = "Age of Players in FIFA 21", 
  y = "Overall Rating of Players in FIFA 21") +
  theme_minimal(base_size = 10)
scatterplot_overall + geom_point()

Scatterplot of mean of potential in correlation to age

scatterplot_potential <- ggplot(overall_rating_potential, aes(x = age, y = mean_potential)) +
  labs(title = "Correlation between Age and Potential Ratings in FIFA 21",
  caption = "Source: fifaindex.com",
  x = "Age of Players in FIFA 21", 
  y = "Potential Rating of Players in FIFA 21") +
  theme_minimal(base_size = 10)
scatterplot_potential + geom_point()

Conclusion and Future Directions

Looking at my findings, I can see that younger players’ overall mean is the lowest of all ages compared to being the highest for potential mean ages. For the overall mean we can see that the points peak at early to mid thirties which shown the prime of many players’ careers. It starts to decrease as they get older which makes sense as well as staying low when they are younger due to them still developing as a player. For potential, the younger you are the higher your potential rating could be as you still have so much room to develop as a player with all the years left in your career which explains the decline in means as the age value gets higher. There is an outline at age 42 with the point having an extremely high value in each graph as this can mean that there was not many 41 year-old athletes playing soccer which can fluctuate the mean for that age. Using these findings, we can see that the age of a soccer player does indeed affect the potential and overall rating of the player. We can improve these findings by adding a variable to see when these players are the healthiest at a certain age. It will allow us to have an extra variable to research upon and may make my findings more understandable.

References

Source: The data set was taken from fifaindex.com. (Found the source from the GitHub link on Blackboard which linked to Kaggle) Source Links: https://www.kaggle.com/datasets/aayushmishra1512/fifa-2021-complete-player-data, https://fifaindex.com/players/top/fifa21_486/