DATA-607 Project 2

IS 607 – Project 2 The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. Your task is to: (1) Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets: Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!] Perform the analysis requested in the discussion item. Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions.

Source: 2021 Fifa player data. We will begin the tidying of this data set by loading the necessary libraries as well as loading the raw csv file into a data frame. We then store the data in a data frame which we will call fifaplayer_data.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

fifaraw_data <- "https://raw.githubusercontent.com/Zcash95/DATA607-Project2/main/fifa21_raw_data.csv"

fifaplayer_data <- read.csv(fifaraw_data, header = TRUE, sep = ",")


head(fifaplayer_data)
##                                           photoUrl                     LongName
## 1 https://cdn.sofifa.com/players/158/023/21_60.png                 Lionel Messi
## 2 https://cdn.sofifa.com/players/020/801/21_60.png C. Ronaldo dos Santos Aveiro
## 3 https://cdn.sofifa.com/players/200/389/21_60.png                    Jan Oblak
## 4 https://cdn.sofifa.com/players/192/985/21_60.png              Kevin De Bruyne
## 5 https://cdn.sofifa.com/players/190/871/21_60.png   Neymar da Silva Santos Jr.
## 6 https://cdn.sofifa.com/players/188/545/21_60.png           Robert Lewandowski
##                                                            playerUrl
## 1               http://sofifa.com/player/158023/lionel-messi/210005/
## 2 http://sofifa.com/player/20801/c-ronaldo-dos-santos-aveiro/210005/
## 3                  http://sofifa.com/player/200389/jan-oblak/210005/
## 4            http://sofifa.com/player/192985/kevin-de-bruyne/210005/
## 5  http://sofifa.com/player/190871/neymar-da-silva-santos-jr/210005/
## 6         http://sofifa.com/player/188545/robert-lewandowski/210005/
##   Nationality Positions              Name Age X.OVA POT
## 1   Argentina  RW ST CF          L. Messi  33    93  93
## 2    Portugal     ST LW Cristiano Ronaldo  35    92  92
## 3    Slovenia        GK          J. Oblak  27    91  93
## 4     Belgium    CAM CM      K. De Bruyne  29    91  91
## 5      Brazil    LW CAM         Neymar Jr  28    91  91
## 6      Poland        ST    R. Lewandowski  31    91  91
##                                Team...Contract     ID Height Weight  foot BOV
## 1        \n\n\n\nFC Barcelona\n2004 ~ 2021\n\n 158023   5'7" 159lbs  Left  93
## 2            \n\n\n\nJuventus\n2018 ~ 2022\n\n  20801   6'2" 183lbs Right  92
## 3     \n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n 200389   6'2" 192lbs Right  91
## 4     \n\n\n\nManchester City\n2015 ~ 2023\n\n 192985  5'11" 154lbs Right  91
## 5 \n\n\n\nParis Saint-Germain\n2017 ~ 2022\n\n 190871   5'9" 150lbs Right  91
## 6   \n\n\n\nFC Bayern München\n2014 ~ 2023\n\n 188545   6'0" 176lbs Right  91
##    BP Growth       Joined Loan.Date.End  Value  Wage Release.Clause Attacking
## 1  RW      0  Jul 1, 2004           N/A €67.5M €560K        €138.4M       429
## 2  ST      0 Jul 10, 2018           N/A   €46M €220K         €75.9M       437
## 3  GK      2 Jul 16, 2014           N/A   €75M €125K        €159.4M        95
## 4 CAM      0 Aug 30, 2015           N/A   €87M €370K          €161M       407
## 5  LW      0  Aug 3, 2017           N/A   €90M €270K        €166.5M       408
## 6  ST      0  Jul 1, 2014           N/A   €80M €240K          €132M       423
##   Crossing Finishing Heading.Accuracy Short.Passing Volleys Skill Dribbling
## 1       85        95               70            91      88   470        96
## 2       84        95               90            82      86   414        88
## 3       13        11               15            43      13   109        12
## 4       94        82               55            94      82   441        88
## 5       85        87               62            87      87   448        95
## 6       71        94               85            84      89   407        85
##   Curve FK.Accuracy Long.Passing Ball.Control Movement Acceleration
## 1    93          94           91           96      451           91
## 2    81          76           77           92      431           87
## 3    13          14           40           30      307           43
## 4    85          83           93           92      398           77
## 5    88          89           81           95      453           94
## 6    79          85           70           88      407           77
##   Sprint.Speed Agility Reactions Balance Power Shot.Power Jumping Stamina
## 1           80      91        94      95   389         86      68      72
## 2           91      87        95      71   444         94      95      84
## 3           60      67        88      49   268         59      78      41
## 4           76      78        91      76   408         91      63      89
## 5           89      96        91      83   357         80      62      81
## 6           78      77        93      82   420         89      84      76
##   Strength Long.Shots Mentality Aggression Interceptions Positioning Vision
## 1       69         94       347         44            40          93     95
## 2       78         93       353         63            29          95     82
## 3       78         12       140         34            19          11     65
## 4       74         91       408         76            66          88     94
## 5       50         84       356         51            36          87     90
## 6       86         85       391         81            49          94     79
##   Penalties Composure Defending Marking Standing.Tackle Sliding.Tackle
## 1        75        96        91      32              35             24
## 2        84        95        84      28              32             24
## 3        11        68        57      27              12             18
## 4        84        91       186      68              65             53
## 5        92        93        94      35              30             29
## 6        88        88        96      35              42             19
##   Goalkeeping GK.Diving GK.Handling GK.Kicking GK.Positioning GK.Reflexes
## 1          54         6          11         15             14           8
## 2          58         7          11         15             14          11
## 3         437        87          92         78             90          90
## 4          56        15          13          5             10          13
## 5          59         9           9         15             15          11
## 6          51        15           6         12              8          10
##   Total.Stats Base.Stats W.F SM    A.W    D.W  IR PAC SHO PAS DRI DEF PHY  Hits
## 1        2231        466 4 ★ 4★ Medium    Low 5 ★  85  92  91  95  38  65 \n372
## 2        2221        464 4 ★ 5★   High    Low 5 ★  89  93  81  89  35  77 \n344
## 3        1413        489 3 ★ 1★ Medium Medium 3 ★  87  92  78  90  52  90  \n86
## 4        2304        485 5 ★ 4★   High   High 4 ★  76  86  93  88  64  78 \n163
## 5        2175        451 5 ★ 5★   High Medium 5 ★  91  85  86  94  36  59 \n273
## 6        2195        457 4 ★ 4★   High Medium 4 ★  78  91  78  85  43  82 \n182

We will then continue to remove the columns which are not necessary to perform our analysis. This dataset contains 77 columns which many are added and combined in the total stats column. Because of this we can remove the columns containing individual stats which are not required for our analysis.

columns_to_remove <- c(23:63, which(names(fifaplayer_data) %in% c("photoUrl", "playerUrl", "Hits")))

fifaplayer_data1 <- fifaplayer_data %>%
  select(-columns_to_remove)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(columns_to_remove)
## 
##   # Now:
##   data %>% select(all_of(columns_to_remove))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
str(fifaplayer_data1)
## 'data.frame':    18979 obs. of  33 variables:
##  $ LongName       : chr  "Lionel Messi" "C. Ronaldo dos Santos Aveiro" "Jan Oblak" "Kevin De Bruyne" ...
##  $ Nationality    : chr  "Argentina" "Portugal" "Slovenia" "Belgium" ...
##  $ Positions      : chr  "RW ST CF" "ST LW" "GK" "CAM CM" ...
##  $ Name           : chr  "L. Messi" "Cristiano Ronaldo" "J. Oblak" "K. De Bruyne" ...
##  $ Age            : int  33 35 27 29 28 31 21 27 28 28 ...
##  $ X.OVA          : int  93 92 91 91 91 91 90 90 90 90 ...
##  $ POT            : int  93 92 93 91 91 91 95 91 90 90 ...
##  $ Team...Contract: chr  "\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n" "\n\n\n\nJuventus\n2018 ~ 2022\n\n" "\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n" "\n\n\n\nManchester City\n2015 ~ 2023\n\n" ...
##  $ ID             : int  158023 20801 200389 192985 190871 188545 231747 212831 209331 208722 ...
##  $ Height         : chr  "5'7\"" "6'2\"" "6'2\"" "5'11\"" ...
##  $ Weight         : chr  "159lbs" "183lbs" "192lbs" "154lbs" ...
##  $ foot           : chr  "Left" "Right" "Right" "Right" ...
##  $ BOV            : int  93 92 91 91 91 91 91 90 90 90 ...
##  $ BP             : chr  "RW" "ST" "GK" "CAM" ...
##  $ Growth         : int  0 0 2 0 0 0 5 1 0 0 ...
##  $ Joined         : chr  "Jul 1, 2004" "Jul 10, 2018" "Jul 16, 2014" "Aug 30, 2015" ...
##  $ Loan.Date.End  : chr  "N/A" "N/A" "N/A" "N/A" ...
##  $ Value          : chr  "€67.5M" "€46M" "€75M" "€87M" ...
##  $ Wage           : chr  "€560K" "€220K" "€125K" "€370K" ...
##  $ Release.Clause : chr  "€138.4M" "€75.9M" "€159.4M" "€161M" ...
##  $ Total.Stats    : int  2231 2221 1413 2304 2175 2195 2147 1389 2211 2203 ...
##  $ Base.Stats     : int  466 464 489 485 451 457 466 490 470 469 ...
##  $ W.F            : chr  "4 ★" "4 ★" "3 ★" "5 ★" ...
##  $ SM             : chr  "4★" "5★" "1★" "4★" ...
##  $ A.W            : chr  "Medium" "High" "Medium" "High" ...
##  $ D.W            : chr  "Low" "Low" "Medium" "High" ...
##  $ IR             : chr  "5 ★" "5 ★" "3 ★" "4 ★" ...
##  $ PAC            : int  85 89 87 76 91 78 96 86 93 94 ...
##  $ SHO            : int  92 93 92 86 85 91 86 88 86 85 ...
##  $ PAS            : int  91 81 78 93 86 78 78 85 81 80 ...
##  $ DRI            : int  95 89 90 88 94 85 91 89 90 90 ...
##  $ DEF            : int  38 35 52 64 36 43 39 51 45 44 ...
##  $ PHY            : int  65 77 90 78 59 82 76 91 75 76 ...
head(fifaplayer_data1)
##                       LongName Nationality Positions              Name Age
## 1                 Lionel Messi   Argentina  RW ST CF          L. Messi  33
## 2 C. Ronaldo dos Santos Aveiro    Portugal     ST LW Cristiano Ronaldo  35
## 3                    Jan Oblak    Slovenia        GK          J. Oblak  27
## 4              Kevin De Bruyne     Belgium    CAM CM      K. De Bruyne  29
## 5   Neymar da Silva Santos Jr.      Brazil    LW CAM         Neymar Jr  28
## 6           Robert Lewandowski      Poland        ST    R. Lewandowski  31
##   X.OVA POT                              Team...Contract     ID Height Weight
## 1    93  93        \n\n\n\nFC Barcelona\n2004 ~ 2021\n\n 158023   5'7" 159lbs
## 2    92  92            \n\n\n\nJuventus\n2018 ~ 2022\n\n  20801   6'2" 183lbs
## 3    91  93     \n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n 200389   6'2" 192lbs
## 4    91  91     \n\n\n\nManchester City\n2015 ~ 2023\n\n 192985  5'11" 154lbs
## 5    91  91 \n\n\n\nParis Saint-Germain\n2017 ~ 2022\n\n 190871   5'9" 150lbs
## 6    91  91   \n\n\n\nFC Bayern München\n2014 ~ 2023\n\n 188545   6'0" 176lbs
##    foot BOV  BP Growth       Joined Loan.Date.End  Value  Wage Release.Clause
## 1  Left  93  RW      0  Jul 1, 2004           N/A €67.5M €560K        €138.4M
## 2 Right  92  ST      0 Jul 10, 2018           N/A   €46M €220K         €75.9M
## 3 Right  91  GK      2 Jul 16, 2014           N/A   €75M €125K        €159.4M
## 4 Right  91 CAM      0 Aug 30, 2015           N/A   €87M €370K          €161M
## 5 Right  91  LW      0  Aug 3, 2017           N/A   €90M €270K        €166.5M
## 6 Right  91  ST      0  Jul 1, 2014           N/A   €80M €240K          €132M
##   Total.Stats Base.Stats W.F SM    A.W    D.W  IR PAC SHO PAS DRI DEF PHY
## 1        2231        466 4 ★ 4★ Medium    Low 5 ★  85  92  91  95  38  65
## 2        2221        464 4 ★ 5★   High    Low 5 ★  89  93  81  89  35  77
## 3        1413        489 3 ★ 1★ Medium Medium 3 ★  87  92  78  90  52  90
## 4        2304        485 5 ★ 4★   High   High 4 ★  76  86  93  88  64  78
## 5        2175        451 5 ★ 5★   High Medium 5 ★  91  85  86  94  36  59
## 6        2195        457 4 ★ 4★   High Medium 4 ★  78  91  78  85  43  82

We will now analyze this data by taking the total stats of the players and finding the proportion between that and the value of their contracts. Soccer players such as Lionel Messi and Kylian Mbappe have been given exuberant amounts of money in their respective contracts with their teams. Finding the proportion of their total stats to the value of their contracts will give us the insight to determine whether the amount of money in a players contract reflect how well they’ll perform.

fifaplayer_data1$Value <- as.numeric(gsub("[^0-9.]", "", fifaplayer_data1$Value))

fifaplayer_data1$Total.Stats <- as.numeric(fifaplayer_data1$Total.Stats)

fifaplayer_data1$Proportion <- fifaplayer_data1$Total.Stats / fifaplayer_data1$Value

complete_rows <- complete.cases(fifaplayer_data1[, c("Total.Stats", "Value")])

We first summarize the data and can see that the average total stats of players is a score of 1595. This will help us better gauge a players performance just by taking an initial look at it. We complete this by creating a scatter plot to visualize the value of contracts in comparison with that players stats performance on fifa”

summary(fifaplayer_data1$`Total.Stats`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     747    1452    1627    1595    1781    2316
ggplot(fifaplayer_data1[complete_rows,], aes(x = Total.Stats, y = Value)) +
  geom_point() +
  labs(title = "Proportion of Total Stats to Contract Value",
       x = "Total Stats",
       y = "Contract Value (in millions)")

Conclusion

Based on the analysis of the dataset, a positive correlation exists between high contract values and high player statistics in FIFA. However, a negative correlation is not observed; players with high statistics do not necessarily have high contract values. This suggests that while player performance is a significant factor influencing contract value, it is not the sole determinant. Other factors, such as a player’s previous league experience, media exposure, and early career trajectory (e.g., child prodigy status), potentially influence contract values to a significant degree. This finding highlights the multi-faceted nature of contract valuation in professional sports, where on-field performance interacts with various external factors to determine a player’s financial worth.