IS 607 – Project 2 The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work. Your task is to: (1) Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets: Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!] Perform the analysis requested in the discussion item. Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions.
Source: 2021 Fifa player data. We will begin the tidying of this data set by loading the necessary libraries as well as loading the raw csv file into a data frame. We then store the data in a data frame which we will call fifaplayer_data.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
fifaraw_data <- "https://raw.githubusercontent.com/Zcash95/DATA607-Project2/main/fifa21_raw_data.csv"
fifaplayer_data <- read.csv(fifaraw_data, header = TRUE, sep = ",")
head(fifaplayer_data)
## photoUrl LongName
## 1 https://cdn.sofifa.com/players/158/023/21_60.png Lionel Messi
## 2 https://cdn.sofifa.com/players/020/801/21_60.png C. Ronaldo dos Santos Aveiro
## 3 https://cdn.sofifa.com/players/200/389/21_60.png Jan Oblak
## 4 https://cdn.sofifa.com/players/192/985/21_60.png Kevin De Bruyne
## 5 https://cdn.sofifa.com/players/190/871/21_60.png Neymar da Silva Santos Jr.
## 6 https://cdn.sofifa.com/players/188/545/21_60.png Robert Lewandowski
## playerUrl
## 1 http://sofifa.com/player/158023/lionel-messi/210005/
## 2 http://sofifa.com/player/20801/c-ronaldo-dos-santos-aveiro/210005/
## 3 http://sofifa.com/player/200389/jan-oblak/210005/
## 4 http://sofifa.com/player/192985/kevin-de-bruyne/210005/
## 5 http://sofifa.com/player/190871/neymar-da-silva-santos-jr/210005/
## 6 http://sofifa.com/player/188545/robert-lewandowski/210005/
## Nationality Positions Name Age X.OVA POT
## 1 Argentina RW ST CF L. Messi 33 93 93
## 2 Portugal ST LW Cristiano Ronaldo 35 92 92
## 3 Slovenia GK J. Oblak 27 91 93
## 4 Belgium CAM CM K. De Bruyne 29 91 91
## 5 Brazil LW CAM Neymar Jr 28 91 91
## 6 Poland ST R. Lewandowski 31 91 91
## Team...Contract ID Height Weight foot BOV
## 1 \n\n\n\nFC Barcelona\n2004 ~ 2021\n\n 158023 5'7" 159lbs Left 93
## 2 \n\n\n\nJuventus\n2018 ~ 2022\n\n 20801 6'2" 183lbs Right 92
## 3 \n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n 200389 6'2" 192lbs Right 91
## 4 \n\n\n\nManchester City\n2015 ~ 2023\n\n 192985 5'11" 154lbs Right 91
## 5 \n\n\n\nParis Saint-Germain\n2017 ~ 2022\n\n 190871 5'9" 150lbs Right 91
## 6 \n\n\n\nFC Bayern München\n2014 ~ 2023\n\n 188545 6'0" 176lbs Right 91
## BP Growth Joined Loan.Date.End Value Wage Release.Clause Attacking
## 1 RW 0 Jul 1, 2004 N/A €67.5M €560K €138.4M 429
## 2 ST 0 Jul 10, 2018 N/A €46M €220K €75.9M 437
## 3 GK 2 Jul 16, 2014 N/A €75M €125K €159.4M 95
## 4 CAM 0 Aug 30, 2015 N/A €87M €370K €161M 407
## 5 LW 0 Aug 3, 2017 N/A €90M €270K €166.5M 408
## 6 ST 0 Jul 1, 2014 N/A €80M €240K €132M 423
## Crossing Finishing Heading.Accuracy Short.Passing Volleys Skill Dribbling
## 1 85 95 70 91 88 470 96
## 2 84 95 90 82 86 414 88
## 3 13 11 15 43 13 109 12
## 4 94 82 55 94 82 441 88
## 5 85 87 62 87 87 448 95
## 6 71 94 85 84 89 407 85
## Curve FK.Accuracy Long.Passing Ball.Control Movement Acceleration
## 1 93 94 91 96 451 91
## 2 81 76 77 92 431 87
## 3 13 14 40 30 307 43
## 4 85 83 93 92 398 77
## 5 88 89 81 95 453 94
## 6 79 85 70 88 407 77
## Sprint.Speed Agility Reactions Balance Power Shot.Power Jumping Stamina
## 1 80 91 94 95 389 86 68 72
## 2 91 87 95 71 444 94 95 84
## 3 60 67 88 49 268 59 78 41
## 4 76 78 91 76 408 91 63 89
## 5 89 96 91 83 357 80 62 81
## 6 78 77 93 82 420 89 84 76
## Strength Long.Shots Mentality Aggression Interceptions Positioning Vision
## 1 69 94 347 44 40 93 95
## 2 78 93 353 63 29 95 82
## 3 78 12 140 34 19 11 65
## 4 74 91 408 76 66 88 94
## 5 50 84 356 51 36 87 90
## 6 86 85 391 81 49 94 79
## Penalties Composure Defending Marking Standing.Tackle Sliding.Tackle
## 1 75 96 91 32 35 24
## 2 84 95 84 28 32 24
## 3 11 68 57 27 12 18
## 4 84 91 186 68 65 53
## 5 92 93 94 35 30 29
## 6 88 88 96 35 42 19
## Goalkeeping GK.Diving GK.Handling GK.Kicking GK.Positioning GK.Reflexes
## 1 54 6 11 15 14 8
## 2 58 7 11 15 14 11
## 3 437 87 92 78 90 90
## 4 56 15 13 5 10 13
## 5 59 9 9 15 15 11
## 6 51 15 6 12 8 10
## Total.Stats Base.Stats W.F SM A.W D.W IR PAC SHO PAS DRI DEF PHY Hits
## 1 2231 466 4 ★ 4★ Medium Low 5 ★ 85 92 91 95 38 65 \n372
## 2 2221 464 4 ★ 5★ High Low 5 ★ 89 93 81 89 35 77 \n344
## 3 1413 489 3 ★ 1★ Medium Medium 3 ★ 87 92 78 90 52 90 \n86
## 4 2304 485 5 ★ 4★ High High 4 ★ 76 86 93 88 64 78 \n163
## 5 2175 451 5 ★ 5★ High Medium 5 ★ 91 85 86 94 36 59 \n273
## 6 2195 457 4 ★ 4★ High Medium 4 ★ 78 91 78 85 43 82 \n182
We will then continue to remove the columns which are not necessary to perform our analysis. This dataset contains 77 columns which many are added and combined in the total stats column. Because of this we can remove the columns containing individual stats which are not required for our analysis.
columns_to_remove <- c(23:63, which(names(fifaplayer_data) %in% c("photoUrl", "playerUrl", "Hits")))
fifaplayer_data1 <- fifaplayer_data %>%
select(-columns_to_remove)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
## # Was:
## data %>% select(columns_to_remove)
##
## # Now:
## data %>% select(all_of(columns_to_remove))
##
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
str(fifaplayer_data1)
## 'data.frame': 18979 obs. of 33 variables:
## $ LongName : chr "Lionel Messi" "C. Ronaldo dos Santos Aveiro" "Jan Oblak" "Kevin De Bruyne" ...
## $ Nationality : chr "Argentina" "Portugal" "Slovenia" "Belgium" ...
## $ Positions : chr "RW ST CF" "ST LW" "GK" "CAM CM" ...
## $ Name : chr "L. Messi" "Cristiano Ronaldo" "J. Oblak" "K. De Bruyne" ...
## $ Age : int 33 35 27 29 28 31 21 27 28 28 ...
## $ X.OVA : int 93 92 91 91 91 91 90 90 90 90 ...
## $ POT : int 93 92 93 91 91 91 95 91 90 90 ...
## $ Team...Contract: chr "\n\n\n\nFC Barcelona\n2004 ~ 2021\n\n" "\n\n\n\nJuventus\n2018 ~ 2022\n\n" "\n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n" "\n\n\n\nManchester City\n2015 ~ 2023\n\n" ...
## $ ID : int 158023 20801 200389 192985 190871 188545 231747 212831 209331 208722 ...
## $ Height : chr "5'7\"" "6'2\"" "6'2\"" "5'11\"" ...
## $ Weight : chr "159lbs" "183lbs" "192lbs" "154lbs" ...
## $ foot : chr "Left" "Right" "Right" "Right" ...
## $ BOV : int 93 92 91 91 91 91 91 90 90 90 ...
## $ BP : chr "RW" "ST" "GK" "CAM" ...
## $ Growth : int 0 0 2 0 0 0 5 1 0 0 ...
## $ Joined : chr "Jul 1, 2004" "Jul 10, 2018" "Jul 16, 2014" "Aug 30, 2015" ...
## $ Loan.Date.End : chr "N/A" "N/A" "N/A" "N/A" ...
## $ Value : chr "€67.5M" "€46M" "€75M" "€87M" ...
## $ Wage : chr "€560K" "€220K" "€125K" "€370K" ...
## $ Release.Clause : chr "€138.4M" "€75.9M" "€159.4M" "€161M" ...
## $ Total.Stats : int 2231 2221 1413 2304 2175 2195 2147 1389 2211 2203 ...
## $ Base.Stats : int 466 464 489 485 451 457 466 490 470 469 ...
## $ W.F : chr "4 ★" "4 ★" "3 ★" "5 ★" ...
## $ SM : chr "4★" "5★" "1★" "4★" ...
## $ A.W : chr "Medium" "High" "Medium" "High" ...
## $ D.W : chr "Low" "Low" "Medium" "High" ...
## $ IR : chr "5 ★" "5 ★" "3 ★" "4 ★" ...
## $ PAC : int 85 89 87 76 91 78 96 86 93 94 ...
## $ SHO : int 92 93 92 86 85 91 86 88 86 85 ...
## $ PAS : int 91 81 78 93 86 78 78 85 81 80 ...
## $ DRI : int 95 89 90 88 94 85 91 89 90 90 ...
## $ DEF : int 38 35 52 64 36 43 39 51 45 44 ...
## $ PHY : int 65 77 90 78 59 82 76 91 75 76 ...
head(fifaplayer_data1)
## LongName Nationality Positions Name Age
## 1 Lionel Messi Argentina RW ST CF L. Messi 33
## 2 C. Ronaldo dos Santos Aveiro Portugal ST LW Cristiano Ronaldo 35
## 3 Jan Oblak Slovenia GK J. Oblak 27
## 4 Kevin De Bruyne Belgium CAM CM K. De Bruyne 29
## 5 Neymar da Silva Santos Jr. Brazil LW CAM Neymar Jr 28
## 6 Robert Lewandowski Poland ST R. Lewandowski 31
## X.OVA POT Team...Contract ID Height Weight
## 1 93 93 \n\n\n\nFC Barcelona\n2004 ~ 2021\n\n 158023 5'7" 159lbs
## 2 92 92 \n\n\n\nJuventus\n2018 ~ 2022\n\n 20801 6'2" 183lbs
## 3 91 93 \n\n\n\nAtlético Madrid\n2014 ~ 2023\n\n 200389 6'2" 192lbs
## 4 91 91 \n\n\n\nManchester City\n2015 ~ 2023\n\n 192985 5'11" 154lbs
## 5 91 91 \n\n\n\nParis Saint-Germain\n2017 ~ 2022\n\n 190871 5'9" 150lbs
## 6 91 91 \n\n\n\nFC Bayern München\n2014 ~ 2023\n\n 188545 6'0" 176lbs
## foot BOV BP Growth Joined Loan.Date.End Value Wage Release.Clause
## 1 Left 93 RW 0 Jul 1, 2004 N/A €67.5M €560K €138.4M
## 2 Right 92 ST 0 Jul 10, 2018 N/A €46M €220K €75.9M
## 3 Right 91 GK 2 Jul 16, 2014 N/A €75M €125K €159.4M
## 4 Right 91 CAM 0 Aug 30, 2015 N/A €87M €370K €161M
## 5 Right 91 LW 0 Aug 3, 2017 N/A €90M €270K €166.5M
## 6 Right 91 ST 0 Jul 1, 2014 N/A €80M €240K €132M
## Total.Stats Base.Stats W.F SM A.W D.W IR PAC SHO PAS DRI DEF PHY
## 1 2231 466 4 ★ 4★ Medium Low 5 ★ 85 92 91 95 38 65
## 2 2221 464 4 ★ 5★ High Low 5 ★ 89 93 81 89 35 77
## 3 1413 489 3 ★ 1★ Medium Medium 3 ★ 87 92 78 90 52 90
## 4 2304 485 5 ★ 4★ High High 4 ★ 76 86 93 88 64 78
## 5 2175 451 5 ★ 5★ High Medium 5 ★ 91 85 86 94 36 59
## 6 2195 457 4 ★ 4★ High Medium 4 ★ 78 91 78 85 43 82
We will now analyze this data by taking the total stats of the players and finding the proportion between that and the value of their contracts. Soccer players such as Lionel Messi and Kylian Mbappe have been given exuberant amounts of money in their respective contracts with their teams. Finding the proportion of their total stats to the value of their contracts will give us the insight to determine whether the amount of money in a players contract reflect how well they’ll perform.
fifaplayer_data1$Value <- as.numeric(gsub("[^0-9.]", "", fifaplayer_data1$Value))
fifaplayer_data1$Total.Stats <- as.numeric(fifaplayer_data1$Total.Stats)
fifaplayer_data1$Proportion <- fifaplayer_data1$Total.Stats / fifaplayer_data1$Value
complete_rows <- complete.cases(fifaplayer_data1[, c("Total.Stats", "Value")])
We first summarize the data and can see that the average total stats of players is a score of 1595. This will help us better gauge a players performance just by taking an initial look at it. We complete this by creating a scatter plot to visualize the value of contracts in comparison with that players stats performance on fifa”
summary(fifaplayer_data1$`Total.Stats`)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 747 1452 1627 1595 1781 2316
ggplot(fifaplayer_data1[complete_rows,], aes(x = Total.Stats, y = Value)) +
geom_point() +
labs(title = "Proportion of Total Stats to Contract Value",
x = "Total Stats",
y = "Contract Value (in millions)")
Based on the analysis of the dataset, a positive correlation exists between high contract values and high player statistics in FIFA. However, a negative correlation is not observed; players with high statistics do not necessarily have high contract values. This suggests that while player performance is a significant factor influencing contract value, it is not the sole determinant. Other factors, such as a player’s previous league experience, media exposure, and early career trajectory (e.g., child prodigy status), potentially influence contract values to a significant degree. This finding highlights the multi-faceted nature of contract valuation in professional sports, where on-field performance interacts with various external factors to determine a player’s financial worth.