Unit 3 FLS

Left Midfielders (LM) versus Left Forwards (LF) analysis.

# Create dataframe that only contains Left Midfielders and Left Forwards:

library(plotly)
library(GGally)
playerdata = read.csv("C:/Users/gerde/Downloads/FIFA Players.csv")
LMvLF = playerdata %>% filter(Position %in% c("LF", "LM"))

GGPAIR output for LF and LM Agility and Acceleration

ggpairs(LMvLF,columns = c("Acceleration","Agility"),mapping = ggplot2::aes(color = Position))

# Based on the visuals a positive correlation between agility and acceleration is present.

T-Test of the means of LF and LM Agility

1) Hypothesis:

Ho-Ha = 0

Ho-Ha != 0

2) The selected significance level is 0.05

3) Collect Data:

lmagi = LMvLF %>% filter(Position == "LM") %>% select("ID","Position","Agility")
lfagi = LMvLF %>% filter(Position == "LF") %>% select("ID","Position","Agility")
head(lmagi)
##       ID Position Agility
## 1 188567       LM      76
## 2 208722       LM      91
## 3 190483       LM      93
## 4 188350       LM      86
## 5 193747       LM      74
## 6 184267       LM      92
head(lfagi)
##       ID Position Agility
## 1 183277       LF      95
## 2 211110       LF      91
## 3     41       LF      79
## 4 198164       LF      89
## 5 190577       LF      87
## 6 204713       LF      84

4) Perform the T-Test:

t.test(lmagi$Agility, lfagi$Agility, alternative = "two.sided", var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  lmagi$Agility and lfagi$Agility
## t = -1.8109, df = 1108, p-value = 0.07043
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.0741435  0.3636412
## sample estimates:
## mean of x mean of y 
##  75.37808  79.73333

5) Decision:

Based on the resulting p-value from the two sided t-test (0.07043) being higher than the significance value of 0.05, the null hypothesis that LF and LM means for agility are the same fails to be rejected.

6) Conclusion:

The comparison test between left midfielder and left forward player agility scores finds that there is no significant difference between the means of each groups scores. (P-value = 0.07043, CI = -9.074,0.364)

Are the assumptions of this test reasonably met?

lmagi %>% ggplot(aes(x=Agility)) + geom_histogram(fill = "blue" ,color = "black") + labs(title = "Left Midfielder Agility Scores")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

lfagi %>% ggplot(aes(x=Agility)) + geom_histogram(fill = "blue" ,color = "black") + labs(title = "Left Forward Agility Scores")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

I would not assume that the assumptions are reasonably met due to the distinct difference in the number of available samples for LF (15 Players) vs. LM (1095 Players).

Select/create at least 2 categorical variables and select two continuous variables and perform an EDA. Also, at least one of the categorical variables should be created from a continuous variable (using the cut() function).

playerdata = read.csv("C:/Users/gerde/Downloads/FIFA Players.csv")
playerstats = playerdata %>% select("ID","Name","Age","Position","Jersey.Number","Height","Weight","Aggression","Acceleration","SprintSpeed","Agility","ShotPower")
Aggression.Level = cut(playerstats$Aggression, breaks = c(1,33,66,100), labels = c("Low","Medium","High"))

ps = playerstats %>% mutate(Aggression.Level = Aggression.Level)
ps$Weight = as.numeric(gsub("lbs","",ps$Weight))

ggpairs(ps,columns = c("Acceleration","ShotPower","Weight"),mapping = ggplot2::aes(color = Aggression.Level), title = "Acceleration v. ShotPower v. Weight by Aggression Level")
## Warning: Removed 48 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 48 rows containing missing values
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 48 rows containing missing values
## Warning: Removed 48 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 48 rows containing non-finite outside the scale range
## (`stat_density()`).
## Warning in ggally_statistic(data = data, mapping = mapping, na.rm = na.rm, :
## Removed 48 rows containing missing values
## Warning: Removed 48 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 48 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 48 rows containing non-finite outside the scale range
## (`stat_density()`).

ggplot(data = ps, aes(x= Aggression, y = Age,color = Aggression.Level)) + geom_jitter()
## Warning: Removed 48 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(data = ps, aes(x = Age, fill = Aggression.Level, title = "Number of players by Age and Aggression Level")) + geom_histogram(color = "black")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data = ps, aes(x = Age, fill = Aggression.Level)) + geom_histogram(color = "black") + facet_wrap(~Position)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

filtered_ps = ps[ps$Position == "" | is.na(ps$Position), ]

# Print the filtered data frame to show who has no data for column "Position"
print(filtered_ps)
##           ID            Name Age Position Jersey.Number Height Weight
## 5019  153160       R. Raldes  37                     NA   5'11    172
## 6737  175393         J. Arce  33                     NA    5'9    154
## 7923  195905    L. Gutiérrez  33                     NA   5'11    190
## 9906  226044       R. Vargas  23                     NA    5'7    143
## 10629 216751     D. Bejarano  26                     NA    5'9    154
## 13237 177971      J. McNulty  33                     NA            NA
## 13238 195380      J. Barrera  29                     NA            NA
## 13239 139317        J. Stead  35                     NA            NA
## 13240 240437     A. Semprini  20                     NA            NA
## 13241 209462      R. Bingham  24                     NA            NA
## 13242 219702    K. Dankowski  21                     NA            NA
## 13243 225590       I. Colman  23                     NA            NA
## 13244 233782       M. Feeney  19                     NA            NA
## 13245 239158        R. Minor  30                     NA            NA
## 13246 242998          Klauss  21                     NA            NA
## 13247 244022      I. Sissoko  22                     NA            NA
## 13248 189238         F. Hart  28                     NA            NA
## 13249 211511   L. McCullough  24                     NA            NA
## 13250 224055       Li Yunqiu  27                     NA            NA
## 13251 244535       F. Garcia  29                     NA            NA
## 13252 134968    R. Haemhouts  34                     NA            NA
## 13253 225336       E. Binaku  22                     NA            NA
## 13254 171320       G. Miller  31                     NA            NA
## 13255 246328      A. Aidonis  17                     NA            NA
## 13256 196921        L. Sowah  25                     NA            NA
## 13257 202809       R. Deacon  26                     NA            NA
## 13258 226617   Jang Hyun Soo  25                     NA            NA
## 13259 230713     A. Al Malki  23                     NA            NA
## 13260 234809     E. Guerrero  27                     NA            NA
## 13261 246073         Hernáiz  20                     NA            NA
## 13262 221498   H. Al Mansour  25                     NA            NA
## 13263 244026         H. Paul  24                     NA            NA
## 13264 244538        S. Bauer  25                     NA            NA
## 13265 201019      M. Chergui  29                     NA            NA
## 13266 221499      D. Gardner  28                     NA            NA
## 13267 237371    L. Bengtsson  20                     NA            NA
## 13268 242491    F. Jaramillo  22                     NA            NA
## 13269 153148      L. Gargu_a  37                     NA            NA
## 13270 244540       S. Rivera  26                     NA            NA
## 13271 245564        Vinicius  19                     NA            NA
## 13272 213821    F. Sepúlveda  26                     NA            NA
## 13273 240701       L. Spence  22                     NA            NA
## 13274 242237      B. Lepistu  25                     NA            NA
## 13275 244029     A. Abruscia  27                     NA            NA
## 13276 244541     E. González  23                     NA            NA
## 13277 211006      M. Al Amri  26                     NA            NA
## 13278 215102    J. Rebolledo  26                     NA            NA
## 13279 246078      C. Mamengi  17                     NA            NA
## 13280 239679    P. Mazzocchi  22                     NA            NA
## 13281 244543       Y. Ammour  19                     NA            NA
## 13282 212800  Jwa Joon Hyeop  27                     NA            NA
## 13283 231232      O. Marrufo  25                     NA            NA
## 13284 232256     Han Pengfei  25                     NA            NA
## 16451 193911         S. Paul  31                     NA    6'1    172
## 16540 245167 L. Lalruatthara  23                     NA   5'11    143
## 16794 228192      E. Lyngdoh  31                     NA    5'9    150
## 17130 228198        J. Singh  26                     NA    5'7    159
## 17340 233526        S. Passi  23                     NA    5'9    143
## 17437 236452  D. Lalhlimpuia  20                     NA    6'0    168
## 17540 234508        C. Singh  21                     NA    6'3    174
##       Aggression Acceleration SprintSpeed Agility ShotPower Aggression.Level
## 5019          74           47          46      59        74             High
## 6737          48           71          74      73        61           Medium
## 7923          76           64          61      68        51             High
## 9906          26           71          73      79        62              Low
## 10629         57           68          61      54        24           Medium
## 13237         NA           NA          NA      NA        NA             <NA>
## 13238         NA           NA          NA      NA        NA             <NA>
## 13239         NA           NA          NA      NA        NA             <NA>
## 13240         NA           NA          NA      NA        NA             <NA>
## 13241         NA           NA          NA      NA        NA             <NA>
## 13242         NA           NA          NA      NA        NA             <NA>
## 13243         NA           NA          NA      NA        NA             <NA>
## 13244         NA           NA          NA      NA        NA             <NA>
## 13245         NA           NA          NA      NA        NA             <NA>
## 13246         NA           NA          NA      NA        NA             <NA>
## 13247         NA           NA          NA      NA        NA             <NA>
## 13248         NA           NA          NA      NA        NA             <NA>
## 13249         NA           NA          NA      NA        NA             <NA>
## 13250         NA           NA          NA      NA        NA             <NA>
## 13251         NA           NA          NA      NA        NA             <NA>
## 13252         NA           NA          NA      NA        NA             <NA>
## 13253         NA           NA          NA      NA        NA             <NA>
## 13254         NA           NA          NA      NA        NA             <NA>
## 13255         NA           NA          NA      NA        NA             <NA>
## 13256         NA           NA          NA      NA        NA             <NA>
## 13257         NA           NA          NA      NA        NA             <NA>
## 13258         NA           NA          NA      NA        NA             <NA>
## 13259         NA           NA          NA      NA        NA             <NA>
## 13260         NA           NA          NA      NA        NA             <NA>
## 13261         NA           NA          NA      NA        NA             <NA>
## 13262         NA           NA          NA      NA        NA             <NA>
## 13263         NA           NA          NA      NA        NA             <NA>
## 13264         NA           NA          NA      NA        NA             <NA>
## 13265         NA           NA          NA      NA        NA             <NA>
## 13266         NA           NA          NA      NA        NA             <NA>
## 13267         NA           NA          NA      NA        NA             <NA>
## 13268         NA           NA          NA      NA        NA             <NA>
## 13269         NA           NA          NA      NA        NA             <NA>
## 13270         NA           NA          NA      NA        NA             <NA>
## 13271         NA           NA          NA      NA        NA             <NA>
## 13272         NA           NA          NA      NA        NA             <NA>
## 13273         NA           NA          NA      NA        NA             <NA>
## 13274         NA           NA          NA      NA        NA             <NA>
## 13275         NA           NA          NA      NA        NA             <NA>
## 13276         NA           NA          NA      NA        NA             <NA>
## 13277         NA           NA          NA      NA        NA             <NA>
## 13278         NA           NA          NA      NA        NA             <NA>
## 13279         NA           NA          NA      NA        NA             <NA>
## 13280         NA           NA          NA      NA        NA             <NA>
## 13281         NA           NA          NA      NA        NA             <NA>
## 13282         NA           NA          NA      NA        NA             <NA>
## 13283         NA           NA          NA      NA        NA             <NA>
## 13284         NA           NA          NA      NA        NA             <NA>
## 16451         28           56          46      65        13              Low
## 16540         52           78          82      70        24           Medium
## 16794         33           67          66      81        63              Low
## 17130         32           86          82      77        68              Low
## 17340         39           66          68      57        50           Medium
## 17437         33           53          58      59        51              Low
## 17540         44           71          73      62        31           Medium

Takeaways

The ggpairs tool is pretty fun to play with and show correlations between multiple variables. I need more time playing with this tool to learn more of the customization functions.

I am a bit confused about the reasonably met assumptions for the t-test problem. I would think that the t-test is affected by sample size and more specifically highly divergent numbers of samples as is the case for this problem, but I am not sure how to make that connection.

EDAs are a fun thing to do, and one that I commonly do in my current job, but one thing that I know I would need to do a better EDA for the FIFA data is to better understand what each column represents and why it matters to the overall picture this data is presenting.

The FIFA data is from the game, and the values are used to affect the way each player behaves in the game, I was curious at first how this data was generated but the more I looked at it, the more it looked like a role playing game character sheet.

Questions ???

The more I use R, the more I want to know, what would be a really good online resource that I could use to figure out more ways to do the things we are doing, and to add more visual flair?