Modern football performance is characterized by a wide range of technical, physical, and mental attributes, resulting in highly multidimensional player evaluation data. Such high-dimensional datasets often contain strong correlations between variables, which complicates interpretation and comparative analysis. Dimensionality reduction techniques are therefore essential for uncovering the underlying structure of player performance while preserving the most relevant information.
This project applies Principal Component Analysis (PCA) to player attribute data derived from EA Sports FC 26. The objective is to reduce the dimensionality of the dataset and identify a smaller number of latent components that summarize the key performance characteristics of football players. By transforming correlated attributes into a set of uncorrelated principal components, PCA enables a more compact and interpretable representation of player profiles.
The analysis focuses on evaluating the correlation structure of the data, determining the appropriate number of components using established criteria, and interpreting the retained components in the context of football performance. The results demonstrate that a substantial proportion of the total variance can be captured by a limited number of components, providing meaningful insights into the fundamental dimensions of player ability.
The dataset consists of information on 18405 professional football players featured in the video game EA Sports FC 26. Each player is described using 35 quantitative attributes that reflect various aspects of football performance. The data were scraped from sofifa.com using the publicly available GitHub repository: github.com/rovnez/fifa-data/.
The set of player attributes can be categorized into seven main feature groups, each representing a distinct dimension of player performance:
All 35 features are measured on an interval scale ranging from 0 to 99, where higher values indicate stronger performance in the corresponding attribute:
library(corrplot)
library(ggplot2)
library(caret)
library(factoextra)
library(gridExtra)
player_stats <- read.csv('fc26_data.csv')
summary(player_stats)
## pace shooting passing dribbling
## Min. :30.00 Min. :21.00 Min. :25.00 Min. :22.00
## 1st Qu.:62.00 1st Qu.:42.00 1st Qu.:51.00 1st Qu.:57.00
## Median :69.00 Median :55.00 Median :58.00 Median :64.00
## Mean :68.37 Mean :52.81 Mean :57.61 Mean :62.88
## 3rd Qu.:75.00 3rd Qu.:63.00 3rd Qu.:64.00 3rd Qu.:69.00
## Max. :97.00 Max. :92.00 Max. :92.00 Max. :93.00
## NA's :2062 NA's :2062 NA's :2062 NA's :2062
## defending physic attacking_crossing attacking_finishing
## Min. :15.00 Min. :32.00 Min. : 6.00 Min. : 4.00
## 1st Qu.:38.00 1st Qu.:58.00 1st Qu.:38.00 1st Qu.:31.00
## Median :57.00 Median :66.00 Median :53.00 Median :52.00
## Mean :51.95 Mean :64.76 Mean :49.14 Mean :46.68
## 3rd Qu.:64.00 3rd Qu.:72.00 3rd Qu.:62.00 3rd Qu.:63.00
## Max. :90.00 Max. :91.00 Max. :93.00 Max. :94.00
## NA's :2062 NA's :2062
## attacking_heading_accuracy attacking_short_passing attacking_volleys
## Min. : 6.00 Min. :12.00 Min. : 3.00
## 1st Qu.:44.00 1st Qu.:55.00 1st Qu.:30.00
## Median :55.00 Median :62.00 Median :44.00
## Mean :51.83 Mean :59.29 Mean :42.39
## 3rd Qu.:64.00 3rd Qu.:68.00 3rd Qu.:56.00
## Max. :93.00 Max. :93.00 Max. :92.00
##
## skill_dribbling skill_curve skill_fk_accuracy skill_long_passing
## Min. : 5.00 Min. : 7.00 Min. : 7.00 Min. : 9.00
## 1st Qu.:51.00 1st Qu.:36.00 1st Qu.:31.00 1st Qu.:46.00
## Median :61.00 Median :50.00 Median :42.00 Median :57.00
## Mean :55.98 Mean :47.67 Mean :42.29 Mean :53.98
## 3rd Qu.:68.00 3rd Qu.:61.00 3rd Qu.:54.00 3rd Qu.:64.00
## Max. :94.00 Max. :93.00 Max. :94.00 Max. :93.00
##
## skill_ball_control movement_acceleration movement_sprint_speed
## Min. : 9.00 Min. :15.00 Min. :13.00
## 1st Qu.:55.00 1st Qu.:57.00 1st Qu.:58.00
## Median :63.00 Median :67.00 Median :68.00
## Mean :58.78 Mean :64.46 Mean :64.66
## 3rd Qu.:69.00 3rd Qu.:75.00 3rd Qu.:75.00
## Max. :94.00 Max. :97.00 Max. :97.00
##
## movement_agility movement_reactions movement_balance power_shot_power
## Min. :20.00 Min. :30.00 Min. :20.0 Min. :20.00
## 1st Qu.:55.00 1st Qu.:56.00 1st Qu.:55.0 1st Qu.:48.00
## Median :65.00 Median :62.00 Median :66.0 Median :58.00
## Mean :63.02 Mean :61.92 Mean :63.7 Mean :57.47
## 3rd Qu.:74.00 3rd Qu.:68.00 3rd Qu.:74.0 3rd Qu.:67.00
## Max. :94.00 Max. :94.00 Max. :95.0 Max. :94.00
##
## power_jumping power_stamina power_strength power_long_shots
## Min. :27.00 Min. :12.00 Min. :24.00 Min. : 4.00
## 1st Qu.:58.00 1st Qu.:56.00 1st Qu.:57.00 1st Qu.:32.00
## Median :67.00 Median :66.00 Median :65.00 Median :51.00
## Mean :65.62 Mean :62.59 Mean :64.75 Mean :46.64
## 3rd Qu.:74.00 3rd Qu.:74.00 3rd Qu.:74.00 3rd Qu.:62.00
## Max. :95.00 Max. :95.00 Max. :95.00 Max. :91.00
##
## mentality_aggression mentality_interceptions mentality_positioning
## Min. :11.0 Min. : 6.00 Min. : 3.00
## 1st Qu.:46.0 1st Qu.:26.00 1st Qu.:40.00
## Median :59.0 Median :54.00 Median :57.00
## Mean :55.8 Mean :46.76 Mean :50.69
## 3rd Qu.:68.0 3rd Qu.:64.00 3rd Qu.:65.00
## Max. :94.0 Max. :91.00 Max. :95.00
##
## mentality_vision mentality_penalties mentality_composure
## Min. :12.00 Min. : 5.0 Min. :15.00
## 1st Qu.:45.00 1st Qu.:38.0 1st Qu.:50.00
## Median :56.00 Median :48.0 Median :59.00
## Mean :54.61 Mean :47.3 Mean :57.75
## 3rd Qu.:64.00 3rd Qu.:59.0 3rd Qu.:66.00
## Max. :92.00 Max. :93.0 Max. :93.00
##
## defending_marking_awareness defending_standing_tackle defending_sliding_tackle
## Min. : 5.00 Min. : 7.00 Min. : 6.00
## 1st Qu.:28.00 1st Qu.:29.00 1st Qu.:26.00
## Median :52.00 Median :56.00 Median :53.00
## Mean :46.25 Mean :48.44 Mean :46.25
## 3rd Qu.:63.00 3rd Qu.:65.00 3rd Qu.:63.00
## Max. :91.00 Max. :91.00 Max. :89.00
##
dim(player_stats)
## [1] 18405 35
First, all observations containing missing values were removed, as principal component analysis cannot be applied to datasets with missing entries.
player_stats = player_stats[complete.cases(player_stats), ]
summary(player_stats)
## pace shooting passing dribbling
## Min. :30.00 Min. :21.00 Min. :25.00 Min. :22.00
## 1st Qu.:62.00 1st Qu.:42.00 1st Qu.:51.00 1st Qu.:57.00
## Median :69.00 Median :55.00 Median :58.00 Median :64.00
## Mean :68.37 Mean :52.81 Mean :57.61 Mean :62.88
## 3rd Qu.:75.00 3rd Qu.:63.00 3rd Qu.:64.00 3rd Qu.:69.00
## Max. :97.00 Max. :92.00 Max. :92.00 Max. :93.00
## defending physic attacking_crossing attacking_finishing
## Min. :15.00 Min. :32.00 Min. :20.00 Min. :15.00
## 1st Qu.:38.00 1st Qu.:58.00 1st Qu.:44.00 1st Qu.:39.00
## Median :57.00 Median :66.00 Median :55.00 Median :55.00
## Mean :51.95 Mean :64.76 Mean :53.64 Mean :51.33
## 3rd Qu.:64.00 3rd Qu.:72.00 3rd Qu.:64.00 3rd Qu.:64.00
## Max. :90.00 Max. :91.00 Max. :93.00 Max. :94.00
## attacking_heading_accuracy attacking_short_passing attacking_volleys
## Min. :20.00 Min. :25.00 Min. :15.00
## 1st Qu.:48.00 1st Qu.:58.00 1st Qu.:35.00
## Median :57.00 Median :64.00 Median :46.00
## Mean :56.64 Mean :63.16 Mean :46.45
## 3rd Qu.:65.00 3rd Qu.:69.00 3rd Qu.:57.00
## Max. :93.00 Max. :93.00 Max. :92.00
## skill_dribbling skill_curve skill_fk_accuracy skill_long_passing
## Min. :17.00 Min. :15.00 Min. :15.00 Min. :20.00
## 1st Qu.:56.00 1st Qu.:41.00 1st Qu.:35.00 1st Qu.:50.00
## Median :63.00 Median :52.00 Median :44.00 Median :58.00
## Mean :61.41 Mean :51.91 Mean :45.93 Mean :57.21
## 3rd Qu.:69.00 3rd Qu.:62.00 3rd Qu.:56.00 3rd Qu.:65.00
## Max. :94.00 Max. :93.00 Max. :94.00 Max. :93.00
## skill_ball_control movement_acceleration movement_sprint_speed
## Min. :20.00 Min. :27.00 Min. :30.00
## 1st Qu.:58.00 1st Qu.:62.00 1st Qu.:63.00
## Median :64.00 Median :69.00 Median :69.00
## Mean :63.77 Mean :68.27 Mean :68.43
## 3rd Qu.:70.00 3rd Qu.:76.00 3rd Qu.:76.00
## Max. :94.00 Max. :97.00 Max. :97.00
## movement_agility movement_reactions movement_balance power_shot_power
## Min. :25.00 Min. :31.00 Min. :29.00 Min. :20.00
## 1st Qu.:58.00 1st Qu.:57.00 1st Qu.:60.00 1st Qu.:51.00
## Median :67.00 Median :62.00 Median :68.00 Median :60.00
## Mean :66.29 Mean :62.38 Mean :66.74 Mean :58.84
## 3rd Qu.:75.00 3rd Qu.:68.00 3rd Qu.:75.00 3rd Qu.:68.00
## Max. :94.00 Max. :94.00 Max. :95.00 Max. :94.00
## power_jumping power_stamina power_strength power_long_shots
## Min. :27.00 Min. :27.00 Min. :24.00 Min. :15.00
## 1st Qu.:60.00 1st Qu.:60.00 1st Qu.:58.00 1st Qu.:40.00
## Median :68.00 Median :67.00 Median :66.00 Median :54.00
## Mean :66.94 Mean :66.97 Mean :65.43 Mean :51.22
## 3rd Qu.:75.00 3rd Qu.:75.00 3rd Qu.:74.00 3rd Qu.:63.00
## Max. :95.00 Max. :95.00 Max. :95.00 Max. :91.00
## mentality_aggression mentality_interceptions mentality_positioning
## Min. :24.00 Min. :10.00 Min. :15.00
## 1st Qu.:51.00 1st Qu.:35.00 1st Qu.:48.00
## Median :61.00 Median :57.00 Median :59.00
## Mean :59.67 Mean :50.88 Mean :55.94
## 3rd Qu.:69.00 3rd Qu.:65.00 3rd Qu.:66.00
## Max. :94.00 Max. :91.00 Max. :95.00
## mentality_vision mentality_penalties mentality_composure
## Min. :22.00 Min. :20.00 Min. :30.00
## 1st Qu.:48.00 1st Qu.:42.00 1st Qu.:53.00
## Median :58.00 Median :50.00 Median :60.00
## Mean :56.42 Mean :51.04 Mean :60.13
## 3rd Qu.:65.00 3rd Qu.:60.00 3rd Qu.:67.00
## Max. :92.00 Max. :93.00 Max. :93.00
## defending_marking_awareness defending_standing_tackle defending_sliding_tackle
## Min. :10.00 Min. :10.00 Min. :10.0
## 1st Qu.:36.00 1st Qu.:38.00 1st Qu.:35.0
## Median :56.00 Median :59.00 Median :56.0
## Mean :50.57 Mean :52.85 Mean :50.4
## 3rd Qu.:64.00 3rd Qu.:66.00 3rd Qu.:64.0
## Max. :91.00 Max. :91.00 Max. :89.0
dim(player_stats)
## [1] 16343 35
As a result, the final dataset contains information on 16343 players.
Before applying the PCA algorithm, it is important to examine the degree of correlation among the variables.
data_corr <- cor(player_stats, method="pearson")
corrplot(data_corr, tl.cex=0.6)
The correlation matrix reveals substantial dependencies among player attributes, with several groups of variables exhibiting moderate to strong positive correlations. In particular, technical and attacking attributes, movement-related variables, and defensive skills form distinct correlation clusters, indicating the presence of underlying latent dimensions. The observed correlation structure suggests considerable multicollinearity among the variables, thereby justifying the application of principal component analysis as a dimensionality reduction technique.
preproc <- preProcess(player_stats, method=c("center", "scale"))
player_stats_s <- predict(preproc, player_stats)
summary(player_stats_s)
## pace shooting passing dribbling
## Min. :-3.64929 Min. :-2.2821 Min. :-3.29239 Min. :-4.3565
## 1st Qu.:-0.60615 1st Qu.:-0.7755 1st Qu.:-0.66752 1st Qu.:-0.6267
## Median : 0.05954 Median : 0.1571 Median : 0.03918 Median : 0.1192
## Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.63013 3rd Qu.: 0.7310 3rd Qu.: 0.64492 3rd Qu.: 0.6520
## Max. : 2.72229 Max. : 2.8115 Max. : 3.47170 Max. : 3.2096
## defending physic attacking_crossing attacking_finishing
## Min. :-2.2815 Min. :-3.3617 Min. :-2.5047 Min. :-2.2752
## 1st Qu.:-0.8614 1st Qu.:-0.6940 1st Qu.:-0.7177 1st Qu.:-0.7721
## Median : 0.3117 Median : 0.1268 Median : 0.1014 Median : 0.2300
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.7439 3rd Qu.: 0.7424 3rd Qu.: 0.7715 3rd Qu.: 0.7937
## Max. : 2.3492 Max. : 2.6918 Max. : 2.9308 Max. : 2.6726
## attacking_heading_accuracy attacking_short_passing attacking_volleys
## Min. :-3.22360 Min. :-4.1876 Min. :-2.21413
## 1st Qu.:-0.76030 1st Qu.:-0.5659 1st Qu.:-0.80594
## Median : 0.03147 Median : 0.0926 Median :-0.03144
## Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.73527 3rd Qu.: 0.6413 3rd Qu.: 0.74306
## Max. : 3.19858 Max. : 3.2753 Max. : 3.20738
## skill_dribbling skill_curve skill_fk_accuracy skill_long_passing
## Min. :-3.8961 Min. :-2.59419 Min. :-2.2277 Min. :-3.2555
## 1st Qu.:-0.4743 1st Qu.:-0.76673 1st Qu.:-0.7874 1st Qu.:-0.6310
## Median : 0.1398 Median : 0.00643 Median :-0.1393 Median : 0.0689
## Mean : 0.0000 Mean : 0.00000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.6662 3rd Qu.: 0.70930 3rd Qu.: 0.7249 3rd Qu.: 0.6813
## Max. : 2.8597 Max. : 2.88820 Max. : 3.4614 Max. : 3.1308
## skill_ball_control movement_acceleration movement_sprint_speed
## Min. :-4.75401 Min. :-3.72936 Min. :-3.54002
## 1st Qu.:-0.62675 1st Qu.:-0.56635 1st Qu.:-0.50052
## Median : 0.02492 Median : 0.06625 Median : 0.05211
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.67659 3rd Qu.: 0.69885 3rd Qu.: 0.69686
## Max. : 3.28328 Max. : 2.59666 Max. : 2.63108
## movement_agility movement_reactions movement_balance power_shot_power
## Min. :-3.43124 Min. :-3.69501 Min. :-3.1848 Min. :-3.01893
## 1st Qu.:-0.68885 1st Qu.:-0.63336 1st Qu.:-0.5690 1st Qu.:-0.60954
## Median : 0.05908 Median :-0.04459 Median : 0.1060 Median : 0.08996
## Mean : 0.00000 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.72390 3rd Qu.: 0.66195 3rd Qu.: 0.6967 3rd Qu.: 0.71174
## Max. : 2.30285 Max. : 3.72360 Max. : 2.3843 Max. : 2.73251
## power_jumping power_stamina power_strength power_long_shots
## Min. :-3.43362 Min. :-3.565615 Min. :-3.36099 Min. :-2.3421
## 1st Qu.:-0.59667 1st Qu.:-0.621641 1st Qu.:-0.60249 1st Qu.:-0.7257
## Median : 0.09108 Median : 0.002838 Median : 0.04657 Median : 0.1795
## Mean : 0.00000 Mean : 0.000000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.69285 3rd Qu.: 0.716529 3rd Qu.: 0.69563 3rd Qu.: 0.7614
## Max. : 2.41222 Max. : 2.500756 Max. : 2.39941 Max. : 2.5718
## mentality_aggression mentality_interceptions mentality_positioning
## Min. :-2.7391 Min. :-2.2285 Min. :-2.8772
## 1st Qu.:-0.6659 1st Qu.:-0.8655 1st Qu.:-0.5582
## Median : 0.1019 Median : 0.3339 Median : 0.2148
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.7162 3rd Qu.: 0.7700 3rd Qu.: 0.7067
## Max. : 2.6358 Max. : 2.1875 Max. : 2.7446
## mentality_vision mentality_penalties mentality_composure
## Min. :-2.7629 Min. :-2.57365 Min. :-3.00836
## 1st Qu.:-0.6761 1st Qu.:-0.74944 1st Qu.:-0.71196
## Median : 0.1265 Median :-0.08609 Median :-0.01306
## Mean : 0.0000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.6883 3rd Qu.: 0.74310 3rd Qu.: 0.68585
## Max. : 2.8554 Max. : 3.47941 Max. : 3.28178
## defending_marking_awareness defending_standing_tackle defending_sliding_tackle
## Min. :-2.2913 Min. :-2.3831 Min. :-2.2509
## 1st Qu.:-0.8228 1st Qu.:-0.8259 1st Qu.:-0.8579
## Median : 0.3069 Median : 0.3420 Median : 0.3122
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.7587 3rd Qu.: 0.7313 3rd Qu.: 0.7580
## Max. : 2.2837 Max. : 2.1216 Max. : 2.1510
In the context of principal component analysis, eigenvectors represent the directions in which the data exhibit the greatest variance. Each eigenvector defines a principal component and corresponds to a linear combination of the original variables. The coefficients of an eigenvector, commonly referred to as loadings, indicate the contribution of each variable to the respective principal component. Eigenvectors are orthogonal to one another, ensuring that the resulting principal components are uncorrelated and capture distinct dimensions of the data structure.
player_stats_cov <- cov(player_stats_s)
player_stats_eigen <- eigen(player_stats_cov)
head(player_stats_eigen$vectors)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.13091801 0.10327800 0.19246498 -0.44289954 0.04801582 -0.18945177
## [2,] -0.23432513 0.08071868 -0.15035398 0.06081625 -0.11904862 -0.02646869
## [3,] -0.23257735 -0.10853384 0.09277772 0.11882206 0.08838166 -0.06331518
## [4,] -0.24777498 -0.01268462 0.07159458 -0.01264111 0.19050074 0.12481468
## [5,] 0.03344752 -0.32118870 0.15638780 0.03123606 -0.06446332 -0.11404401
## [6,] -0.02818010 -0.26397670 -0.23726938 -0.22005220 -0.20109751 0.13567466
## [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 0.0004378586 -0.16098706 -0.062097424 -0.01508688 -0.09445773 -0.03838649
## [2,] -0.0524458262 -0.22919265 0.063993987 0.07920146 0.10466542 0.05180801
## [3,] 0.1727186697 0.05764461 -0.063772197 -0.09923314 0.04527656 -0.15924432
## [4,] 0.0105173095 0.07269634 0.161509039 0.05931460 0.03775654 0.11312459
## [5,] -0.1056559260 -0.07648913 0.004848678 0.13880464 0.07868747 0.08651911
## [6,] 0.1849201141 0.04382540 0.084788161 -0.17655062 -0.01192083 0.05503545
## [,13] [,14] [,15] [,16] [,17] [,18]
## [1,] -0.01101953 -0.036524861 -0.014770631 -0.044115345 -0.03142472 0.03777432
## [2,] 0.04632439 -0.031419807 -0.036917854 -0.013869836 0.02169966 -0.05521629
## [3,] 0.01679664 -0.121750975 -0.034781962 0.003083505 0.05871849 -0.03531334
## [4,] -0.12451549 0.275578159 0.038755245 0.004888372 -0.09819981 0.04522243
## [5,] -0.04487788 -0.012593513 0.005598766 0.026800238 -0.01394130 0.02971122
## [6,] -0.11025764 0.001051517 -0.050799509 -0.065628552 0.02465322 0.02009157
## [,19] [,20] [,21] [,22] [,23]
## [1,] 0.001959211 0.008473139 0.00418110 9.033082e-05 0.0003642619
## [2,] 0.102566780 0.089882116 -0.13006975 -3.714073e-02 0.1057925222
## [3,] -0.014514287 -0.020173581 0.04736159 6.553020e-02 0.0623700693
## [4,] 0.002653114 -0.076765829 0.03932564 -1.089835e-01 -0.0054100265
## [5,] -0.007074844 -0.004471964 -0.01580564 1.902385e-02 0.0065157571
## [6,] -0.008929651 -0.033940587 -0.02491591 -3.809087e-04 0.0124472006
## [,24] [,25] [,26] [,27] [,28]
## [1,] 0.012666641 0.0093971299 -0.007169066 -0.002673820 -0.017547234
## [2,] -0.154958999 0.0178185405 0.019038005 -0.009300353 0.005578514
## [3,] -0.007637143 0.0006310612 0.113196607 -0.014380564 0.001083597
## [4,] -0.007633813 0.0176873585 0.012596302 -0.002161974 -0.005567428
## [5,] -0.005322801 0.0753531575 0.003783493 0.025770143 -0.125102590
## [6,] -0.005977473 -0.0148454076 0.003996730 0.004710472 -0.026211932
## [,29] [,30] [,31] [,32] [,33]
## [1,] -0.086720798 0.008976716 -0.021657563 0.0047292334 0.803464633
## [2,] -0.006307471 0.009633615 0.009691571 0.0050062230 -0.001146831
## [3,] -0.004752151 -0.099501032 -0.876991880 -0.0139867993 -0.022195372
## [4,] 0.001581349 -0.834699806 0.093348169 0.0418573589 0.011255448
## [5,] 0.001471690 0.009559796 -0.011504574 -0.0005855984 -0.003216995
## [6,] -0.159523368 -0.037615827 0.016733908 -0.7987985933 0.004649758
## [,34] [,35]
## [1,] -0.0008887022 2.439525e-03
## [2,] -0.8582352482 -1.357359e-02
## [3,] -0.0112137272 -1.037352e-02
## [4,] -0.0081900026 1.020609e-02
## [5,] -0.0134776782 8.808894e-01
## [6,] -0.0043501890 -8.424374e-05
Eigenvalues quantify the amount of variance explained by each corresponding eigenvector. A larger eigenvalue indicates that the associated principal component accounts for a greater proportion of the total variability in the dataset. In PCA, eigenvalues are commonly used to determine the number of components to retain, with criteria such as the Kaiser rule and the cumulative explained variance serving as standard selection methods. Components associated with small eigenvalues contribute little additional information and are often excluded from further analysis.
player_stats_s_pca1 <- prcomp(player_stats_s, center=FALSE, scale.=FALSE)
head(player_stats_s_pca1$rotation)
## PC1 PC2 PC3 PC4 PC5
## pace -0.13091801 -0.10327800 0.19246498 0.44289954 0.04801582
## shooting -0.23432513 -0.08071868 -0.15035398 -0.06081625 -0.11904862
## passing -0.23257735 0.10853384 0.09277772 -0.11882206 0.08838166
## dribbling -0.24777498 0.01268462 0.07159458 0.01264111 0.19050074
## defending 0.03344752 0.32118870 0.15638780 -0.03123606 -0.06446332
## physic -0.02818010 0.26397670 -0.23726938 0.22005220 -0.20109751
## PC6 PC7 PC8 PC9 PC10
## pace -0.18945177 0.0004378586 -0.16098706 0.062097424 -0.01508688
## shooting -0.02646869 -0.0524458262 -0.22919265 -0.063993987 0.07920146
## passing -0.06331518 0.1727186697 0.05764461 0.063772197 -0.09923314
## dribbling 0.12481468 0.0105173095 0.07269634 -0.161509039 0.05931460
## defending -0.11404401 -0.1056559260 -0.07648913 -0.004848678 0.13880464
## physic 0.13567466 0.1849201141 0.04382540 -0.084788161 -0.17655062
## PC11 PC12 PC13 PC14 PC15
## pace 0.09445773 -0.03838649 -0.01101953 -0.036524861 0.014770631
## shooting -0.10466542 0.05180801 0.04632439 -0.031419807 0.036917854
## passing -0.04527656 -0.15924432 0.01679664 -0.121750975 0.034781962
## dribbling -0.03775654 0.11312459 -0.12451549 0.275578159 -0.038755245
## defending -0.07868747 0.08651911 -0.04487788 -0.012593513 -0.005598766
## physic 0.01192083 0.05503545 -0.11025764 0.001051517 0.050799509
## PC16 PC17 PC18 PC19 PC20
## pace 0.044115345 -0.03142472 -0.03777432 0.001959211 0.008473139
## shooting 0.013869836 0.02169966 0.05521629 0.102566780 0.089882116
## passing -0.003083505 0.05871849 0.03531334 -0.014514287 -0.020173581
## dribbling -0.004888372 -0.09819981 -0.04522243 0.002653114 -0.076765829
## defending -0.026800238 -0.01394130 -0.02971122 -0.007074844 -0.004471964
## physic 0.065628552 0.02465322 -0.02009157 -0.008929651 -0.033940587
## PC21 PC22 PC23 PC24 PC25
## pace 0.00418110 9.033082e-05 0.0003642619 0.012666641 -0.0093971299
## shooting -0.13006975 -3.714073e-02 0.1057925222 -0.154958999 -0.0178185405
## passing 0.04736159 6.553020e-02 0.0623700693 -0.007637143 -0.0006310612
## dribbling 0.03932564 -1.089835e-01 -0.0054100265 -0.007633813 -0.0176873585
## defending -0.01580564 1.902385e-02 0.0065157571 -0.005322801 -0.0753531575
## physic -0.02491591 -3.809087e-04 0.0124472006 -0.005977473 0.0148454076
## PC26 PC27 PC28 PC29 PC30
## pace -0.007169066 -0.002673820 -0.017547234 0.086720798 0.008976716
## shooting 0.019038005 -0.009300353 0.005578514 0.006307471 0.009633615
## passing 0.113196607 -0.014380564 0.001083597 0.004752151 -0.099501032
## dribbling 0.012596302 -0.002161974 -0.005567428 -0.001581349 -0.834699806
## defending 0.003783493 0.025770143 -0.125102590 -0.001471690 0.009559796
## physic 0.003996730 0.004710472 -0.026211932 0.159523368 -0.037615827
## PC31 PC32 PC33 PC34 PC35
## pace 0.021657563 0.0047292334 -0.803464633 -0.0008887022 -2.439525e-03
## shooting -0.009691571 0.0050062230 0.001146831 -0.8582352482 1.357359e-02
## passing 0.876991880 -0.0139867993 0.022195372 -0.0112137272 1.037352e-02
## dribbling -0.093348169 0.0418573589 -0.011255448 -0.0081900026 -1.020609e-02
## defending 0.011504574 -0.0005855984 0.003216995 -0.0134776782 -8.808894e-01
## physic -0.016733908 -0.7987985933 -0.004649758 -0.0043501890 8.424374e-05
PCA was employed as a dimensionality reduction technique to identify latent structures within the player attribute data. By projecting the original variables onto a lower-dimensional orthogonal space, the method facilitates interpretation while preserving the majority of the information contained in the dataset.
summary(player_stats_s_pca1)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.8189 2.8872 1.8938 1.57465 0.90208 0.78299 0.73479
## Proportion of Variance 0.4167 0.2382 0.1025 0.07084 0.02325 0.01752 0.01543
## Cumulative Proportion 0.4167 0.6549 0.7573 0.82818 0.85143 0.86894 0.88437
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.66073 0.64057 0.62530 0.56324 0.53226 0.50676 0.48259
## Proportion of Variance 0.01247 0.01172 0.01117 0.00906 0.00809 0.00734 0.00665
## Cumulative Proportion 0.89684 0.90857 0.91974 0.92880 0.93690 0.94423 0.95089
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.46461 0.44981 0.43194 0.41924 0.37636 0.36206 0.35340
## Proportion of Variance 0.00617 0.00578 0.00533 0.00502 0.00405 0.00375 0.00357
## Cumulative Proportion 0.95706 0.96284 0.96817 0.97319 0.97724 0.98098 0.98455
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.32627 0.32267 0.28071 0.26706 0.2644 0.23600 0.16168
## Proportion of Variance 0.00304 0.00297 0.00225 0.00204 0.0020 0.00159 0.00075
## Cumulative Proportion 0.98759 0.99057 0.99282 0.99486 0.9969 0.99844 0.99919
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.15927 0.02598 0.02570 0.02361 0.02220 0.01782 0.01563
## Proportion of Variance 0.00072 0.00002 0.00002 0.00002 0.00001 0.00001 0.00001
## Cumulative Proportion 0.99992 0.99994 0.99995 0.99997 0.99998 0.99999 1.00000
The results indicate that a substantial proportion of the total variance is captured by a small number of principal components. Specifically, the first four components explain approximately 82.8% of the original variance, demonstrating a strong dimensionality reduction with minimal information loss. When the number of retained components is increased to nine, the cumulative explained variance exceeds 90%, indicating that most of the variability present in the original dataset has been preserved.
Beyond this point, the contribution of additional components becomes progressively smaller. In particular, after approximately 20 components, the increase in cumulative explained variance is marginal, suggesting diminishing returns from retaining further dimensions. This pattern confirms that the underlying structure of the data can be effectively summarized using a limited number of principal components.
To facilitate interpretation of the retained principal components, a variable projection plot was constructed for the first two principal components. This biplot illustrates the contributions of the original variables to the reduced component space and provides insight into the relationships among attributes as well as their influence on each component.
fviz_pca_var(player_stats_s_pca1, col.var="steelblue")
Together, the first two principal components capture the dominant performance dimensions of football players and provide an interpretable low-dimensional representation of the original high-dimensional attribute space.
The analysis of principal components focuses on determining the appropriate number of components to retain and assessing their contribution to explaining the variability of the original dataset. Several complementary criteria were employed for this purpose, including the scree plot, the Kaiser criterion, and the cumulative explained variance.
eigen_plot <- fviz_eig(player_stats_s_pca1, choice='eigenvalue', ylim = c(0, 17),
addlabels=TRUE, main="Eigenvalues of Principal Components")
variances_plot <- fviz_eig(player_stats_s_pca1, ylim = c(0, 50),
addlabels=TRUE, main="Percentage of Variance Explained by Principal Components")
grid.arrange(eigen_plot, variances_plot, nrow=2)
According to the Kaiser rule, only components with eigenvalues greater than one should be retained, as such components explain more variance than an individual standardized variable. The scree plot illustrates the distribution of eigenvalues across successive components and reveals a pronounced decline after the fourth component. Only the first four principal components satisfy the Kaiser criterion, indicating that they capture the most meaningful structure in the data.
a<-summary(player_stats_s_pca1)
plot(a$importance[3,],type="l")
This conclusion is further supported by the cumulative explained variance curve, which shows that the first four components account for over 80% of the total variance. Beyond this point, the curve flattens considerably, and additional components contribute only marginal increases in explained variance. Together, these results suggest that retaining four principal components provides an efficient and interpretable low-dimensional representation of the original high-dimensional player attribute space.
The contribution plots provide insight into the relative importance of individual variables in defining each of the retained principal components. Variables with contributions exceeding the average threshold (indicated by the dashed line) play a dominant role in shaping the corresponding component and are therefore central to its interpretation.
PC1 <- fviz_contrib(player_stats_s_pca1, choice = "var", axes = 1, top=10)
PC2 <- fviz_contrib(player_stats_s_pca1, choice = "var", axes = 2, top=10)
PC3 <- fviz_contrib(player_stats_s_pca1, choice = "var", axes = 3, top=10)
PC4 <- fviz_contrib(player_stats_s_pca1, choice = "var", axes = 4, top=10)
grid.arrange(PC1, PC2)
The first principal component (Dim-1) is primarily driven by technical and attacking attributes, including dribbling, ball control, passing, shooting, and finishing. Mental attributes related to vision and positioning also contribute substantially. This component captures overall technical proficiency and offensive capability, reflecting a general measure of player quality in possession and attacking play.
The second principal component (Dim-2) is dominated by defensive and physical variables, such as marking awareness, standing and sliding tackles, interceptions, aggression, and strength. These attributes clearly distinguish players with strong defensive and physical profiles from those whose strengths lie in offensive skills. Accordingly, Dim-2 can be interpreted as a defensive and physical intensity dimension.
grid.arrange(PC3, PC4)
The third principal component (Dim-3) is characterized by contributions from power and movement related attributes, including strength, jumping ability, balance, agility, and acceleration. This component represents physical robustness and aerial ability, emphasizing attributes associated with physical duels and stability rather than technical finesse.
The fourth principal component (Dim-4) is strongly influenced by pace related variables, particularly sprint speed, acceleration, and overall pace, with additional contributions from jumping and stamina. This component captures explosiveness and speed, distinguishing fast, dynamic players from those with lower mobility.
This project employed principal component analysis to examine football player attributes from EA Sports FC 26. The dataset comprised 16343 players described by 35 performance variables covering technical, physical, mental, and defensive aspects of play. Correlation analysis confirmed the presence of substantial interdependencies among variables, justifying the application of PCA.
Using the scree plot, Kaiser criterion, and cumulative explained variance, four principal components were retained, collectively accounting for over 80% of the total variance. These components represent key dimensions of player performance, including technical ability, defensive and physical strength, power, and speed. The findings demonstrate that PCA provides an effective method for reducing the dimensionality of football performance data while preserving its essential structure.