Introduction

In this analysis, we explore a dataset of FIFA 23 players and apply dimension reduction techniques to better understand the data. Dimension reduction methods such as PCA (Principal Component Analysis) and t-SNE (t-Distributed Stochastic Neighbor Embedding) help us reduce the complexity of high-dimensional data while retaining its important features. This can be particularly useful in visualizing high-dimensional data and uncovering patterns or structures that might not be immediately apparent.

Objective:

The main goal of this project is to reduce the dimensionality of the FIFA 23 Players Data using PCA and t-SNE, visualize the results, and uncover any latent structures or relationships within the data that can aid in further analysis or clustering.


Step 1: Data Loading and Preprocessing

We begin by loading the FIFA 23 Players Data, inspecting the structure, and performing some basic preprocessing. This includes removing categorical variables and scaling the numerical features to make them suitable for dimension reduction techniques.

# Loading necessary libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(FactoMineR)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(Rtsne)

# Setting working directory
setwd("/Users/zayne/Desktop")

# Loading the FIFA 23 Players Data
data <- read.csv("Fifa 23 Players Data.csv")

# Checking the structure of the dataset
str(data)
## 'data.frame':    18539 obs. of  89 variables:
##  $ Known.As                   : chr  "L. Messi" "K. Benzema" "R. Lewandowski" "K. De Bruyne" ...
##  $ Full.Name                  : chr  "Lionel Messi" "Karim Benzema" "Robert Lewandowski" "Kevin De Bruyne" ...
##  $ Overall                    : int  91 91 91 91 91 90 90 90 90 90 ...
##  $ Potential                  : int  91 91 91 91 95 90 91 90 90 90 ...
##  $ Value.in.Euro.             : int  54000000 64000000 84000000 107500000 190500000 115500000 90000000 13500000 41000000 98000000 ...
##  $ Positions.Played           : chr  "RW" "CF,ST" "ST" "CM,CAM" ...
##  $ Best.Position              : chr  "CAM" "CF" "ST" "CM" ...
##  $ Nationality                : chr  "Argentina" "France" "Poland" "Belgium" ...
##  $ Image.Link                 : chr  "https://cdn.sofifa.net/players/158/023/23_60.png" "https://cdn.sofifa.net/players/165/153/23_60.png" "https://cdn.sofifa.net/players/188/545/23_60.png" "https://cdn.sofifa.net/players/192/985/23_60.png" ...
##  $ Age                        : int  35 34 33 31 23 30 30 36 37 30 ...
##  $ Height.in.cm.              : int  169 185 185 181 182 175 199 193 187 193 ...
##  $ Weight.in.kg.              : int  67 81 81 70 73 71 96 93 83 92 ...
##  $ TotalStats                 : int  2190 2147 2205 2303 2177 2226 1334 1535 2159 2117 ...
##  $ BaseStats                  : int  452 455 458 483 470 471 473 501 445 461 ...
##  $ Club.Name                  : chr  "Paris Saint-Germain" "Real Madrid CF" "FC Barcelona" "Manchester City" ...
##  $ Wage.in.Euro.              : int  195000 450000 420000 350000 230000 270000 250000 72000 220000 230000 ...
##  $ Release.Clause             : int  99900000 131199999 172200000 198900000 366700000 213700000 191300000 22300000 77900000 181300000 ...
##  $ Club.Position              : chr  "RW" "CF" "ST" "CM" ...
##  $ Contract.Until             : chr  "2023" "2023" "2025" "2025" ...
##  $ Club.Jersey.Number         : chr  "30" "9" "9" "17" ...
##  $ Joined.On                  : int  2021 2009 2022 2015 2018 2017 2018 2011 2021 2018 ...
##  $ On.Loan                    : chr  "-" "-" "-" "-" ...
##  $ Preferred.Foot             : chr  "Left" "Right" "Right" "Right" ...
##  $ Weak.Foot.Rating           : int  4 4 4 5 4 3 3 4 4 3 ...
##  $ Skill.Moves                : int  4 4 4 4 5 4 1 1 5 2 ...
##  $ International.Reputation   : int  5 4 5 4 4 4 4 5 5 4 ...
##  $ National.Team.Name         : chr  "Argentina" "France" "Poland" "Belgium" ...
##  $ National.Team.Image.Link   : chr  "https://cdn.sofifa.net/flags/ar.png" "https://cdn.sofifa.net/flags/fr.png" "https://cdn.sofifa.net/flags/pl.png" "https://cdn.sofifa.net/flags/be.png" ...
##  $ National.Team.Position     : chr  "RW" "ST" "ST" "RF" ...
##  $ National.Team.Jersey.Number: chr  "10" "19" "9" "7" ...
##  $ Attacking.Work.Rate        : chr  "Low" "Medium" "High" "High" ...
##  $ Defensive.Work.Rate        : chr  "Low" "Medium" "Medium" "High" ...
##  $ Pace.Total                 : int  81 80 75 74 97 90 84 87 81 81 ...
##  $ Shooting.Total             : int  89 88 91 88 89 89 89 88 92 60 ...
##  $ Passing.Total              : int  90 83 79 93 80 82 75 91 78 71 ...
##  $ Dribbling.Total            : int  94 87 86 87 92 90 90 88 85 72 ...
##  $ Defending.Total            : int  34 39 44 64 36 45 46 56 34 91 ...
##  $ Physicality.Total          : int  64 78 83 77 76 75 89 91 75 86 ...
##  $ Crossing                   : int  84 75 71 94 78 80 14 15 80 53 ...
##  $ Finishing                  : int  90 92 94 85 93 93 14 13 93 52 ...
##  $ Heading.Accuracy           : int  70 90 91 55 72 59 13 25 90 87 ...
##  $ Short.Passing              : int  91 89 84 93 85 84 33 60 80 79 ...
##  $ Volleys                    : int  88 88 89 83 83 84 12 11 86 45 ...
##  $ Dribbling                  : int  95 87 85 88 93 90 13 30 85 70 ...
##  $ Curve                      : int  93 82 79 89 80 84 19 14 81 60 ...
##  $ Freekick.Accuracy          : int  93 73 85 83 69 69 20 11 79 70 ...
##  $ LongPassing                : int  90 76 70 93 71 77 35 68 75 86 ...
##  $ BallControl                : int  93 91 89 90 91 88 23 46 88 76 ...
##  $ Acceleration               : int  87 79 76 76 97 89 42 54 79 68 ...
##  $ Sprint.Speed               : int  76 80 75 73 97 91 52 60 83 91 ...
##  $ Agility                    : int  91 78 77 76 93 90 63 51 77 61 ...
##  $ Reactions                  : int  92 92 93 91 93 93 84 87 94 89 ...
##  $ Balance                    : int  95 72 82 78 81 91 45 35 67 53 ...
##  $ Shot.Power                 : int  86 87 91 92 88 83 56 68 93 81 ...
##  $ Jumping                    : int  68 79 85 63 77 69 68 77 95 88 ...
##  $ Stamina                    : int  70 82 76 88 87 87 38 43 76 74 ...
##  $ Strength                   : int  68 82 87 74 76 75 70 80 77 93 ...
##  $ Long.Shots                 : int  91 80 84 91 82 85 17 16 90 64 ...
##  $ Aggression                 : int  44 63 81 75 64 63 23 29 63 85 ...
##  $ Interceptions              : int  40 39 49 66 38 55 15 30 29 90 ...
##  $ Positioning                : int  93 92 94 88 92 92 13 12 95 47 ...
##  $ Vision                     : int  94 89 81 94 83 85 44 70 76 65 ...
##  $ Penalties                  : int  75 84 90 83 80 86 27 47 90 62 ...
##  $ Composure                  : int  96 90 88 89 88 92 66 70 95 90 ...
##  $ Marking                    : int  20 43 35 68 26 38 20 17 24 92 ...
##  $ Standing.Tackle            : int  35 24 42 65 34 43 18 10 32 92 ...
##  $ Sliding.Tackle             : int  24 18 19 53 32 41 16 11 24 86 ...
##  $ Goalkeeper.Diving          : int  6 13 15 15 13 14 84 87 7 13 ...
##  $ Goalkeeper.Handling        : int  11 11 6 13 5 14 89 88 11 10 ...
##  $ GoalkeeperKicking          : int  15 5 12 5 7 9 75 91 15 13 ...
##  $ Goalkeeper.Positioning     : int  14 5 8 10 11 11 89 91 14 11 ...
##  $ Goalkeeper.Reflexes        : int  8 7 10 13 6 14 90 88 11 11 ...
##  $ ST.Rating                  : int  90 91 91 86 92 89 34 43 90 74 ...
##  $ LW.Rating                  : int  90 87 85 88 90 88 29 40 86 68 ...
##  $ LF.Rating                  : int  91 89 88 87 90 88 31 43 88 70 ...
##  $ CF.Rating                  : int  91 89 88 87 90 88 31 43 88 70 ...
##  $ RF.Rating                  : int  91 89 88 87 90 88 31 43 88 70 ...
##  $ RW.Rating                  : int  90 87 85 88 90 88 29 40 86 68 ...
##  $ CAM.Rating                 : int  91 91 88 91 92 90 35 50 88 73 ...
##  $ LM.Rating                  : int  91 89 86 91 92 90 34 47 87 73 ...
##  $ CM.Rating                  : int  88 84 83 91 84 85 35 53 81 79 ...
##  $ RM.Rating                  : int  91 89 86 91 92 90 34 47 87 73 ...
##  $ LWB.Rating                 : int  67 67 67 82 70 74 32 39 65 83 ...
##  $ CDM.Rating                 : int  66 67 69 82 66 71 34 46 62 88 ...
##  $ RWB.Rating                 : int  67 67 67 82 70 74 32 39 65 83 ...
##  $ LB.Rating                  : int  62 63 64 78 66 70 32 38 61 85 ...
##  $ CB.Rating                  : int  53 58 63 72 57 61 32 37 56 90 ...
##  $ RB.Rating                  : int  62 63 64 78 66 70 32 38 61 85 ...
##  $ GK.Rating                  : int  22 21 22 24 21 25 90 90 23 23 ...
# Removing in dataset if it contains categorical variables, 
data_numeric <- select_if(data, is.numeric)

# Standardizing the data
data_scaled <- scale(data_numeric)

# Displaying the first few rows of the scaled data
head(data_scaled)
##       Overall Potential Value.in.Euro.        Age Height.in.cm. Weight.in.kg.
## [1,] 3.704575  3.226831       6.695963  2.0685146   -1.83007610    -1.1654374
## [2,] 3.704575  3.226831       8.005698  1.8565677    0.50293272     0.8306864
## [3,] 3.704575  3.226831      10.625169  1.6446207    0.50293272     0.8306864
## [4,] 3.704575  3.226831      13.703048  1.2207269   -0.08031948    -0.7376966
## [5,] 3.704575  3.872735      24.573853 -0.4748484    0.06549357    -0.3099558
## [6,] 3.557263  3.065355      14.750836  1.0087800   -0.95519779    -0.5951163
##      TotalStats BaseStats Wage.in.Euro. Release.Clause  Joined.On
## [1,]   2.152163  2.373402      9.566823        6.46252  0.3076090
## [2,]   1.994747  2.449105     22.670268        8.59583 -5.5279118
## [3,]   2.207076  2.524809     21.128687       11.39026  0.7939024
## [4,]   2.565840  3.155672     17.531662       13.21005 -2.6101514
## [5,]   2.104572  2.827623     11.365335       24.64677 -1.1512712
## [6,]   2.283954  2.852858     13.420778       14.21877 -1.6375646
##      Weak.Foot.Rating Skill.Moves International.Reputation Pace.Total
## [1,]        1.5636654    2.115218                10.909157  1.2190469
## [2,]        1.5636654    2.115218                 8.121726  1.1251459
## [3,]        1.5636654    2.115218                10.909157  0.6556408
## [4,]        3.0478346    2.115218                 8.121726  0.5617398
## [5,]        1.5636654    3.409838                 8.121726  2.7214634
## [6,]        0.0794962    2.115218                 8.121726  2.0641562
##      Shooting.Total Passing.Total Dribbling.Total Defending.Total
## [1,]       2.586085      3.290372        3.308545      -0.9907794
## [2,]       2.512662      2.570056        2.558805      -0.6857624
## [3,]       2.732929      2.158446        2.451699      -0.3807455
## [4,]       2.512662      3.599079        2.558805       0.8393222
## [5,]       2.586085      2.261349        3.094333      -0.8687726
## [6,]       2.586085      2.467153        2.880122      -0.3197421
##      Physicality.Total Crossing Finishing Heading.Accuracy Short.Passing
## [1,]       -0.08095235 1.930027  2.229156        1.0481725      2.234634
## [2,]        1.38077422 1.426879  2.331073        2.2029771      2.094653
## [3,]        1.90281942 1.203258  2.432990        2.2607174      1.744702
## [4,]        1.27636518 2.489079  1.974365        0.1820691      2.374614
## [5,]        1.17195614 1.594595  2.382031        1.1636530      1.814692
## [6,]        1.06754710 1.706406  2.382031        0.4130300      1.744702
##       Volleys Dribbling    Curve Freekick.Accuracy LongPassing BallControl
## [1,] 2.579269  2.085212 2.529556          2.941171    2.489544    2.078579
## [2,] 2.579269  1.658584 1.915381          1.764545    1.532857    1.958025
## [3,] 2.635974  1.551927 1.747879          2.470521    1.122848    1.837471
## [4,] 2.295746  1.711912 2.306220          2.352858    2.694548    1.897748
## [5,] 2.295746  1.978555 1.803713          1.529220    1.191183    1.958025
## [6,] 2.352451  1.818569 2.027049          1.529220    1.601192    1.777194
##      Acceleration Sprint.Speed   Agility Reactions   Balance Shot.Power
## [1,]    1.4576850    0.7387786 1.8437092  3.422067 2.1363003   2.175492
## [2,]    0.9341539    1.0035345 0.9715380  3.422067 0.5482527   2.252713
## [3,]    0.7378297    0.6725897 0.9044479  3.534423 1.2387082   2.561596
## [4,]    0.7378297    0.5402118 0.8373578  3.309711 0.9625260   2.638817
## [5,]    2.1120989    2.1287469 1.9778894  3.534423 1.1696626   2.329934
## [6,]    1.5885678    1.7316131 1.7766191  3.534423 1.8601181   1.943830
##         Jumping   Stamina  Strength Long.Shots Aggression Interceptions
## [1,]  0.2606603 0.4277708 0.2256236   2.281486 -0.6902211    -0.3316107
## [2,]  1.1554404 1.1653550 1.3347301   1.713365  0.4336731    -0.3799979
## [3,]  1.6435023 0.7965629 1.7308395   1.919954  1.4984151     0.1038735
## [4,] -0.1460579 1.5341470 0.7009550   2.281486  1.1435011     0.9264550
## [5,]  0.9927531 1.4726817 0.8593987   1.816660  0.4928255    -0.4283850
## [6,]  0.3420040 1.4726817 0.7801768   1.971602  0.4336731     0.3941964
##      Positioning   Vision Penalties Composure    Marking Standing.Tackle
## [1,]    2.160710 2.953587  1.716816  3.154081 -1.3143590      -0.6322850
## [2,]    2.109846 2.582612  2.288970  2.655588 -0.1841505      -1.1513575
## [3,]    2.211575 1.989052  2.670406  2.489423 -0.5772665      -0.3019661
## [4,]    1.906387 2.953587  2.225397  2.572505  1.0443369       0.7833673
## [5,]    2.109846 2.137442  2.034679  2.489423 -1.0195220      -0.6794734
## [6,]    2.109846 2.285832  2.416115  2.821752 -0.4298480      -0.2547777
##      Sliding.Tackle Goalkeeper.Diving Goalkeeper.Handling GoalkeeperKicking
## [1,]     -1.0755030       -0.59138411          -0.3047212        -0.0636063
## [2,]     -1.3653421       -0.19341844          -0.3047212        -0.6630965
## [3,]     -1.3170356       -0.07971396          -0.6001549        -0.2434534
## [4,]      0.3253856       -0.07971396          -0.1865477        -0.6630965
## [5,]     -0.6890510       -0.19341844          -0.6592417        -0.5431985
## [6,]     -0.2542925       -0.13656620          -0.1274609        -0.4233004
##      Goalkeeper.Positioning Goalkeeper.Reflexes ST.Rating LW.Rating LF.Rating
## [1,]             -0.1290848          -0.4726173   2.46927  2.336032  2.481980
## [2,]             -0.6557360          -0.5283972   2.54348  2.131002  2.341299
## [3,]             -0.4801856          -0.3610575   2.54348  1.994316  2.270958
## [4,]             -0.3631520          -0.1937178   2.17243  2.199345  2.200617
## [5,]             -0.3046352          -0.5841771   2.61769  2.336032  2.411640
## [6,]             -0.3046352          -0.1379379   2.39506  2.199345  2.270958
##      CF.Rating RF.Rating RW.Rating CAM.Rating LM.Rating CM.Rating RM.Rating
## [1,]  2.481980  2.481980  2.336032   2.376748  2.327046  2.325220  2.327046
## [2,]  2.341299  2.341299  2.131002   2.376748  2.184058  2.021527  2.184058
## [3,]  2.270958  2.270958  1.994316   2.161005  1.969575  1.945604  1.969575
## [4,]  2.200617  2.200617  2.199345   2.376748  2.327046  2.552990  2.327046
## [5,]  2.411640  2.411640  2.336032   2.448662  2.398541  2.021527  2.398541
## [6,]  2.270958  2.270958  2.199345   2.304834  2.255552  2.097450  2.255552
##      LWB.Rating CDM.Rating RWB.Rating LB.Rating  CB.Rating RB.Rating
## [1,]  0.7708974  0.7260149  0.7708974 0.4484455 -0.1036483 0.4484455
## [2,]  0.7708974  0.7981016  0.7708974 0.5190697  0.2354743 0.5190697
## [3,]  0.7708974  0.9422749  0.7708974 0.5896938  0.5745969 0.5896938
## [4,]  1.8497364  1.8794016  1.8497364 1.5784317  1.1850177 1.5784317
## [5,]  0.9866652  0.7260149  0.9866652 0.7309421  0.1676498 0.7309421
## [6,]  1.2743556  1.0864483  1.2743556 1.0134386  0.4389479 1.0134386
##        GK.Rating
## [1,] -0.08320471
## [2,] -0.14939075
## [3,] -0.08320471
## [4,]  0.04916739
## [5,] -0.14939075
## [6,]  0.11535344

Step 2: PCA (Principal Component Analysis)

Principal Component Analysis (PCA) is a linear dimension reduction technique that transforms the data into a smaller set of uncorrelated variables, known as principal components, that explain the maximum variance.

We will apply PCA to the scaled data and visualize the results.

# Applying PCA
pca_result <- PCA(data_scaled, graph = FALSE)

# Scree ploting to visualize explained variance
fviz_eig(pca_result, addlabels = TRUE, ylim = c(0, 50))

# Biplot for PCA to visualize the first two components
fviz_pca_biplot(pca_result, label = "var", addEllipses = TRUE)

Explanation:

The scree plot helps us understand how much variance is explained by each principal component. If the first few components explain a large percentage of the variance, we can reduce the number of features with minimal information loss. The biplot visualizes the relationships between the original variables and the first two principal components. This helps in understanding how the features contribute to the variance.

Step 3: t-SNE (t-Distributed Stochastic Neighbor Embedding)

Next, we apply t-SNE, a technique that focuses on reducing dimensions for visualization by retaining local structure and clustering similar points. It’s particularly useful for visualizing high-dimensional data in 2D.

# Removing duplicate rows from scaled data (before t-SNE)
data_scaled <- unique(data_scaled)

# Applying t-SNE (set perplexity according to dataset size)
tsne_result <- Rtsne(data_scaled, perplexity = 30, theta = 0.5)

# Converting t-SNE results to a data frame
tsne_data <- data.frame(tsne_result$Y)
colnames(tsne_data) <- c("Dim1", "Dim2")

# Ploting the t-SNE results
ggplot(tsne_data, aes(x = Dim1, y = Dim2)) +
  geom_point(color = "blue", alpha = 0.6) +
  labs(title = "t-SNE Visualization", x = "Dimension 1", y = "Dimension 2") +
  theme_minimal()

Explanation:

t-SNE helps to reduce high-dimensional data to two dimensions, making it easier to visualize. The plot shows how data points are clustered in the 2D space. Each dot represents a player, and similar players are grouped together based on their attributes.

Step 4: Results and Interpretation

After applying both PCA and t-SNE, we are able to reduce the complexity of the high-dimensional data and visualize it in 2D space.

PCA Results:

The scree plot indicated that the first few components capture most of the variance, making it possible to reduce the number of features while retaining the most important information. The biplot helped visualize how different features contributed to the variance along the principal components.

t-SNE Results:

The t-SNE plot displayed how the players were grouped in a 2D space, with similar players clustering together. This could suggest that players with similar attributes, such as overall rating or skill level, form natural clusters.

Step 5: Conclusion

In this project, we demonstrated how to apply dimension reduction techniques such as PCA and t-SNE on the FIFA 23 Players Data.

PCA helped in reducing the dimensionality of the dataset while preserving the maximum variance, making it easier to analyze large datasets with fewer variables.

t-SNE provided a visually informative 2D map, showing how the players are grouped in a lower-dimensional space.

These methods are crucial in extracting meaningful patterns from high-dimensional data and can be used as a foundation for further analysis, such as clustering players based on their attributes or improving player recommendations.