Importing data and dropping units of observation without available data (goalkeepers)

data <- read.table("./playersFIFA21.csv", header=TRUE, sep=",", dec=";")

mydata <- data %>% drop_na(pace)

head(mydata)
##          short_name age height_cm weight_kg nationality overall potential
## 1          L. Messi  33       170        72   Argentina      93        93
## 2 Cristiano Ronaldo  35       187        83    Portugal      92        92
## 3    R. Lewandowski  31       184        80      Poland      91        91
## 4         Neymar Jr  28       175        68      Brazil      91        91
## 5      K. De Bruyne  29       181        70     Belgium      91        91
## 6         K. Mbappe  21       178        73      France      90        95
##   value_eur preferred_foot weak_foot skill_moves pace shooting passing
## 1  67500000           Left         4           4   85       92      91
## 2  46000000          Right         4           5   89       93      81
## 3  80000000          Right         4           4   78       91      78
## 4  90000000          Right         5           5   91       85      86
## 5  87000000          Right         5           4   76       86      93
## 6 105500000          Right         4           5   96       86      78
##   dribbling defending physic gk_diving gk_handling gk_kicking gk_reflexes
## 1        95        38     65        NA          NA         NA          NA
## 2        89        35     77        NA          NA         NA          NA
## 3        85        43     82        NA          NA         NA          NA
## 4        94        36     59        NA          NA         NA          NA
## 5        88        64     78        NA          NA         NA          NA
## 6        91        39     76        NA          NA         NA          NA
##   gk_speed gk_positioning
## 1       NA             NA
## 2       NA             NA
## 3       NA             NA
## 4       NA             NA
## 5       NA             NA
## 6       NA             NA

Selecting only first 100 units of observation and erasing 6 variables (columns) with goalkeeper skills

mydata <- mydata[1:100,1:17]

colnames(mydata) <- c("Name", "Age", "Height", "Weight", "Nationality", "Overall", "Potential", "Value", "PreferredFoot", "WeakFoot", "SkillMoves", "Pace", "Shooting", "Passing", "Dribbling", "Defending", "Physic")

mydata$Value <- mydata$Value/1000000

head(mydata,10)
##                 Name Age Height Weight Nationality Overall Potential Value
## 1           L. Messi  33    170     72   Argentina      93        93  67.5
## 2  Cristiano Ronaldo  35    187     83    Portugal      92        92  46.0
## 3     R. Lewandowski  31    184     80      Poland      91        91  80.0
## 4          Neymar Jr  28    175     68      Brazil      91        91  90.0
## 5       K. De Bruyne  29    181     70     Belgium      91        91  87.0
## 6          K. Mbappe  21    178     73      France      90        95 105.5
## 7        V. van Dijk  28    193     92 Netherlands      90        91  75.5
## 8            S. Mane  28    175     69     Senegal      90        90  78.0
## 9           M. Salah  28    175     71       Egypt      90        90  78.0
## 10         S. Aguero  32    173     70   Argentina      89        89  53.0
##    PreferredFoot WeakFoot SkillMoves Pace Shooting Passing Dribbling Defending
## 1           Left        4          4   85       92      91        95        38
## 2          Right        4          5   89       93      81        89        35
## 3          Right        4          4   78       91      78        85        43
## 4          Right        5          5   91       85      86        94        36
## 5          Right        5          4   76       86      93        88        64
## 6          Right        4          5   96       86      78        91        39
## 7          Right        3          2   76       60      71        71        91
## 8          Right        4          4   94       85      80        90        44
## 9           Left        3          4   93       86      81        90        45
## 10         Right        4          4   78       90      77        88        33
##    Physic
## 1      65
## 2      77
## 3      82
## 4      59
## 5      78
## 6      76
## 7      86
## 8      76
## 9      75
## 10     73

Description of Variables:

Name: Name of a player

Age: Age of a player in years

Height: Height of a player in cm

Weight: Weight of a player in kg

Nationality: Player’s nationality

Overall: Overall rating (1-100)

Potential: Potential rating (1-100)

Value: Value of a player in million EUR

PreferredFoot: The natural preference of player’s left or right foot

WeakFoot: Assessed ability to play with weak (not preferred) foot (1-5)

SkillMoves: Assessed skill moves (1-5)

Pace: Pace rating (1-100)

Shooting: Shooting rating (1-100)

Passing: Passing rating (1-100)

Dribbling: Dribbling rating (1-100)

Defending: Defending rating (1-100)

Physic: Physic rating (1-100)

Descriptive Statistics

round(describe(mydata[c(-1,-5,-9)]),1)
##            vars   n  mean   sd median trimmed  mad   min   max range skew
## Age           1 100  27.6  3.8   28.0    27.7  4.4  19.0  35.0    16  0.0
## Height        2 100 181.1  7.1  181.0   181.2  8.2 163.0 195.0    32 -0.2
## Weight        3 100  76.1  7.5   75.5    75.8  8.2  59.0  97.0    38  0.3
## Overall       4 100  86.2  2.0   85.5    86.0  2.2  84.0  93.0     9  1.1
## Potential     5 100  87.9  2.3   88.0    87.7  3.0  85.0  95.0    10  0.6
## Value         6 100  45.9 16.5   44.2    44.8 12.2  11.5 105.5    94  0.8
## WeakFoot      7 100   3.5  0.7    4.0     3.5  1.5   2.0   5.0     3 -0.3
## SkillMoves    8 100   3.4  0.9    3.0     3.4  1.5   2.0   5.0     3  0.0
## Pace          9 100  76.4 10.9   77.5    76.9 11.9  42.0  96.0    54 -0.4
## Shooting     10 100  72.8 14.1   77.5    74.4 11.1  28.0  93.0    65 -1.0
## Passing      11 100  77.9  8.3   80.0    78.7  5.9  53.0  93.0    40 -0.9
## Dribbling    12 100  81.0  8.9   83.5    82.0  6.7  49.0  95.0    46 -1.1
## Defending    13 100  63.3 20.2   69.0    63.8 26.7  29.0  91.0    62 -0.1
## Physic       14 100  73.8  8.9   76.0    74.6  8.9  44.0  91.0    47 -0.8
##            kurtosis  se
## Age            -0.7 0.4
## Height         -0.6 0.7
## Weight          0.1 0.7
## Overall         0.9 0.2
## Potential      -0.2 0.2
## Value           1.2 1.7
## WeakFoot       -0.4 0.1
## SkillMoves     -0.9 0.1
## Pace           -0.1 1.1
## Shooting        0.2 1.4
## Passing         0.5 0.8
## Dribbling       1.1 0.9
## Defending      -1.7 2.0
## Physic          0.2 0.9

AGE

Mean Average age of players (units of observation) is 27.6 years.

Median 50% of all observations of data sample are older than 28 years, 50% of observations are younger.

HEIGHT

Mean Average height of players (units of observation) is 181.1 centimeters.

Median 50% of all observations of data sample are higher than 181 centimeters, 50% of observations are smaller.

WEIGHT

Mean Average weight of players (units of observation) is 76.1 kilograms.

Median 50% of all observations of data have more than 181 kilograms, 50% of observations weigh less.

OVERALL RATING

Mean Average overall rating of players (units of observation) is 86.2 out of 100.

Median 50% of all observations of data have overall rating higher than 85.5 out of 100, 50% of observations have less.

POTENTIAL

Mean Average potential of players (units of observation) is 87.9 out of 100.

Median 50% of all observations of data have potential higher than 88 out of 100, 50% of observations have less.

VALUE

Mean Average value of players (units of observation) is 45.9 million EUR.

Median 50% of all observations of data have value higher than 44.2 million EUR, 50% of observations are valued less.

WEAK FOOT RATING

Mean Average weak foot rating of players (units of observation) is 3.5 out of 5.

Median 50% of all observations of data have weak foot rating higher than 4 out of 5, 50% of observations have less.

SKILL MOVES RATING

Mean Average skill moves rating of players (units of observation) is 3.4 out of 5.

Median 50% of all observations of data have weak foot rating higher than 3 out of 5, 50% of observations have less.

PACE RATING

Mean Average pace rating of players (units of observation) is 76.4 out of 100.

Median 50% of all observations of data have pace rating higher than 77.5 out of 100, 50% of observations have less.

SHOOTING RATING

Mean Average shooting rating of players (units of observation) is 72.8 out of 100.

Median 50% of all observations of data have shooting rating higher than 77.5 out of 100, 50% of observations have less.

PASSING RATING

Mean Average passing rating of players (units of observation) is 77.9 out of 100.

Median 50% of all observations of data have passing rating higher than 80 out of 100, 50% of observations have less.

DRIBBLING RATING

Mean Average dribbling rating of players (units of observation) is 81.0 out of 100.

Median 50% of all observations of data have dribbling rating higher than 83.5 out of 100, 50% of observations have less.

DEFENDING RATING

Mean Average defending rating of players (units of observation) is 63.3 out of 100.

Median 50% of all observations of data have dribbling rating higher than 69.0 out of 100, 50% of observations have less.

PHYSIC RATING

Mean Average physic rating of players (units of observation) is 73.8 out of 100.

Median 50% of all observations of data have physic rating higher than 76.0 out of 100, 50% of observations have less.

HISTOGRAMS

Histograms: the red line represents the mean value, while the blue line represents the median value.

OVERALL <- ggplot(mydata, aes(x=Overall)) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Overall Rating") +
  ylab("Count") +
  xlab("Overall Rating") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Overall)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Overall)),
             color="blue", size=0.5)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
POTENTIAL <- ggplot(mydata, aes(x=Potential)) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Potential") +
  ylab("Count") +
  xlab("Potential") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Potential)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Potential)),
             color="blue", size=0.5)


ggarrange(OVERALL, POTENTIAL,
          ncol = 2, nrow = 1)

Histogram Overall Rating: There is no normal distribution, positively skewed.

Histogram Potential: There is no normal distribution.

PAC <- ggplot(mydata, aes(x=Pace), fill=PreferredFoot) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Pace Rating") +
  ylab("Count") +
  xlab("Pace Rating") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Pace)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Pace)),
             color="blue", size=0.5)

SHO <- ggplot(mydata, aes(x=Shooting)) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Shooting Rating") +
  ylab("Count") +
  xlab("Shooting Rating") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Shooting)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Shooting)),
             color="blue", size=0.5)

PAS <- ggplot(mydata, aes(x=Passing)) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Passing Rating") +
  ylab("Count") +
  xlab("Passing Rating") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Passing)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Passing)),
             color="blue", size=0.5)

DRI <- ggplot(mydata, aes(x=Dribbling)) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Dribbling Rating") +
  ylab("Count") +
  xlab("Dribbling Rating") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Dribbling)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Dribbling)),
             color="blue", size=0.5)

DEF <- ggplot(mydata, aes(x=Defending)) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Defending Rating") +
  ylab("Count") +
  xlab("Defending Rating") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Defending)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Defending)),
             color="blue", size=0.5)

PHY <- ggplot(mydata, aes(x=Physic)) +
  geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
  ggtitle("Physic Rating") +
  ylab("Count") +
  xlab("Physic Rating") +
  theme(axis.title = element_text(size = 8)) +
  geom_vline(aes(xintercept=mean(Physic)),
             color="red", size=0.5) +
  geom_vline(aes(xintercept=median(Physic)),
             color="blue", size=0.5)


ggarrange(PAC, SHO, PAS, DRI, DEF, PHY,
          ncol = 3, nrow = 2)

Histogram Pace Rating: There is no normal distribution.

Histogram Shooting Rating: There is no normal distribution, negatively skewed.

Histogram Passing Rating: There are signs of normal distribution, slightly negatively skewed.

Histogram Dribbling Rating: There is no normal distribution, negatively skewed.

Histogram Defending Rating: There is no normal distribution, bimodal distribution.

Histogram Physic Rating: There are signs of normal distribution, slightly negatively skewed.

BOXPLOT

ggplot(mydata, aes(y= WeakFoot, fill=PreferredFoot)) +
  geom_boxplot(position=position_dodge(1)) +
  ggtitle("Weak Foot Skill") +
  ylab("Assessed Weak Foot Skill") + 
  ylim(0,5) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

Boxplot chart above graphically represents weak foot skill rating for players who prefer left (red) and right (blue) foot.

Boxplot displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum. In a box plot, we draw a box from the first quartile to the third quartile. Horizontal bolded line represents medain value, vertical line represents range (from min to max). It is clear that players who prefer right foot have better weak foot skill rating.

SCATTERPLOT

scatterplotMatrix( ~ Value + Potential + Overall, data=mydata,
            smooth=FALSE)

The scatter matrix above represents the relationship between players’ potential and overall rating and their market value.

It is clear that both performance indicators; potential and overall rating are proportional to the player’s market value. The correlation between value and potential is greater than between overall rating and player value.