data <- read.table("./playersFIFA21.csv", header=TRUE, sep=",", dec=";")
mydata <- data %>% drop_na(pace)
head(mydata)
## short_name age height_cm weight_kg nationality overall potential
## 1 L. Messi 33 170 72 Argentina 93 93
## 2 Cristiano Ronaldo 35 187 83 Portugal 92 92
## 3 R. Lewandowski 31 184 80 Poland 91 91
## 4 Neymar Jr 28 175 68 Brazil 91 91
## 5 K. De Bruyne 29 181 70 Belgium 91 91
## 6 K. Mbappe 21 178 73 France 90 95
## value_eur preferred_foot weak_foot skill_moves pace shooting passing
## 1 67500000 Left 4 4 85 92 91
## 2 46000000 Right 4 5 89 93 81
## 3 80000000 Right 4 4 78 91 78
## 4 90000000 Right 5 5 91 85 86
## 5 87000000 Right 5 4 76 86 93
## 6 105500000 Right 4 5 96 86 78
## dribbling defending physic gk_diving gk_handling gk_kicking gk_reflexes
## 1 95 38 65 NA NA NA NA
## 2 89 35 77 NA NA NA NA
## 3 85 43 82 NA NA NA NA
## 4 94 36 59 NA NA NA NA
## 5 88 64 78 NA NA NA NA
## 6 91 39 76 NA NA NA NA
## gk_speed gk_positioning
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
mydata <- mydata[1:100,1:17]
colnames(mydata) <- c("Name", "Age", "Height", "Weight", "Nationality", "Overall", "Potential", "Value", "PreferredFoot", "WeakFoot", "SkillMoves", "Pace", "Shooting", "Passing", "Dribbling", "Defending", "Physic")
mydata$Value <- mydata$Value/1000000
head(mydata,10)
## Name Age Height Weight Nationality Overall Potential Value
## 1 L. Messi 33 170 72 Argentina 93 93 67.5
## 2 Cristiano Ronaldo 35 187 83 Portugal 92 92 46.0
## 3 R. Lewandowski 31 184 80 Poland 91 91 80.0
## 4 Neymar Jr 28 175 68 Brazil 91 91 90.0
## 5 K. De Bruyne 29 181 70 Belgium 91 91 87.0
## 6 K. Mbappe 21 178 73 France 90 95 105.5
## 7 V. van Dijk 28 193 92 Netherlands 90 91 75.5
## 8 S. Mane 28 175 69 Senegal 90 90 78.0
## 9 M. Salah 28 175 71 Egypt 90 90 78.0
## 10 S. Aguero 32 173 70 Argentina 89 89 53.0
## PreferredFoot WeakFoot SkillMoves Pace Shooting Passing Dribbling Defending
## 1 Left 4 4 85 92 91 95 38
## 2 Right 4 5 89 93 81 89 35
## 3 Right 4 4 78 91 78 85 43
## 4 Right 5 5 91 85 86 94 36
## 5 Right 5 4 76 86 93 88 64
## 6 Right 4 5 96 86 78 91 39
## 7 Right 3 2 76 60 71 71 91
## 8 Right 4 4 94 85 80 90 44
## 9 Left 3 4 93 86 81 90 45
## 10 Right 4 4 78 90 77 88 33
## Physic
## 1 65
## 2 77
## 3 82
## 4 59
## 5 78
## 6 76
## 7 86
## 8 76
## 9 75
## 10 73
Name: Name of a player
Age: Age of a player in years
Height: Height of a player in cm
Weight: Weight of a player in kg
Nationality: Player’s nationality
Overall: Overall rating (1-100)
Potential: Potential rating (1-100)
Value: Value of a player in million EUR
PreferredFoot: The natural preference of player’s left or right foot
WeakFoot: Assessed ability to play with weak (not preferred) foot (1-5)
SkillMoves: Assessed skill moves (1-5)
Pace: Pace rating (1-100)
Shooting: Shooting rating (1-100)
Passing: Passing rating (1-100)
Dribbling: Dribbling rating (1-100)
Defending: Defending rating (1-100)
Physic: Physic rating (1-100)
round(describe(mydata[c(-1,-5,-9)]),1)
## vars n mean sd median trimmed mad min max range skew
## Age 1 100 27.6 3.8 28.0 27.7 4.4 19.0 35.0 16 0.0
## Height 2 100 181.1 7.1 181.0 181.2 8.2 163.0 195.0 32 -0.2
## Weight 3 100 76.1 7.5 75.5 75.8 8.2 59.0 97.0 38 0.3
## Overall 4 100 86.2 2.0 85.5 86.0 2.2 84.0 93.0 9 1.1
## Potential 5 100 87.9 2.3 88.0 87.7 3.0 85.0 95.0 10 0.6
## Value 6 100 45.9 16.5 44.2 44.8 12.2 11.5 105.5 94 0.8
## WeakFoot 7 100 3.5 0.7 4.0 3.5 1.5 2.0 5.0 3 -0.3
## SkillMoves 8 100 3.4 0.9 3.0 3.4 1.5 2.0 5.0 3 0.0
## Pace 9 100 76.4 10.9 77.5 76.9 11.9 42.0 96.0 54 -0.4
## Shooting 10 100 72.8 14.1 77.5 74.4 11.1 28.0 93.0 65 -1.0
## Passing 11 100 77.9 8.3 80.0 78.7 5.9 53.0 93.0 40 -0.9
## Dribbling 12 100 81.0 8.9 83.5 82.0 6.7 49.0 95.0 46 -1.1
## Defending 13 100 63.3 20.2 69.0 63.8 26.7 29.0 91.0 62 -0.1
## Physic 14 100 73.8 8.9 76.0 74.6 8.9 44.0 91.0 47 -0.8
## kurtosis se
## Age -0.7 0.4
## Height -0.6 0.7
## Weight 0.1 0.7
## Overall 0.9 0.2
## Potential -0.2 0.2
## Value 1.2 1.7
## WeakFoot -0.4 0.1
## SkillMoves -0.9 0.1
## Pace -0.1 1.1
## Shooting 0.2 1.4
## Passing 0.5 0.8
## Dribbling 1.1 0.9
## Defending -1.7 2.0
## Physic 0.2 0.9
AGE
Mean Average age of players (units of observation) is 27.6 years.
Median 50% of all observations of data sample are older than 28 years, 50% of observations are younger.
HEIGHT
Mean Average height of players (units of observation) is 181.1 centimeters.
Median 50% of all observations of data sample are higher than 181 centimeters, 50% of observations are smaller.
WEIGHT
Mean Average weight of players (units of observation) is 76.1 kilograms.
Median 50% of all observations of data have more than 181 kilograms, 50% of observations weigh less.
OVERALL RATING
Mean Average overall rating of players (units of observation) is 86.2 out of 100.
Median 50% of all observations of data have overall rating higher than 85.5 out of 100, 50% of observations have less.
POTENTIAL
Mean Average potential of players (units of observation) is 87.9 out of 100.
Median 50% of all observations of data have potential higher than 88 out of 100, 50% of observations have less.
VALUE
Mean Average value of players (units of observation) is 45.9 million EUR.
Median 50% of all observations of data have value higher than 44.2 million EUR, 50% of observations are valued less.
WEAK FOOT RATING
Mean Average weak foot rating of players (units of observation) is 3.5 out of 5.
Median 50% of all observations of data have weak foot rating higher than 4 out of 5, 50% of observations have less.
SKILL MOVES RATING
Mean Average skill moves rating of players (units of observation) is 3.4 out of 5.
Median 50% of all observations of data have weak foot rating higher than 3 out of 5, 50% of observations have less.
PACE RATING
Mean Average pace rating of players (units of observation) is 76.4 out of 100.
Median 50% of all observations of data have pace rating higher than 77.5 out of 100, 50% of observations have less.
SHOOTING RATING
Mean Average shooting rating of players (units of observation) is 72.8 out of 100.
Median 50% of all observations of data have shooting rating higher than 77.5 out of 100, 50% of observations have less.
PASSING RATING
Mean Average passing rating of players (units of observation) is 77.9 out of 100.
Median 50% of all observations of data have passing rating higher than 80 out of 100, 50% of observations have less.
DRIBBLING RATING
Mean Average dribbling rating of players (units of observation) is 81.0 out of 100.
Median 50% of all observations of data have dribbling rating higher than 83.5 out of 100, 50% of observations have less.
DEFENDING RATING
Mean Average defending rating of players (units of observation) is 63.3 out of 100.
Median 50% of all observations of data have dribbling rating higher than 69.0 out of 100, 50% of observations have less.
PHYSIC RATING
Mean Average physic rating of players (units of observation) is 73.8 out of 100.
Median 50% of all observations of data have physic rating higher than 76.0 out of 100, 50% of observations have less.
Histograms: the red line represents the mean value, while the blue line represents the median value.
OVERALL <- ggplot(mydata, aes(x=Overall)) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Overall Rating") +
ylab("Count") +
xlab("Overall Rating") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Overall)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Overall)),
color="blue", size=0.5)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
POTENTIAL <- ggplot(mydata, aes(x=Potential)) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Potential") +
ylab("Count") +
xlab("Potential") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Potential)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Potential)),
color="blue", size=0.5)
ggarrange(OVERALL, POTENTIAL,
ncol = 2, nrow = 1)
Histogram Overall Rating: There is no normal distribution, positively skewed.
Histogram Potential: There is no normal distribution.
PAC <- ggplot(mydata, aes(x=Pace), fill=PreferredFoot) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Pace Rating") +
ylab("Count") +
xlab("Pace Rating") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Pace)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Pace)),
color="blue", size=0.5)
SHO <- ggplot(mydata, aes(x=Shooting)) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Shooting Rating") +
ylab("Count") +
xlab("Shooting Rating") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Shooting)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Shooting)),
color="blue", size=0.5)
PAS <- ggplot(mydata, aes(x=Passing)) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Passing Rating") +
ylab("Count") +
xlab("Passing Rating") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Passing)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Passing)),
color="blue", size=0.5)
DRI <- ggplot(mydata, aes(x=Dribbling)) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Dribbling Rating") +
ylab("Count") +
xlab("Dribbling Rating") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Dribbling)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Dribbling)),
color="blue", size=0.5)
DEF <- ggplot(mydata, aes(x=Defending)) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Defending Rating") +
ylab("Count") +
xlab("Defending Rating") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Defending)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Defending)),
color="blue", size=0.5)
PHY <- ggplot(mydata, aes(x=Physic)) +
geom_histogram(position=position_identity(),alpha=1, binwidth=1) +
ggtitle("Physic Rating") +
ylab("Count") +
xlab("Physic Rating") +
theme(axis.title = element_text(size = 8)) +
geom_vline(aes(xintercept=mean(Physic)),
color="red", size=0.5) +
geom_vline(aes(xintercept=median(Physic)),
color="blue", size=0.5)
ggarrange(PAC, SHO, PAS, DRI, DEF, PHY,
ncol = 3, nrow = 2)
Histogram Pace Rating: There is no normal distribution.
Histogram Shooting Rating: There is no normal distribution, negatively skewed.
Histogram Passing Rating: There are signs of normal distribution, slightly negatively skewed.
Histogram Dribbling Rating: There is no normal distribution, negatively skewed.
Histogram Defending Rating: There is no normal distribution, bimodal distribution.
Histogram Physic Rating: There are signs of normal distribution, slightly negatively skewed.
ggplot(mydata, aes(y= WeakFoot, fill=PreferredFoot)) +
geom_boxplot(position=position_dodge(1)) +
ggtitle("Weak Foot Skill") +
ylab("Assessed Weak Foot Skill") +
ylim(0,5) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
Boxplot chart above graphically represents weak foot skill rating for players who prefer left (red) and right (blue) foot.
Boxplot displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum. In a box plot, we draw a box from the first quartile to the third quartile. Horizontal bolded line represents medain value, vertical line represents range (from min to max). It is clear that players who prefer right foot have better weak foot skill rating.
scatterplotMatrix( ~ Value + Potential + Overall, data=mydata,
smooth=FALSE)
The scatter matrix above represents the relationship between players’ potential and overall rating and their market value.
It is clear that both performance indicators; potential and overall rating are proportional to the player’s market value. The correlation between value and potential is greater than between overall rating and player value.