Nogomet

Import the data into program R.

mydata <- read.table("./Football.csv", header=TRUE, sep=";", dec=".")

head(mydata)

##   Rank           Name Position Age Value                Club
## 1    1  Kylian Mbappe        4  22   144 Paris Saint-Germain
## 2    2 Erling Haaland        4  21   135   Borussia Dortmund
## 3    3     Harry Kane        4  28   108   Tottenham Hotspur
## 4    4  Jack Grealish        1  26    90     Manchester City
## 5    5  Mohamed Salah        2  29    90        Liverpool FC
## 6    6  Romelu Lukaku        4  28    90          Chelsea FC
##   Games_played Goals Assists Card_yellow
## 1           16     7      11           3
## 2           10    13       4           1
## 3           16     7       2           2
## 4           15     2       3           1
## 5           15    15       6           1
## 6           11     4       1           0

Description of the variables: - Rank: The rank of the footballer in relation to the value.

Name: Name of the player.
Position: Player position (1:midfielder, 2:winger, 3:defender, 4:striker, 5:goalkeeper)
Age: Age of the player.
Value: Estimated value of the player in million EUR for the year 2021.
Club: The club where the player plays.
Games played: Number of games played in 2021.
Goals: Number of goals scored in 2021.
Assists: Number of assists given in 2021.
Card_yellow: Number of yellow cards received in 2021.

mydata$Position <- factor(mydata$Position, 
                           levels = c(1, 2, 3, 4, 5), 
                           labels = c("midfielder", "winger", "defender", "striker", "goalkeeper"))

Display the frequency distribution of the players’ positions.

library(ggplot2)
library(ggplot2)
ggplot(mydata, aes(x = Position)) +
  geom_bar() +
  ylab("Frequency") +
  xlab("Position")

Draw a scatterplot between the number of games played and the number of goals scored and explain it.

library(car)
scatterplot(y = mydata$Goals,
            x = mydata$Games_played,
            ylab = "Goals scored",
            xlab = "Games played",
            smooth = FALSE)

library(ggplot2)
ggplot(mydata, aes(x=Games_played, y=Goals)) +
  geom_point(color = "chocolate1")

The relationship between the variables is positive - the more games you play, the more goals you score.

Estimate the average number of yellow cards for defenders. Can you say that the average number of yellow cards for strikers is lower?

library(psych)

## 
## Attaching package: 'psych'

## The following object is masked from 'package:onewaytests':
## 
##     describe

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

## The following object is masked from 'package:car':
## 
##     logit

describeBy(mydata$Card_yellow, group = mydata$Position)

## 
##  Descriptive statistics by group 
## group: midfielder
##    vars  n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 48 1.56 1.44      1    1.45 1.48   0   5     5 0.52     -0.9
##      se
## X1 0.21
## ---------------------------------------------------- 
## group: winger
##    vars n mean   sd median trimmed mad min max range skew kurtosis
## X1    1 9 0.44 0.73      0    0.44   0   0   2     2 1.04     -0.5
##      se
## X1 0.24
## ---------------------------------------------------- 
## group: defender
##    vars  n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 23 1.91 1.24      2    1.84 1.48   0   5     5  0.7        0
##      se
## X1 0.26
## ---------------------------------------------------- 
## group: striker
##    vars  n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 14 1.64 1.28    1.5    1.58 1.48   0   4     4 0.22    -1.29
##      se
## X1 0.34
## ---------------------------------------------------- 
## group: goalkeeper
##    vars n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 6    1 0.89      1       1 1.48   0   2     2    0    -1.96
##      se
## X1 0.37

Average number for defender is 1.91 yellow cards. H0: Mu = 1.91 H1: Mu < 1.91 This is one-sided test.

t.test(mydata[mydata$Position == "striker", ]$Card_yellow,
       mu = 1.91,
       alternative = "less")

## 
##  One Sample t-test
## 
## data:  mydata[mydata$Position == "striker", ]$Card_yellow
## t = -0.78247, df = 13, p-value = 0.224
## alternative hypothesis: true mean is less than 1.91
## 95 percent confidence interval:
##      -Inf 2.247475
## sample estimates:
## mean of x 
##  1.642857

table(mydata$Position)

## 
## midfielder     winger   defender    striker goalkeeper 
##         48          9         23         14          6

We can’t say that the average number of yellow cards is different from 1.91. (p-value is too high, around 22%)