library(readxl)
mydata <- read_excel("./Football.xlsx")
head(mydata)
## # A tibble: 6 × 10
## Rank Name Position Age Value Club Games_played Goals Assists Card_yellow
## <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1 Kylia… 4 22 144 Pari… 16 7 11 3
## 2 2 Erlin… 4 21 135 Boru… 10 13 4 1
## 3 3 Harry… 4 28 108 Tott… 16 7 2 2
## 4 4 Jack … 1 26 90 Manc… 15 2 3 1
## 5 5 Moham… 2 29 90 Live… 15 15 6 1
## 6 6 Romel… 4 28 90 Chel… 11 4 1 0
mydata$Position <- factor(mydata$Position,
levels = c(1, 2, 3, 4, 5),
labels = c( "Midfielder", "Winger", "Defender", "Striker", "Goalkeeper"))
library(ggplot2)
ggplot(mydata, aes( x= Position)) +
geom_bar(colour = "blue", fill = "lightblue") +
ylab("Frequency")+
theme_minimal() +
geom_text(stat = "count",
aes(label = ..count..),
vjust = -0.3)
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
library(ggplot2)
ggplot(mydata, aes( x= Games_played, y= Goals))+
geom_point(color="blue", size= 3)+
geom_smooth(method= "lm", se=FALSE, color = "red")+
labs(title= "Scatterplot")
## `geom_smooth()` using formula = 'y ~ x'
d) Average number of cards for defenders
mean(mydata$Card_yellow[mydata$Position=="Defender"])
## [1] 1.913043
The average number of yellow cards for defenders is 1.91.
Hypothesis testing Ho: mu = 1.91 H1: Mu < 1.91
t.test(mydata[mydata$Position == "Striker", ]$Card_yellow,
mu= 1.91,
alternative = "less")
##
## One Sample t-test
##
## data: mydata[mydata$Position == "Striker", ]$Card_yellow
## t = -0.78247, df = 13, p-value = 0.224
## alternative hypothesis: true mean is less than 1.91
## 95 percent confidence interval:
## -Inf 2.247475
## sample estimates:
## mean of x
## 1.642857
We can not reject Ho, because p value is not lower than 0.05. We can not say that the average number of yellow card is smaller for strikers compared to defenders.