See the minutes distribution among Barcelona and Real Madrid players in La Liga 2016/2017. We will check which team is using their 11 main players more and understand whether there’s a significant difference.
library(readxl)
Real_Madrid_and_Barcelona_Full_Stats <- read_excel("~/R/Real Madrid and Barcelona Full Stats.xlsx")
View(Real_Madrid_and_Barcelona_Full_Stats)
football<-Real_Madrid_and_Barcelona_Full_Stats
football
## # A tibble: 59 x 13
## Player Team Minutes Appearances Lineups `Substitute in`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 K. Navas Real Madrid 2430 27 27 0
## 2 Kiko Casilla Real Madrid 990 11 11 0
## 3 Rubén Yáñez Real Madrid 0 0 0 0
## 4 Daniel Carvajal Real Madrid 2014 23 21 2
## 5 Pepe Real Madrid 1081 13 13 0
## 6 Sergio Ramos Real Madrid 2489 28 28 0
## 7 R. Varane Real Madrid 1928 23 23 0
## 8 Nacho Real Madrid 2301 28 24 4
## 9 Marcelo Real Madrid 2277 30 26 4
## 10 Fábio Coentrão Real Madrid 169 3 2 1
## # ... with 49 more rows, and 7 more variables: `Substitute out` <dbl>,
## # `Substitutes on bench` <dbl>, Goal <dbl>, Assist <dbl>, `Yellow
## # card` <dbl>, `Yellow 2nd/RC` <dbl>, `Red card` <dbl>
Let’s produce a scatter plot to see the distribution in minutes among all players.
library(plotly)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.1
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
plot_ly(football, x=~Minutes, y=~Appearances, mode="markers", type="scatter", color=~Team, text = ~paste('Player: ', Player, '</br> Team: ', Team,'</br> Minutes: ', Minutes, '</br> Appearances: ', Appearances))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
Barcelona have 5 players in the top right corner.
Next step: see the minutes distribution among the top 11 players. We’ll use dplyr package
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
football %>% filter(Minutes>0) %>% group_by(Team) %>% top_n (11, Minutes) %>% summarise(total=sum(Minutes), n= n()) %>% mutate(media=total/n)
## # A tibble: 2 x 4
## Team total n media
## <chr> <dbl> <int> <dbl>
## 1 Barcelona 27216 11 2474.182
## 2 Real Madrid 23998 11 2181.636
As we can use, the first 11 players in Barcelona played 2474 minutes on average, 292.5 minutes more than the average on Real Madrid players.