Objective

See the minutes distribution among Barcelona and Real Madrid players in La Liga 2016/2017. We will check which team is using their 11 main players more and understand whether there’s a significant difference.

Upload the data

library(readxl)
Real_Madrid_and_Barcelona_Full_Stats <- read_excel("~/R/Real Madrid and Barcelona Full Stats.xlsx")
View(Real_Madrid_and_Barcelona_Full_Stats)
football<-Real_Madrid_and_Barcelona_Full_Stats
football
## # A tibble: 59 x 13
##             Player        Team Minutes Appearances Lineups `Substitute in`
##              <chr>       <chr>   <dbl>       <dbl>   <dbl>           <dbl>
##  1        K. Navas Real Madrid    2430          27      27               0
##  2    Kiko Casilla Real Madrid     990          11      11               0
##  3     Rubén Yáñez Real Madrid       0           0       0               0
##  4 Daniel Carvajal Real Madrid    2014          23      21               2
##  5            Pepe Real Madrid    1081          13      13               0
##  6    Sergio Ramos Real Madrid    2489          28      28               0
##  7       R. Varane Real Madrid    1928          23      23               0
##  8           Nacho Real Madrid    2301          28      24               4
##  9         Marcelo Real Madrid    2277          30      26               4
## 10  Fábio Coentrão Real Madrid     169           3       2               1
## # ... with 49 more rows, and 7 more variables: `Substitute out` <dbl>,
## #   `Substitutes on bench` <dbl>, Goal <dbl>, Assist <dbl>, `Yellow
## #   card` <dbl>, `Yellow 2nd/RC` <dbl>, `Red card` <dbl>

Plot

Let’s produce a scatter plot to see the distribution in minutes among all players.

library(plotly)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.4.1
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(football, x=~Minutes, y=~Appearances, mode="markers", type="scatter", color=~Team, text = ~paste('Player: ', Player, '</br> Team: ', Team,'</br> Minutes: ', Minutes, '</br> Appearances: ', Appearances))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Barcelona have 5 players in the top right corner.

Next step: see the minutes distribution among the top 11 players. We’ll use dplyr package

Dplyr

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
football %>% filter(Minutes>0) %>% group_by(Team) %>% top_n (11, Minutes) %>% summarise(total=sum(Minutes), n= n()) %>% mutate(media=total/n)
## # A tibble: 2 x 4
##          Team total     n    media
##         <chr> <dbl> <int>    <dbl>
## 1   Barcelona 27216    11 2474.182
## 2 Real Madrid 23998    11 2181.636

As we can use, the first 11 players in Barcelona played 2474 minutes on average, 292.5 minutes more than the average on Real Madrid players.