(1) Introduction

The dataset includes information for seasons between 2015-16 through 2018-19 but does not include goalkeeper information or players with less than 10 appearances in a season. Using this iformation, I’ve created a separate dataframe for each season. Given that I follow the Premier League and a huge soccer fan, I will create expectations and questions using my own personal knowledge of the game. They are the following:

1. I’d expect the league to be mostly concentrated by players who do not originate from the UK.
2. I’d expect the ages of players in the league to be normally distributed and centered around an average of 25.
3. Are the market value of players right skewed?
4 Forwards (FW) are generally the most expensive players in the league.
4. Important metrics such goals, assists, and tackles normally distributed?
5 Who were the most important players from each season?

##  [1] "Player"                "Season"               
##  [3] "Born"                  "Age"                  
##  [5] "Squad"                 "Nation"               
##  [7] "Previous.Market.Value" "Market.Value"         
##  [9] "Position"              "App"                  
## [11] "Minutes"               "Goals"                
## [13] "Passes"                "Assists"              
## [15] "Yellow"                "Red"                  
## [17] "SubOn"                 "SubOff"               
## [19] "Shots"                 "SOT"                  
## [21] "HitPost"               "HeadClear"            
## [23] "HeadGoal"              "PKScored"             
## [25] "FKGoal"                "Offsides"             
## [27] "ThrBall"               "Misses"               
## [29] "Corners"               "Crosses"              
## [31] "Blocks"                "Interceptions"        
## [33] "Fouls"                 "Last.man"             
## [35] "Tackles"               "ELG"                  
## [37] "OwnGoal"               "Clears"               
## [39] "ABW"                   "ABL"                  
## [41] "ggratio"

(2) Exploring the data

Demographics


The Premier League is the most popular league in football, attracting players from all over the world. Even though the league is based out of the UK, I would expect most the players to be of foreign descent becuase some of the biggest teams sign foreign talent and global superstars. Visually the break out of pllayers tends to be on average 70% of foreign descent and 30% domestic talent.


The median age of total players is between 25 and 26 years old. Generally professional players usually hit their prime around 26 and retire by 35. It is rare to see players over 35 playing consistently ever week as a starter, let alone someone who is a teenager and new to the league. We see this is consistent across the 4 seasons, so that must mean that teams are always bringing in new talent to reinforce the teams.

##      Season Median Age
## 1 2015-2016   26.13564
## 2 2016-2017   26.31421
## 3 2017-2018   26.49861
## 4 2018-2019   25.93460

Market Values


As expected, the market values are right skewed. Most players not worth more than 220k (euros). This could be due to the fact that a little more than half the teams are worth less than 10 million euros.

##                        Team Avg_Team_Value
## 1           AFC Bournemouth       48.71284
## 3               Aston Villa       54.75000
## 26     West Bromwich Albion       54.93617
## 5                   Burnley       55.51000
## 22               Sunderland       56.18182
## 6              Cardiff City       56.47368
## 4  Brighton and Hove Albion       59.75143
## 11        Huddersfield Town       59.86842
## 25                  Watford       68.69178
## 21               Stoke City       69.21071
## 23             Swansea City       71.88898
## 19             Norwich City       72.97500
## 18         Newcastle United       79.54182
## 12                Hull City       82.70000
## 8            Crystal Palace       93.41901
## 17            Middlesbrough       94.08333
## 27          West Ham United       94.87179
## 10                   Fulham       96.05000
## 28                   Wolves       98.91250
## 13           Leicester City      102.88406
## 20              Southampton      106.29795
## 9                   Everton      119.92123
## 2                   Arsenal      215.76351
## 14                Liverpool      223.58219
## 16        Manchester United      223.76299
## 24                Tottenham      235.36957
## 7                   Chelsea      266.63571
## 15          Manchester City      289.52078
## [1] "Values in 100,000s (Euros)"


As we see above the average market value for defenders (DF) and forwards (FW) steadily increased through the years. For some reason the average value of midfielders (MF) drastically dropped in the 2018-19 season. Though history for these past 4 seasons does not prove that forwards are always the clear cut most expensive position in the league, it is generally MF or FW that on average more expensive than the other two positions.

Analyzing Player Stats

Players from each position are judged on different metrics such as tackles, passes, goals, etc. We will take a look at metrics specific to each position and analyze my hypothesis. ### Forwards
Forwards are typically judged on the amount of goals they score. Judging by the distribution of goals scored, we see that the data is heavily skewed to the right. It might be unfair to judge forwards solely on goal figures since certain forwards play more games than others. A popular metric to judge a forward’s efficiency in front of goal is the goal to game ratio. After reviewing the histograms, it appears that these stats don’t necessarily follow a specific type of distribution. The max ratio in the league tends to fall under 80%. During the 2016-2017 and 2017-2018 there were players who hit an average greater than 80%. We see that these players (3) have been consistently hit this mark during the past seasons.

forwards<- data.frame(matrix(ncol = 0, nrow = 0))
for ( i in 1:nrow(data))
{ if ( data$ggratio[i] >=80) print(data$Player[i]) }
## [1] Harry Kane
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic
## [1] Harry Kane
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic
## [1] Mohamed Salah
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic
## [1] Sergio Agüero
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic
## [1] Sergio Agüero
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic

Midfielders

The number of midfielder’s assists carry the same weight as goals scored by forwards. By using the 2015-16 season as a preliminary analysis, we see that the data is extremely right-skewed. Perhaps it would make sense to judge midfielders based on the number of passes they complete. I would still expect the data to be skewed to the right but not as much compared to assists.
We see that during the 2015-2016 and 2016-2017 seasons, the data was distributed a bit more evenly compared to the 2017-18 and 2018-19 season. During the 2017-18 and 2018-19, there were midfielders who completed more than 3,000 passes. Noticed as we extracted players with more than 3,000 passes a couple defenders appear in our result. So it appears that even defensive players can put up passing figures similar to midfielders.

midfielders<- which(season18$Passes >3000)
season18$Player[midfielders]
## [1] Granit Xhaka     Nicolás Otamendi
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic
midfielders<- which(season19$Passes >3000)
season19$Player[midfielders]
## [1] Jorginho        Virgil van Dijk
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic

Defenders

Defenders are typically judged on total number of interceptions, tackles, and clearances they make. I will analyze tackles and interceptions. Given that tackles and interceptions occur more frequently than goals or assists, I would expect the distribution of these stasts to resemble a uniform distribution.

## [1] Wilfred Ndidi
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic
## [1] Aaron Wan-Bissaka Idrissa Gueye     Wilfred Ndidi    
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic

## [1] Idrissa Gueye     Laurent Koscielny N'Golo Kanté     
## 711 Levels: Aaron Cresswell Aaron Lennon Aaron Mooy ... Zlatan Ibrahimovic


The data related to tackles appears to be right skewed at a larger rate compared to data related to interceptions. Wilfred Ndidi consistently impressed during the 2017-18 and 2018-19 season, ranking up a large number of tackles. Interceptions appear to be distributed at a even rate but it is still apparent that most players will struggle to complete more than 80 interceptions during the season. Idriss Gueye (DF) was the only player to complete more than 120 interceptions during all 4 seasons. ### The Leagues Most Important Players Now that we’ve analyzed the different metrics used to judge players. Let’s extract the most important players from the last 4 seasons.

##            Most Goals        Player Most Assists          Player       MVP
## season1516         25    Harry Kane           19      Mesut Özil  88000000
## season1617         29    Harry Kane           18 Kevin De Bruyne 165000000
## season1718         32 Mohamed Salah           16 Kevin De Bruyne 143000000
## season1819         22 Mohamed Salah           15     Eden Hazard 165000000
##                     Player
## season1516    N'Golo Kanté
## season1617 Kevin De Bruyne
## season1718 Kevin De Bruyne
## season1819   Mohamed Salah

It appears Harry Kane and Mohamed Salah have been going head to head in the goal charts. Despite this, neither have won the league. Mesut Ozil had an impressive run in 2015-16 with 19 assists. As far as the MVP of the league, N’Golo Kante was the league’s most valuable player after winning the league in 2015-16, but his value was overtaken by Kevin De Bruyne and Mohamed Salah in the following seasons.