EMAIL: chawlarahul1997@gmail.com

COLLEGE / COMPANY: Maharaja Agrasen Institute Of Technology

1.Introduction

The Premier League is the most-watched sports league in the world, broadcast in 212 territories to 643 million homes and a potential TV audience of 4.7 billion people.In the 2014-15 season, the average Premier League match attendance exceeded 36,000,second highest of any professional football league behind the Bundesliga’s 43,500.Most stadium occupancies are near capacity.The Premier League ranks third in the UEFA coefficients of leagues based on performances in European competitions over the past five seasons.

There are 20 clubs in the Premier League. During the course of a season (from August to May) each club plays the others twice (a double round-robin system), once at their home stadium and once at that of their opponents’, for a total of 38 games. Teams receive three points for a win and one point for a draw. No points are awarded for a loss. Teams are ranked by total points, then goal difference, and then goals scored. If still equal, teams are deemed to occupy the same position. If there is a tie for the championship, for relegation, or for qualification to other competitions, a play-off match at a neutral venue decides rank.The three lowest placed teams are relegated into the EFL Championship, and the top two teams from the Championship, together with the winner of play-offs involving the third to sixth placed Championship clubs, are promoted in their place.

2.Overview of Study

Our field of study concerns the market value of the players of English Premier League(EPL)-The competition was formed as the FA Premier League on 20 February 1992 following the decision of clubs in the Football League First Division to break away from the Football League, founded in 1888, and take advantage of a lucrative television rights deal.The deal was worth £1 billion a year domestically as of 2013-14, with BSkyB and BT Group securing the domestic rights to broadcast 116 and 38 games respectively.The league generates ???2.2 billion per year in domestic and international television rights.In 2014-15, teams were apportioned revenues of £1,600 million,rising sharply to £2,400 million in 2016-17.

The Premier League is the most-watched football league in the world, broadcast in 212 territories to 643 million homes and a potential TV audience of 4.7 billion people. The Premier League’s production arm, Premier League Productions, is operated by IMG Productions and produces all content for its international television partners.

The Premier League is particularly popular in Asia, where it is the most widely distributed sports programme.In Australia, Optus telecommunications holds exclusive rights to the Premier League, providing live broadcasts and online access (Fox Sports formerly held rights).In India, the matches are broadcast live on STAR Sports. In China, the broadcast rights were awarded to Super Sports in a six-year agreement that began in the 2013-14 season.As of the 2013-14 season, Canadian broadcast rights to the Premier League are jointly owned by Sportsnet and TSN, with both rival networks holding rights to 190 matches per season.

3.The English Premier League whose field is limited to clubs from England

3.1 Overview

The specific objective of this Study was to investigate the Market Values variation of different players and what things they depend on/what are the factors affecting them.Our goal was to compare market value of the players.The rationale behind this is summarized next.The Premier League is the top level of the English football league system. Contested by twenty clubs, it operates on a system of promotion and relegation with the English Football League (EFL).

The Premier League is a corporation in which the member clubs act as shareholders. Seasons run from August to May with each team playing 38 matches (playing each other home and away).Most games are played on Saturday and Sunday afternoons. It is known outside the UK as the English Premier League (EPL).

3.2 Data

For the study we taken up the data from kaggle(https://www.kaggle.com/mauryashubham/english-premier-league-players-dataset). We have the following variables in the Dataset:

–name: Name of the player

–club: Club of the player

–age : Age of the player

–position : The usual position on the pitch

–position_cat :

1 for attackers

2 for midfielders

3 for defenders

4 for goalkeepers

–market_value : As on transfermrkt.com on July 20th, 2017

–page_views : Average daily Wikipedia page views from September 1, 2016 to May 1, 2017

–fpl_value : Value in Fantasy Premier League as on July 20th, 2017

–fpl_sel : % of FPL players who have selected that player in their team

–fpl_points : FPL points accumulated over the previous season

–region:

1 for England

2 for EU

3 for Americas

4 for Rest of World

–nationality

–new_foreign : Whether a new signing from a different league, for 2017/18 (till 20th July)

–age_cat

–club_id

–big_club: Whether one of the Top 6 clubs

–new_signing: Whether a new signing for 2017/18 (till 20th July)

3.3 Model

In order to test Hypothesis, we proposed the following model:

Model=a0+a1Age +a2Page_views +a3New_foreign +a4Big_club +a5New_signing +E

#Reading the dataset
epl<-read.csv(paste("epldata_final.csv",sep=" "))
attach(epl)
fit1<-lm(market_value~age+page_views+new_foreign+big_club+new_signing,data=epl)
summary(fit1)
## 
## Call:
## lm(formula = market_value ~ age + page_views + new_foreign + 
##     big_club + new_signing, data = epl)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -46.766  -3.238  -0.530   2.896  37.532 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.6328483  2.5026212   2.650 0.008321 ** 
## age         -0.1619876  0.0899644  -1.801 0.072432 .  
## page_views   0.0078490  0.0004401  17.833  < 2e-16 ***
## new_foreign  6.8644448  1.9458643   3.528 0.000462 ***
## big_club     7.3710543  0.8912677   8.270 1.49e-15 ***
## new_signing  1.7176598  1.0067450   1.706 0.088662 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.506 on 455 degrees of freedom
## Multiple R-squared:  0.6291, Adjusted R-squared:  0.625 
## F-statistic: 154.4 on 5 and 455 DF,  p-value: < 2.2e-16

We established the effect of Age,Page views,New foreign,Big Club,New Signing on the Market Value of a player with the simplest model. We carried out a linear regression, on the specified fields ,against the Market Value.

3.4 Results

By this we can clearly see that p<0.05 in 3 variables namely-page_views,new_foreign,big_club which makes it clear that market_value and these variables are dependant.The regression analysis also yielded that the age of a player is independant of his market_value.And as we can see that a5 is the heighest , therefore amongst these variables,playing for a big_club do really increase your market_value.

4.Conclusion

This paper was motivated by the need for research that could improve our understanding of what is really affecting the market price of players playing in the English Premier League(EPL). The unique contribution of this paper is that we investigated the variables that do affect the value of a player.We came to know many insights on how a player is valued high or low.We came to know that a Big Club rally does matter and does affect your market value greatly in the League. So this is how We statistically analysed the Beautiful Game.

5.References

http://rpubs.com/SameerMathur/HTNF

https://www.premierleague.com/

https://en.wikipedia.org/wiki/Premier_League

http://www.goal.com/en-in/premier-league/2kwbbcootiqqgmrzs6o5inle5



Appendix

Reading the Data

epl<-read.csv(paste("epldata_final.csv",sep=" "))
View(epl)

Dimentions of the data set

dim(epl)
## [1] 461  17

There are 461 rows(players to be analysed) and 17 columns (the variables on which the players are analysed).

Summarising

summary(epl)
##                       name                    club          age      
##  Łukasz Fabiański:  1   Arsenal          : 28   Min.   :17.0  
##  Aaron Cresswell        :  1   Everton          : 28   1st Qu.:24.0  
##  Aaron Lennon           :  1   Huddersfield     : 28   Median :27.0  
##  Aaron Mooy             :  1   Liverpool        : 27   Mean   :26.8  
##  Aaron Ramsey           :  1   Manchester+United: 25   3rd Qu.:30.0  
##  Abdoulaye Doucoure     :  1   Swansea          : 25   Max.   :38.0  
##  (Other)                :455   (Other)          :300                 
##     position    position_cat   market_value     page_views    
##  CB     : 85   Min.   :1.00   Min.   : 0.05   Min.   :   3.0  
##  CM     : 63   1st Qu.:1.00   1st Qu.: 3.00   1st Qu.: 220.0  
##  CF     : 61   Median :2.00   Median : 7.00   Median : 460.0  
##  GK     : 42   Mean   :2.18   Mean   :11.01   Mean   : 763.8  
##  DM     : 36   3rd Qu.:3.00   3rd Qu.:15.00   3rd Qu.: 896.0  
##  LW     : 36   Max.   :4.00   Max.   :75.00   Max.   :7664.0  
##  (Other):138                                                  
##    fpl_value         fpl_sel      fpl_points         region     
##  Min.   : 4.000   0.10%  : 64   Min.   :  0.00   Min.   :1.000  
##  1st Qu.: 4.500   0.20%  : 41   1st Qu.:  5.00   1st Qu.:1.000  
##  Median : 5.000   0.40%  : 30   Median : 51.00   Median :2.000  
##  Mean   : 5.448   0.30%  : 25   Mean   : 57.31   Mean   :1.993  
##  3rd Qu.: 5.500   0.60%  : 15   3rd Qu.: 94.00   3rd Qu.:2.000  
##  Max.   :12.500   0.00%  : 14   Max.   :264.00   Max.   :4.000  
##                   (Other):272                    NA's   :1      
##       nationality   new_foreign         age_cat         club_id     
##  England    :156   Min.   :0.00000   Min.   :1.000   Min.   : 1.00  
##  Spain      : 28   1st Qu.:0.00000   1st Qu.:2.000   1st Qu.: 6.00  
##  France     : 25   Median :0.00000   Median :3.000   Median :10.00  
##  Netherlands: 20   Mean   :0.03471   Mean   :3.206   Mean   :10.33  
##  Belgium    : 18   3rd Qu.:0.00000   3rd Qu.:4.000   3rd Qu.:15.00  
##  Argentina  : 17   Max.   :1.00000   Max.   :6.000   Max.   :20.00  
##  (Other)    :197                                                    
##     big_club       new_signing    
##  Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000  
##  Mean   :0.3037   Mean   :0.1453  
##  3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000  
## 

Describing

library(psych)
describe(epl)[,c(1:5)]
##              vars   n   mean     sd median
## name*           1 461 231.00 133.22    231
## club*           2 461  10.33   5.73     10
## age             3 461  26.80   3.96     27
## position*       4 461   5.55   3.32      5
## position_cat    5 461   2.18   1.00      2
## market_value    6 461  11.01  12.26      7
## page_views      7 461 763.78 931.81    460
## fpl_value       8 461   5.45   1.35      5
## fpl_sel*        9 461  27.70  32.14     11
## fpl_points     10 461  57.31  53.11     51
## region         11 460   1.99   0.96      2
## nationality*   12 461  28.38  14.86     23
## new_foreign    13 461   0.03   0.18      0
## age_cat        14 461   3.21   1.28      3
## club_id        15 461  10.33   5.73     10
## big_club       16 461   0.30   0.46      0
## new_signing    17 461   0.15   0.35      0
str(epl)
## 'data.frame':    461 obs. of  17 variables:
##  $ name        : Factor w/ 461 levels "Łukasz Fabiański",..: 23 306 352 422 254 163 340 326 403 20 ...
##  $ club        : Factor w/ 20 levels "Arsenal","Bournemouth",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ age         : int  28 28 35 28 31 22 30 31 25 21 ...
##  $ position    : Factor w/ 13 levels "AM","CB","CF",..: 9 1 6 12 2 10 3 7 2 9 ...
##  $ position_cat: int  1 1 4 1 3 3 1 3 3 1 ...
##  $ market_value: num  65 50 7 20 22 30 22 13 30 10 ...
##  $ page_views  : int  4329 4395 1529 2393 912 1675 2230 555 1877 1812 ...
##  $ fpl_value   : num  12 9.5 5.5 7.5 6 6 8.5 5.5 5.5 5.5 ...
##  $ fpl_sel     : Factor w/ 113 levels "0.00%","0.10%",..: 44 93 94 16 8 37 56 83 77 11 ...
##  $ fpl_points  : int  264 167 134 122 121 119 116 115 90 89 ...
##  $ region      : int  3 2 2 1 2 2 2 2 2 4 ...
##  $ nationality : Factor w/ 61 levels "Algeria","Argentina",..: 13 27 19 23 26 52 26 52 27 41 ...
##  $ new_foreign : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ age_cat     : int  4 4 6 4 4 2 4 4 3 1 ...
##  $ club_id     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ big_club    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ new_signing : int  0 0 0 0 0 0 0 0 1 0 ...

There are 4 Factor variables to be analysed namely-club,position,nationality and fpl_sel.

Let us look at the distribution of these variables

Club distribution

mytable1<-with(epl,table(club))
mytable1
## club
##           Arsenal       Bournemouth Brighton+and+Hove           Burnley 
##                28                24                22                18 
##           Chelsea    Crystal+Palace           Everton      Huddersfield 
##                20                21                28                28 
##    Leicester+City         Liverpool   Manchester+City Manchester+United 
##                24                27                20                25 
##  Newcastle+United       Southampton        Stoke+City           Swansea 
##                21                23                22                25 
##         Tottenham           Watford         West+Brom          West+Ham 
##                20                24                19                22

So the heighest number of players is from Arsenal,Everton and Huddersfield (28).

Possition distribution

mytable2<-with(epl,table(position))
mytable2
## position
## AM CB CF CM DM GK LB LM LW RB RM RW SS 
## 17 85 61 63 36 42 35  8 36 34  5 32  7

The heighest number of Center Backs are being analysed.

Nationality distribution

mytable3<-with(epl,table(nationality))
mytable3
## nationality
##             Algeria           Argentina             Armenia 
##                   3                  17                   1 
##           Australia             Austria             Belgium 
##                   4                   4                  18 
##               Benin             Bermuda              Bosnia 
##                   1                   1                   2 
##              Brazil            Cameroon              Canada 
##                  12                   3                   1 
##               Chile            Colombia            Congo DR 
##                   2                   1                   4 
##       Cote d'Ivoire             Croatia             Curacao 
##                   4                   1                   1 
##      Czech Republic             Denmark             Ecuador 
##                   2                   6                   2 
##               Egypt             England             Estonia 
##                   4                 156                   1 
##             Finland              France             Germany 
##                   1                  25                  16 
##               Ghana              Greece             Iceland 
##                   5                   1                   2 
##             Ireland              Israel               Italy 
##                  17                   2                   4 
##             Jamaica               Japan               Kenya 
##                   2                   2                   1 
##                Mali             Morocco         Netherlands 
##                   2                   2                  20 
##         New Zealand             Nigeria    Northern Ireland 
##                   1                   6                   6 
##              Norway              Poland            Portugal 
##                   1                   3                   6 
##             Romania            Scotland             Senegal 
##                   1                  14                   7 
##              Serbia            Slovenia         South Korea 
##                   5                   1                   3 
##               Spain              Sweden         Switzerland 
##                  28                   3                   4 
##          The Gambia Trinidad and Tobago             Tunisia 
##                   1                   1                   1 
##       United States             Uruguay           Venezuela 
##                   2                   1                   1 
##               Wales 
##                  12

As obvious the heeghest number of England players(156) play in EPL.

Now lets look at the distruibution of the variables with the market value.

1.According to clubs

attach(epl)
## The following objects are masked from epl (pos = 4):
## 
##     age, age_cat, big_club, club, club_id, fpl_points, fpl_sel,
##     fpl_value, market_value, name, nationality, new_foreign,
##     new_signing, page_views, position, position_cat, region
mytable5<-aggregate(market_value,list(club),mean)
mytable5
##              Group.1         x
## 1            Arsenal 19.642857
## 2        Bournemouth  4.895833
## 3  Brighton+and+Hove  2.522727
## 4            Burnley  3.958333
## 5            Chelsea 27.677500
## 6     Crystal+Palace  7.726190
## 7            Everton 10.098214
## 8       Huddersfield  1.791071
## 9     Leicester+City  8.645833
## 10         Liverpool 16.314815
## 11   Manchester+City 28.200000
## 12 Manchester+United 20.564000
## 13  Newcastle+United  5.202381
## 14       Southampton 10.000000
## 15        Stoke+City  6.818182
## 16           Swansea  5.560000
## 17         Tottenham 23.000000
## 18           Watford  5.166667
## 19         West+Brom  5.328947
## 20          West+Ham  8.818182

On an average Manchester City will have the players with the most Market Value.

2.According to Position of the player

mytable6<-aggregate(market_value,list(position),mean)
mytable6
##    Group.1         x
## 1       AM 26.161765
## 2       CB  8.972353
## 3       CF 13.823770
## 4       CM 10.960317
## 5       DM 12.347222
## 6       GK  7.234524
## 7       LB  8.300000
## 8       LM  4.000000
## 9       LW 13.293056
## 10      RB  7.742647
## 11      RM 10.600000
## 12      RW 12.195312
## 13      SS 11.357143

The Attacking Midfielders have the most Market Value in EPL.

3.According to the country they belong

mytable7<-aggregate(market_value,list(nationality),mean)
mytable7
##                Group.1         x
## 1              Algeria 22.333333
## 2            Argentina 12.779412
## 3              Armenia 35.000000
## 4            Australia  2.875000
## 5              Austria  4.750000
## 6              Belgium 25.805556
## 7                Benin  5.500000
## 8              Bermuda  5.000000
## 9               Bosnia 11.500000
## 10              Brazil 21.750000
## 11            Cameroon 10.333333
## 12              Canada  3.000000
## 13               Chile 36.500000
## 14            Colombia  7.000000
## 15            Congo DR 10.125000
## 16       Cote d'Ivoire 15.750000
## 17             Croatia 17.000000
## 18             Curacao  2.000000
## 19      Czech Republic  4.250000
## 20             Denmark 10.416667
## 21             Ecuador  7.250000
## 22               Egypt 12.750000
## 23             England  8.328526
## 24             Estonia  3.500000
## 25             Finland  0.250000
## 26              France 17.980000
## 27             Germany 13.781250
## 28               Ghana  8.700000
## 29              Greece  2.000000
## 30             Iceland 13.750000
## 31             Ireland  4.911765
## 32              Israel  1.875000
## 33               Italy 10.500000
## 34             Jamaica  2.000000
## 35               Japan  6.000000
## 36               Kenya 25.000000
## 37                Mali  3.750000
## 38             Morocco  9.750000
## 39         Netherlands  9.575000
## 40         New Zealand 10.000000
## 41             Nigeria 13.000000
## 42    Northern Ireland  4.083333
## 43              Norway  8.000000
## 44              Poland  3.500000
## 45            Portugal 13.191667
## 46             Romania  3.000000
## 47            Scotland  4.196429
## 48             Senegal 14.142857
## 49              Serbia 14.000000
## 50            Slovenia  0.500000
## 51         South Korea 12.833333
## 52               Spain 16.803571
## 53              Sweden  9.500000
## 54         Switzerland 14.375000
## 55          The Gambia  3.000000
## 56 Trinidad and Tobago  0.650000
## 57             Tunisia  1.000000
## 58       United States  4.000000
## 59             Uruguay  3.500000
## 60           Venezuela 15.000000
## 61               Wales  7.708333

Therefore Chile players have the most market value in EPL.

Plots to analyse the data well

Age distribution of players

boxplot(age,horizontal = TRUE,col="light blue",xlab="Age(years)")

Most of the players are from 24 to 30 years.The average being 27.

Club distribution

plot(club,col="light blue")

Position distribution

plot(position,col="light blue")

Plot for the position

hist(position_cat,col="light blue")

We have 150 attackers(1)and 150 defenders(3) with us,with a bit more than 100 midfielders(2) and a bit less than 50 goalkeepers(4).

How many players are from big clubs?

hist(big_club,col="light blue")

So most of them are not from big club.

Variation of market value

boxplot(market_value,col="light blue",main="Box plot for market value distribution")

How many of them are new sighings?

hist(new_signing,col="light blue")

Page views distribution

boxplot(page_views,col="light blue")

Correlation Matrix

Correlation between market value and age

y<-epl[,6]
x<-epl[,3]
cor(x,y)
## [1] -0.1323962

Correlation between market value, Page_views and fpl_value

y<-epl[,6]
x<-epl[,7:8]
cor(x,y)
##                 [,1]
## page_views 0.7396565
## fpl_value  0.7886534

Correlation between market value,new_foreign,age_cat,club_id,big_club and new_signing

x<-epl[,13:17]
cor(x,y)
##                    [,1]
## new_foreign  0.09805600
## age_cat     -0.11768198
## club_id     -0.04606806
## big_club     0.59348296
## new_signing  0.13132060

Corrgram of the data

library(corrgram)
corrgram(epl, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of EPL Variable intercorrelations")

A Scatterplot to know the distribution well

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplotMatrix(formula=~age+market_value+page_views+fpl_points,diagonal="histogram",cex=0.6)

Testing the hypothesis

Null Hypothesis-1:There is no significant difference between Market value and Page views.

t.test(market_value,page_views)
## 
##  Welch Two Sample t-test
## 
## data:  market_value and page_views
## t = -17.344, df = 460.16, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -838.0558 -667.4733
## sample estimates:
## mean of x mean of y 
##  11.01204 763.77657

Now as the p-value<0.05 ,therefore We Reject our Null Hypothesis hence Page views and Market value of a player is significant.

Null Hypothesis-2:There is no significant difference between Market Value and Age

t.test(market_value,age)
## 
##  Welch Two Sample t-test
## 
## data:  market_value and age
## t = -26.323, df = 555.08, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.97121 -14.61425
## sample estimates:
## mean of x mean of y 
##  11.01204  26.80477

Now as the p-value<0.05 ,therefore We Reject our Null Hypothesis hence Age and Market value of a player is significant.

Null Hypothesis-3:There is no significant difference between Market Value and Fantasy premier league value

t.test(market_value,fpl_value)
## 
##  Welch Two Sample t-test
## 
## data:  market_value and fpl_value
## t = 9.6882, df = 471.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  4.435555 6.692644
## sample estimates:
## mean of x mean of y 
## 11.012039  5.447939

Now as the p-value<0.05 ,therefore We Reject our Null Hypothesis hence fpl_value and Market value of a player is significant.

Null Hypothesis-4:There is no significant difference between Market Value and New signing

t.test(market_value,new_signing)
## 
##  Welch Two Sample t-test
## 
## data:  market_value and new_signing
## t = 19.027, df = 460.76, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   9.744379 11.989027
## sample estimates:
##  mean of x  mean of y 
## 11.0120390  0.1453362

Now as the p-value<0.05 ,therefore We Reject our Null Hypothesis hence new signing and Market value of a player is significant.

Thus,this concludes our report.