title: “EPL ANALYTICS” author: “jackson” date: “08/07/2019” output: html_document

Let load Data

Data<-read.csv("EplData.csv",header=TRUE)
str(Data)
## 'data.frame':    1520 obs. of  16 variables:
##  $ X       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ HomeTeam: Factor w/ 28 levels "Arsenal","Aston Villa",..: 3 7 9 13 16 19 1 18 21 26 ...
##  $ AwayTeam: Factor w/ 28 levels "Arsenal","Aston Villa",..: 2 23 25 22 24 8 27 20 14 15 ...
##  $ FTHG    : int  0 2 2 4 1 1 0 2 0 0 ...
##  $ FTAG    : int  1 2 2 2 0 3 2 2 1 3 ...
##  $ FTR     : Factor w/ 3 levels "A","D","H": 1 2 2 3 3 1 1 2 1 1 ...
##  $ HTHG    : int  0 2 0 3 1 0 0 1 0 0 ...
##  $ HTAG    : int  0 1 1 0 0 1 1 1 0 2 ...
##  $ HTR     : Factor w/ 3 levels "A","D","H": 2 3 1 3 3 1 1 2 2 1 ...
##  $ HS      : int  11 11 10 19 9 17 22 9 7 9 ...
##  $ AS      : int  7 18 11 10 9 11 8 15 8 19 ...
##  $ HST     : int  2 3 5 8 1 6 6 4 1 2 ...
##  $ AST     : int  3 10 5 5 4 7 4 5 3 7 ...
##  $ PSH     : num  1.95 1.39 1.7 1.99 1.65 2.52 1.31 2.88 3.48 5.75 ...
##  $ PSA     : num  4.27 10.39 5.62 4.34 5.9 ...
##  $ PSD     : num  3.65 4.92 3.95 3.48 4.09 3.35 5.75 3.33 3.46 3.98 ...

What is % of Home,Away and Draw in EPL last 4 seasons?

table(Data$FTR)
## 
##   A   D   H 
## 461 361 698

Home Win Rate.

Home<-c(698*100/1520)
print(Home)
## [1] 45.92105

Away Win Rate.

Away<-c(461*100/1520)
print(Away)
## [1] 30.32895

Draw Rate.

Draw<-c(361*100/1520)
print(Draw)
## [1] 23.75

How many goals does home and away teams score

sum(Data$FTHG)
## [1] 2352
sum(Data$FTAG)
## [1] 1828

Epl home field advantage in last 4 season

HomeAdv<-c(2352/1828)
print(HomeAdv)
## [1] 1.286652

Now let determin if their is Anchoring bias in bookmakers opening price using pinnacle odds for last 4 Epl seasons Data.

Openig odds probability

The inverse of decimal odds gives the probability of an event happening.therefore ,we take the opening odds in their raw form,invert them and take that as our prediction probabilities.The outcome with the higest event then forms our prediction of the match.

However,this is a thing called over-round,the probabilities for all the outcome won’t add up to 1,instead it will be slightly over,therefore a theoretical outcome probability needs to be calculated by adjusting by this over-round.Therefore two prediction are made;a raw prediction,straight from the odds and theoretical,adjusting for the over-round.

##Load R Packages To Manupulate Data

library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.0     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(dplyr)
RawProbs<-select(Data,FTR,PSH,PSA,PSD)%>%mutate(HomeProbs=1/PSH,AwayProbs=1/PSA,DrawProbs=1/PSD,OverRund=HomeProbs+AwayProbs+DrawProbs,HomeTheo=HomeProbs/OverRund,AwayTheo=AwayProbs/OverRund,DrawTheo=DrawProbs/OverRund)
str(RawProbs)
## 'data.frame':    1520 obs. of  11 variables:
##  $ FTR      : Factor w/ 3 levels "A","D","H": 1 2 2 3 3 1 1 2 1 1 ...
##  $ PSH      : num  1.95 1.39 1.7 1.99 1.65 2.52 1.31 2.88 3.48 5.75 ...
##  $ PSA      : num  4.27 10.39 5.62 4.34 5.9 ...
##  $ PSD      : num  3.65 4.92 3.95 3.48 4.09 3.35 5.75 3.33 3.46 3.98 ...
##  $ HomeProbs: num  0.513 0.719 0.588 0.503 0.606 ...
##  $ AwayProbs: num  0.2342 0.0962 0.1779 0.2304 0.1695 ...
##  $ DrawProbs: num  0.274 0.203 0.253 0.287 0.244 ...
##  $ OverRund : num  1.02 1.02 1.02 1.02 1.02 ...
##  $ HomeTheo : num  0.502 0.706 0.577 0.493 0.594 ...
##  $ AwayTheo : num  0.2294 0.0945 0.1746 0.2258 0.1662 ...
##  $ DrawTheo : num  0.268 0.199 0.248 0.282 0.24 ...
RawProbs%>%select(HomeProbs,AwayProbs,DrawProbs)%>%apply(.,1,which.max)->RawPreds

RawPreds is number of Home and Away as Favorite as per Pinnacle Opening Odds with Home being in bookies favor 1036 times against 484 Away side.

table(RawPreds)
## RawPreds
##    1    2 
## 1036  484

While true results = 698,461,361 Home,Away wins and Draw

table(RawProbs$FTR)
## 
##   A   D   H 
## 461 361 698

Let find out how many times did Pinnacle opening odds correctly predicted Home win

H_Pred<-filter(RawProbs,FTR=="H" & HomeProbs > AwayProbs)
str(H_Pred)
## 'data.frame':    577 obs. of  11 variables:
##  $ FTR      : Factor w/ 3 levels "A","D","H": 3 3 3 3 3 3 3 3 3 3 ...
##  $ PSH      : num  1.99 1.65 1.94 2.08 1.44 1.85 1.19 2.05 1.28 1.94 ...
##  $ PSA      : num  4.34 5.9 4.3 3.87 8.6 ...
##  $ PSD      : num  3.48 4.09 3.66 3.56 4.74 3.63 8 3.54 6.12 3.49 ...
##  $ HomeProbs: num  0.503 0.606 0.515 0.481 0.694 ...
##  $ AwayProbs: num  0.23 0.169 0.233 0.258 0.116 ...
##  $ DrawProbs: num  0.287 0.244 0.273 0.281 0.211 ...
##  $ OverRund : num  1.02 1.02 1.02 1.02 1.02 ...
##  $ HomeTheo : num  0.493 0.594 0.505 0.471 0.68 ...
##  $ AwayTheo : num  0.226 0.166 0.228 0.253 0.114 ...
##  $ DrawTheo : num  0.282 0.24 0.268 0.275 0.206 ...

Pinnacle opening odds predicted 577 Home wins correctly out of 1036 times that Home Teams received highest odds probability.

Accuracy<-c(577*100/1036)
print(Accuracy)
## [1] 55.69498

Let find out how many times did Pinnacle opening odds correctly predicted Away win

A_Pred<-filter(RawProbs,FTR=="A" & AwayProbs > HomeProbs)
str(A_Pred)
## 'data.frame':    265 obs. of  11 variables:
##  $ FTR      : Factor w/ 3 levels "A","D","H": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PSH      : num  3.48 5.75 5.66 5.34 4.85 6.25 6.13 5.75 4.32 3.22 ...
##  $ PSA      : num  2.25 1.68 1.72 1.72 1.76 1.65 1.61 1.65 1.94 2.46 ...
##  $ PSD      : num  3.46 3.98 3.83 3.97 4.08 3.93 4.21 4.16 3.65 3.3 ...
##  $ HomeProbs: num  0.287 0.174 0.177 0.187 0.206 ...
##  $ AwayProbs: num  0.444 0.595 0.581 0.581 0.568 ...
##  $ DrawProbs: num  0.289 0.251 0.261 0.252 0.245 ...
##  $ OverRund : num  1.02 1.02 1.02 1.02 1.02 ...
##  $ HomeTheo : num  0.281 0.17 0.173 0.183 0.202 ...
##  $ AwayTheo : num  0.435 0.583 0.57 0.57 0.557 ...
##  $ DrawTheo : num  0.283 0.246 0.256 0.247 0.24 ...

Pinnacle opening odds predicted 265 Away wins correctly out of 484 times that Away Teams received highest odds probability.

Accuracy<-c(265*100/484)
print(Accuracy)
## [1] 54.75207

Let now find out accuracy rate of pinnacle opening odds model.

Accuracy_R<-c(842*100/1520)
print(Accuracy_R)
## [1] 55.39474

Home Profit

H_Pred%>%select(HomeProbs)%>%mutate(H_Profit=100*HomeProbs)->HP
Profit<-sum(HP$H_Profit)
print(Profit)
## [1] 35150.65

Home Loss = to all the times Home Team was favorite against level stake.

loss<-c(1036*100)
print(loss)
## [1] 103600

Home P/L

P_L<-c(Profit-loss)
print(P_L)
## [1] -68449.35

Away Profit

A_Pred%>%select(AwayProbs)%>%mutate(A_Profit=100*AwayProbs)->AP
Away_Profit<-sum(AP$A_Profit)
print(Away_Profit)
## [1] 15245.37

Away Loss = to all the times Away Team was favorite against level stake.

A_loss<-c(484*100)
print(A_loss)
## [1] 48400

Away P/L

AwayP_L<-c(Away_Profit-A_loss)
print(AwayP_L)
## [1] -33154.63

Conlusion

The best Pinnacle opening odds model is 55.39474% correct in predicting EPL matches in last 4 season including 2018-2019 season.

While you can not make any profit in the long run by betting only the favorite as evideced by -68449.35 P/L on Home and -33154.63 P/L Away Favorite

Hence Bookmakers odds are not effecient