title: “EPL ANALYTICS” author: “jackson” date: “08/07/2019” output: html_document
Let load Data
Data<-read.csv("EplData.csv",header=TRUE)
str(Data)
## 'data.frame': 1520 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ HomeTeam: Factor w/ 28 levels "Arsenal","Aston Villa",..: 3 7 9 13 16 19 1 18 21 26 ...
## $ AwayTeam: Factor w/ 28 levels "Arsenal","Aston Villa",..: 2 23 25 22 24 8 27 20 14 15 ...
## $ FTHG : int 0 2 2 4 1 1 0 2 0 0 ...
## $ FTAG : int 1 2 2 2 0 3 2 2 1 3 ...
## $ FTR : Factor w/ 3 levels "A","D","H": 1 2 2 3 3 1 1 2 1 1 ...
## $ HTHG : int 0 2 0 3 1 0 0 1 0 0 ...
## $ HTAG : int 0 1 1 0 0 1 1 1 0 2 ...
## $ HTR : Factor w/ 3 levels "A","D","H": 2 3 1 3 3 1 1 2 2 1 ...
## $ HS : int 11 11 10 19 9 17 22 9 7 9 ...
## $ AS : int 7 18 11 10 9 11 8 15 8 19 ...
## $ HST : int 2 3 5 8 1 6 6 4 1 2 ...
## $ AST : int 3 10 5 5 4 7 4 5 3 7 ...
## $ PSH : num 1.95 1.39 1.7 1.99 1.65 2.52 1.31 2.88 3.48 5.75 ...
## $ PSA : num 4.27 10.39 5.62 4.34 5.9 ...
## $ PSD : num 3.65 4.92 3.95 3.48 4.09 3.35 5.75 3.33 3.46 3.98 ...
What is % of Home,Away and Draw in EPL last 4 seasons?
table(Data$FTR)
##
## A D H
## 461 361 698
Home Win Rate.
Home<-c(698*100/1520)
print(Home)
## [1] 45.92105
Away Win Rate.
Away<-c(461*100/1520)
print(Away)
## [1] 30.32895
Draw Rate.
Draw<-c(361*100/1520)
print(Draw)
## [1] 23.75
How many goals does home and away teams score
sum(Data$FTHG)
## [1] 2352
sum(Data$FTAG)
## [1] 1828
Epl home field advantage in last 4 season
HomeAdv<-c(2352/1828)
print(HomeAdv)
## [1] 1.286652
Now let determin if their is Anchoring bias in bookmakers opening price using pinnacle odds for last 4 Epl seasons Data.
Openig odds probability
The inverse of decimal odds gives the probability of an event happening.therefore ,we take the opening odds in their raw form,invert them and take that as our prediction probabilities.The outcome with the higest event then forms our prediction of the match.
However,this is a thing called over-round,the probabilities for all the outcome won’t add up to 1,instead it will be slightly over,therefore a theoretical outcome probability needs to be calculated by adjusting by this over-round.Therefore two prediction are made;a raw prediction,straight from the odds and theoretical,adjusting for the over-round.
##Load R Packages To Manupulate Data
library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.0 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
library(dplyr)
RawProbs<-select(Data,FTR,PSH,PSA,PSD)%>%mutate(HomeProbs=1/PSH,AwayProbs=1/PSA,DrawProbs=1/PSD,OverRund=HomeProbs+AwayProbs+DrawProbs,HomeTheo=HomeProbs/OverRund,AwayTheo=AwayProbs/OverRund,DrawTheo=DrawProbs/OverRund)
str(RawProbs)
## 'data.frame': 1520 obs. of 11 variables:
## $ FTR : Factor w/ 3 levels "A","D","H": 1 2 2 3 3 1 1 2 1 1 ...
## $ PSH : num 1.95 1.39 1.7 1.99 1.65 2.52 1.31 2.88 3.48 5.75 ...
## $ PSA : num 4.27 10.39 5.62 4.34 5.9 ...
## $ PSD : num 3.65 4.92 3.95 3.48 4.09 3.35 5.75 3.33 3.46 3.98 ...
## $ HomeProbs: num 0.513 0.719 0.588 0.503 0.606 ...
## $ AwayProbs: num 0.2342 0.0962 0.1779 0.2304 0.1695 ...
## $ DrawProbs: num 0.274 0.203 0.253 0.287 0.244 ...
## $ OverRund : num 1.02 1.02 1.02 1.02 1.02 ...
## $ HomeTheo : num 0.502 0.706 0.577 0.493 0.594 ...
## $ AwayTheo : num 0.2294 0.0945 0.1746 0.2258 0.1662 ...
## $ DrawTheo : num 0.268 0.199 0.248 0.282 0.24 ...
RawProbs%>%select(HomeProbs,AwayProbs,DrawProbs)%>%apply(.,1,which.max)->RawPreds
RawPreds is number of Home and Away as Favorite as per Pinnacle Opening Odds with Home being in bookies favor 1036 times against 484 Away side.
table(RawPreds)
## RawPreds
## 1 2
## 1036 484
While true results = 698,461,361 Home,Away wins and Draw
table(RawProbs$FTR)
##
## A D H
## 461 361 698
Let find out how many times did Pinnacle opening odds correctly predicted Home win
H_Pred<-filter(RawProbs,FTR=="H" & HomeProbs > AwayProbs)
str(H_Pred)
## 'data.frame': 577 obs. of 11 variables:
## $ FTR : Factor w/ 3 levels "A","D","H": 3 3 3 3 3 3 3 3 3 3 ...
## $ PSH : num 1.99 1.65 1.94 2.08 1.44 1.85 1.19 2.05 1.28 1.94 ...
## $ PSA : num 4.34 5.9 4.3 3.87 8.6 ...
## $ PSD : num 3.48 4.09 3.66 3.56 4.74 3.63 8 3.54 6.12 3.49 ...
## $ HomeProbs: num 0.503 0.606 0.515 0.481 0.694 ...
## $ AwayProbs: num 0.23 0.169 0.233 0.258 0.116 ...
## $ DrawProbs: num 0.287 0.244 0.273 0.281 0.211 ...
## $ OverRund : num 1.02 1.02 1.02 1.02 1.02 ...
## $ HomeTheo : num 0.493 0.594 0.505 0.471 0.68 ...
## $ AwayTheo : num 0.226 0.166 0.228 0.253 0.114 ...
## $ DrawTheo : num 0.282 0.24 0.268 0.275 0.206 ...
Pinnacle opening odds predicted 577 Home wins correctly out of 1036 times that Home Teams received highest odds probability.
Accuracy<-c(577*100/1036)
print(Accuracy)
## [1] 55.69498
Let find out how many times did Pinnacle opening odds correctly predicted Away win
A_Pred<-filter(RawProbs,FTR=="A" & AwayProbs > HomeProbs)
str(A_Pred)
## 'data.frame': 265 obs. of 11 variables:
## $ FTR : Factor w/ 3 levels "A","D","H": 1 1 1 1 1 1 1 1 1 1 ...
## $ PSH : num 3.48 5.75 5.66 5.34 4.85 6.25 6.13 5.75 4.32 3.22 ...
## $ PSA : num 2.25 1.68 1.72 1.72 1.76 1.65 1.61 1.65 1.94 2.46 ...
## $ PSD : num 3.46 3.98 3.83 3.97 4.08 3.93 4.21 4.16 3.65 3.3 ...
## $ HomeProbs: num 0.287 0.174 0.177 0.187 0.206 ...
## $ AwayProbs: num 0.444 0.595 0.581 0.581 0.568 ...
## $ DrawProbs: num 0.289 0.251 0.261 0.252 0.245 ...
## $ OverRund : num 1.02 1.02 1.02 1.02 1.02 ...
## $ HomeTheo : num 0.281 0.17 0.173 0.183 0.202 ...
## $ AwayTheo : num 0.435 0.583 0.57 0.57 0.557 ...
## $ DrawTheo : num 0.283 0.246 0.256 0.247 0.24 ...
Pinnacle opening odds predicted 265 Away wins correctly out of 484 times that Away Teams received highest odds probability.
Accuracy<-c(265*100/484)
print(Accuracy)
## [1] 54.75207
Let now find out accuracy rate of pinnacle opening odds model.
Accuracy_R<-c(842*100/1520)
print(Accuracy_R)
## [1] 55.39474
Home Profit
H_Pred%>%select(HomeProbs)%>%mutate(H_Profit=100*HomeProbs)->HP
Profit<-sum(HP$H_Profit)
print(Profit)
## [1] 35150.65
Home Loss = to all the times Home Team was favorite against level stake.
loss<-c(1036*100)
print(loss)
## [1] 103600
Home P/L
P_L<-c(Profit-loss)
print(P_L)
## [1] -68449.35
Away Profit
A_Pred%>%select(AwayProbs)%>%mutate(A_Profit=100*AwayProbs)->AP
Away_Profit<-sum(AP$A_Profit)
print(Away_Profit)
## [1] 15245.37
Away Loss = to all the times Away Team was favorite against level stake.
A_loss<-c(484*100)
print(A_loss)
## [1] 48400
Away P/L
AwayP_L<-c(Away_Profit-A_loss)
print(AwayP_L)
## [1] -33154.63
Conlusion
The best Pinnacle opening odds model is 55.39474% correct in predicting EPL matches in last 4 season including 2018-2019 season.
While you can not make any profit in the long run by betting only the favorite as evideced by -68449.35 P/L on Home and -33154.63 P/L Away Favorite
Hence Bookmakers odds are not effecient