You probably all know about counting cards in Blackjack. Or at least you’ve heard about it. But did you know that you can do something very similar with sports betting? And R can be your accomplice. Of course, you won’t become a millionaire today, but here is a very fun exercise in R anyway.

An unfair bet

Suppose you offer a friend the following bet: You will flip a coin and if it lands on heads, your friend must pay you a dollar. If the coin lands on tails, you must pay a dollar to your friend. As long as the coin is fair, this is of course a fair bet. Had you been a professional bookkeeper, you would have given your friend the following (decimal) odds:

You can work out these odds by taking the inverse of the probability associated with each of the different events of the bet. Here, this is 1/0.5 = 2 for both heads and tails. These odds mean that if your friend wagers one dollar on either heads or tails, he stands to win two dollars if he is correct. If he is incorrect, he loses his bet. Thus, his expected return is equal to 0.5 * 2.0 + 0.5 * 0 = 1. If you were to play this game over and over again, neither of your would make any significant long-run gains or losses, because for every dollar you gain, you lose a dollar in another round.

Now what if you had offered your friend the following odds: * Heads: 1.9 * Tails: 1.9

Your friend’s expected return is now equal to 0.5 * 1.9 + 0.5 * 0 = 0.95. Your own expected return is therefore positive; so you stand to make money in the long run, if you and your friend play this game over and over again. At its core, this is how professional bookmaking works. You, the bookie, try to work out the probability of each event in the bet (i.e., heads or tails in case of the coin toss). Then, you calculate the corresponding odds. Lastly, you slightly reduce these odds to give yourself a built-in “unfair” advantage (or “margin”), so that you make money regardless of how the event turns out. Essentially, this is no different from the zero pocket in a roulette game.

Why sports betting is different

In the above coin-toss example, you and your friend both know the probabilities associated with the two different events. In sports, this is different. Nobody knows exactly (or at least with a very high degree of certainty) how likely it is that one football team wins against another football team. There are simply too many factors that influence the outcome of a football match. So it’s not unlikely, that two bookkeepers disagree on what the “proper” odds should be for a specific game.

Let’s ignore for the moment the possibility of a draw, and assume that they offer you the following odds:

Bookkeeper 1:

Bookkeeper 2:

In the previous section, we had calculated the inverse of known event probabilities to obtain the corresponding odds. Now, we can do the opposite to obtain so-called “implied probabilities” from odds. If we add these implied probabilities, we can see how great the bookkeepers’ margins are.

Thus, Bookkeeper 1 is helping himself to an unfair advantage of 0.03, while Bookkeeper 2 has a margin of 0.04.

Now what happens, if we combine the odds of the two different bookkeepers? Bookkeeper 1 offers the better odds for betting on the away team (2.1 as opposed to 1.7), while Bookkeeper 2 offers the better odds for betting on the home team (2.2 as opposed to 1.8). Combined odds: * Home team win: 2.2 * Away team win: 2.1

The implied probability of this “synthetic bet” is equal to 1/2.2 + 1/2.1 = 0.93. This means that this hypothetical bookkeeper has a negative margin. We can now exploit this fact! Our own margin is positive equal 1 – 0.93 = 0.07. Say we want to bet $100. We will then allocate $100 / 2.2 * 1.07 = $48.64 to a bet on a home team win at Bookkeeper 2 and the remaining $51.36 on a bet on the away team at Bookkeeper 1. Here is what we stand to earn: * If the home team wins: $48.64 * 2.2 = 107.88 * If the away team wins: $51.36 * 2.1 = 107.86 Either way we wind up with more money than before. Thus, when playing the two bookies against each other, we can turn a riskless profit.

Not just a theoretical exercise

At this stage, you are probably thinking: “Come on, you specifically picked the numerical values in the example above to make the math work.” And you are right. I did cherry-pick the numbers. But you’d be surprised to see how often such situations arise in the real-world betting market, too. When researching this topic, I came across a very useful homepage called football-data.co.uk, which collects historical odds offered by different bookkeepers for football matches played in all major European leagues in .csv-file format. A companion file explaining all the different variables can be found here. Let’s have a look at what these data look like for the 2018/2019 Premier League season.

library(tidyverse)

# clear workspace
rm(list = ls())

# import data
df <- read.csv("data/E0.csv")
df <- df %>% select(Date,HomeTeam,AwayTeam,FTR,BbMxH,BbMxD,BbMxA)
head(df)
##         Date     HomeTeam       AwayTeam FTR BbMxH BbMxD BbMxA
## 1 10/08/2018   Man United      Leicester   H  1.60  4.20  8.05
## 2 11/08/2018  Bournemouth        Cardiff   H  1.93  3.71  4.75
## 3 11/08/2018       Fulham Crystal Palace   A  2.60  3.49  3.05
## 4 11/08/2018 Huddersfield        Chelsea   A  6.85  4.07  1.66
## 5 11/08/2018    Newcastle      Tottenham   A  4.01  3.57  2.12
## 6 11/08/2018      Watford       Brighton   H  2.48  3.30  3.42

The variables “BbMxH”, “BbMxD” and “BbMxA” refer to the best odds available for the three possible outcomes of the game: “Home team wins”, “Draw” and “Away team wins”. Let’s start with our analysis.

# compute implied probability
df$ImplProb <- 1/df$BbMxH + 1/df$BbMxD + 1/df$BbMxA
df$Margin <- 1 - df$ImplProb
df$ShouldBet <- ifelse(df$Margin > 0, 1, 0)
head(df %>% select(-BbMxH,-BbMxD,-BbMxA))
##         Date     HomeTeam       AwayTeam FTR  ImplProb        Margin ShouldBet
## 1 10/08/2018   Man United      Leicester   H 0.9873188  0.0126811594         1
## 2 11/08/2018  Bournemouth        Cardiff   H 0.9982028  0.0017971902         1
## 3 11/08/2018       Fulham Crystal Palace   A 0.9990172  0.0009828116         1
## 4 11/08/2018 Huddersfield        Chelsea   A 0.9940953  0.0059047143         1
## 5 11/08/2018    Newcastle      Tottenham   A 1.0011867 -0.0011867166         0
## 6 11/08/2018      Watford       Brighton   H 0.9986538  0.0013462297         1
# visualize implied probabilities
plot(df$ImplProb ~ c(1:nrow(df)), 
     ylab = "Implied probability", 
     xlab = "Season games")
abline(h = 1, col = "red") # break even line

# how many profitable games?
sum(df$ShouldBet)
## [1] 170

Wow! Out of 380 Premier League games, a staggering 170 games would have a offered a riskless betting opportunity to a gambler who plays different bookkeepers against each other.

Let’s make some money

We’ll start our betting business with a very simple strategy. We’ll simply bet $10 on each game that offers a riskless profit.

# place bets (ignoring minor rounding errors)
bet <- 10
df$BetH <- round(ifelse(df$ShouldBet == 1, bet * 1/df$BbMxH * (1+df$Margin), 0), 2)
df$BetD <- round(ifelse(df$ShouldBet == 1, bet * 1/df$BbMxD * (1+df$Margin), 0), 2)
df$BetA <- round(ifelse(df$ShouldBet == 1, bet * 1/df$BbMxA * (1+df$Margin), 0), 2)
head(df %>% select(-BbMxH,-BbMxD,-BbMxA,-Margin,-ShouldBet))
##         Date     HomeTeam       AwayTeam FTR  ImplProb BetH BetD BetA
## 1 10/08/2018   Man United      Leicester   H 0.9873188 6.33 2.41 1.26
## 2 11/08/2018  Bournemouth        Cardiff   H 0.9982028 5.19 2.70 2.11
## 3 11/08/2018       Fulham Crystal Palace   A 0.9990172 3.85 2.87 3.28
## 4 11/08/2018 Huddersfield        Chelsea   A 0.9940953 1.47 2.47 6.06
## 5 11/08/2018    Newcastle      Tottenham   A 1.0011867 0.00 0.00 0.00
## 6 11/08/2018      Watford       Brighton   H 0.9986538 4.04 3.03 2.93
# calculate earnings
df$WinH <- round(ifelse(df$FTR == "H",1,0) * df$BetH * df$BbMxH,2)
df$WinD <- round(ifelse(df$FTR == "D",1,0) * df$BetD * df$BbMxD,2) 
df$WinA <- round(ifelse(df$FTR == "A",1,0) * df$BetA * df$BbMxA,2)
df$WinSum <- df$WinH + df$WinD + df$WinA

# calculate profits
df$Profits <- df$WinSum - df$BetH - df$BetD - df$BetA

# how much money made in total?
sum(df$Profits)
## [1] 11.27

OK, now that seems like a bit of a set-back. We made a mere $11. Bear in mind, however, that we only invested $10 in each bet. What if we were to re-invest our earnings from previous games? In that case, we will profit from the compund interest effect a potentially makea lot more money.

Compound interest

We will start with $10. Of course, we can only invest in one game at a time. To keep it simple, we’ll only bet once a day. Every morning we identify the most profitable game of the day, if any, and invest all our money in that game using the procedure explained above.

# clear workspace
rm(list = ls())

# import data
df <- read.csv("data/E0.csv")
df <- df %>% select(Date,HomeTeam,AwayTeam,FTR,BbMxH,BbMxD,BbMxA)
df$Date <- as.Date(as.character(df$Date),"%d/%m/%Y")

# compute implied probability
df$ImplProb <- 1/df$BbMxH + 1/df$BbMxD + 1/df$BbMxA
df$Margin <- 1 - df$ImplProb
df$ShouldBet <- ifelse(df$Margin > 0, 1, 0)

# remove unprofitable games
df <- df %>% filter(ShouldBet == 1) 

# initial amount of money
initial = 10
money = c(initial)

# loop through each day in the season
dates <- unique(df$Date)
for(day in dates){

  # how much money available by now?
  bet <- money[length(money)]
  
  # select most profitable game of the day
  df_day <- df %>% 
    filter(Date == day) %>%
    filter(ImplProb == min(ImplProb))
  
  # place bet
  df_day$BetH <- round(bet * 1/df_day$BbMxH * (1+df_day$Margin), 2)
  df_day$BetD <- round(bet * 1/df_day$BbMxD * (1+df_day$Margin), 2)
  df_day$BetA <- round(bet * 1/df_day$BbMxA * (1+df_day$Margin), 2)
  
  # calculate earnings
  df_day$WinH <- round(ifelse(df_day$FTR == "H",1,0) * df_day$BetH * df_day$BbMxH,2)
  df_day$WinD <- round(ifelse(df_day$FTR == "D",1,0) * df_day$BetD * df_day$BbMxD,2) 
  df_day$WinA <- round(ifelse(df_day$FTR == "A",1,0) * df_day$BetA * df_day$BbMxA,2)
  df_day$WinSum <- df_day$WinH + df_day$WinD + df_day$WinA

  # update money
  money <- c(money,df_day$WinSum)
  
  
} # end loop

# plot money over time
plot(money[-1] ~ unique(df$Date), type="l", ylab="Money", xlab="Time")

We are still far away from becoming millionaires here. However, within just a single season we more than doubled our money. That’s more than a 100% return! And each day we only worked about 10 minutes in which we quickly checked whether any games offered a riskless profit. Nonetheless, we want to make more money. Of course, we could bet multiple times a day, especially if the games are not scheduled at the same time. Unfortunately, the data from football-data.co.uk only show the dates not the actual kick-off times of the games. However, so far we were only betting on Premier League games … .

Why only England?

As I said earlier, we have data on all major European leagues. So let’s repeat our little exercise for nine different leagues. This should give us a lot more profitable betting opportunities and the compound interest effect will now skyrocket our earnings!

rm(list = ls())

# import first .csv file
df <- read.csv("data/D1.csv")

# loop through and rbind the other .csv files
for(i in c("D2","E0","E1","E2","E3","SP1","SP2","I1")){
  
  # load csv
  path <- paste("data/",i,".csv",sep="")
  df_new <- read.csv(path)
  # drop "Referee" columns (not part of all datasets)
  if(ncol(df_new)==62) df_new <- df_new[,-which(names(df_new)=="Referee")]
  df <- rbind(df,df_new)
}
rm(i,path,df_new)
df <- df %>% select(Date,HomeTeam,AwayTeam,FTR,BbMxH,BbMxD,BbMxA)
df$Date <- as.Date(as.character(df$Date),"%d/%m/%Y")

# sort data by date
df <- df[order(df$Date),]

# compute implied probability
df$ImplProb <- 1/df$BbMxH + 1/df$BbMxD + 1/df$BbMxA
df$Margin <- 1 - df$ImplProb
df$ShouldBet <- ifelse(df$Margin > 0, 1, 0)

# remove unprofitable games
df <- df %>% filter(ShouldBet == 1) 
nrow(df)
## [1] 936

Instead of 170 profitable games, we now have 936 proftiable games. Well, let’s re-run our earlier code and have a look at the results:

## [1] 1141.2

We just a realized a whopping 1141.2 % return! Our money pile has grown by factor of 10 within just ten months. And let me stress it one more time: We only placed one bet per day. In real life, we could easily on multiple (non-overlapping) games per day. And we could of course scan even more leagues for profitable bets. Unfortunately, this now brings me to the question of why neither you nor I will soon become a millionaire.

Just like a casino

Many of you will have seen movies about card counters at the Blackjack tables in Las Vegas. Although card counting is legal, casinos don’t appreciate you doing it. If you get caught, they will kick you of their casino. Something similar is at play with sports betting. For our strategy to work, we would have to have online betting accounts at dozens of different bookkeepers. Of course, with every day we play the game, we make more money. But not at every bookkeeper. To some bookkeepers it will appear as if we were losing more and more money. Eventually they might have to close our accounts for legal stipulations aimed at preventing gambling addicts from losing all their money. Moreover, the bookies will also notice that we are betting fairly unusual amounts of money like $153.27. At some point they will probably suspect that we are not the regular gambler but either trying to run an arbitrage scheme, which we are, or they might even even fear that we are trying to launder money. Either way, in the end they will probably kick us out, just like card counters get kicked out of casinos.

 

 

Disclaimer

Gambling can be additive. Dont’t fool yourself into “having a system” just because you’re throwing around with data in R.