Introduction: It’s a Friday afternoon and I just cannot wait for this Sunday’s slate of NFL games. Out of anticipation I pull up the matchups on my phone and take a look at all the spreads. As always, 3 to 5 games jump out at me. “This line is a total joke” I think to myself as a slap down $10 on each of my locks this week. We fast forward to Sunday evening when my 5 locks have turned into a disgusting 1-4 record on the day. The realization sets in. I am a square.
This experience (that definitely only happened once) inspired a series of thoughts, questions, and hypothesis about betting NFL games.
We first load in our NFL data dating back to the 2012-2013 season. I simply join the tables together to produce the following result:
df4
## # A tibble: 4,810 x 13
## Date Rot VH Team `1st` `2nd` `3rd` `4th` Final Open Close ML `2H`
## <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr>
## 1 905 451 V Dall~ 0 7 10 7 24 46.5 46 170 21.5
## 2 905 452 H NYGi~ 0 3 7 7 17 3 4 -200 1.5
## 3 909 453 V Indi~ 7 7 0 7 21 41 42 385 21
## 4 909 454 H Chic~ 7 17 10 7 41 10 9.5 -485 3
## 5 909 455 V Phil~ 0 10 0 7 17 8.5 9 -450 3
## 6 909 456 H Clev~ 3 0 3 10 16 41.5 42.5 350 20.5
## 7 909 457 V Buff~ 0 7 7 14 28 42.5 39.5 130 0.5
## 8 909 458 H NYJe~ 7 20 14 7 48 4.5 2.5 -150 18
## 9 909 459 V Wash~ 10 10 10 10 40 50.5 50 325 26.5
## 10 909 460 H NewO~ 7 7 3 15 32 9.5 8.5 -400 6
## # ... with 4,800 more rows
The data is an absolute mess so the next few steps are dedicated to cleaning. The variables that I am most interested in are the opening and closing lines and the final score.
##make new columns for outright winner, spread winner open, spread winner close,line movement, open margin, close margin, over/under...,
df4=df4 %>% mutate(spread_results = home_final-away_final)
df4=df4 %>% mutate(total_results = away_final+home_final)
df4$home=as.character(df4$home)
df4$away=as.character(df4$away)
spread_winner_open=list()
for (i in 1:length(df4$date)){
if (df4$away_final[i]+df4$spread_open[i]>df4$home_final[i])
{
spread_winner_open[i]<-df4$away[i]
}
else if (df4$away_final[i]+df4$spread_open[i]<df4$home_final[i])
{
spread_winner_open[i]<-df4$home[i]
}
else
{
spread_winner_open[i]<-"push"
}
}
df4$spread_winner_open=spread_winner_open
spread_winner_close=list()
for (i in 1:length(df4$date)){
if (df4$away_final[i]+df4$spread_close[i]>df4$home_final[i])
{
spread_winner_close[i]<-df4$away[i]
}
else if (df4$away_final[i]+df4$spread_close[i]<df4$home_final[i])
{
spread_winner_close[i]<-df4$home[i]
}
else
{
spread_winner_close[i]<-"push"
}
}
df4$spread_winner_close=spread_winner_close
over_under=list()
for (i in 1:length(df4$date)){
if (df4$total_results[i]>df4$total_close[i])
{
over_under[i]<-"over"
}
else if (df4$total_results[i]<df4$total_close[i])
{
over_under[i]<-"under"
}
else
{
over_under[i]<-"push"
}
}
df4$over_under=over_under
df4$line_movement= abs(df4$spread_open-df4$spread_close)
for (i in 1:length(df4$date)){
if (df4$spread_open[i]>df4$spread_close[i])
{
df4$line_movement[i]<-df4$line_movement[i]*-1
}
}
The next variable that I will create is crucial. It is called “direction” and it describes if the direction of the line moved towards the winner or towards the loser. For example, the Packers play the Bears and the line opens at Packers -2.5. The line closes at Packers -3.5 and the Packers win the game and cover the spread. We would say the line moved “towards the loser” because the Bears were getting a friendlier spread at the close but were the losing bet. In this same example if the Bears covered the 3.5 point closing spread, then the result would be “towards the winner”. Similarly, if I use the phrasing “bet against the line movement” I would be talking about a bet on the Packers while “with the line movement” would be a bet on the Bears.
#towards winner towards loser
direction<-list()
for (i in 1:length(df4$date)){
if (df4$line_movement[i]>0)
{
if (df4$away[i]==df4$spread_winner_close[i])
{
direction[i]<-"towards loser"
}
else
{
direction[i]<-"towards winner"
}
}
else if (df4$line_movement[i]<0)
{
if (df4$away[i]==df4$spread_winner_close[i])
{
direction[i]<-"towards winner"
}
else
{
direction[i]<-"towards loser"
}
}
else
{
direction[i]<-"none"
}
}
df4$direction<-direction
df4$total_movement<-df4$total_close-df4$total_open
total_direction<-list()
for (i in 1:length(df4$date)){
if (df4$total_movement[i]>0)
{
if (df4$over_under=="under")
{
total_direction[i]<-"towards loser"
}
else
{
total_direction[i]<-"towards winner"
}
}
else if (df4$total_movement[i]<0)
{
if (df4$over_under=="under")
{
total_direction[i]<-"towards winner"
}
else
{
total_direction[i]<-"towards loser"
}
}
else
{
total_direction[i]<-"none"
}
}
Before I analyzed the movements in the spreads, I looked for some more obvious patterns to see if Vegas had any glaring patterns. I first checked out if home or away teams win against the spread more often and if the over or the under hits more.
a=sum(df4$spread_winner_close==df4$home)
b=sum(df4$spread_winner_close==df4$away)
a
## [1] 1121
b
## [1] 1184
a/(a+b)
## [1] 0.4863341
In 2,305 games since 2012, the home team has won ATS 1,121 times and the away team 1,184. Away teams win more often, but after doing a proportion test it is clear that the difference is not statistically significant with a p-value of .095. This means that the difference of 65 games is due to variance and away teams are not more likely to win.
a=sum(df4$over_under=="under")
b=sum(df4$over_under=="over")
a
## [1] 1198
b
## [1] 1156
a/(a+b)
## [1] 0.508921
A very similar conclusion can be made for the over/under. The difference is not statistically significant making the over/under a 50/50 bet. I examined numerous other splits and intersections of splits such as spreads over 7, home teams that are favored, game totals over 50. Nothing was statistically significant! In fact, most of the numbers were impressively close to 50%. I already had lots of respect for Los Vegas, but seeing these numbers forced me to give them even more.
Next, I took a look at my direction variable to see the line moved towards the winner or loser more often:
a=sum(df4$direction=="towards loser" )
b=sum(df4$direction=="towards winner")
a
## [1] 1025
b
## [1] 927
a/(a+b)
## [1] 0.5251025
As we can see, 52.5% of the time the line moves towards the loser. Very interesting right? Vegas moves the line to entice more people to bet on the losing side by throwing them a few extra points. Additionally, a proportion test demonstrated a p-value of .0135 making it significant and very unlikely that this phenomenon is due to variance. So, all I have to do is wait until the line is about to close and bet on the team that has a worse line than when it opened? Sort of… but not so fast. We have neglected the 10% vig that makes Vegas an absolute gold mine rather than a non-profit. In order to beat the vig and make a profit you need to win about 52.4% of all bets. This makes any of the rougher trends like betting on the away team every time very bad bets. When looking at our line movement strategy we can construct a 95% confidence interval of [0.5064 - 0.5436]. Yes, we will win bets more often than losing them but we will make little to no profit in the long run. Never the less, I am happy with these findings and tried to further them.
Let’s only look at games where the line moved a lot (3 or more points). When we narrow it down to just these games, we jump up to 54.9%. This is certainly enough to be profitable! This also makes some sense that a line moving over 2.5 points (with the exception of player injuries) means that Vegas is scrambling and desperately trying to get people on the losing side of the line.
a=sum(df4$direction=="towards loser" & abs(df4$line_movement)>2.5)
b=sum(df4$direction=="towards winner"& abs(df4$line_movement)>2.5)
a
## [1] 185
b
## [1] 152
a/(a+b)
## [1] 0.5489614
After looking futher into how the size of the line movement indicates outcome, I found that games with line movement equal to 0.5 to be very interesting. Since 2012 a whopping 56.3% of those games were won against the spread by the team that the line got worse for! After thinking about it for a while it does make sense. When Vegas moves the line half a point, they are telling us that the sharp money is on the other side. The small line movement is Vegas turning their hand face up and hoping that you won’t look. They want you to pay attention to the distractions of starting quarterbacks, winning streaks, and home and away splits. All the while their very small or very large movement of the line is a singing canary in a coal mine warning you about where the bookies want your money to go!
a=sum(df4$direction=="towards loser" & abs(df4$line_movement)==.5)
b=sum(df4$direction=="towards winner"& abs(df4$line_movement)==.5)
a
## [1] 320
b
## [1] 248
a/(a+b)
## [1] 0.5633803
Qualifications: Before you take out a loan and start hammering home these .5 and greater than 2.5 games, we need to consider a few important things. First off, past success does not guarantee future success. With thousands of different variables to look at there can be statistical anomalies. In other words, if we flip a coin 30 times, we wouldn’t expect to get 25 heads. However, if we flipped 10,000 coins 30 times each, it would be probable that a few of these coins landed heads 25 times. It would be unwise to bet on these coins to come up heads more often than tails in the future. That being said, I don’t think that is the case here and line movement really does tell us a big part of the story. The other more problematic point is that Vegas is constantly changing their strategy to disrupt trends like the one we have found. I investigated by doing a year-by-year breakdown. The results of betting $100 dollars on every single game with our different strategies are below and as you can see there is a change in pattern in the last couple years. Is this Vegas catching on to our trend or simply an outlier from the data? It is very tough to say.
This is a data table and graph of the year-by-year breakdown of our line movement direction. The profit variable in the tables and graphs is created by simulating a wager of 110 dollars to win 100 dollars on every game against the line movement
year=(2012:2021)
towards_winner=c(100,108,93,97,96,100,99,118,116,62)
towards_loser=c(103,111,112,113,112,120,118,116,120,69)
winning_percent=towards_loser/(towards_winner+towards_loser)
profit=(towards_loser*100 - towards_winner*110)
overall=data.frame(year,towards_winner,towards_loser,winning_percent,profit)
overall
## year towards_winner towards_loser winning_percent profit
## 1 2012 100 103 0.5073892 -700
## 2 2013 108 111 0.5068493 -780
## 3 2014 93 112 0.5463415 970
## 4 2015 97 113 0.5380952 630
## 5 2016 96 112 0.5384615 640
## 6 2017 100 120 0.5454545 1000
## 7 2018 99 118 0.5437788 910
## 8 2019 118 116 0.4957265 -1380
## 9 2020 116 120 0.5084746 -760
## 10 2021 62 69 0.5267176 80
plot(year, profit, type = "l", lty = 1,main = "Betting $100 on all Games With Line Movement")
abline(h=0, col="red")
#net profit if we bet to win $100 every time
sum(profit)
## [1] 610
Next, we look at the year by year breakdown of only the half a point line movements
year=(2012:2021)
towards_winner=c(36,21,23,22,32,28,29,28,29,11)
towards_loser=c(41,28,35,33,38,41,42,27,25,9)
winning_percent=towards_loser/(towards_winner+towards_loser)
profit=(towards_loser*100 - towards_winner*110)
half_point=data.frame(year,towards_winner,towards_loser,winning_percent,profit)
half_point
## year towards_winner towards_loser winning_percent profit
## 1 2012 36 41 0.5324675 140
## 2 2013 21 28 0.5714286 490
## 3 2014 23 35 0.6034483 970
## 4 2015 22 33 0.6000000 880
## 5 2016 32 38 0.5428571 280
## 6 2017 28 41 0.5942029 1020
## 7 2018 29 42 0.5915493 1010
## 8 2019 28 27 0.4909091 -380
## 9 2020 29 25 0.4629630 -690
## 10 2021 11 9 0.4500000 -310
plot(year,profit, type = "l", lty = 1, main = "Betting $100 on Line Movement of 0.5")
abline(h=0, col="red")
sum(profit)
## [1] 3410
#net profit if we bet to win $100 every time
Finally, we look at the year by year breakdown of our 3 or more point line movments.
year=(2012:2021)
towards_winner=c(11,11,12,11,14,16,13,29,24,23)
towards_loser=c(8,16,16,16,18,12,16,29,41,28)
winning_percent=towards_loser/(towards_winner+towards_loser)
profit=(towards_loser*100 - towards_winner*110)
atleast_three=data.frame(year,towards_winner,towards_loser,winning_percent,profit)
atleast_three
## year towards_winner towards_loser winning_percent profit
## 1 2012 11 8 0.4210526 -410
## 2 2013 11 16 0.5925926 390
## 3 2014 12 16 0.5714286 280
## 4 2015 11 16 0.5925926 390
## 5 2016 14 18 0.5625000 260
## 6 2017 16 12 0.4285714 -560
## 7 2018 13 16 0.5517241 170
## 8 2019 29 29 0.5000000 -290
## 9 2020 24 41 0.6307692 1460
## 10 2021 23 28 0.5490196 270
plot(year, profit, type = "l", lty = 1,main = "Betting $100 on Line Movement of at least 3")
abline(h=0, col="red")
sum(profit)
## [1] 1960
#net profit if we bet to win $100 every time
Conclusion: If you decide to bet on the team that the line moves away from, it certainly seems to improve your odds of winning. Betting against 0.5-line changes has made a killing over the last 10 years (3,410 by betting 100 on each game at -110 odds) but a recent dip may be enough for us to want to wait and see what happens next. Our other strategy of betting against line changes of 3 or more has done very well, swapping some profit for more consistency and perhaps the start of a dominant upward trend (1,960 in profit and 1,460 of it coming from last season!). My final statistical and gambling recommendation would be that at the very least you heavily consider the line movement in all of your bets and listen to the canary’s profitable song!