Overview

This report is an analysis of data from NFL games focusing on the over/under betting lines. Over/Under lines predict the total number of points that will be scored in a game and then a bettor can bet if the total number of points will be over that line or under the line. The oddsmakers setting the line ideally wants half of the bets to be on the over and half to be on the under. We will evaluate the data to see if there are any instances or circumstances that skew the data to one side or the other more often.

scoresdata <- read.csv("spreadspoke_scores.csv", stringsAsFactors = FALSE)
nfl_teams  <- read.csv("nfl_teams.csv", stringsAsFactors = FALSE)

df  <- scoresdata
df2 <- nfl_teams


stopifnot(all(c("team_name","team_id") %in% names(df2)))

df <- df %>%
  left_join(df2 %>% select(team_name, team_id),
            by = c("team_home" = "team_name")) %>%
  rename(home_team_id = team_id) %>%
  left_join(df2 %>% select(team_name, team_id),
            by = c("team_away" = "team_name")) %>%
  rename(away_team_id = team_id)


df <- df %>%
  mutate(
    point_dif    = abs(score_away - score_home),
    total_points = score_away + score_home,
    ou_result = ifelse(total_points > over_under_line, "Over",
                       ifelse(total_points < over_under_line, "Under", "Push"))
  )

# Weather
df_overunder <- df %>%
  select(schedule_season, weather_detail, over_under_line, total_points, ou_result) %>%
  filter(!is.na(over_under_line))

# treat blanks / indoor / Sunny as clear (case-insensitive for "sunny")
df_overunder$weather_detail <- ifelse(
  tolower(trimws(df_overunder$weather_detail)) %in% c("", "indoor", "sunny"),
  "clear",
  df_overunder$weather_detail
)

df_overunder_inclement <- df_overunder %>%
  filter(weather_detail != "clear")

ggplot(df_overunder_inclement, aes(x = weather_detail, fill = ou_result)) +
  geom_bar(position = "dodge") +
  labs(title = "Over vs Under by Weather Condition",
       x = "Weather Condition", y = "Number of Games", fill = "O/U result") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

This clustered bar chart shows the results of the games that were played in inclement weather sorted by the type of weather condition. If any particular weather condition has over or under significantly different from 50%, then the oddsmakers are overestimating or underestimating the effects of that weather condition. We can see that games playe in rain hit the under more than 60% of the time. This means, the oddsmakers are underestimating the effects of rain and we should bet the under on rainy games with all else being equal.

# Inclement-only by year (2008–2019)
df_overunder2 <- df_overunder %>%
  mutate(weather_simple = ifelse(weather_detail == "clear", "clear", "inclement")) %>%
  filter(weather_simple == "inclement", schedule_season >= 2008, schedule_season <= 2019)

ggplot(df_overunder2, aes(x = schedule_season, fill = ou_result)) +
  geom_bar(position = "dodge") +
  scale_x_continuous(breaks = seq(2008, 2019, 1)) +
  labs(title = "Over vs Under Inclement Weather by Year",
       x = "Year", y = "Number of Games", fill = "O/U result") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

More generally, we can see if inclement weather is being over estimated or under estimated in the chart above. We can also see how that has changed through the years. In 2010 and 2013 the effects of inclement were be over estimated since more games hit the over. However, from 2015-2018 the under has hit more in games with inclement weather. This shows a change in approach from the odds makers and they are now under estimating the effects of inclement weather.

# Weekly line (regular season 1–18)
dfwklyou <- df %>%
  select(schedule_week, schedule_season, ou_result, over_under_line, total_points) %>%
  filter(!schedule_week %in% c("Superbowl","Wildcard","Division","Conference"),
         !is.na(ou_result)) %>%
  mutate(schedule_week = suppressWarnings(as.numeric(schedule_week))) %>%
  filter(!is.na(schedule_week), schedule_week >= 1, schedule_week <= 18) %>%
  count(schedule_week, ou_result)

ggplot(dfwklyou, aes(x = schedule_week, y = n, color = ou_result)) +
  geom_line(linewidth = 1) +
  geom_point() +
  scale_x_continuous(breaks = 1:18) +
  labs(title = "Over vs Under by Week",
       x = "Week", y = "Number of Games", color = "O/U Result") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

How far in the league is to the season may also affect how teams perform against the over/under line. In the line graph above you can see that the under hits more often in weeks 1-3, 6, 11, and 14. The early weeks of the season, everyone is working with new teammates or coaches and they have not hit their stride yet. The odds makers have not adjusted enough to compensate for this and an under bet is recommended in these weeks.

# Heat map (season × O/U) with labels, 1990–2024
heat_df <- df %>%
  filter(!is.na(schedule_season), !is.na(ou_result)) %>%
  group_by(schedule_season, ou_result) %>%
  summarise(n = n(), .groups = "drop") %>%
  complete(
    schedule_season = seq(min(schedule_season), max(schedule_season), by = 1),
    ou_result = c("Over","Under","Push"),
    fill = list(n = 0)
  ) %>%
  mutate(ou_result = factor(ou_result, levels = c("Over","Under","Push"))) %>%
  filter(schedule_season >= 1990, schedule_season <= 2024)

ggplot(heat_df, aes(x = schedule_season, y = ou_result, fill = n)) +
  geom_tile(color = "white") +
  geom_text(aes(label = n), color = "black", size = 3) +   # <- fixed 'size'
  scale_x_continuous(breaks = seq(min(heat_df$schedule_season),
                                  max(heat_df$schedule_season), by = 2)) +
  scale_fill_gradient(low = "red", high = "yellow") +
  labs(title = "NFL O/U/Push Counts by Season",
       x = "Season", y = "Result", fill = "Games") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

The above heat map can show a trend in total over or under results by year. The bright yellow in Under for the years 2021-2023 shows that the under hit more frequently in those years. It appears there was a trend that the under was hitting more often, however it seems the oddsmakers compensated for it in 2024.

# Team trellis pies (facet by team name; home games only)
df_team_pies = df %>%
  filter(!is.na(ou_result)) %>%
  filter(schedule_season >= 1990 & schedule_season <= 2024) %>%
  group_by(home_team_id, ou_result) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(home_team_id) %>%
  mutate(prop = n / sum(n))

ggplot(df_team_pies, aes(x = "", y = prop, fill = ou_result)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(theta = "y") +
  facet_wrap(~ home_team_id) +
  scale_fill_manual(values = c("Over" = "blue", "Under" = "red", "Push" = "green")) +
  theme_void() +
  labs(
    title = "Over/Under/Push Distribution by Team",
    fill = "O/U Result") +
  theme(
    plot.title = element_text(hjust = 0.5),
    strip.text = element_text(size = 8),
    legend.position = "bottom")

Another factor that may influence if the over or the under is a good bet is if a certain team consistently hits one or the other. The above trellis chart shows how frequently each team has hit over and under over the last 14 years. Most teams are pretty close to 50/50 however there are some teams that have some variance. Tampa Bay, Kansas City, and Los Angeles Chargers all seem to hit the under more frequently than the over. You may want to think twice before placing an over bet on those teams.

# Overall pie (1990–2024)
df1990to2024 = df %>%
  filter(schedule_season >= 1990 & schedule_season <= 2024)

years = sort(unique(df1990to2024$schedule_season))
years_to_show = years[seq(1,length(years), by = 2)]

df1990to2024$ou_result = factor(df1990to2024$ou_result,
                                levels = c("Over", "Under", "Push"))

df_pie <- df1990to2024 %>%
  filter(!is.na(ou_result)) %>%
  count(ou_result, name = "count") %>%
  mutate(prop = count / sum(count))

ggplot(df_pie, aes(x = "", y = prop, fill = ou_result)) +
  geom_bar(stat = "identity", width = 1, color = "white") +
  coord_polar(theta = "y") +
  geom_text(aes(label = scales::percent(prop, accuracy = 0.1)),
            position = position_stack(vjust = 0.5), color = "black", size = 4) +
  scale_fill_manual(values = c("Over" = "lightblue", "Under" = "pink", "Push" = "lightgreen")) +
  labs(title = "NFL Over/Under Results (1990–2024)",
       fill = "O/U Result") +
  theme_void() +
  theme(plot.title = element_text(hjust = 0.5))

Conclusion

There are many factors that affect if betting the over or the under in an NFL game is a good bet. As you can see in the pie chart above the odds makers are very accurate overall at making the lines such that half the games are over the line and half the games are under the line. In order to bet successfully you must drill down into the data to find every advantage you can to make your bets more successful. Factors like time of season, team, weather conditions, and trends of oddsmakers in recent years are all factors you should consider before placing a bet.