There really is nothing like the first weekend of March Madness. I have fond middle and high school memories of huddling around a classmate’s computer as they streamed the first March Madness game at the end of the school day. We watched with bated breath during the 8 vs 9 seed matchup, and felt the result of this game was pivotal to our bracket’s success. We could not have been more wrong. The points allotted for each predicted game increases as the tournament goes on. Every one game predicted correctly in just the elite 8 is worth 4 times as much as a game predicted correctly in the round of 64, and the point distribution only gets more skewed as the tournament goes on. Basically, brackets are won in April, not March. With this expression in mind, I created a bracket this year that started from the Championship game and made its way back. I used KenPom efficiency data to make my predictions, really aiming to predict at least 6 teams in the elite 8 and at least 3 for the Final Four. KenPom offensive efficiency is calculated by adding points scored per 100 possessions, adjusting for the level of competition faced. Defensive efficiency is the same formula but uses points allowed instead. My R analysis below focuses on comparing previous elite 8 teams, Final Four teams, runner-ups, and Championship teams, with the top KenPom teams heading into the 2025 tournament. Throughout this post I will incorporate a bit of analysis, reflecting on my results at the very end. Enjoy!

final_four_data <- data.frame(
  Year = c(2024, 2024, 2024, 2024, 2023, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2021, 2021, 2021, 2021),
  Team = c('UConn (24)', 'Purdue', 'Alabama', 'N.C. State',
           "UConn (23)", "Miami", "San Diego State", "FAU",
           "Kansas", "Duke", "UNC", "Villanova",
           "Baylor", "Gonzaga", "Houston", "UCLA"),
  Off_Rating = c(127.5, 125.2, 126.0, 114.3,
                 120.8, 119.1, 110.8, 115.1, 
                 119.2, 121.1, 114.4, 117.5, 
                 125.0, 126.4, 118.3, 116.9), 
  Def_Rating = c(91.1, 94.6, 103.0, 98.4,
                 90.9, 101.2, 90.4, 95.7, 
                 91.7, 95.9, 94.3, 92.9, 
                 91.1, 89.9, 89.6, 94.5),
  Outcome = c("Champion", "Runner-up", "Final Four", "Final Four",
              "Champion", "Final Four", "Runner-up", "Final Four",
              "Champion", "Final Four", "Runner-up", "Final Four",
              "Champion", "Runner-up", "Final Four", "Final Four")
)

str(final_four_data)
## 'data.frame':    16 obs. of  5 variables:
##  $ Year      : num  2024 2024 2024 2024 2023 ...
##  $ Team      : chr  "UConn (24)" "Purdue" "Alabama" "N.C. State" ...
##  $ Off_Rating: num  128 125 126 114 121 ...
##  $ Def_Rating: num  91.1 94.6 103 98.4 90.9 ...
##  $ Outcome   : chr  "Champion" "Runner-up" "Final Four" "Final Four" ...
summary(final_four_data)
##       Year          Team             Off_Rating      Def_Rating    
##  Min.   :2021   Length:16          Min.   :110.8   Min.   : 89.60  
##  1st Qu.:2022   Class :character   1st Qu.:116.5   1st Qu.: 91.05  
##  Median :2022   Mode  :character   Median :119.2   Median : 93.60  
##  Mean   :2022                      Mean   :119.8   Mean   : 94.08  
##  3rd Qu.:2023                      3rd Qu.:125.0   3rd Qu.: 95.75  
##  Max.   :2024                      Max.   :127.5   Max.   :103.00  
##    Outcome         
##  Length:16         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
## [1] -0.01159829
ggplot(final_four_data, aes(x = Off_Rating, y = Def_Rating, color = Outcome)) +
  geom_point(size = 3) +
  geom_text(aes(label = Team), vjust = -1, hjust = .5, size = 3) +
  labs(title = "Offensive vs. Defensive Rating in Final Four Teams (2021 - 2024)",
       x = "Offensive Rating",
       y = "Defensive Rating") +
  theme_minimal()

I created a data frame by imputing the KenPom efficiency rating of each Final Four team since 2021, making it the basis for my visualizations going forward. The teams with good offensive and defensive efficiency live in the bottom right corner of the visualization, and tend to perform well in the tournament. Teams that are elite at just offense or defense have the ability to make the Final Four, but there isn’t any recent data to suggest they can win the whole thing.

elite_eight_twentyfour <- data.frame(team = c('UConn', 'Purdue', 'Alabama', 'N.C. State', 'Avg Top 50 Team', 'Illinois',                                  'Clemson', 'Tennessee', 'Duke'),
    Off_Rating = c(127.5, 125.2, 126.0, 114.3, 115.16, 125.5, 117.7, 116.8, 121.6),
    Def_Rating = c(91.1, 94.6, 103.0, 98.4, 97.35, 101, 98.3, 90.2, 95.2),
    Outcome = c("Champion", "Runner-up", "Final Four", "Final Four", "N/A", 'Elite Eight', 'Elite Eight', 'Elite Eight', 'Elite Eight')
                                     )

ggplot(elite_eight_twentyfour, aes(x = Off_Rating, y = Def_Rating, color = Outcome))+
      geom_point(size = 3) +
        geom_text(aes(label = team), vjust = -1, hjust = .5, size = 3)+
  labs(title = "Offensive vs. Defensive Rating in Elite Eight Teams (2024)",
       x = "Offensive Rating",
       y = "Defensive Rating") +
  theme_minimal()

To make the Final Four you have to either completely dominate one facet of the game, or be a well balanced team capable of excelling at both ends of the floor… or be N.C. State. A basketball program with an established penchant for pulling off miracle Tournament runs did the unthinkable again in 2024. Not only was N.C. State worse than an average top 50 KenPom team on offense and defense, they weren’t even that close. However, N.C. State’s run highlights the potential importance of momentum when it comes to tournament success (they won the ACC tournament), a facet of the game that’s hard to manifest in statistical analysis.

top_ten_twentyfour <- data.frame(team = c('UConn (1)', 'Purdue (3)',        'Houston (2)','Auburn (4)', 'Avg Top 50 Team', 'Illinois (10)',           'Arizona (6)', 'Tennessee (5)', 'Duke (7)', 'Iowa St (8)', 'North Carolina (9)'),
    Off_Rating = c(127.5, 125.2, 118.9, 120.4, 115.16, 125.5, 120.2, 116.8, 121.6, 113.9, 119.7),
    Def_Rating = c(91.1, 94.6, 87.7, 92.4, 97.35, 101, 93.7, 90.2, 95.2, 87.5, 93.5),
    Outcome = c("Champion", "Runner-up", "Sweet 16", "First Round", "N/A", 'Elite Eight', 'Sweet 16', 'Elite Eight', 'Elite Eight', 'Sweet 16', 'Sweet 16')
                                     )

ggplot(top_ten_twentyfour, aes(x = Off_Rating, y = Def_Rating, color = Outcome))+
      geom_point(size = 3) +
        geom_text(aes(label = team), vjust = -1, hjust = .5, size = 3)+
  labs(title = "Top 10 Ken Pom Teams and Their March Madness Result (2024)",
       x = "Offensive Rating",
       y = "Defensive Rating") +
  theme_minimal()

Houston was ranked #2 in KenPom this year, but their team was decimated by injury, another underrated factor in Tournament success. Each of the top ten KenPom teams had their fair share of success this year except for Auburn, who inexplicably lost to #13 seed Yale.

The 2023 Tournament was defined by improper seeding. According to KenPom, FAU (#9 seed) had a real case to be seeded higher than Kansas St (#3 seed), so their success should not have been all that surprising despite their high seed. UConn being a #4 seed was also ridiculous given that they clearly had the offensive and defensive chops to be among the tournament favorites.

It is also not very surprising that #1 seeds like Purdue and Kansas flamed out early in the tournament given their mediocre net efficiency, although Houston’s inability to make it out of the sweet 16 is hard to explain given the data.

According to KenPom, the teams that made at least the elite 8 in 2022 were just above average (besides Kansas). Most teams were guided by a top of the line coach, so that’s another variable worth considering to some degree.

A lot of quality teams this year, but no team that really established itself on both sides of the ball (other than Gonzaga).

Baylor and Gonzaga were set on a collision course for the final in 2021 based on their insane efficiency. Since people disproportionately picked Zaga to win their bracket picking Baylor would have been a really good value.

This year’s tournament does not have a problem with improper seeding, that much is clear. The #1 seeds are all complete wagons that stand head and shoulder above their peers. After reflecting on the results of the past few seasons, these variables seem the most important for picking teams in the elite 8 and beyond: Net efficiency, injuries, coaching, and momentum. As such, I will be picking Duke to win the Championship the year, assuming Cooper Flagg is at full health. Among these titans, they are still in a tier of their own in terms of efficiency. Jon Scheyer is a good coach who took a pretty underwhelming squad to the elite 8 last year. Add a ACC Championship run (without Flagg), and Duke checks all the boxes. I have Florida as the runner-up for similar reasons. Insane efficiency, no injuries, and SEC Champions. There coach does not have a single tournament win under his belt though, and I think that will matter at the end of games. I have them beating Auburn in the Final Four, who has similar efficiency but is heading in the wrong direction over the last 10 games played. I picked Tennessee as my last Final Four team. They are the 5th best team in terms of efficiency and are comparable enough to Houston that I will give them the edge (also to avoid 4 #1 seeds in the Final Four). I also don’t buy St.Johns as a serious contender given their aberrant efficiency, even though they check the other boxes.

## 'data.frame':    16 obs. of  6 variables:
##  $ Year      : num  2024 2024 2024 2024 2023 ...
##  $ Team      : chr  "UConn (24)" "Purdue" "Alabama" "N.C. State" ...
##  $ Off_Rating: num  128 125 126 114 121 ...
##  $ Def_Rating: num  91.1 94.6 103 98.4 90.9 ...
##  $ Outcome   : chr  "Champion" "Runner-up" "Final Four" "Final Four" ...
##  $ Champion  : num  1 0 0 0 1 0 0 0 1 0 ...
##       Year          Team             Off_Rating      Def_Rating    
##  Min.   :2021   Length:16          Min.   :110.8   Min.   : 89.60  
##  1st Qu.:2022   Class :character   1st Qu.:116.5   1st Qu.: 91.05  
##  Median :2022   Mode  :character   Median :119.2   Median : 93.60  
##  Mean   :2022                      Mean   :119.8   Mean   : 94.08  
##  3rd Qu.:2023                      3rd Qu.:125.0   3rd Qu.: 95.75  
##  Max.   :2024                      Max.   :127.5   Max.   :103.00  
##    Outcome             Champion   
##  Length:16          Min.   :0.00  
##  Class :character   1st Qu.:0.00  
##  Mode  :character   Median :0.00  
##                     Mean   :0.25  
##                     3rd Qu.:0.25  
##                     Max.   :1.00

Seeing where the last few champions have been in terms of efficiency, it seems like Florida and Auburn have what it takes to win. But Duke has better efficiency than last year’s UConn team. Last year’s UConn team that was a total buzzsaw. Yeah, I’m taking Duke and not looking back.