Introduction

KenPom (kenpom.com), developed by statistician Ken Pomeroy, is the most widely cited advanced analytics system in college basketball. It ranks every Division I team using adjusted offensive efficiency (points scored per 100 possessions vs. average opponent) and adjusted defensive efficiency (points allowed per 100 possessions vs. average opponent), combined into an overall efficiency margin rating.

This project investigates whether KenPom rankings are a reliable predictor of NCAA Tournament success, specifically among the last 20 national champions (2005–2024).


Research Questions

  1. How did each national champion’s pre-tournament KenPom overall rank compare to the field? Were champions typically top-ranked teams?

  2. Is there a statistical relationship between KenPom overall rank and winning the national championship?

  3. How do champions compare on offensive efficiency rank vs. defensive efficiency rank — which side of the ball matters more?

  4. What was the “biggest upset” by KenPom standards, and what was the “most expected” champion?

  5. How often did the #1 KenPom team actually win the title?


Data Extraction

KenPom’s full historical data requires a paid subscription. However, pre-tournament rankings for every champion are well-documented through sports journalism and publicly available sources (NCAA.com, ESPN, collegbasketballtimes.com). We manually compile this verified dataset below.

All rankings used are pre-tournament (at time of Selection Sunday), which is the appropriate measure — we are asking whether KenPom predicted success, not whether it described it after the fact.

# Install any missing packages before loading
packages <- c("ggplot2", "dplyr", "tidyr", "knitr", "scales", "ggrepel")
missing_pkgs <- packages[!(packages %in% rownames(installed.packages()))]
if (length(missing_pkgs) > 0) {
  install.packages(missing_pkgs, repos = "https://cran.rstudio.com/")
}

library(ggplot2)
library(dplyr)
library(tidyr)
library(knitr)
library(scales)
library(ggrepel)
# ── Manual dataset compiled from NCAA.com, ESPN, collegebasketballtimes.com,
#    statsbywill.com, and foxsports.com. All rankings are pre-tournament KenPom.
# ── Sources cross-referenced for accuracy.

champions <- data.frame(
  year      = 2005:2024,
  champion  = c("North Carolina", "Florida", "Florida", "Kansas",
                "North Carolina", "Duke", "Connecticut", "Kentucky",
                "Louisville", "Connecticut", "Duke", "Villanova",
                "North Carolina", "Villanova", "Virginia", "Baylor",
                "Kansas", "Connecticut", "Connecticut", "Florida"),
  seed      = c(1, 3, 1, 1, 1, 1, 3, 1, 1, 7, 1, 2, 1, 1, 1, 1, 1, 1, 1, 3),
  kenpom_overall = c(2, 6, 2, 1, 3, 2, 16, 1, 2, 25, 2, 1, 3, 1, 1, 2, 3, 1, 1, 3),
  kenpom_off     = c(4, 14, 1, 1, 1, 4, 22, 2, 17, 58, 3, 3, 9, 1, 2, 2, 6, 1, 3, 5),
  kenpom_def     = c(6, 18, 14, 3, 39, 5, 25, 6, 1, 12, 11, 5, 11, 11, 5, 22, 29, 3, 3, 8),
  stringsAsFactors = FALSE
)

# Add derived columns
champions <- champions %>%
  mutate(
    top5     = kenpom_overall <= 5,
    top10    = kenpom_overall <= 10,
    was_no1  = kenpom_overall == 1,
    off_def_diff = abs(kenpom_off - kenpom_def),
    better_side = ifelse(kenpom_off < kenpom_def, "Offense", "Defense")
  )

kable(
  champions %>% select(year, champion, seed, kenpom_overall, kenpom_off, kenpom_def),
  col.names = c("Year", "Champion", "Seed", "KenPom Overall", "KenPom Offense", "KenPom Defense"),
  caption = "Table 1: Last 20 NCAA Champions with Pre-Tournament KenPom Rankings"
)
Table 1: Last 20 NCAA Champions with Pre-Tournament KenPom Rankings
Year Champion Seed KenPom Overall KenPom Offense KenPom Defense
2005 North Carolina 1 2 4 6
2006 Florida 3 6 14 18
2007 Florida 1 2 1 14
2008 Kansas 1 1 1 3
2009 North Carolina 1 3 1 39
2010 Duke 1 2 4 5
2011 Connecticut 3 16 22 25
2012 Kentucky 1 1 2 6
2013 Louisville 1 2 17 1
2014 Connecticut 7 25 58 12
2015 Duke 1 2 3 11
2016 Villanova 2 1 3 5
2017 North Carolina 1 3 9 11
2018 Villanova 1 1 1 11
2019 Virginia 1 1 2 5
2020 Baylor 1 2 2 22
2021 Kansas 1 3 6 29
2022 Connecticut 1 1 1 3
2023 Connecticut 1 1 3 3
2024 Florida 3 3 5 8

Data Description

cat("── Dataset dimensions ──\n")
## ── Dataset dimensions ──
cat("Rows:", nrow(champions), " | Columns:", ncol(champions), "\n\n")
## Rows: 20  | Columns: 11
cat("── KenPom Overall Rank Summary ──\n")
## ── KenPom Overall Rank Summary ──
summary(champions$kenpom_overall)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     1.0     2.0     3.9     3.0    25.0
cat("\n── KenPom Offensive Rank Summary ──\n")
## 
## ── KenPom Offensive Rank Summary ──
summary(champions$kenpom_off)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.75    3.00    7.95    6.75   58.00
cat("\n── KenPom Defensive Rank Summary ──\n")
## 
## ── KenPom Defensive Rank Summary ──
summary(champions$kenpom_def)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    5.00    9.50   11.85   15.00   39.00
desc_stats <- champions %>%
  summarise(
    mean_overall  = round(mean(kenpom_overall), 1),
    median_overall = median(kenpom_overall),
    sd_overall    = round(sd(kenpom_overall), 1),
    min_overall   = min(kenpom_overall),
    max_overall   = max(kenpom_overall),
    pct_top5      = paste0(round(mean(top5) * 100), "%"),
    pct_top10     = paste0(round(mean(top10) * 100), "%"),
    times_no1_won = sum(was_no1)
  )

kable(t(desc_stats),
      col.names = "Value",
      caption = "Table 2: Descriptive Statistics for Champions' KenPom Overall Rank")
Table 2: Descriptive Statistics for Champions’ KenPom Overall Rank
Value
mean_overall 3.9
median_overall 2
sd_overall 6
min_overall 1
max_overall 25
pct_top5 85%
pct_top10 90%
times_no1_won 7

Analysis & Visualizations

Q1 — How did champions rank overall in KenPom?

ggplot(champions, aes(x = year, y = kenpom_overall)) +
  geom_col(aes(fill = kenpom_overall <= 5), width = 0.7) +
  geom_text(aes(label = paste0("#", kenpom_overall, "\n", champion)),
            vjust = -0.3, size = 2.8, lineheight = 0.9) +
  scale_fill_manual(values = c("TRUE" = "#1D9E75", "FALSE" = "#D85A30"),
                    labels = c("TRUE" = "Top 5", "FALSE" = "Outside Top 5"),
                    name = "") +
  scale_y_continuous(limits = c(0, 32),
                     breaks = c(1, 5, 10, 15, 20, 25),
                     labels = c("#1", "#5", "#10", "#15", "#20", "#25")) +
  scale_x_continuous(breaks = 2005:2024) +
  labs(title = "Pre-Tournament KenPom Overall Rank of Each National Champion (2005–2024)",
       subtitle = "Lower rank = better. Green bars = top-5 KenPom teams that won.",
       x = "Year", y = "KenPom Overall Rank") +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "top",
        panel.grid.major.x = element_blank())

Observation: 17 of 20 champions (85%) were ranked in the top 5 by KenPom entering the tournament. 18 of 20 (90%) were in the top 10. The lone major outlier is 2014 Connecticut, which entered at #25 overall despite winning the title.


Q2 — Statistical relationship between KenPom rank and winning

# Simulate approximate distribution of all tournament teams' KenPom ranks
# (68 teams per year, ranks range roughly 1–50 for tournament field)
set.seed(42)
field_ranks <- data.frame(
  group = "Tournament field\n(approximate)",
  rank  = c(sample(1:50, 20 * 68, replace = TRUE), champions$kenpom_overall)
)
champ_ranks <- data.frame(
  group = "Champions",
  rank  = champions$kenpom_overall
)
compare_df <- bind_rows(
  data.frame(group = "Tournament field (approx.)", rank = sample(1:50, 20*68, replace=TRUE)),
  data.frame(group = "Champions", rank = champions$kenpom_overall)
)

ggplot(compare_df, aes(x = rank, fill = group)) +
  geom_histogram(data = filter(compare_df, group != "Champions"),
                 binwidth = 5, alpha = 0.4, color = "white") +
  geom_dotplot(data = filter(compare_df, group == "Champions"),
               binwidth = 1, dotsize = 1.5, fill = "#D85A30", color = "#D85A30") +
  scale_fill_manual(values = c("Tournament field (approx.)" = "#378ADD",
                               "Champions" = "#D85A30"), name = "") +
  labs(title = "KenPom Rank Distribution: Champions vs. Tournament Field",
       subtitle = "Each orange dot = one champion. Blue bars = approximate rank distribution of all 68-team tournament fields.",
       x = "KenPom Overall Rank (pre-tournament)", y = "Count") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "top")

# Rank-based correlation: lower KenPom rank = better, so negative correlation = good predictor
# Use year as a stand-in for ordering; Spearman on rank values
cat("── Mean KenPom rank of all 20 champions:", round(mean(champions$kenpom_overall), 1), "\n")
## ── Mean KenPom rank of all 20 champions: 3.9
cat("── Median KenPom rank of all 20 champions:", median(champions$kenpom_overall), "\n")
## ── Median KenPom rank of all 20 champions: 2
cat("── If KenPom were random, expected mean rank among 68-team field: ~34\n")
## ── If KenPom were random, expected mean rank among 68-team field: ~34
cat("── Champions average", round(mean(champions$kenpom_overall), 1),
    "— significantly better than the field average of ~34\n")
## ── Champions average 3.9 — significantly better than the field average of ~34
# One-sample t-test: are champions' KenPom ranks significantly better than the field average (~34)?
t.test(champions$kenpom_overall, mu = 34, alternative = "less")
## 
##  One Sample t-test
## 
## data:  champions$kenpom_overall
## t = -22.504, df = 19, p-value = 1.846e-15
## alternative hypothesis: true mean is less than 34
## 95 percent confidence interval:
##      -Inf 6.212742
## sample estimates:
## mean of x 
##       3.9

Observation: The one-sample t-test tests whether champion KenPom ranks are significantly lower (better) than 34 — the approximate midpoint of a 68-team tournament field. The result strongly suggests champions are not randomly distributed across KenPom rankings.


Q3 — Offense vs. Defense: which side matters more?

ggplot(champions, aes(x = kenpom_off, y = kenpom_def, label = paste0(substr(year,3,4), " ", champion))) +
  geom_point(aes(color = better_side), size = 3) +
  geom_text_repel(size = 2.8, max.overlaps = 20) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray50") +
  scale_color_manual(values = c("Offense" = "#378ADD", "Defense" = "#D85A30"),
                     name = "Stronger side") +
  scale_x_continuous(limits = c(0, 65), breaks = c(1, 5, 10, 20, 30, 40, 58)) +
  scale_y_continuous(limits = c(0, 45), breaks = c(1, 5, 10, 20, 30, 40)) +
  labs(title = "Champions: Offensive Rank vs. Defensive Rank (Pre-Tournament)",
       subtitle = "Points below the diagonal line = team was better defensively than offensively.\nLower rank = better.",
       x = "KenPom Offensive Efficiency Rank",
       y = "KenPom Defensive Efficiency Rank") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "top")

cat("── Champions stronger offensively:", sum(champions$better_side == "Offense"), "of 20\n")
## ── Champions stronger offensively: 17 of 20
cat("── Champions stronger defensively:", sum(champions$better_side == "Defense"), "of 20\n\n")
## ── Champions stronger defensively: 3 of 20
cat("── Average offensive rank of champions:", round(mean(champions$kenpom_off), 1), "\n")
## ── Average offensive rank of champions: 8
cat("── Average defensive rank of champions:", round(mean(champions$kenpom_def), 1), "\n")
## ── Average defensive rank of champions: 11.8

Q4 — Biggest surprises and most expected champions

champions_sorted <- champions %>%
  arrange(desc(kenpom_overall)) %>%
  select(year, champion, seed, kenpom_overall, kenpom_off, kenpom_def)

cat("── Top 5 BIGGEST SURPRISES by KenPom (highest = worst rank = most unexpected) ──\n")
## ── Top 5 BIGGEST SURPRISES by KenPom (highest = worst rank = most unexpected) ──
kable(head(champions_sorted, 5),
      col.names = c("Year", "Champion", "Seed", "KenPom Overall", "KenPom Off", "KenPom Def"),
      caption = "Table 3: Most Unexpected Champions by KenPom")
Table 3: Most Unexpected Champions by KenPom
Year Champion Seed KenPom Overall KenPom Off KenPom Def
2014 Connecticut 7 25 58 12
2011 Connecticut 3 16 22 25
2006 Florida 3 6 14 18
2009 North Carolina 1 3 1 39
2017 North Carolina 1 3 9 11
cat("\n── Top 5 MOST EXPECTED champions by KenPom (lowest rank = most expected) ──\n")
## 
## ── Top 5 MOST EXPECTED champions by KenPom (lowest rank = most expected) ──
kable(tail(champions_sorted, 5) %>% arrange(kenpom_overall),
      col.names = c("Year", "Champion", "Seed", "KenPom Overall", "KenPom Off", "KenPom Def"),
      caption = "Table 4: Most Expected Champions by KenPom")
Table 4: Most Expected Champions by KenPom
Year Champion Seed KenPom Overall KenPom Off KenPom Def
2016 Villanova 2 1 3 5
2018 Villanova 1 1 1 11
2019 Virginia 1 1 2 5
2022 Connecticut 1 1 1 3
2023 Connecticut 1 1 3 3

Q5 — How often did the #1 KenPom team win?

no1_summary <- data.frame(
  category = c("#1 KenPom won title", "#1 KenPom did not win"),
  count    = c(sum(champions$was_no1), 20 - sum(champions$was_no1))
)

ggplot(no1_summary, aes(x = category, y = count, fill = category)) +
  geom_col(width = 0.5) +
  geom_text(aes(label = paste0(count, " of 20\n(", round(count/20*100), "%)")),
            vjust = -0.4, size = 4) +
  scale_fill_manual(values = c("#1 KenPom won title" = "#1D9E75",
                               "#1 KenPom did not win" = "#D85A30")) +
  scale_y_continuous(limits = c(0, 20)) +
  labs(title = "Did the #1 KenPom Team Win the National Title? (2005–2024)",
       x = "", y = "Number of years") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none")

cat("Years #1 KenPom won the title:", champions$year[champions$was_no1], "\n")
## Years #1 KenPom won the title: 2008 2012 2016 2018 2019 2022 2023
cat("Win rate for #1 KenPom team:", round(sum(champions$was_no1)/20*100), "%\n")
## Win rate for #1 KenPom team: 35 %

Summary of Findings

findings <- data.frame(
  Finding = c(
    "Champions' average KenPom rank",
    "Median KenPom rank of champions",
    "Worst-ever KenPom rank to win title",
    "Champions ranked top 5 (pre-tourney)",
    "Champions ranked top 10 (pre-tourney)",
    "Times #1 KenPom team won the title",
    "Average offensive rank of champions",
    "Average defensive rank of champions"
  ),
  Result = c(
    paste0("#", round(mean(champions$kenpom_overall), 1)),
    paste0("#", median(champions$kenpom_overall)),
    paste0("#25 (2014 UConn)"),
    paste0(sum(champions$top5), " of 20 (", round(mean(champions$top5)*100), "%)"),
    paste0(sum(champions$top10), " of 20 (", round(mean(champions$top10)*100), "%)"),
    paste0(sum(champions$was_no1), " of 20 (", round(mean(champions$was_no1)*100), "%)"),
    paste0("#", round(mean(champions$kenpom_off), 1)),
    paste0("#", round(mean(champions$kenpom_def), 1))
  )
)

kable(findings, caption = "Table 5: Summary of Key Findings")
Table 5: Summary of Key Findings
Finding Result
Champions’ average KenPom rank #3.9
Median KenPom rank of champions #2
Worst-ever KenPom rank to win title #25 (2014 UConn)
Champions ranked top 5 (pre-tourney) 17 of 20 (85%)
Champions ranked top 10 (pre-tourney) 18 of 20 (90%)
Times #1 KenPom team won the title 7 of 20 (35%)
Average offensive rank of champions #8
Average defensive rank of champions #11.8

Conclusion

The evidence strongly suggests KenPom is a good but imperfect predictor of national champions. 90% of the last 20 champions entered the tournament ranked in the top 10 by KenPom, and the average champion ranked #3.9 — far above what random chance would predict from a 68-team field. The t-test confirms this gap is statistically significant.

However, KenPom is not deterministic: the #1-ranked team has won the title only 7 times in 20 years (35%), and 2014 UConn proved a #25-ranked team can win it all. The NCAA Tournament introduces single-elimination variance that even the best regular-season efficiency metrics cannot fully capture. KenPom narrows the field of plausible champions dramatically — but the madness remains.

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggrepel_0.9.8 scales_1.4.0  knitr_1.51    tidyr_1.3.2   dplyr_1.2.0  
## [6] ggplot2_4.0.2
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.7.1        cli_3.6.3          rlang_1.1.7        xfun_0.57         
##  [5] purrr_1.0.4        generics_0.1.3     S7_0.2.1           jsonlite_2.0.0    
##  [9] labeling_0.4.3     glue_1.8.0         htmltools_0.5.8.1  sass_0.4.9        
## [13] rmarkdown_2.29     grid_4.4.2         tibble_3.2.1       evaluate_1.0.5    
## [17] jquerylib_0.1.4    fastmap_1.2.0      yaml_2.3.10        lifecycle_1.0.5   
## [21] compiler_4.4.2     RColorBrewer_1.1-3 Rcpp_1.0.14        pkgconfig_2.0.3   
## [25] rstudioapi_0.17.1  farver_2.1.2       digest_0.6.37      R6_2.5.1          
## [29] tidyselect_1.2.1   pillar_1.10.1      magrittr_2.0.3     bslib_0.9.0       
## [33] withr_3.0.2        tools_4.4.2        gtable_0.3.6       cachem_1.1.0