Kenpom Data

Question I intend to answer

By using the Kenpom team data I will be diving deeper into overall performances of conferences, using teams Average team rank, Offensive and Defensive efficiency as well. As we know there is the “Power 5” conferences(SEC, ACC, B1G 10, BIG 12, and BIG EAST), but I am curious as to how the next 5 best conferences also stack up. By viewing this stats at a conference level we can view which conferences are the best from top to bottom, and to provide more nuance as to which conference is the best, and also which conferences may be under and over rates

How I intend to answer

I plan on using Kenpoms team data, in which I obtained by web scraping, and then cleaned after pulling the data. I plan on using different visualizations to show which of these conferences perform best in each category, and by looking at the conference as a whole instead of by the top teams, we can get a better look at the true depth of the conference. To do this, I will take my scraped data, and group all the teams by conference in order to achieve my desired representative statistics.

Warning: package 'readr' was built under R version 4.5.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Rows: 365 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): team_name, team_conf, team_record
dbl (4): team_rank, adj_em, adj_o, adj_d

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

kenpom_conf <- kenpom_data %>% 
group_by(team_conf) %>% 
  summarise(avg_team_rank = round(mean(team_rank),2),
            avg_adj_o = round(mean(adj_o),2),
            avg_adj_d = round(mean(adj_d),2))

top_conf <- kenpom_conf %>%
  arrange(avg_team_rank) %>%
  head(10)

All I am doing here is combining teams into their conferences and taking the averages of all the stats I want to keep for my visuals

top_conf %>% 
  ggplot(aes(x = team_conf, y = avg_team_rank)) +
  geom_col() +
labs(title = "Average Rank by Conference",
       x = "Conference",
       y = "Avg_Rank")

After looking at this graph, the next best 5 are not doing so strong. After the power 5, we see almost a 40 swing by average team rank until the next conference (the MWC), which shows how even the top end teams cannot carry the loads of these conferences

  top_conf %>% 
  ggplot(aes(x = team_conf, y = avg_adj_o)) + 
  geom_col() + 
  labs(title = "Average Offense by Conference", 
       x = "Conference",
       y = "Avg_Offense")

One of the bigger surprises in my findings was that the power 5 was still atop of all others when looking at the offensive metrics. Here I was thinking there would at least be 1-2 group of 5 teams better than a P5 team, however that remains untrue.

  top_conf %>% 
  ggplot(aes(x = team_conf, y = avg_adj_d)) + 
  geom_col() + 
  labs(title = "Average Defense by Conference", 
  x = "Conference",
  y = "Avg_Defense")

In an even more shocking finding, especially seeing how all the offenses were worse, you would at least expect the defenses to be better, but no. The power 5 once again lead the pack, and shows us truly why they are considered the power 5

What does understanding this data show

A lot of people question why some mid majors don’t make the tournament, even when they have a much better record than the power 5 teams, and this data shows how hard it is for those mid majors to play good teams on a nightly basis. To make a strong conference it seems it is almost more important to have a stronger lower end of the conference, than a few elite teams, as the lower end of your conference is what is dragging all of the metrics down for these group of 5 teams.