Assignment 7

Author

Kevin Howse

Introduction to the Data

This dataset leverages the Simple Rating System (SRS) to evaluate the relative strength of college basketball conferences over the last decade. By quantifying performance through this metric, we can objectively compare conference depth across different eras and identify which specific league earned “elite” status in any given season.

Beyond historical rankings, this data serves as a lens through which we can observe the evolving landscape of the sport. Specifically, it allows us to analyze whether the gap between “Power” and “Mid-Major” conferences is widening, providing statistical insight into whether the recent introduction of Name, Image, and Likeness (NIL) is concentrating talent and further consolidating power within the nation’s top programs.

Seasons are indexed by their ending year. Accordingly, the 2024 season denotes the full 2023–2024 campaign.

Quick Summary of SRS

Simple Rating System; a rating that takes into account average point differential and strength of schedule. The rating is denominated in points above/below average, where zero is average. Non-Division I games are excluded from the ratings. (Sports Reference)

Doug Drinen wrote an in-depth explanation that you can view here. (Pro Football Reference)

Ethics Statement

Sports Reference publicly displays SRS and SOS data with the explicit intent of informing sports analysis and debate — it’s a stats reference site, not a proprietary database. The data being collected is narrow and purposeful: conference-level aggregate metrics across a bounded set of seasons to answer a specific analytical question, not a broad crawl designed to replicate or compete with their product. There’s no commercial angle, no redistribution, and no attempt to circumvent a paywall. Scraping a handful of summary tables to answer a legitimate research question falls well within the spirit of fair use, and is functionally no different from a researcher manually recording numbers from the same pages — automation just removes the tedium. Sports Reference itself makes their intent explicit on their front page: “We democratize data, so our users enjoy, understand, and share the sports they love” — and answering a conference-level analytical question about SRS and SOS trends is exactly the kind of understanding and sharing that mission describes. Their encouragement of CSV exports and embeds further signals that analysis of this kind is precisely the use case they’re built for.

To ensure our data collection remains ethically sound and compliant with the site’s terms of service, we have analyzed the /robots.txt file (provided below). Our scraping protocol strictly avoids all restricted paths and directories. By adhering to these Disallow directives, we ensure that our automated retrieval process respects the site’s crawling boundaries and remains within the scope of permissible access.

```text User-agent: AhrefsBot Disallow: /

User-agent: Twitterbot Disallow:

User-agent: GPTBot Disallow: /

User-agent:

Disallow: /bb/

Disallow: /olympics/athlete_search.cgi

Disallow: /cfb/search.cgi

Disallow: /cbb/search.cgi

Disallow: /cbb/boxscores/index.cgi?

Disallow: /cbb/players//splits//?

Disallow: /cbb/players//splits/?

Disallow: /cbb/players//splits/

Disallow: /cbb/players//splits

Disallow: /cfb/boxscores/index.cgi?

Disallow: /cfb/players//splits//?

Disallow: /cfb/players//splits/?

Disallow: /cfb/players//splits/

Disallow: /cfb/players//splits

Disallow: /cfb/schools///splits//?

Disallow: /cfb/schools///splits/?

Disallow: /cfb/schools///splits/

Disallow: /cfb/schools///splits*

Disallow: /cfb/req/

Disallow: /cfb/short/

Disallow: /cfb/nocdn/

Disallow: /cbb/req/

Disallow: /cbb/short/

Disallow: /cbb/nocdn/

Sitemap: https://www.sports-reference.com/sitemaps/sitemap.xml

Identify myself to Sports Reference

httr::set_config(httr::user_agent("howsek@xavier.edu; +https://www.xavier.edu/business-analytics-program"))

Load Library’s

library(tidyverse)  # The tidyverse collection of packages
library(rvest)      # Useful tools for working with HTML and XML
library(xml2)       # Functions for interacting with HTML via the R viewer
library(httr)       # Useful tools for working with HTTP verbs and authentication
library(magrittr)   # Tools for piping code
library(knitr)      # Tools for dynamic report generation in Quarto

We need to scrape the HTML and loop through the last 10 years:

#Make a lost and have the years to loop through
all_conferences <- list()
years <- 2016:2026

#Insert in to Base Url
for (year in years) {
  
  url <- paste0("https://www.sports-reference.com/cbb/seasons/men/", year, ".html")
  cat("Scraping:", year, "\n")
  
  tryCatch({
    
    raw_html <- read_html(url) %>% as.character()
    
    uncommented <- raw_html %>%
      str_replace_all("<!--", "") %>%
      str_replace_all("-->", "") %>%
      read_html()
    
    table <- uncommented %>%
      html_element("#conference-summary") %>%
      html_table()
    
    table$Season <- year
    all_conferences[[as.character(year)]] <- table
    
  }, error = function(e) {
    cat("Failed for year:", year, "-", e$message, "\n")
  })
  
#Let the system sleep for a respectable 3 seconds
  Sys.sleep(3)
}
Scraping: 2016 
Scraping: 2017 
Scraping: 2018 
Scraping: 2019 
Scraping: 2020 
Scraping: 2021 
Scraping: 2022 
Scraping: 2023 
Scraping: 2024 
Scraping: 2025 
Scraping: 2026 
#Look at the conference data of the charts
conference_data <- bind_rows(all_conferences)

Question #1: What are the top 5 conferences each year since 2015?

Example: 2017 Top 5 Conferences (by SRS): Big 12, ACC, Big East, Big Ten, and SEC

#Make a table of the top 5 from each year and rank them (by SRS)
top5 <- conference_data %>%
  select(Season, Conference, SRS, SOS) %>%
  filter(!is.na(SRS), SRS != "SRS") %>% 
  mutate(
    SRS = as.numeric(SRS),
    SOS = as.numeric(SOS)
  ) %>%
  filter(!is.na(SRS)) %>%
  group_by(Season) %>%
  arrange(desc(SRS)) %>%
  slice_head(n = 5) %>%
  mutate(Rank = row_number()) %>%
  ungroup()

print(top5, n = 55)
# A tibble: 55 × 5
   Season Conference                  SRS   SOS  Rank
    <int> <chr>                     <dbl> <dbl> <int>
 1   2016 Big 12 Conference         15.0  10.1      1
 2   2016 Atlantic Coast Conference 14.0   9.24     2
 3   2016 Pac-12 Conference         11.8   8.52     3
 4   2016 Big East Conference       11.6   7.92     4
 5   2016 Big Ten Conference        10.9   7.12     5
 6   2017 Big 12 Conference         17.4  11.3      1
 7   2017 Atlantic Coast Conference 14.7   9.67     2
 8   2017 Big East Conference       13.2   9.11     3
 9   2017 Big Ten Conference        12.6   8.75     4
10   2017 Southeastern Conference   11.7   8.82     5
11   2018 Big 12 Conference         15.2   9.83     1
12   2018 Big East Conference       14.2   9.28     2
13   2018 Atlantic Coast Conference 13.4   8.51     3
14   2018 Big Ten Conference        13.0   7.99     4
15   2018 Southeastern Conference   13     9.34     5
16   2019 Big Ten Conference        15.1  10.8      1
17   2019 Big 12 Conference         14.8  10.5      2
18   2019 Atlantic Coast Conference 14.2   9.06     3
19   2019 Southeastern Conference   12.7   9.29     4
20   2019 Big East Conference        9.92  6.96     5
21   2020 Big Ten Conference        15.1  10.5      1
22   2020 Big 12 Conference         14.4  10.0      2
23   2020 Big East Conference       13.7   9.19     3
24   2020 Pac-12 Conference         11.5   7.66     4
25   2020 Atlantic Coast Conference 11.1   7.73     5
26   2021 Big Ten Conference        14.9  11.9      1
27   2021 Pac-12 Conference         11.9   9.86     2
28   2021 Big 12 Conference         11.7   8.94     3
29   2021 Southeastern Conference   11.2   8.48     4
30   2021 Big East Conference       10.6   8.73     5
31   2022 Big 12 Conference         15.5   9.75     1
32   2022 Southeastern Conference   12.3   8.45     2
33   2022 Big Ten Conference        11.9   8.6      3
34   2022 Big East Conference       11.2   8.16     4
35   2022 Pac-12 Conference          9.49  7.35     5
36   2023 Big 12 Conference         15.6  10.2      1
37   2023 Big Ten Conference        12.5   8.75     2
38   2023 Big East Conference       11.3   8.25     3
39   2023 Southeastern Conference   11.0   7.66     4
40   2023 Pac-12 Conference         10.5   8.01     5
41   2024 Big 12 Conference         15.1   9.12     1
42   2024 Big Ten Conference        13.0   9.15     2
43   2024 Big East Conference       12.8   9.58     3
44   2024 Southeastern Conference   12.2   8.38     4
45   2024 Atlantic Coast Conference 11.4   8.31     5
46   2025 Southeastern Conference   18.8  12.1      1
47   2025 Big Ten Conference        16.2  11.3      2
48   2025 Big 12 Conference         15.9  11.4      3
49   2025 Big East Conference       12.1   8.63     4
50   2025 Atlantic Coast Conference  9.81  7.38     5
51   2026 Southeastern Conference   16.8  11        1
52   2026 Big 12 Conference         16.4  11.2      2
53   2026 Big Ten Conference        16.0  11.4      3
54   2026 Atlantic Coast Conference 13.9   9.04     4
55   2026 Big East Conference       12.7   9.49     5

It is cool to be able to pull up any season and immediately see which five conferences were operating at the highest level that year — something that would have taken hours of manual research before tools like this existed. For most of the decade, the answer was predictable: the Big 12 was “the basketball conference,” consistently sitting at or near the top of the SRS rankings year after year, and it was hard to argue otherwise. But the last two seasons have told a different story, with the SEC claiming the number one spot back to back and doing so by a margin that is hard to ignore. The most fun part of this kind of data, though, is being able to rewind — it’s one thing to know who’s on top now, but being able to ask “who were the top five conferences back in 2017, and in what order?” and actually get a clean, defensible answer is where SRS really earns its value.

Question #2: What are the strongest seasons by a conference in the last 10 years?

# Create a single all-time ranking table across all years
all_time_ranking <- conference_data %>%
  select(Season, Conference, SRS, SOS) %>%
  filter(!is.na(SRS), SRS != "SRS") %>%
  mutate(
    SRS = as.numeric(SRS),
    SOS = as.numeric(SOS)
  ) %>%
  filter(!is.na(SRS)) %>%
  arrange(desc(SRS)) %>%
  mutate(Rank = row_number()) %>%
  select(Rank, Season, Conference, SRS, SOS)

#Limit it to see the top 25
print(all_time_ranking, n = 25)
# A tibble: 351 × 5
    Rank Season Conference                  SRS   SOS
   <int>  <int> <chr>                     <dbl> <dbl>
 1     1   2025 Southeastern Conference    18.8 12.1 
 2     2   2017 Big 12 Conference          17.4 11.3 
 3     3   2026 Southeastern Conference    16.8 11   
 4     4   2026 Big 12 Conference          16.4 11.2 
 5     5   2025 Big Ten Conference         16.2 11.3 
 6     6   2026 Big Ten Conference         16.0 11.4 
 7     7   2025 Big 12 Conference          15.9 11.4 
 8     8   2023 Big 12 Conference          15.6 10.2 
 9     9   2022 Big 12 Conference          15.5  9.75
10    10   2018 Big 12 Conference          15.2  9.83
11    11   2019 Big Ten Conference         15.1 10.8 
12    12   2020 Big Ten Conference         15.1 10.5 
13    13   2024 Big 12 Conference          15.1  9.12
14    14   2016 Big 12 Conference          15.0 10.1 
15    15   2021 Big Ten Conference         14.9 11.9 
16    16   2019 Big 12 Conference          14.8 10.5 
17    17   2017 Atlantic Coast Conference  14.7  9.67
18    18   2020 Big 12 Conference          14.4 10.0 
19    19   2018 Big East Conference        14.2  9.28
20    20   2019 Atlantic Coast Conference  14.2  9.06
21    21   2016 Atlantic Coast Conference  14.0  9.24
22    22   2026 Atlantic Coast Conference  13.9  9.04
23    23   2020 Big East Conference        13.7  9.19
24    24   2018 Atlantic Coast Conference  13.4  8.51
25    25   2017 Big East Conference        13.2  9.11
# ℹ 326 more rows

Some Key Takeaways:

1st: Southeastern Conference (2025) — 18.76 SRS

2nd: Big 12 Conference (2017) — 17.44 SRS

3rd: Southeastern Conference (2026) — 16.76 SRS

4th: Big 12 Conference (2026) — 15.98 SRS

5th: Big 12 Conference (2023) — 15.90 SRS

The SEC has emerged as the new standard of conference excellence, claiming the top spot in 2025 and returning even stronger in 2026, signaling that NIL and roster construction through the transfer portal have fundamentally shifted the balance of power in college basketball. The Big 12 remains the most consistently elite conference of the decade, appearing six times in the top 10 across multiple years, but is now being challenged at the very top rather than operating unchallenged as it did from 2017 to 2023. Perhaps most striking is the clustering of 2025 and 2026 seasons at the top of the all-time list — four of the top six performances come from just the last two years, strongly suggesting that conference strength is not cyclical but accelerating, with the gap between elite and average conferences widening by the season.

Question #3: Has NIL created stronger conferences, or were elite conference performances already the norm before the NIL era?

#Using the top 25 we just created, what years did they happen
top25_by_year <- all_time_ranking %>%
  slice_head(n = 25) %>%
  count(Season, name = "Count")

#plot the findings
ggplot(top25_by_year, aes(x = factor(Season), y = Count)) +
  geom_col(fill = "#1a3a5c", width = 0.6) +
  geom_text(aes(label = Count), vjust = -0.5, fontface = "bold", size = 5) +
  labs(
    title = "Top 25 Conference-Season Performances by Year",
    subtitle = "How many of the all-time top 25 SRS performances came from each season",
    x = "Season",
    y = "# of Conferences in Top 25"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.major.x = element_blank()
  )

From 2017 to 2020, college basketball saw three conferences consistently performing at an elite level each season, suggesting a era of relatively balanced power at the top where multiple conferences could legitimately claim dominance. That parity collapsed between 2021 and 2024, with only one conference per year cracking the all-time top 25 — a period that coincides with the early and chaotic years of NIL, where roster instability and transfer portal volatility may have leveled the playing field by disrupting everyone equally. Now the trend is reversing sharply, with 2025 returning to three elite conference performances and 2026 hitting four — the highest single-season count in the dataset — indicating that the programs and conferences who figured out NIL and the portal first are now pulling away, and a new era of concentrated dominance may be just beginning.

Question #4: Which conferences have been consistently elite over the last decade, and which ones are less consistent?


Using that top 25 over the last 10 years again, we will count how many times each conference is included in that cart

#Count the amount of times each conference is shown
top25_by_conference <- all_time_ranking %>%
  slice_head(n = 25) %>%
  count(Conference, name = "Count") %>%
  arrange(desc(Count))

#Plot the data 
ggplot(top25_by_conference, aes(x = reorder(Conference, Count), y = Count)) +
  geom_col(fill = "#1a3a5c", width = 0.6) +
  geom_text(aes(label = Count), hjust = -0.3, fontface = "bold", size = 5) +
  coord_flip() +
  labs(
    title = "Most Represented Conferences in Top 25 All-Time SRS Performances",
    subtitle = "Count of appearances in the top 25 conference-season performances (2015–2025)",
    x = NULL,
    y = "# of Appearances"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.major.y = element_blank()
  )

The Big 12 is the definition of sustained excellence — appearing in the top 25 all 10 times, they have not had a single season over the last decade that didn’t rank among the best conference performances in college basketball. The SEC, by contrast, has been a quiet underperformer historically, but both of their top 25 appearances have come recently, including the single strongest conference-season performance in the entire dataset this year — suggesting the SEC may be done waiting its turn. They hold 2 of the top 3 seasons by a conference over the last decade within the last 2 years.

Question #5: Do the conferences with the highest SRS ratings actually earn the most NCAA tournament bids, or are there other factors such as strength of schedule that inflate their numbers?

Bring back the columns of NCAA (How many teams made the NCAA Tournament and FF (Final Four’s) for the top 5 conference seasons within the last decade

#Use the top 5 chart we created and filter those teams in to the all_time_ranking chart we created 

top5_lookup <- all_time_ranking %>%
  slice_head(n = 5) %>%
  select(Season, Conference)

conference_data %>%
  inner_join(top5_lookup, by = c("Season", "Conference")) %>%
  select(Season, Conference, SRS, NCAA, FF) %>%
  mutate(SRS = as.numeric(SRS)) %>%
  arrange(desc(SRS))
# A tibble: 5 × 5
  Season Conference                SRS  NCAA    FF
   <int> <chr>                   <dbl> <int> <int>
1   2025 Southeastern Conference  18.8    14     2
2   2017 Big 12 Conference        17.4     6     0
3   2026 Southeastern Conference  16.8    10     0
4   2026 Big 12 Conference        16.4     8     1
5   2025 Big Ten Conference       16.2     8     0

The SEC’s 2025 season stands alone as the most dominant conference performance of the decade, posting an SRS of 18.76 and sending 14 teams to the NCAA tournament — more than any other top-5 conference-season by a wide margin. Their 2026 follow-up was nearly as impressive, placing four more teams in the field than the Big 12 despite ranking third overall on this list, signaling that the SEC’s rise is not a one-year anomaly but a structural shift. The 2025 SEC season also delivered where it mattered most in March, producing both Auburn and the eventual national champion Florida Gators in the Final Four — but outside of that historic year, the other four top-5 conference-seasons combined for just a single Final Four appearance, Arizona in 2026, who were blown out by Michigan, raising real questions about whether conference-level SRS dominance actually translates to March success. The Big 12, meanwhile, remains the most consistently elite conference of the decade, but their declining NCAA bid counts in recent years suggest their strength is consolidating at the top rather than running deep — dominant on paper, but increasingly thin beneath the surface.

Overall Takeaways and Reasoning

The reason I chose this topic is simple — I am an avid college basketball fan who enjoys debating conference superiority with friends who root for rival schools, and SRS gave me a legitimate analytical framework to back those arguments up. What drew me to it specifically is that SRS puts every conference on a level playing field, using math rather than reputation to determine which conferences are actually having stronger years.

My biggest takeaway from the data is the noticeable decline in Cinderella stories between 2020 and 2023, and how that dry spell coincides almost exactly with the period where power conference SRS performances were at their weakest. Now that power conferences are posting their strongest numbers in the dataset, March has become increasingly predictable — and I don’t think that’s a coincidence.

For my final project, I plan to take this further by scraping public sentiment around NIL and the disappearance of Cinderella runs, exploring whether fans believe the power conferences have become too dominant and what that means for the future of the tournament. These initial questions gave me a strong foundation and a genuine curiosity about where the sport is heading. As for my own prediction — I expect the Big East to make a significant jump next season, driven by some of the strongest transfer portal classes they have assembled across the board, and the data suggests they have the infrastructure to back it up.