2025-11-02

Introduction to the Dataset

  • For this project, I will be using the “cbb.csv” dataset from Kaggle. The .csv file can be downloaded here: https://www.kaggle.com/datasets/andrewsundberg/college-basketball-dataset

  • The dataset above contains various (24 columns) statistics for NCAA Division 1 Men’s Basketball Teams from 2013 - 2024, excluding the 2020 Covid Season.

  • We will be using the dataset above to find various trends and investigate some old sayings in basketball. Without further adieu, Let’s get started.

Old Saying #1: “Defense Wins Championships”

  • If you have ever played organized basketball before, “Defense Wins Championships” is definitely a phrase you have heard from your coaches before.

  • Does the data back that up? Let’s find out with a boxplot of team’s Adjusted Defensive Efficiency (ADJDE) and Postseason Success.

ADJDE vs. Win Percentage Boxplot

ADJDE vs. Win Percentage Boxplot 5 Number summary:

Champions: min: 84.5, q1: 90.675, median: 91.3, q3: 93.075, max: 94.5

2ND place: min: 85.2, q1: 89.575, median: 93.5, q3: 94.45, max: 96.2

Final 4: min: 84, q1: 91, median: 93.95, q3: 96, max: 102

Elite 8: min: 85.7, q1: 90.55, median: 93, q3: 96.6, max: 103.3

Round of 16: min: 85.4, q1: 90.6, median: 93.8, q3: 96.5, max: 107.1

Round of 32: min: 84.1, q1: 92.75, median: 95.2, q3: 98.1, max: 114.7

Round of 64: min: 84.3, q1: 95.2, median: 98.2, q3: 102.1, max: 114.5

Teams that did not qualify: min: 88, q1: 101.5, median: 105.5, q3: 109.5, max: 120.7

ADJDE vs. Win Percentage Boxplot Analysis

  • As seen by the boxplot and the 5 number summary, teams that make it further in March Madness are generally better on defense, most notably the 2013 Louisville Cardinals led by coach Rick Pitino with an outlier ADJDE of 84.5.

  • With that being said, there are still some teams with good defense that ended up not making very far in the tournament, such as the 2015 Virginia Cavaliers with an ADJDE of 84.1 that lost in the round of 32.

  • Conclusion: the saying “Defense Wins Championships” is generally true!

Old Saying #2: “The Best Defense is Good Offense”

  • “The best defense is good offense” is definitely another phrase that you have heard before if you have played basketball before, but does good offense actually lead to good defense?

  • Let’s find out with a scatter plot of team’s Adjusted Offensive Efficiency (ADJOE) and Adjusted Defensive Efficiency (ADJDE).

ADJOE vs. ADJDE Line Plot

ADJOE vs. ADJDE Line Plot Summary

## 
## Call:
## lm(formula = ADJDE ~ ADJOE, data = cbb)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.6746  -3.8104  -0.1496   3.9382  16.6617 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 156.40152    1.92491   81.25   <2e-16 ***
## ADJOE        -0.51704    0.01805  -28.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.628 on 1625 degrees of freedom
## Multiple R-squared:  0.3355, Adjusted R-squared:  0.3351 
## F-statistic: 820.5 on 1 and 1625 DF,  p-value: < 2.2e-16

ADJOE vs. ADJDE Line Plot Analysis

  • Looking at the line plot above, you can somewhat see an inverse trend between ADJOE and ADJDE. Looking at the line plot summary wth a p-value of 2*10^-16 for ADJOE, indicates that the linear regression line of -0.51x+ 156.4 of the ADJOE vs. ADJDE is statistically significant and that ADJOE is a statistically significant predictor of ADJDE.

  • From the linear regression line of -0.51x+ 156.4 of the ADJOE vs. ADJDE, we know that as ADJOE increases, ADJDE decreases, confirming the old saying that “The Best Defense is Good Offense”!

Old saying #3: “Rebounding Wins Games”

  • To emphasize the importance of rebounding, coaches often reiterate to the players that “rebounding wins games” throughout the game.

  • Just how much does rebounding impact the chances of winning a game? Does offensive or defensive rebounds matter more? Let’s find out with a 3D plot of ORB (offensive rebound rate), DRB (offensive rebound rate allowed), and win percentage.

ORB and DRB’s Impact on a Team’s Win Percentage, a 3D Plot

R Code for the 3D Plot on ORB, DRB, and Win Percentage

plot_ly(data = cbb, 
        x = ~ORB, y = ~DRB, z = ~win_percentage, type = "scatter3d", 
        mode = "markers",  color = ~CONF, 
        text = ~paste("Team: ", TEAM, "</br>Year: ", YEAR), 
        hovertemplate = paste("<b>%{text}</b><br>","ORB: %{x}", 
                              "<br> DRB %{y}", 
                              "<br> Win Percentage: %{z}")) %>%
  layout(title = "ORB and DRB's Impact on a Team's Win Percentage", 
         scene = list(
    xaxis = list(title = "Offensive Rebounding Rate"), 
    yaxis = list(title = "Defensive Rebounding Rate"), 
    zaxis = list(title = "Win Percentage")), 
    width = 600, height = 400)

# Modified the hover template to show the team, the year, ORB, DRB, 
# and win percentage of each dot on the scatter plot 

Correlations between ORB, DRB, and Win Percentage

  • Model Coefficients:
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 66.456578 4.40659008  15.08118 3.370207e-48
## ORB          1.633779 0.08867843  18.42364 5.542303e-69
## DRB         -1.906622 0.12952409 -14.72021 3.906443e-46
## [1] "R squared value of the overall regession model is:  0.24"
  • Individual Correlations:

    • Correlation between ORB and win percentage: 0.3769986

    • Correlation between DRB and win percentage: -0.2914118

    • Correlation between ORB and DRB: 0.0680083

ORB, DRB, and Win Percentage 3D Plot Analysis

  • ORB and DRB are statistically significant indicators of win percentage given that the P-values of ORB and DRB from the model coefficients are 5.54*10^-69 and 3.91*10^-46 respectively.

  • The R squared value of 0.24 indicates that around 24% of the variability in win percentage can be explained by ORB and DRB.

  • The larger (in terms of magnitude) DRB coefficient of -1.9 compared to ORB coefficient of 1.6 indicates that DRB has more of an impact on win percentage compared to ORB.

  • Surprisingly ORB and DRB has almost no correlation with a correlation coefficient of 0.07

  • Conclusion: “Rebounding wins games” is true!

Fun Fact: Which Conference is the Best Conference?

  • There has always been a debate on which conference is the best conference in NCAA Mens Basketball. Let’s investigate using the data we have by finding out:
  1. The win percentage of each conference from 2013 - 2024, exlcuding 2020

  2. Which conference has the most championships from 2013 - 2024, exlcuding 2020

Grpah #1: Win Percentage of Each Conference From 2013 - 2024, Excluding 2020

Graph #2: Who Won the Most Championships From 2013 - 2024

Which Conference is the Best Conference - Analysis

  • Big 12 leads all conferences in win percentage with 66.8741355%

  • However, Big East had the most championships with 5, more than double of Big 12 who has only 2.

  • So which conference is the best conference? Do you think overall conference win percentage or the number of championships is more important? I will leave that up to you guys to decide.

Conclusion

Hope you guys enjoyed some analysis on NCAA Divison 1 Men’s Basketball over the last few seasons. Turns out, scoring baskets, playing good defense, and rebounding the ball well contributes to winning basketball (Wow! What a Surprise!). In case you missed it, the source for the .csv file is on the second slide, go take a look at it sometime and maybe you will find some interesting trends too!

Thank you for your time.

References: