The Secret To Making The Big Dance

kcin999

2023-04-30


Background

Every March, 64 teams get the opportunity to play in the largest college basketball tournament ever. Over the span of three weeks, these 64 teams meet on the basketball floor and play in order to survive and advance. In the end, only one team is crowned the National Champion. This last year it was Connecticut, beating San Diego State, 76-59. The year prior Kansas was victorious over the University of North Carolina, 72 - 69.

On top of the yearly competition, individuals from around the world compete in bracket pools, in which they attempt to make the ‘perfect’ bracket. But the odds of accomplishing this task are 1 in 9.2 quintillion if you flip a coin for each game and 1 in 120.2 billion if you know a little bit about basketball. Either way, these are astronomical odds and no one has ever correctly predicted all of the games.

For the remainder of this report, I explore some statistical differences within data collected surrounding the NCAA tournament. I want to know if there are any trends that help a team make a team make the NCAA tournament?

The Data

Before we step into the analysis, it is important to explore some of the statistical data that has been captured and will be used throughout this report.

This report is using data captured from Sports Reference and Twitter.

Sports Reference is a website whose mission is the “democratize data, so our users enjoy, understand, and share the sports they love”. It houses all kinds of statistical information, some of which is detailed below. Each link is for the 2023 stats, but data goes further.

  • Basic School Stats
    • These are basic statistics detailing a school’s performance throughout all games in their season.It includes information such as Total Points, Total Rebounds, etc.
  • Basic Opponent Stats
    • These are basic statistics detailing opponent’s performance against a given team across all games in their season.It includes information such as Total Points, Total Rebounds, etc.
  • Advanced School Stats
    • These are advanced statistics detailing the school’s performance throughout all games in their season.It includes information such as Pace, Effective Field Goal Percentage, etc.
  • Advanced Opponent Stats
    • These are advanced statistics detailing opponent’s performance against a given team’s across all games in their season.It includes information such as Pace, Effective Field Goal Percentage, etc.

For a complete data dictionary, click here.

Important Note: I have removed ALL data from 2020 as the NCAA tournament did not happen.

Lastly, let’s talk about a couple of quick summary statistics that I noticed throughout this data. Following is a table of these summary statistics from 2000 to 2023.
Statistic Value
Average Points Per Game 70.36828
Average Points Per Game Against 69.35283
Average Turnover Percentage 17.25193
Average Pace 68.14219

We can see that over the last 23 years, teams were scoring, on average, 70.4 points per game, while having 69.27 points scored against them. Further, teams were having 17.22% of their possessions end in turnovers. Finally, using average pace, we see that teams were having approximately 68.22 possessions in each game.

Analysis

The following analysis is not a complete list. These are factors that I was interested in exploring as it relates to the NCAA tournament. They are focused more on style of play and advanced statistics over base statistics such as total points and total rebounds. These statistics are informative, but when some teams play more than others, they lose some of their appeal. In addition, a lot of the ‘basic’ stats can be seen in the ‘eye-test’, and I wanted to challenge myself to look even further.

Margin of Victory

The first thing I wanted to explore was related to margin of victory. Are teams that are making the tournament dramatically outscoring their opponents or are they in tight, competitive games? Looking at the comparison, we see that teams who make the tournament are typically winning by a larger margin than those that do not. Teams that do not make the tournament are in tighter matches with the 75th percentile winning games by roughly 4 on average. However of the teams that do make the tournament, we do see that there are some outliers that have a negative margin of victory, but on average, teams who make the tournament are winning by over 9 points per game.

Simple Rating System

The Simple Rating System Measure is a way to compare various teams based on strength of schedule and point differentinal. While I was unable to find the calculation, I was able to find the following on Sports Reference as an exmaple on how it can be used:

If Team A’s rating is 3 bigger than Team B’s, this means that the system thinks Team A is 3 points better than Team B. (Sports Reference)

For college basketball, it only counts games against major opponents.

Compared to the distance between Margin of Victory and Simple Rating System, we see a much larger disparity within the Simple Rating System when comparing teams that made the tournament versus those who did not. Over half of the teams that made a tournament had a Simple Rating Score of 12 or more, while those did not had a negative simple rating score.

Again, we see some outliers as it relates to the teams that did not make the tournament. A few schools had a simple rating score of more than 20, and did make the tournament. That could be a factor to the competition that was seen that specific year.

Strength Of Schedule

Like I mentioned earlier, I was more interested in diving into the statistics that are beyond just the eye test, and for the remainder of this section, that is exactly what we will do.

Strength of Schedule is a measure of how hard a school’s schedule is. In other words, it is a measure of how difficult their opponent’s are when they play them. Looking at the distribution of Strength of Schedule, we are able to see that teams that typically make the tournament have a higher strength of schedule, but not always. We see a couple of schools that have a extremely low strength of schedule that made the tournament.

Upon further research, these outliers are Alcron State, Alabama State, and Alabama A&M. Interestingly enough, all of these teams are members of the Southwestern Atlantic Conference and made the tournament in 2002, 2001, and 2005 respectively. They each made the tournament by winning the conference. Since this is not a Power 5 or Mid Major Conference, my assumption is that they do not get the attention they deserve nor do they have the same level of competition to have more than just the conference winner make the tournament.

Turnover Percentage

A team’s turnover percentage describes how many times a team turns over the basketball in 100 plays. Typically, team’s want their own number to be quite low, meaning they are controlling the ball, while simultaneously forcing the opponent to have have a high turnover percentage.

I wanted to explore and see if there was a correlation between these two categories. Comparing both facets, there is not a significant difference between a school’s own turnover percentage compared to the opponent’s turnover percentage. Across all instances within the data, both the school turnover percentage and opponent turnover percentage are centered around 16-18%.

Offensive Rating

Nearing the end of the statistical analysis, I wanted to explore the Offense Rating. Offensive Rating is a estimate of points scored per 100 possessions by either a player or a team.
Unsurprisingly, schools that made the tournament typically had a higher overall Offensive Rating than those that did not make the tournament. This implies that they are simply scoring at a higher percentage.

However, looking at the Opponent Offensive Rating, both clusters are still relatively the same. There is not much of a difference between the Opponent Offensive Rating as there is for a school’s own offensive ratings. This tells me that a school that typically make the tournament wins by primarily focusing on scoring more and more points, rather than significantly limiting the opponents points.

Personal Fouls Per Game

Up to this point, we have explored both offensive and defensive styles of play. In my mind, there is one more aspect of the game that should be explored: how aggressive a team is. Personal fouls are either a measure of how sloppy a team plays the game or how aggressive they are being when playing. Based on this chart, we are able to see that teams that make the tournament typically play with less fouls per game. This could be due to the polished style of play, or that they have more skill to avoid fouls and make clean plays. There is still a large amount of overlap between the two, as all around 50% of all teams have a foul range that falls within 17-20 fouls per game.

The Reaction

On top of exploring statistical commonalities and differences between teams that make the NCAA tournament, I wanted to explore their respective fan reactions.

Originally, the plan was to use Twitter’s API to pull the data down, but as that is no longer possible due to the restrictions of the API, I had to pivot to another source: Reddit (More specifically, the subreddit: r/CollegeBasketball).

Due to my own familiarity, I decided to focus only on schools in the Big East Conference, and then add in Purdue and FDU since that was one of the largest upsets in recent history. Further, I am using data after March 1st, 2023 in order to just capture Selection Sunday and the rest of the tournament.

Lastly, in order to find the overall sentiment, I am using the NRC lexicon.

Total Volume

First, I wanted to see if there any meaningful difference over the total volume by season result. Unsurprisingly, teams that made the tournament had a higher overall total number of words in each sentiment category. They were actively playing over this time period and individuals were engaging with each other over their teams, thus leading to a higher number of words being used.

Overall Sentiment

While volume is interesting, I think it is also important to examine the sentiment of the fans using a relative word count. This would help us see the frequency of each sentiment category in proportion to the total volume. Overall, we can see that the trends are overall similar. Positivity is the highest, with trust and negative rounding out the top 3 overall sentiments. Interestingly however, teams that did make the tournament have a slightly higher positive and trusting sentiment towards their teams. This could be due to a number of reasons including:

  • teams that did not make the tournament could not disappoint or upset their teams by having a bad game within the tournament.
  • the end of a season. While a specific season may not have gone well, teams will often look to the future for what is to come.
  • the changes happening within the Big East at the end of the season. Georgetown, Providence, and St. John’s have all hired new coaches leading to an increased feeling of hope for the future.

Purdue vs FDU

Quickly, out of my own curiosity, I wanted to analyze the overall sentiment following the Purdue- Fairleigh Dickinson (FDU) historic upset. In all honestly, I was surprised by the similarities between these two teams. Like the overall sentiment analysis performed above, both teams had ‘positive’, ‘negative’ and ‘trust’ rounding the top overall sentiment. FDU did have a higher overall positive usage than Purdue of around 2%, while Purdue added it back in the negative with 2% increase over FDU.

The Results

Through this report, we have examined a few key pieces of what helps a team make the NCAA Men’s Basketball Tournament and participle in March Madness. We have seen how teams that make the tournament have, on average, have a:

  • 15 point better Simple Rating System than teams that do not make the tournament.
  • tougher strength of schedule, but not always.
  • more efficient offensive rating, outscoring their opponent, while not necessarily stepping up against their opponent’s offensive rating.
  • cleaner style of play as they foul at a lower rate per game.

Finally, using the r/CollegeBasketball subreddit, we explored fan’s sentiment as it relates to team’s in the Big East and Purdue vs FDU. We found that team’s make the tournament have a larger engagement on the platform, but there is not much significant difference within the relative word use percentage for any given sentiment category.

Overall, there are a lot of factors that play into team making the tournament. Some of these factors are tangible and can be found within the statistical analysis, but others are intangible, and are merely based on the ‘eye test’. Either way, it is important to look at the advanced statistics to understand, not only if your team will make the tournament, but also if they have a shot at winning the national championship.