March Madness Data Analysis

Introduction

March Madness is often considered to be the most entertaining and unpredictable postseason in sports. Each year, fans spend hours preparing their bracket in hopes of being the one to perfectly pick all 67 games correctly. Additionally, the nation becomes captivated by unlikely schools that make deep and unexpected tournament runs.

As a lifelong college basketball fan, I thought it would be interesting to study data on March Madness and the various statistical factors that impact how teams fare in the tournament. The analysis seeks to understand how random and unpredictable the tournament truly is.

Data Used

The data used for this analysis comes from the website kenpom.com. Kenpom is a site created by Ken Pomeroy that provides ratings and statistics for every single college basketball team. He has ran these metrics for more than 20 seasons and the tool is well respected across the game. Coaches, analysts, and even the NCAA Tournament selection committee pay attention to Pomeroy’s ratings and value it as a useful rankings tool.

The specific data used in this analysis represents kenpom ratings for all 68 teams that qualified for March Madness over the course of the last 15 years. It includes information such as team records, offensive and defensive ratings, shooting percentages, and round advanced to in the tournament. The data set includes over 100 variables in total, but I decided to narrow my analysis to the variables included in the dictionary below.

Data Dictionary

TEAM: Name of the school who competed in the NCAA Tournament
YEAR: Denotes the year in which the team played in March Madness
CONF: The conference that the competing team is a member of
SEED: The seed from 1-16 given to each of the qualifying teams
ROUND: The round of the tournament where the competing team lost
K OFF: The kenpom rating for team offense
K DEF: The kenpom rating for team defense
GAMES: The number of games played by team in that season
W: Number of games won by each competing team during the season
L: Number of games lost by each competing team during the season
3PT%: Represents the percentage of three pointers made by team
FT%: Represents the percentage of free throws made by team
ELITE SOS RANK: Ranking of how difficult a competing team’s regular season schedule was compared to all other D1 teams

Here is a link to the data used for the analysis:

https://myxavier-my.sharepoint.com/:x:/g/personal/niederjohnm_xavier_edu/EX_lWQG2mCtPqzdS-eBt6lsBe3E7qoeLJE8RxD9OQtGi3g?download=1

Analysis

What metrics provide the best explanation as to why certain teams advance far in March Madness?

Each season, teams of various records and seeding advance far into the NCAA Tournament. Additionally, highly seeded and highly favored teams that are expected to go far will lose early unexpectedly. The following analysis seeks to understand which traditional statistics are the most impactful in terms of explaining game results.

How important is free throw shooting to advancing further in the NCAA Tournament?

College basketball games that are close down the stretch often come down to which team can make the most free throws. A team that is leading late in the game will often have to go the foul line to salt the game away. This question seeks to understand whether having a high free throw percentage as a team leads to more success in March Madness.

The data shows that in the first few rounds of the tournament, the results are pretty steady in terms of the impact free throws have on games. However, the later rounds show an increase in the mean free throw percentage, indicating that ultimately the teams that advance the farthest are the best at free throw shooting. Thus, in determining the teams that reach the Final Four and National Championship stages, it is important to look for top free throw shooting teams. However, early round match ups do not seem to be decided at the free throw line.

Does having a strong offense and defensive rating result in more NCAA Tournament success for participating teams?

Kenpom has metrics that rank teams based on their offense and defense individually. This metric is depicted as K offensive rating and K defensive rating. In order to get an understanding of the importance of strong offense and defense together, I created a variable that added the two totals together to create a total overall rating with which to judge teams by.

Similar to free throw percentage, there is not much variation overall in how having a higher rating effects winning more games early in the tournament. Additionally, there is a small spike in overall rating in later rounds, but not much of an overall increase across rounds. One explanation for this may be that only the best teams qualify for March Madness, so the teams that are there competing are all highly ranked overall. This also provides an explanation as to why there are often upsets and unpredictable results in the tournament - as teams gap between teams statistically is not very large.

Is it beneficial for teams to play a more difficult regular season schedule?

Each season, every college basketball teams play roughly 30 games or so in the regular season. This generally consists of 18-20 games played against teams within their conference and 10-12 games against teams outside of their conference. As such, every team plays a different schedule and some play more difficult ones than others. This can happen due to overall strength of a team’s conference as well as the caliber of teams that they chose to play in their non-conference. This questions studies whether programs who played a more difficult schedule were rewarded by advancing further in March Madness. In order to do so, we can use kenpom’s strength of schedule ranking variable.

There is a massive difference in overall success across the first few rounds of the tournament based on strength of schedule. Teams that advance to the round of 32 and Sweet 16 appear to play much more difficult schedules. Additionally, the overall strength of schedule average continues to decline slightly as the rounds move on, proving that teams who go on the longest March Madness runs are the ones who are challenged the most in the regular season. The large drop off after the first round is likely skewed by the fact that many higher seeded teams are smaller schools who won their conference tournaments, and many of the games played within their conferences are against low ranked strength of schedule opponents.

Which conferences have experienced the most March Madness success?

Which conferences have had the most teams to ultimately advance to the Final Four?

Reaching a Final Four is considered the big prize of March Madness and is where each team ultimately desires to go. This question filters to only the teams that advanced to either the Final Four or National Championship game. The comparison was made by conference to study which college basketball conference has the most success on the biggest stage.

Over the last 15 years, it appears as though the Big East and ACC are the two conferences with the most Final Four appearances with 11 each. It is not surprising to see that the traditional “Power 6” conferences that represent the largest and highest revenue basketball programs have the highest number of final four teams. It is interesting to note, however, that 14 total conferences have been represented at the Final Four in the last 15 years.

Among the power 6 conferences, which conference has gotten upset in the NCAA Tournament most often?

Part of what has made March Madness so captivating for fans are the wins from underdog schools. These victories are deemed ‘upsets’ and they occur when a higher seeded team defeats a lower seeded team. Considering the fact that teams from power conference are generally the lower seeded teams in every tournament, I thought it would be interesting to compare which conference has been upset in the first round the most often. In order to do so, I filtered the data to only include results in which a team seeded 9 or higher won their first round game over a Power 6 school.

Over the last 15 years, the conference that has been upset in the first round most often is the Big 12 conference. Big 12 teams have been upset 11 times over the last 15 years, followed by the Big Ten with 10 upsets. Overall, each of the six conferences have been upset at least six times. This depicts the parity that is seen in March Madness and how often the underdog team from a smaller conference finds a way to pull out a victory.

Sentiment Analysis

Upon studying the relationship of kenpom statistics to March Madness success, I also thought it would be interesting to perform a sentiment analysis for college basketball arenas. In college sports, home crowds can have a large impact on the results of games, as fans give players energy and motivation. I was interested in seeing if there was a correlation between sentiment of home arenas and if having a better arena atmosphere contributes to more success.

In order to perform this analysis, I collected Yelp reviews for each stadium in both the Big East and Pac-12 Conferences. This involved scraping data from Yelp using HTML element in order to collect over 100 reviews from each conference. These conferences were chosen due to the fact that previous analysis indentified the Big East as the power conference with the most Final Fours and the Pac-12 as the power conference with the least final fours. Thus, I was interested in comparing whether Big East home arenas are reviewed more positively and perhaps ultimately contribute to postseason success. The data collected includes information about the reviewer, the arena, and the overall review of their experience.

Do fans have a more positive experience attending games at Big East or Pac-12 Arenas?

In order to answer this question, I scraped each individual word out of every review for the respective conference each arena belonged to. With this information, each word was assigned a positive or negative score. The total positivity score then reflects the difference between the total number of positive words said about the venue in the review minus the total number of negative words mentioned in the review.

It is clear that Big East Conference stadiums have a much more positive sentiment from reviewers as opposed to Pac-12 arenas. Fans attending games at Big East venues appear to hold the conference’s arenas in high regard, with a positivity score of over 200. Pac-12 stadiums do not experience the same level of positive sentiment, but their venues still have a fairly high positivity score. This indicates that fans enjoy their experience at Big East games more than they do at Pac-12 games.

What emotions are fans experiencing at Big East and Pac-12 basketball games?

Answering this question is possible using the NRC emotive lexicon. This tool filters out the most commonly used emotive words across all reviews. With this information, it is possible to compare how often each of these sentiments are applied to each conference.

The most common emotions linked to reviewers of Pac-12 arenas include positivity, joy, and anticipation. This makes sense considering the venues hosts college basketball games, which are exciting events that fans look forward to attending and pay large sums of money to participate in. It is interesting to note, however, that Pac-12 arenas had a larger amount of reviews including words linked to fear, anger, and negativity than Big East arenas. This helps to support why Pac-12 arenas have a lower overall sentiment and locate the source of negativity surrounding the conference.

For Big East arenas, we see a very similar trend in which reviewers experience positive emotions overall with their basketball viewing experience. However, it is interesting to note that Big East arenas have a lower score in the anticipation emotion as opposed to Pac-12. This is an interesting emotion, as it raises questions as to why fans may not be looking forward to attending Big East games as much as Pac-12 fans are anticipating their basketball games.

Conclusion

The data from kenpom helps to depict how close statistically each of the teams qualifying for the NCAA Tournament really are. The at-large bids to the tournament represent the best of the best, and it is clear that any team can win on any night. Clearly, there are some factors that higher performance in leads to more tournament success, but there is also a lot of unpredictability when it comes to how college teams will perform. If I were to continue the analysis, I would probably look to compare more factors that influence March Madness success. This could include a comparison of whether offensive metrics or defensive metrics prove to be more influential on advancing late into the tournament.

In terms of fan sentiment, the little evidence that we reviewed did help to at least paint some of the picture that a better home atmosphere in the regular season prepares teams better for March Madness. Big East teams had very positive sentiment towards their stadiums overall, which may explain why the conference has been so well prepared to experience success in March Madness over the last 15 seasons. If I were to continue this sentiment analysis, I seek to study other home crowd aspects such as attendance average to make a more cohesive correlation between how well teams perform in the regular season versus the postseason.