Declan’s Final Project

NBA Analysis

The first statistic I really ever dealt with in my life were those of the NBA while reading ESPN articles in 7th grade. Now with years spent learning how to code and work with data, I hope to impress 13 year old me with an analysis via creation of descriptive NBA stats, and then take a look at the Rockets vs Bucks in terms of stadium reviews.

2022-23 Data

First, I have downloaded a data set from Bryan Weather Chung from Kaggle that has all the player data from the NBA 2022-2023 regular season. I have created it into a OneDrive link for ease of use.

library(tidyverse) #Goat package
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
NBA_22_23 <- (read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/lewarnd_xavier_edu/EbYzgWWHHmFMvcfCVEu375oBa6dWRIQJ1xrkv9N4-ZnVhQ?download=1"))
Rows: 679 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): Player, Pos, Tm
dbl (26): Age, G, GS, MP, FG, FGA, FG%, 3P, 3PA, 3P%, 2P, 2PA, 2P%, eFG%, FT...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The data dictionary is as follows:

Player: Name of the player.

Position: Position played by the player.

Age: Age of the player.

Tm: Team abbreviation or code.

G: Number of games played by the player.

GS: Number of games started by the player.

MP: Total minutes played by the player.

FG: Total field goals made by the player.

FGA: Total field goals attempted by the player.

FG%: Field goal percentage of the player (FG / FGA).

3P: Total three-point field goals made by the player.

3PA: Total three-point field goals attempted by the player.

3P%: Three-point field goal percentage of the player (3P / 3PA).

eFG%: Effective field goal percentage of the player ((FG + 0.5 * 3P) / FGA).

FT: Total free throws made by the player.

FTA: Total free throws attempted by the player.

FT%: Free throw percentage of the player (FT / FTA).

ORB: Total offensive rebounds grabbed by the player.

DRB: Total defensive rebounds grabbed by the player.

TRB: Total rebounds grabbed by the player.

AST: Total assists made by the player.

STL: Total steals made by the player.

BLK: Total blocks made by the player.

TOV: Total turnovers committed by the player.

PF: Total personal fouls committed by the player.

PTS: Total points scored by the player.

Player Points Per Game By Team

First, I want to examine what team’s average player PPG is, as well as the variance across the team and league. All teams have players of different talent levels, so seeing variance is helpful.

The Suns, headed by three offensive first max contract players, lead the league in player average thanks in no small part to them having the PPG leader on their team. The SAC Kings actually lead the league in offensive rating this season, so it is interesting to see their high variance and low average, especially compared a team like OKC that has a very low variance.

Assist to Turnover Ratio

This metric is used to determine what teams take care of the ball best in terms of passing. Generally the more assists a team has the more turnovers, so a high ratio shows accurate passing and a good offensive focus.

The Denver Nuggets went on to win the NBA finals largely due to their outstanding passing and ball movement from Nikola Jokic and Jamal Murray. Other top teams like Portland and Toronto had veteran point guards like Damian Lillard (POR) and Fred Van Vleet (TOR) to control their offense. Houston and Detroit are two of the teams in the league with younger point guards, which often leads to more turnovers and a lower ratio, seen in the chart below:

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`geom_smooth()` using formula = 'y ~ x'

Fun Fact: In the 2023 off season Houston signed Fred Van Vleet to be their new starting point guard. It also is worth noting that the spike at age 34 is due to Chris Paul, often referred to as “Point God” due to his play-making abilities

Age: Both by Team and Scoring Impact

Seeing how age impacts ball handling has me interested in seeing the trend for scoring, so here is an examination of PPG by age:

`geom_smooth()` using formula = 'y ~ x'

Lebron James does skew these numbers a bit, so taking the timeless king as an outlier it seems that players start off low, reach their peak in their late 20’s, hold until their mid 30’s, then have a steep decline. Here is the data presented in a box plot format:

33 is a year of great variance, some players are able to keep their careers afloat while it is a low point for many others, leaving the “prime” window of a players career to be genrously between 26 and 32. Here is a look at age by team:

Houston and Detroit appear here as the two youngest teams, which helps explain their scoring and ball safety woes.

Stadium Sentiment Analysis

Here I have examined review data on NBA stadiums from the website https://www.stadiumjourney.com/stadiums/stadium-journey-s-2023-ranking-of-the-nba-arenas. This data was scraped, but I will provide a copy of the CSV here, but be warned the link will stop working in 89 days just as a heads up. Additionally, sorry to Kings fans, but for some reason the data for Golden 1 Center just does not populate.

library(tidyverse)
info_df <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/lewarnd_xavier_edu/EULHOCOPatVIlFuIWeKg7G4BT2hkG8NZ0NuM0N5YGG9Cpw?download=1")
Rows: 1597 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): stadium_review, name

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

What is the overall NCR breakdown of NBA stadiums?

Just to get a lay of the land, lets see how stadiums across the league as a whole are reviewed. This is complied from all of the website mentioned above, and I used the NRC lexicon that looks for sentiments across 10 major sentiments and close to 14k words.


Attaching package: 'rvest'
The following object is masked from 'package:readr':

    guess_encoding
Joining with `by = join_by(word)`

This baseline shows that the most common sentiments expressed are positive, trust, anticipation, and joy. All of these make sense, but I wanted to provide this league wide average as a reference.

Houston vs Milwaukee Stadium Reviews

I disused the impact of age on team performance above, and wanted to look into it further here with an examination of how the reviews align between the youngest NBA team and the oldest. Below is a Sentiment comparison, a positivity/negativity compariosn, and a few condensed charts from above on the Rockets and the Bucks:

`summarise()` has grouped output by 'name'. You can override using the
`.groups` argument.

Mixed results from the sentiment analysis. While Milwaukee has the higher positive rating, they also rank lower than Houston in positivity, trust, and anticipation, albeit Houston also has nearly double the sadness. All of this comes even with Houston being a far younger team and a worse performing team than Milwaukee, leading for more analysis to be required to analyze if there is any relationship.