Final Project - NBA

Author

Heather White

Introduction

I am interested in NBA data because I like to follow my favorite college players. I also created ab NBA fantasy team this year so it is relevant to me. This dataset lists various stats relevant to professional basketball. I would like to have a better understanding of how players are different in various stages of their careers such as their percentages, points, age, and positions. I will later be scraping data from other NBA stats websites to compare. I will also be performing a sentiment analysis of the stadiums of the two best NBA players.

Data

I retrieved the data used for the first part of this project from Kaggle. This data provides all of the important statistics for NBA players such as a player’s age, minutes, and shooting percentages.

Data Dictionary:

  1. Player: Player’s name
  2. Pos: Position
  3. Age: Player’s age
  4. Tm: Team
  5. G: Games played
  6. GS: Games started
  7. MP: Minutes played per game
  8. FG: Field goals per game
  9. FGA: Field goal attempts per game
  10. FG%: Field goal percentage
  11. 3P: 3-point field goals per game
  12. 3PA: 3-point field goal attempts per game
  13. 3P%: 3-point field goal percentage
  14. 2P: 2-point field goals per game
  15. 2PA: 2-point field goal attempts per game
  16. 2P%: 2-point field goal percentage
  17. eFG%: Effective field goal percentage
  18. FT: Free throws per game
  19. FTA: Free throw attempts per game
  20. FT%: Free throw percentage
  21. ORB: Offensive rebounds per game
  22. DRB: Defensive rebounds per game
  23. TRB: Total rebounds per game
  24. AST: Assists per game
  25. STL: Steals per game
  26. BLK: Blocks per game
  27. TOV: Turnovers per game
  28. PF: Personal fouls per game
  29. PTS: Points per game

Summary Statistics

A preview of some important summary statistics:

# A tibble: 477 × 8
   Player                   Pos     Age `FG%`  `3P%` `2P%`  `FT%`   PTS
   <chr>                    <chr> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>
 1 Precious Achiuwa         C        24 0.432  0.2   0.5    1       7.5
 2 Bam Adebayo              C        26 0.525  0.5   0.525  0.803  23  
 3 Ochai Agbaji             SG       23 0.396  0.345 0.474  0.5     4.5
 4 Santi Aldama             PF       23 0.479  0.4   0.536  0.444  11.6
 5 Nickeil Alexander-Walker SG       25 0.396  0.333 0.5    0.333   5  
 6 Grayson Allen            SG       28 0.475  0.492 0.45   1      12.4
 7 Jarrett Allen            C        25 0.676 NA     0.676  0.781  11.8
 8 Kyle Anderson            PF       30 0.541  0.2   0.608  0.65    7.4
 9 Giannis Antetokounmpo    PF       29 0.582  0.25  0.63   0.639  29.5
10 Thanasis Antetokounmpo   PF       31 0.286 NA     0.286 NA       1  
# ℹ 467 more rows

Descriptive Analysis

NBA players all have different peaks in their careers. This data is only for the 2023-24 season, however, it is important to weigh these factors to see what makes the best players. I am most interested to see how stats vary by position. Using a bar graph we can analyze the different statistics by position.

Data Wrangling

I have assessed how many players are in each position to ensure an accurate analysis. Only one player is a shooting forward and power forward so this player will be ignored in this analysis.

# A tibble: 6 × 2
  Pos       n
  <chr> <int>
1 C        89
2 PF       93
3 PG       82
4 SF      104
5 SF-PF     1
6 SG      108

Free Throw Percentage by Position

I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that centers have the lowest free throw percentages while small forwards/power forward combos have the highest, but if you rule that out because there is only one, then point guards have the highest free throw percentage.

Warning: Removed 86 rows containing non-finite values (`stat_summary()`).

Field Goal Percentage by Position

I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the centers have the largest field goal percentages while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then small forwards have the lowest field goal percentage.

Warning: Removed 12 rows containing non-finite values (`stat_summary()`).

Three Point Percentage by Position

I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the shooting guards have the largest three point percentages while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then centers have the lowest three point percentage.

Warning: Removed 63 rows containing non-finite values (`stat_summary()`).

Two Point Percentage by Position

I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the point guards have the lowest two point percentages while small forwards/power forward combos have the highest, but if you rule them out because there is only one, then centers have the highest two point percentage.

Warning: Removed 26 rows containing non-finite values (`stat_summary()`).

Effective Field Goal Percentage by Position

I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. The effective field goal percentage factors in that some shots count for 3 points while others only count for 2. You can see that the centers have the largest effective field goal percentages while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then small forwards have the lowest effective field goal percentage.

Warning: Removed 12 rows containing non-finite values (`stat_summary()`).

Average Points by Position

I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the point guards have the largest amount of average points while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then small forwards have the lowest effective field goal percentage.

Analysis

This analysis of NBA statistics from the beginning of the 2023-24 NBA season has shown that the centers typically have the best shooting percentages in the league, however, there is great variation between positions in the different categories.

Data Scraping

I would like to understand what statistics are most important for NBA players to be successful such as percentages, amount of points, blocks, steals, etc. The question is: what stats are the most important for player’s position? I intend on answering the question by scraping the NBA’s statistics on nbastuffer.com. This data is suitable because it is updated often and is accurate. I can accomplish this by using the rankings compared to the stats to learn insights about what makes a player better than another in the ranks.

NBA stats categories: RANK, NAME, TEAM, POS, AGE, GP, MPG, USG%, TO%, FTA, FT%, 2PA, 2P%, 3PA, 3P%, eFG%, TS%, PPG, RPG, APG, SPG, BPG, TPG, P+R, P+A, P+R+A, VI, ORtg, and DRtg

I will be focusing on RANK, NAME, TEAM, POS, AGE, GP, MPG, FT%, 2P%, 3P%, eFG%, and PPG

Free Throw Percentage by Position NBA Stuffer

The position with the highest free throw percentage is combination forwards and centers. The position with the lowest free throw percentage is the combination guard and forward position.

Three Point Percentage by Position NBA Stuffer

The position with the highest three point percentage is the forward and guard combination position. The position with the lowest three point percentage is the centers.

Two Point Percentage by Position NBA Stuffer

The position with the highest two point percentage is the combination forward and center position and the lowest is the guards.

Effective Field Goal Percentage by Position NBA Stuffer

The position withe the highest effective field goal percentage is the forward and center combination position and the lowest is the forwards.

Average Points Per Game by Position NBA Stuffer

The position with the highest average points per game is the guards and the lowest is the combination guard and forward position.

Final results for the most important statistics:

Centers: Effective Field Goal Percentage, Points Per Game

Forwards: Points Per Game

Forward-Centers: Free Throw Percentage, Two Point Percentage, Effective Field Goal Percentage

Forwards-Guards: Three Point Percentage

Guards: Free Throw Percentage, Three Point Percentage, Points Per Game

Conclusions

Scraping NBA data has taught me that free throw percentage is one of the most important stats for a player to have.

Sentiment Analysis

I will be using data for the arenas of the two best players identified from NBA Stuffer. The two best players are Joel Embiid (Philadelphia) and Luca Doncic (Dallas). Philadelphia plays at the Wells Fargo Center and Dallas plays at the American Airlines Center. I will be using Trip Advisor to conduct this research and answer the questions.

Question #1

Are the customers of the Wells Fargo Center more satisfied or the American Airlines Center customers?

Data #1

I intend to collect review data from Trip Advisor for both stadiums to evaluate the sentiment to see what customers are saying about both arenas. This data will provide evidence of which arena is preferred by seeing which is more positive. I will be using the NRC Lexicon.

Visualization #1

By using this bar chart you can see that Wells Fargo Arena is perceived as more negative than American Airlines Arena because there are a lot more negative words associated with the reviews.

`summarise()` has grouped output by 'sentiment'. You can override using the
`.groups` argument.

Question #2

Are the reviews of the Wells Fargo Center or the American Airlines Center more positive?

Data #2

I intend to collect review data from Trip Advisor for both stadiums to evaluate the sentiment to see what customers are saying about both arenas. This data will provide evidence of which arena is preferred by seeing which is more positive. Using the Bing Lexicon.

Visualization #2

This visualization shows that only a few words that are used more than once have been used between the two arenas. American Airlines Arena only has two unique negative words, however, one is the word concession, which is most likely referring to a concession stand and does not have a negative connotation. Wells Fargo Arena has four unique negative words. American Airlines Arena has 11 positive words used total and 3 negative words that count. Wells Fargo Arena has 19 total positive words and 8 negative words. This means that despite having more negative words, the Wells Fargo Arena’s reviews are more positive.

Question #3

In what parts of the reviews of the Wells Fargo Center or the American Airlines Center are the reviews more positive or negative and which one is more positive or negative?

Data #3

I intend to collect review data from Trip Advisor for both stadiums to evaluate the sentiment to see what customers are saying about both arenas. This data will provide evidence of which arena is preferred by seeing which is more positive. Using the Bing Lexicon and Chronological Comparison of Valence.

Visualization #3

The visualization shows that the reviews usually start out positive and quickly turn negative just to repeat the process over again. It appears that the Wells Fargo Center has more positive reviews overall.

Joining with `by = join_by(word)`
`summarise()` has grouped output by 'arena', 'index'. You can override using
the `.groups` argument.

Conclusions

The visualization shows that the reviews usually start out positive and quickly turn negative just to repeat the process over again. It appears that the Wells Fargo Center has more positive reviews overall.

Final Conclusions

The data presented shows that the guards are the most effective players on the court. The team with the best player also has more positive reviews for the arena that they play at. This could mean that the players with the most supportive fans perform better on the court.