# A tibble: 477 × 8
Player Pos Age `FG%` `3P%` `2P%` `FT%` PTS
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Precious Achiuwa C 24 0.432 0.2 0.5 1 7.5
2 Bam Adebayo C 26 0.525 0.5 0.525 0.803 23
3 Ochai Agbaji SG 23 0.396 0.345 0.474 0.5 4.5
4 Santi Aldama PF 23 0.479 0.4 0.536 0.444 11.6
5 Nickeil Alexander-Walker SG 25 0.396 0.333 0.5 0.333 5
6 Grayson Allen SG 28 0.475 0.492 0.45 1 12.4
7 Jarrett Allen C 25 0.676 NA 0.676 0.781 11.8
8 Kyle Anderson PF 30 0.541 0.2 0.608 0.65 7.4
9 Giannis Antetokounmpo PF 29 0.582 0.25 0.63 0.639 29.5
10 Thanasis Antetokounmpo PF 31 0.286 NA 0.286 NA 1
# ℹ 467 more rows
Final Project - NBA
Introduction
I am interested in NBA data because I like to follow my favorite college players. I also created ab NBA fantasy team this year so it is relevant to me. This dataset lists various stats relevant to professional basketball. I would like to have a better understanding of how players are different in various stages of their careers such as their percentages, points, age, and positions. I will later be scraping data from other NBA stats websites to compare. I will also be performing a sentiment analysis of the stadiums of the two best NBA players.
Data
I retrieved the data used for the first part of this project from Kaggle. This data provides all of the important statistics for NBA players such as a player’s age, minutes, and shooting percentages.
Data Dictionary:
- Player: Player’s name
- Pos: Position
- Age: Player’s age
- Tm: Team
- G: Games played
- GS: Games started
- MP: Minutes played per game
- FG: Field goals per game
- FGA: Field goal attempts per game
- FG%: Field goal percentage
- 3P: 3-point field goals per game
- 3PA: 3-point field goal attempts per game
- 3P%: 3-point field goal percentage
- 2P: 2-point field goals per game
- 2PA: 2-point field goal attempts per game
- 2P%: 2-point field goal percentage
- eFG%: Effective field goal percentage
- FT: Free throws per game
- FTA: Free throw attempts per game
- FT%: Free throw percentage
- ORB: Offensive rebounds per game
- DRB: Defensive rebounds per game
- TRB: Total rebounds per game
- AST: Assists per game
- STL: Steals per game
- BLK: Blocks per game
- TOV: Turnovers per game
- PF: Personal fouls per game
- PTS: Points per game
Summary Statistics
A preview of some important summary statistics:
Descriptive Analysis
NBA players all have different peaks in their careers. This data is only for the 2023-24 season, however, it is important to weigh these factors to see what makes the best players. I am most interested to see how stats vary by position. Using a bar graph we can analyze the different statistics by position.
Data Wrangling
I have assessed how many players are in each position to ensure an accurate analysis. Only one player is a shooting forward and power forward so this player will be ignored in this analysis.
# A tibble: 6 × 2
Pos n
<chr> <int>
1 C 89
2 PF 93
3 PG 82
4 SF 104
5 SF-PF 1
6 SG 108
Free Throw Percentage by Position
I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that centers have the lowest free throw percentages while small forwards/power forward combos have the highest, but if you rule that out because there is only one, then point guards have the highest free throw percentage.
Warning: Removed 86 rows containing non-finite values (`stat_summary()`).
Field Goal Percentage by Position
I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the centers have the largest field goal percentages while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then small forwards have the lowest field goal percentage.
Warning: Removed 12 rows containing non-finite values (`stat_summary()`).
Three Point Percentage by Position
I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the shooting guards have the largest three point percentages while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then centers have the lowest three point percentage.
Warning: Removed 63 rows containing non-finite values (`stat_summary()`).
Two Point Percentage by Position
I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the point guards have the lowest two point percentages while small forwards/power forward combos have the highest, but if you rule them out because there is only one, then centers have the highest two point percentage.
Warning: Removed 26 rows containing non-finite values (`stat_summary()`).
Effective Field Goal Percentage by Position
I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. The effective field goal percentage factors in that some shots count for 3 points while others only count for 2. You can see that the centers have the largest effective field goal percentages while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then small forwards have the lowest effective field goal percentage.
Warning: Removed 12 rows containing non-finite values (`stat_summary()`).
Average Points by Position
I have used a bar graph to display this data because it does a good job of displaying the summarization of statistics. You can see that the point guards have the largest amount of average points while small forwards/power forward combos have the lowest, but if you rule them out because there is only one, then small forwards have the lowest effective field goal percentage.
Analysis
This analysis of NBA statistics from the beginning of the 2023-24 NBA season has shown that the centers typically have the best shooting percentages in the league, however, there is great variation between positions in the different categories.
Data Scraping
I would like to understand what statistics are most important for NBA players to be successful such as percentages, amount of points, blocks, steals, etc. The question is: what stats are the most important for player’s position? I intend on answering the question by scraping the NBA’s statistics on nbastuffer.com. This data is suitable because it is updated often and is accurate. I can accomplish this by using the rankings compared to the stats to learn insights about what makes a player better than another in the ranks.
NBA stats categories: RANK, NAME, TEAM, POS, AGE, GP, MPG, USG%, TO%, FTA, FT%, 2PA, 2P%, 3PA, 3P%, eFG%, TS%, PPG, RPG, APG, SPG, BPG, TPG, P+R, P+A, P+R+A, VI, ORtg, and DRtg
I will be focusing on RANK, NAME, TEAM, POS, AGE, GP, MPG, FT%, 2P%, 3P%, eFG%, and PPG
Free Throw Percentage by Position NBA Stuffer
The position with the highest free throw percentage is combination forwards and centers. The position with the lowest free throw percentage is the combination guard and forward position.
Three Point Percentage by Position NBA Stuffer
The position with the highest three point percentage is the forward and guard combination position. The position with the lowest three point percentage is the centers.
Two Point Percentage by Position NBA Stuffer
The position with the highest two point percentage is the combination forward and center position and the lowest is the guards.
Effective Field Goal Percentage by Position NBA Stuffer
The position withe the highest effective field goal percentage is the forward and center combination position and the lowest is the forwards.
Average Points Per Game by Position NBA Stuffer
The position with the highest average points per game is the guards and the lowest is the combination guard and forward position.
Final results for the most important statistics:
Centers: Effective Field Goal Percentage, Points Per Game
Forwards: Points Per Game
Forward-Centers: Free Throw Percentage, Two Point Percentage, Effective Field Goal Percentage
Forwards-Guards: Three Point Percentage
Guards: Free Throw Percentage, Three Point Percentage, Points Per Game
Conclusions
Scraping NBA data has taught me that free throw percentage is one of the most important stats for a player to have.
Sentiment Analysis
I will be using data for the arenas of the two best players identified from NBA Stuffer. The two best players are Joel Embiid (Philadelphia) and Luca Doncic (Dallas). Philadelphia plays at the Wells Fargo Center and Dallas plays at the American Airlines Center. I will be using Trip Advisor to conduct this research and answer the questions.
Question #1
Are the customers of the Wells Fargo Center more satisfied or the American Airlines Center customers?
Data #1
I intend to collect review data from Trip Advisor for both stadiums to evaluate the sentiment to see what customers are saying about both arenas. This data will provide evidence of which arena is preferred by seeing which is more positive. I will be using the NRC Lexicon.
Visualization #1
By using this bar chart you can see that Wells Fargo Arena is perceived as more negative than American Airlines Arena because there are a lot more negative words associated with the reviews.
`summarise()` has grouped output by 'sentiment'. You can override using the
`.groups` argument.
Question #2
Are the reviews of the Wells Fargo Center or the American Airlines Center more positive?
Data #2
I intend to collect review data from Trip Advisor for both stadiums to evaluate the sentiment to see what customers are saying about both arenas. This data will provide evidence of which arena is preferred by seeing which is more positive. Using the Bing Lexicon.
Visualization #2
This visualization shows that only a few words that are used more than once have been used between the two arenas. American Airlines Arena only has two unique negative words, however, one is the word concession, which is most likely referring to a concession stand and does not have a negative connotation. Wells Fargo Arena has four unique negative words. American Airlines Arena has 11 positive words used total and 3 negative words that count. Wells Fargo Arena has 19 total positive words and 8 negative words. This means that despite having more negative words, the Wells Fargo Arena’s reviews are more positive.
Question #3
In what parts of the reviews of the Wells Fargo Center or the American Airlines Center are the reviews more positive or negative and which one is more positive or negative?
Data #3
I intend to collect review data from Trip Advisor for both stadiums to evaluate the sentiment to see what customers are saying about both arenas. This data will provide evidence of which arena is preferred by seeing which is more positive. Using the Bing Lexicon and Chronological Comparison of Valence.
Visualization #3
The visualization shows that the reviews usually start out positive and quickly turn negative just to repeat the process over again. It appears that the Wells Fargo Center has more positive reviews overall.
Joining with `by = join_by(word)`
`summarise()` has grouped output by 'arena', 'index'. You can override using
the `.groups` argument.
Conclusions
The visualization shows that the reviews usually start out positive and quickly turn negative just to repeat the process over again. It appears that the Wells Fargo Center has more positive reviews overall.
Final Conclusions
The data presented shows that the guards are the most effective players on the court. The team with the best player also has more positive reviews for the arena that they play at. This could mean that the players with the most supportive fans perform better on the court.