Sports are an activity that many people love and grow up playing, including myself. However, there is one sport that holds a special place in my heart and that is basketball. I grew up watching and playing basketball and have always wondered what leads to players scoring more points and does that win a championship? With that being said, I wanted to take a deeper dive into what is important when it comes to scoring by asking a few questions:
The following document was created to better understand NBA statistics and results and use statistical analysis methods to predict what impacts the number of points scored.
The data used for this analysis was provided by a user on Kaggle but also comes from Basketball References website. This data set provides various statistics that players had throughout the 2017-2018 NBA season.
You can view the data set from either website:
https://www.kaggle.com/datasets/mcamli/nba17-18
https://www.basketball-reference.com/leagues/NBA_2018_totals.html
The data set contains 30 variables and consists of 664 rows. Each row represents a specific player that played in the NBA in the 2017-2018 season. Although there are 664 rows, there were not 664 players. Players who were traded during the season and played on multiple teams, have a distinct row for each team they played on and the statistics they had while on those teams. These players who played on multiple teams also have a row under team TOT, which lists their total statistics for the season. Also note that a player’s rank is based off of last name (A-Z) and acts as a unique key for each player.
You can view the full data set, a data dictionary to help explain each variable, and a few summary statistics from the data in the tables below.
Listed below are each of the variables in the data set, as well as definitions of each variable.
| Variable | Description |
|---|---|
| Rk | Rank of the player |
| Player | Name of the player |
| Pos | Position that a player plays |
| Age | Age of the player |
| Tm | Team that the player is on |
| G | Number of games played in |
| GS | Number of games started |
| MP | Number of minutes played |
| FG | Number of field goals made |
| FGA | Number of field goals attempted |
| FG% | Field goal shooting percentage |
| 3P | Number of 3 point shots made |
| 3PA | Number of 3 point shots attempted |
| 3P% | 3 point shooting percentage |
| 2P | Number of 2 point shots made |
| 2PA | Number of 2 point shots attempted |
| 2P% | 2 point shooting percentage |
| eFG% | effective field goal percentage |
| FT | Number of free throws made |
| FTA | Number of free throws attempted |
| FT% | Free throw shooting percentage |
| ORB | Number of offensive rebounds |
| DRB | Number of defensive rebounds |
| TRB | Total number rebounds |
| AST | Number of assists |
| STL | Number of steals |
| BLK | Number of blocks |
| TOV | Number of turnovers |
| PF | Number of personal fouls |
| PTS | Number of points scored |
Below is a summary table of a few key statistics to help understand NBA player statistics data at a high level.
The first question that I wanted to look into was whether or not there was a certain position that scored the most points. The reason why I wanted to look into this is because if teams are looking to score more, is there a position they should target the most in order to gain more production?
Below is a bar chart, showing the number of points scored throughout the season for each position. As you can see, there is one position that contributions more points than the others and that is a shooting guard (SG). Shooting guards contributed to around 65,000 points, with the next highest position, point guard (PG), only contributing around 55,000 points. Although, the point guard is the next highest position, it is not by a large amount with power forwards (PF), and centers (C) being over the 50,000 point mark. The small forward is the position that contributed the least amount of points with just under 40,000 and is the position that sticks out the most when looking at the graph.
Some might ask then if this is due to the number of players by position and having more players be the reason they are able to score more points. To answer this question, I created a table that has the number of players per position, total number of points scored, and the average number of points per position. The table shows that SG averaged 503 points, PF averaged 490 points, C averaged 486 points, PG averaged 481 points, and SF averaged 437 points. The only position that did not match to the number of players that they had was PG.
From this graph and analysis, if you are trying to score more points you should look to pick up a SG because they score, on average, the most points. On the contrary, teams should avoid picking up a SF because they, on average, score the least amount of points. Is position the only factor that goes into the number of points scored? I will try to address this question later in the document in the predictive analysis section.
*** Excludes players with two other teams (TOT) as their team
Another matter that I was curious about was if field goal attempts and field goal percentage were positively related or not. Teams could take a look at this and see if certain players need to take more shots in order to score more points or if they need players to take less shots and pass the ball.
Below are two graphs that are broken down by team to help visualize the amount of field goal attempts each player takes along with the field goal percentage. Each team can take their own graph and have it be more in depth by having each point be labeled with the player that it represents so they know the relationship between attempts and percentage.
Most of the teams had at least one player who only attempted five shots or fewer, which can lead to a very high or very low field goal percentage. Some teams with the highest field goal attempts are the Cavaliers (CLE), Pelicans (NOP), Rockets (HOU), Thunder (OKC), Trail Blazers (POR), and Wizards (WAS). Next, I took a look at field goal percentages. In order to gain a better understanding, I filtered players who took 100 shots or more. With these players filtered out, the teams with players that had high field goal percentages (.6 or higher) were DAL, DEN, GSW, HOU, MEM, PHO, UTA, LAC, LAL, OKC, and TOR.
*** Excludes players with two other teams (TOT) as their team
Next, I wanted to take a look at the number of points each team scored over the course of the season and see how their scoring broke down by position. I wanted to look into this because I wanted to see if teams relied more on one position than another and see where that left them in terms of the total number of points that they scored.
The bar chart below helps display and shed light onto what teams scored the most points and the position that scored the most and least amount of points for them. The Golden State Warriors is the team with the most points scored and is the team who won the NBA Finals. Within their team, their PFs scored the most points for them with just over 2,500 points. This does not come as a surprise to me because their PFs consist of Kevin Durant and Draymond Green, who are two of their top five leading scorers on the team. The other positions on the team were pretty close to one another with only SF having a significant amount of less points. I believe that this is due to the offensive that they rely on and shooting 3s and not necessarily trying to drive the ball to the basket.
How does this compare to the team who scored the fewest amount of points? The Sacramento Kings scored the fewest amount of points with the Memphis Grizzlies less than 40 points away from them. These two teams were some of the worst teams in the NBA. In order to win games, you have to be able to score points. These two teams have different scores on who scored for them. The Kings seemed to not get as much production from the PF position as other teams did who made the playoffs. Whereas, the Grizzlies did not get the production from their SG. In fact, the Grizzlies had one of the worst SG crews in the NBA. Based off of other analysis (see secondary data)and this graph, it goes to show that the more points you score, the more wins you will get.
*** Excludes players with two other teams (TOT) as their team
One of the next things that I want to take a look at was the number of rebounds each team got over the year. This seemed important to look into because the more rebounds you get, the more possessions you have, the more possessions you have, the more chances you have to score.
Below is a bar graph that represents the total number of rebounds each team had and the number of offensive rebounds each player had on that team. I have also included a table that breaks down the number of offensive, defensive, and total rebounds that goes with the graph. The Philadelphia 76ers had the highest total number of rebounds with 3,889 but when it came to total number of points they were sixth in the NBA.
The Warriors scored the most points in the NBA this season and sixteenth most rebounds and were twentieth in offensive rebounds. This puts them nearly last in offensive rebounds in the entire NBA. This goes to show that it does not matter how many offensive rebounds that you have. The lack of offensive rebounds could be because of a small lineup or because they were able to shoot the ball very well and had few opportunities to get offensive rebounds. Either way, it seems that it does not play a factor into whether or not a team is able to score the most points in the NBA.
*** Excludes players with two other teams (TOT) as their team
One of my final questions that I looked into was to see if age had some impact on the number of points that are scored. I mainly wanted to if there was one age group that helps contribute to the total scoring.
There are two different bar graphs and a table that I created. The first graph is the total number of players for each age and the second is the total number of points for those ages. The table helps show the average age for each team.
When it comes to the age of players in the NBA, we can see that there are more younger players than older. This does not come as a surprise as these younger athletes are in their prime and as you get older you get injured more or decide that it is time to retire.
Looking at the table of the average ages of the teams, the Golden State Warriors and the Houston Rockets are the fourth oldest teams in the NBA, at 28 years of age. The Cleveland Cavaliers are the oldest team with an average age of 29 years of age. This helps prove that if a team is older, it does not mean that that team will score less points. These three teams were the top three teams when it came to the total number of points scored. This could be because they have a good mix of younger and older players, leading to the older players having the experience of the game and passing that onto the younger players.
How do all of these compare to the Golden State Warriors who won the NBA Finals?
Below is an explanation of all of the variables that were used in the data.
| Variable | Description |
|---|---|
| game_id | Unique ID for each game |
| date_game | Date of the game |
| game_start_time | Time the game started |
| visitor_team_name | Name of the visiting team |
| visitor_pts | Number of points for the visiting team |
| home_team_name | Name of the home team |
| home_pts | Number of points for the home team |
| box_score_text | Text from the box score |
| overtimes | Number if overtimes the game went into |
| attendance | Number of people in attendance at the game |
| game_remarks | Game remarks that were made |
With the data that I scraped, I decide to add and remove some of the variables. I removed the variables box_score_text and game_remarks. My reasoning for doing this is because they are both blank columns and would provide no insight into the data. Essentially they were just taking up space. However, I also added some variables that I thought may be useful. In order to distinguish whether a game was a playoff or regular season game, I created another variable to produce whether or not it was based on the the playoff start date of April 14th, 2018. I also created two columns that display the winner and loser of each game in order to easily identify the outcome of the game.
The next few tables will break down the number of wins and losses that each team was able to achieve over the course of the 2017-2018 season. These tables help relate back to the graph that was created that breaks down the number of wins of each team achieved and the total number of points that they had.
Because I was looking into the Golden State Warriors, I wanted to see how they did in terms of wins and losses compared to those teams in their conference along with their division. The tables show each team along with their conference, division, wins, losses, and win percentage. The graph is then broken down by only the number of wins and their respective conferences.
When looking at the Western Conference we see that there are only two true contenders in this conference, Houston Rockets (76 wins) and Golden State Warriors (74 wins). The next highest number of wins after these two is 53 wins, so we have roughly a 20 game gap between second and third. This is quite substantial when we compare it to the Eastern Conference. Only having these two teams as contenders in the West, probably gave the Warriors an easier road to the finals then some other teams in the Eastern Conference. Because if we look at the Eastern Conference, it is much more packed at the top then the Western Conference. There are roughly four or five teams that you could consider to be contenders as opposed to the two that the West had. If we took the 20 game gap that the West has, been second and third, and applied it to the East, then that would account for firth through sixth place. This shows the competitive nature of the East versus the West.
Now how did the Warriors get so many wins? To answer this question I took a look into their division. The Warriors are in the Pacific division with the Los Angeles Clippers, Los Angeles Lakers, Sacramento Kings, and Phoenix Suns. For the 2017-2018 season none of these other teams were very good, with the highest win total being 42. This gives them a .512 win percentage, which means they only won just over half of their games. Being in this division, basically gave the Warriors some “easy” wins and allowed them to gain their 74 wins.
After looking at both the graph and the tables, it is evident that the more points a team is able to score the more games they will win. This is not the case only for the Golden State Warriors. There were other teams in the NBA that also scored a significant amount of points like the Houston Rockets (76 wins and first in the Western Conference) and the Toronto Raptors (63 wins and second in the Eastern Conference).
The next graph will help provide more insight into how many points each team was able to score. In the descriptive analysis, there is another graph that broke down the number of points each team scored during the season and also how many points were scored by each position.
With 74 wins in the season, did the Warriors get lucky or did they just dominate the competition? I looked into the total number of points that each team scored at home and on the road. The tables below show that the Warriors are at the top for each of these categories. At home the Warriors scored 5,902 points and on the road they scored 5,721. It makes sense that they scored more points at home, as they would be more comfortable playing there. What came as a surprise to me was the number of points on the road. I would have expected this number to be a little bit lower, with the constant travel and potential injuries that they had to deal with all season. Being in the top of these two categories helps explain why they were able to gather so many wins, but again was it because they were good or because they were playing teams that weren’t so good? Looking over their schedule, I believe that it is a mix of both. I believe that they were challenged at times with their competition, but like all major league sports, they also played teams who were terrible. When they played these better teams, they were still able to score a significant amount of points.
In the 2017-2018 season, the Golden State Warriors had one of the best shooting teams. Along with the tables before, if you go back and look at the descriptive analysis graphs of field goal percentages by team, all of their players, except for one, shot 37.5% or higher. The only other teams that were able to do this were the Toronto Raptors and the Denver Nuggets. All of these teams finished several games above .500. Even though the Nuggets were able to have good shooters they were not able to score as many points as other teams. This is the reason why they finished with only 46 wins. This goes to show that even though you have good shooters, you have to be able to put the ball in the basket to win games.
After examining both the original data and the secondary data, we can run a multivariate analysis for points using position, field goals attempted, free throws attempted, and age and we get the results below.
The equation for the regression is below: \[Points = \alpha_i + position_i + fieldgoalattempts_i + freethrowattempts_i + age_i\]
##
## Call:
## lm(formula = PTS ~ Pos + FGA + FTA + Age, data = player_stats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -179.235 -14.887 0.605 14.881 214.459
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -25.440275 10.697930 -2.378 0.017691 *
## PosPF -8.343130 5.163123 -1.616 0.106597
## PosPG -29.446243 4.957205 -5.940 4.63e-09 ***
## PosPG-SG -20.694841 39.961559 -0.518 0.604725
## PosSF -16.930531 5.261058 -3.218 0.001354 **
## PosSF-SG -34.284086 28.373983 -1.208 0.227371
## PosSG -15.609788 4.975289 -3.137 0.001781 **
## FGA 1.050763 0.009942 105.687 < 2e-16 ***
## FTA 0.798926 0.032327 24.714 < 2e-16 ***
## Age 1.285978 0.377185 3.409 0.000691 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39.8 on 654 degrees of freedom
## Multiple R-squared: 0.9922, Adjusted R-squared: 0.9921
## F-statistic: 9204 on 9 and 654 DF, p-value: < 2.2e-16
The regression analysis shows that field goal attempts, free throw attempts, and age were all statistically significant in predicting the number of points. Position was also significant but only for some of the positions. The positions where it was not significant were PF, PG-SG, and SF-SG. The one that comes as a surprise is PF because it was the position that averaged the second most points per player. Something to note here is that this is only for the 2017-2018 statistics for players and could change from season to season.