Introduction and Explanation

The purpose of this document is to look at professional basketball stats from 2018-2019. However, I already know the players that were the best that year - Lebron, Giannis, Kevin Durant, etc. - I want to see which players did not play in a lot of games. There are 82 games in the regular season, so I’m going to look at players who played in fewer than 41 games and see who the most productive were. This could be valuable as a lot of times younger players from the G league or those who don’t get as much playing time can be very valuable, they’re up-and-coming.

Packages Installed

  library(tidyverse) 
  library(dplyr)
  library(rvest)
  library(knitr)
  library(kableExtra)
  library(XML)
  library(sqldf)

Data Import

bball <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/largentm_xavier_edu/EeNYcGKVcHZHpYHLMxHa0YsBuqR4t9b8AJmcx4GXygnlCg?download=1")

Data Cleaning

Removing 26 filler header rows (used for scrolling to know what stats you’re looking at) that aren’t needed for analysis

bballstats <-
bball %>% 
  filter(`Player` != "Player")

All variables were imported as character type, so now I have to change most to numeric to do analysis.

bballstats<-transform(bballstats, Age = as.numeric(Age))
bballstats<-transform(bballstats, G = as.numeric(G))
bballstats<-transform(bballstats, GS = as.numeric(GS))
bballstats<-transform(bballstats, MP = as.numeric(MP))
bballstats<-transform(bballstats, FG = as.numeric(FG))
bballstats<-transform(bballstats, FGA = as.numeric(FGA))
bballstats<-transform(bballstats, FG. = as.numeric(FG.))
bballstats<-transform(bballstats, X3P = as.numeric(X3P))
bballstats<-transform(bballstats, X3PA = as.numeric(X3PA))
bballstats<-transform(bballstats, X3P. = as.numeric(X3P.))
bballstats<-transform(bballstats, X2P = as.numeric(X2P))
bballstats<-transform(bballstats, X2PA = as.numeric(X2PA))
bballstats<-transform(bballstats, X2P. = as.numeric(X2P.))
bballstats<-transform(bballstats, eFG. = as.numeric(eFG.))
bballstats<-transform(bballstats, FT = as.numeric(FT))
bballstats<-transform(bballstats, FTA = as.numeric(FTA))
bballstats<-transform(bballstats, FT. = as.numeric(FT.))
bballstats<-transform(bballstats, ORB = as.numeric(ORB))
bballstats<-transform(bballstats, DRB = as.numeric(DRB))
bballstats<-transform(bballstats, TRB = as.numeric(TRB))
bballstats<-transform(bballstats, AST = as.numeric(AST))
bballstats<-transform(bballstats, STL = as.numeric(STL))
bballstats<-transform(bballstats, BLK = as.numeric(BLK))
bballstats<-transform(bballstats, TOV = as.numeric(TOV))
bballstats<-transform(bballstats, PF = as.numeric(PF))
bballstats<-transform(bballstats, PTS = as.numeric(PTS))

Creating new data frame with only players who appeared in fewer than 41 games.

newbball <-
bballstats %>% 
  filter(`G` < 42)

Analysis

Now, there are 327 players remaining to perform analysis

Question 1

How many of these players did not score and how many opportunities did they get?
Player PTS G
Ron Baker 0 4
Andre Ingram 0 4
Ike Anigbogu 0 3
Donte Grantham 0 3
Okaro White 0 3
Tyler Zeller 0 2
Tyler Davis 0 1
Jawun Evans 0 1
John Holland 0 1
George King 0 1
Zach Lofton 0 1
Eric Moreland 0 1
Kobi Simmons 0 1
Ray Spalding 0 1
Tyler Ulis 0 1

15 Players did not score in any of the games. Did any of them try to score?

  • 6 Players did not attempt a field goal. That’s a bad look for Andre Ingram, though.
    • The most games anyone got to play in was 4. There is a strong positive correlation between games played and field goals attempted for this small sample. To see if this is true on a greater scheme, you could compare the r-squared values between this sample of 15 and the total sample of 327 players that played less than 42 games, as shown on the bottom graph

  • This graph still has a positive correlation, but there are more players in the bottom left of the graph than previously and more variability in general.

Question 2

Who were the most productive players in their limited game appearances in terms of points and total rebounds?

  • There are four matches between the two results - Deandre Jordan, DeMarcus Cousins, Otto Porter, Nikola Mirotic, and Jabari Parker. Those are pretty recognizable names in the NBA, so it’s safe to assume their lack of games played is due to injury.

  • Another interesting insight is that the scale for total points decreased significantly compared to the top points scorers, while rebounds only decreased a little. It is clear that people who rebound more might not be scoring as much (perhaps due to their lack of ability to shoot three-pointers most of the time.)

Points

  • This result shows the top ten players, ranked by points
    Player G PTS
    Victor Oladipo 36 675
    Kelly Oubre Jr. 40 674
    John Wall 32 663
    Jabari Parker 39 556
    Caris LeVert 40 547
    Nikola Mirotić 32 534
    Otto Porter 41 518
    Goran Dragić 36 494
    Tobias Harris 27 492
    DeMarcus Cousins 30 488

Rebounds

  • This result is the top ten players, ranked by total rebounds (offensive + defensive)
Player G TRB
Nikola Mirotić 32 264
JaMychal Green 41 252
Kenneth Faried 37 250
DeMarcus Cousins 30 247
Bobby Portis 28 242
Jabari Parker 39 241
Kevin Love 22 239
Otto Porter 41 231
DeAndre Jordan 19 216
Jonas Valančiūnas 30 216

Question 3

How much of a factor does age play into playing time?

  • I am going to create a new data frame with only players who played more than 41 games. I’ll then compare the average ages of the top 10 players in these categories: total points, assists, total rebounds, and turnovers. 381 players are in this new data frame.
morebball <-
bballstats %>% 
  filter(`G` > 41) 

Overall, the average age for those who play 41+ games is 25.9 and for the reverse, it is 26.4

  • Interestingly, points is the only category where the players who play 42+ games are older than those who do not. Age is less of a factor for assists, total rebounds, and turnovers.

  • I think this makes sense. Players have to adjust to NBA style play in order to score on opponents, so younger players would more likely pass to an older scorer and are better at rebounding because their bodies are not worn down yet. Likewise, turnovers are pretty easy to make at a high level so younger players make them more often.

  • To do further analysis on this, you could do a two-sample t-test to determine if the difference between the average ages in the two categories is significant.

Points

42> Games
Avg. Age
26.2
42+ Games
Avg. Age
27.4

Assists

42> Games
Avg. Age
27
42+ Games
Avg. Age
25.5

Total Rebounds

42> Games
Avg. Age
26.9
42+ Games
Avg. Age
25.6

Turnovers

42> Games
Avg. Age
26.2
42+ Games
Avg. Age
24

Question 4

Are there certain teams that have more players playing fewer games than others?

  • First of all, the below chart shows each team’s player count included on this list. Players that played on multiple teams are not counted on this chart.
  • The second chart shows the total number of players on each team throughout the season.

  • Next, found that the average number of players that played less than 42 games on each team was 10.55
  • There are 14 teams that were above the average for this.
  • Cleveland being first on the list, besides the “total” variable, makes sense because they just are not that good without Lebron and are probably trying to find players that would be a good fit for the team and sending them back to the g league when it does not work out.
  • It would be interesting to combine this data set with a total team statistics dataset to look at total winning percentage. I would expect that teams with fewer total players during the season would be better because it would aid team chemistry.
    Tm pperteam
    TOT 28
    CLE 19
    MEM 19
    WAS 18
    PHI 16
    HOU 14
    MIL 13
    PHO 13
    TOR 13
    CHI 11
    LAC 11
    LAL 11
    MIN 11
    NYK 11