I590-Intro to Stats in R

Purpose

In the world of NBA media today there is a lot of discourse around the three point shot. Should teams be taking as many threes as they are? Do they need to shoot more? Do three pointers ruin the “beauty” of basketball? Inside the NBA analyst Charles Barkley dumbed-down today’s NBA like this: “Just jack up 3’s all night. They go in, we win. We miss, they lose.” This comes from a clip referencing the 2021 Dallas Mavericks, the worst three point shooting team in the league at the time (just over 33%), coming out and shooting 50 threes in a bad loss to the Golden State Warriors. On the surface this seems crazy and that Barkley is justified in his distaste the league. However, it’s not quite that simple.

The purpose of the analysis done throughout this project is to layout 3 reasons why the NBA has seen such a dramatic increase in three point attempts (3PA). The 3 reasons I will discuss in further details below are that the math makes sense now, shooter development, and big men are shooting it more than ever. If you agree that teams should be shooting more or less you won’t find an argument to boost your point here. The purpose is purely lay out the reasons of why the following graph looks like it does.

## `geom_smooth()` using formula = 'y ~ x'

Data Description

The data used in this analysis season by season data for every player to have played in the NBA since 1980. For example a single observations represents one players stats for one NBA season. Some stats recorded include your typical box score statistics such as points, rebounds, steals, turnovers, etc. Also included in the data set are more advanced stats such as offensive/defensive win shares, offensive/defensive box plus-minus, and VORP.

The primary columns used throughout this analysis are, Decade, Year, Pos, 3PM, 3PA, 3P%, and height.

Reason 1 - The Math Makes Sense

In the realm of probabilities you always want to find your way into a situation with a positive expected value (EV). This means that throughout the course of whatever you are doing you are expected to finish better than you started. Putting this in terms of basketball to finish better means to score more points. So is there a positive EV situation that would result in us scoring more points? Yes there is and the answer may surprise you. It’s the 3 point shot. The idea of taking more 3 point shots has been dragged through mud by many members of the media, but analytics people have held strong in saying that taking more 3s makes sense over taking mid to long range 2s.

The argument against 3s: The style of play is less visually appealing, every games comes down to makes and misses rather than X’s, O’s, and defense.

The argument for 3s: For the last 30 seasons the EV of a 3P attempt has been greater than 1pt/attempt while the same number for a 2P attempt has been less than 1pt/attempt from 1980-2017. Another thing to note is that the expected value of a 3P attempt has been higher than that of a 2P attempt since 1992.

Test 1

Hypothesis: Since 1980 the average EV for a 3PA is higher than the average EV of a 2PA

H0 : mean EV 3PA = mean EV 2PA (or mean EV 3PA - mean EV 2PA = 0)

Ha: mean EV 3PA > mean EV 2PA (or mean EV 3PA - mean EV 2PA > 0)

The data used in the following T-test is the EV for 3PAs and 2PAs for each team from every year since 1980.

test_result <- t.test(three_ev,two_ev, paired = FALSE, alternative = "greater", conf.level = 0.95)
test_result

## 
##  Welch Two Sample t-test
## 
## data:  three_ev and two_ev
## t = 5.6892, df = 1474.5, p-value = 7.683e-09
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.01780246        Inf
## sample estimates:
## mean of x mean of y 
## 1.0022214 0.9771723

Test 1 Conclusion

Result: Strong evidence to reject H0

Conclusion: Based on the results of the t-test there is an extremely low probability of obtaining results as or more extreme than the ones we saw if there truly was no difference between the EV of 2PAs and 3PAs.

So while the style of play may not be enticing or “pretty” to watch, the math is simple. If you expected to earn more points over the long term by shooting a three vs a two why wouldn’t you.? One important thing to note here is that the data doesn’t go more granular than 3P vs 2P, for example it doesn’t state whether a the distance or difficulty of shot. Obviously shooting a wide open layout is much better than any 3PA, but I would expect the EV of this shot to be much better if I had the data to prove it. What this tells us is that taking threes is more effective in terms of scoring points than taking mid to long range twos, contested or not. The proof for this is in the above graph. Analytics has taken over basketball and teams are starting to adopt the in the paint or behind the 3 point line strategy. This has resulted in an increase in the EV of 2P shots as more shots are being taken closer to the basket as teams has realized the above phenomena (in bold).

Reason 2 - Shooter Development

It’s fairly obvious to see that doing more of any task over time is only going to benefit you if you are constantly improving. The same remains true for three point shooting in the NBA. Why take more if you aren’t getting better? Well due to many reasons including focusing on shooting more in practice, shaping offenses around shooters, and rule changes benefiting offense more than defense, NBA players simply put are becoming better shooters.

Test 2

Hypothesis: The proportion of good, above 33%, 3 point shooters in the 2010s is higher than the proportion of good 3 point shooters in the 1980s, ie players have gotten better at shooting the 3 pointer.

H0: Decade and plusEV are independent

Ha: Decade and plusEV are NOT independent

fishers_table <- nba %>%
  filter(`3PA` > 10, Decade %in% c(1980, 2010)) %>%
  group_by(Decade) %>%
  summarise(plusEV = sum(`3P%` >=0.33),
            negEV = sum(`3P%` <0.33))
fishers_table

## # A tibble: 2 × 3
##   Decade plusEV negEV
##    <dbl>  <int> <int>
## 1   1980    287   817
## 2   2010   2063  1374

fisher.test(select(fishers_table,plusEV,negEV))

## 
##  Fisher's Exact Test for Count Data
## 
## data:  select(fishers_table, plusEV, negEV)
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.2004941 0.2726828
## sample estimates:
## odds ratio 
##  0.2340234

Test 2 Conclusion

Result: reject H0

Conclusion The odds of being a plus EV 3 point shooter in the 2010s is 4.27 times higher than being a plus EV shooter in the 1980s. In other words, a player is much more likely to be a plus EV 3 point shooter if they played in the 2010s.

What caused this dramatic shift? A combination of players practicing long distance shooting more and coaches giving them the green light to shoot it. In the 80s when the 3 point line was introduced many players and coaches at the time saw it as a gimmick or a novelty. John MacLeod, former Suns coach, said”[w]e don’t need it, I say leave our game alone.” Another complaint came in from Red Auerbach, former Celtics president and coaching legend, said “I’m not going to set up plays for guys to bomb from 23 feet [or farther]. I think that’s very boring basketball.” Spoiler alert, these dudes were flat out wrong. Little did they know their profession would quickly turn into scouting prolific 3 point shooters and building an entire offense around shooting the long ball.

Another way to further illustrate this point that shooters have developed overtime and coaches continue to seek out these good shooters is to look at the proportion of good shooters on a team over the years.

ggplot() +
  geom_point(mapping = aes(x = Year, y = `%plusEV`), data = tm_prop_plusEV, alpha = 0.1) +
  geom_smooth(mapping = aes(x = Year, y = `%plusEV`), data = tm_prop_plusEV, se = F) +
  labs(title = "Proportion of Good 3 Point Shooters on Teams Since 1980",
       x = "Season",
       y = "Proportion of Good 3P Shooters") +
  theme_hc()

## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Here we can see the steady increase of good shooters throughout the league. Even though some teams through 2015 still had rosters made up of less than 25% good shooters, the general trend of the league has been to acquire more and more shooters (some teams even loaded up with 100%). A big impact on shooter development has come in the form of expanding the PF and C shooting capabilities out to the 3 point line, which leads me into my next point.

Reason 3 - 5 > 3 (a.k.a The Big Guys Shoot Now)

When someone used to say that they played PF or C you immediately thought of all time greats such as Shaquille O’Neal, Kareem Abdul-Jabaar, or Hakeem Olajuwon. All of these guys are some of the best basketball players to ever live and were extremely dominant in their respective eras. Each “post player” player listed was force-fed the ball in the post and would then go on to use one of their signature moves or their sheer size and strength to put the ball in the basket with extreme efficiency. Just how extreme you may ask?

## # A tibble: 3 × 5
##   Player              `avgEFG%` `PTS/Game` `2PA/Game` `3PA/Game`
##   <chr>                   <dbl>      <dbl>      <dbl>      <dbl>
## 1 Hakeem Olajuwon         0.507       21.8       16.9     0.100 
## 2 Kareem Abdul-Jabbar     0.566       20.6       14.9     0.0229
## 3 Shaquille O'Neal        0.587       23.7       16.1     0.0182

All 3 players held an effective field goal percentage over 50% while average over 20PPG. This was the exact blueprint you were looking for out of a PF/C before the 3 point revolution. Today this blueprint would look almost unrecognizable.

## # A tibble: 4 × 6
## # Groups:   Pos [2]
##   Pos   `Year >= 2010` `avgEFG%` `PTS/Game` `2PA/Game` `3PA/Game`
##   <chr> <lgl>              <dbl>      <dbl>      <dbl>      <dbl>
## 1 C     FALSE              0.500       11.2       8.53      0.145
## 2 C     TRUE               0.546       12.0       8.32      0.771
## 3 PF    FALSE              0.496       13.4      10.0       0.627
## 4 PF    TRUE               0.519       13.5       8.44      2.30

Here we are looking at the same stats for Power Forwards and Centers from before and after 2010 that have started at least 20 games in a season. We can see that while the players are still shooting efficiently there is a pretty significant jump in 3PA/Game. What does this tell us? The big guys have started shooting the ball! To hit further on this point we can look at a graph showing the proportion of PFs and Centers taking more than one 3PA per game.

## `geom_smooth()` using formula = 'y ~ x'

Test 3

Hypothesis: Year is a significant predictor of the proportion of PFs and Cs taking more than one 3PA per game

\[ H_0 : \beta_{Year} = 0 \]

\[ H_a: \beta_{Year} \neq 0 \]

pfc_prop <- nba %>% 
  mutate(`3PA/Game` = `3PA`/G, .after = `3PA`) %>%
  group_by(Year) %>%
  summarise(`prop>1` = mean(`3PA/Game` >= 1))
model <- lm(`prop>1` ~ Year, data = pfc_prop)
summary(model)

## 
## Call:
## lm(formula = `prop>1` ~ Year, data = pfc_prop)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.066431 -0.033086 -0.007631  0.014984  0.140981 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.339e+01  1.188e+00  -28.11   <2e-16 ***
## Year         1.687e-02  5.936e-04   28.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04831 on 41 degrees of freedom
## Multiple R-squared:  0.9517, Adjusted R-squared:  0.9505 
## F-statistic: 807.8 on 1 and 41 DF,  p-value: < 2.2e-16

Test 3 Conclusion

The main point of model summary above is that the parameter estimating slope is significant and positive. This tells us that we have strong evidence backing the statement that as the year increased the proportion of PFs and Centers that shoot more than one 3P shot a game is increasing. To be more specific for every one year increase we expect this proportion to increase by ~1.69%.

data <- nba %>% 
  filter(Pos %in% c("PF","C")) %>%
  mutate(`3PA/Game` = `3PA`/G, .after = `3PA`) %>%
  group_by(Decade) %>%
  summarise(true = sum(`3PA/Game` >= 1),
            count = n()) %>%
  select(true, count)

pairwise.prop.test(x = data$true, n = data$count, alternative = "greater", p.adjust.method = "bonferroni")

## 
##  Pairwise comparisons using Pairwise comparison of proportions 
## 
## data:  data$true out of data$count 
## 
##   1       2       3       4      
## 2 3.1e-11 -       -       -      
## 3 < 2e-16 1.3e-11 -       -      
## 4 < 2e-16 < 2e-16 < 2e-16 -      
## 5 < 2e-16 < 2e-16 < 2e-16 < 2e-16
## 
## P value adjustment method: bonferroni

In the above pairwise comparison test the numbers correspond to the following decades:

1 = 1980, 2 = 1990, 3 = 2000, 4 = 2010, and 5 = 2020

Here we can see that every pairwise test between decades is very significant after applying the Bonferroni correction method. These tests further confirm the trend that we saw in the above graph. Another way to lay out these results is that since the 1980 decade, each subsequent decade has seen an increase in the proportion of Power Forwards and Centers that are attempting more than one 3 point shot a game.

Test 4

Test 4 Assumptions

1) Independence - It is reasonable to make the assumption that for every player one season is independent of the other seasons they played in. Of course, their may be a few exceptions, such as injuries that crossed over from the previous season, but for the large majority of the observations they are independent.

2) Normality - As you can see in the histogram below the distribution of height in each Decade is approximately normal.

above33 %>%
  ggplot() +
  geom_histogram(mapping = aes(x=height), bins = 30) +
  facet_wrap(~Decade, scales = "free") +
  labs(title = "Distribution of Height by Decade")

3) Constant Variance (Homoscedasticity) - based on the following output we comfortable saying the the standard deviations and therefore variance are roughly constant across groups.

above33 %>%
  group_by(Decade) %>%
  summarise(std_dev = sd(height))

## # A tibble: 5 × 2
##   Decade std_dev
##    <dbl>   <dbl>
## 1   1980    2.59
## 2   1990    3.28
## 3   2000    3.31
## 4   2010    3.24
## 5   2020    3.13

ANOVA Hypothesis

H0: the mean height of players making more than 33% of their 3PA is equal for every decade

\[ H_0 : \mu_{1980} = \mu_{1990} = \mu_{2000} = \mu_{2010} = \mu_{2020} \]

Ha: there is at least one decade for which the average height of players making more than 33% of their 3PA is NOT equal

\[ H_a : \text{at least one i, j such that } \mu_{i} \neq \mu_{j} \]

m <- aov(height ~ Decade, data = above33)
summary(m)

##               Df Sum Sq Mean Sq F value   Pr(>F)    
## Decade         1    582   581.8   56.04 8.17e-14 ***
## Residuals   5782  60030    10.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA Conclusion

The result of this test tells us that the probability of getting data/results as or more extreme than the data we have. The test is done assuming the null hypothesis that all decade have equal heights is true. So now that we know that this claim is extremely unlikely to be true, we can dive deeper and look into which decades are different.

Pairwise Tests

pairwise.t.test(above33$height, above33$Decade, p.adjust.method = "bonferroni")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  above33$height and above33$Decade 
## 
##      1980    1990    2000    2010   
## 1990 1.00000 -       -       -      
## 2000 0.00211 1.3e-05 -       -      
## 2010 5.1e-06 8.0e-12 0.23024 -      
## 2020 0.00014 3.7e-07 1.00000 1.00000
## 
## P value adjustment method: bonferroni

Pairwise Conclusion

These pairwise t test are very revealing as they split the data into two groups

\[ \text{Group 1: 1980 and 1990 } \blacksquare \text{ Group 2: 2000, 2010, and 2020} \]

The way the decades have been split into groups is the third form of evidence that the taller player, the Power Forwards and Centers, started shooting more threes.

Why this is so important to the overall trend of increasing 3PA is the fact that a majority of the time in today’s game you have players at all 5 positions that not only are willing to shoot but also have the green light to shoot. So, I’ll refer back to the title of this section, it’s simple, 5 > 3. If you throw 5 players on the floor that can shoot the ball you are going to see more shots than if only 3 players on the floor can shoot it.

Conclusion

So, what does this all mean? As I said in the opening this is simply 3 reasons I believe that the number of 3 point shots taken in the NBA has increased year after year. To me, a fan of the game, they all seem intuitive as I have followed from both a pure fan perspective and from the analytics perspective. It is this second perspective where a lot of understanding of the 3P trend comes from. The lack of this “controversial” perspective in the basketball landscape today plays a major role in people like our friend Charles Barkley hopping on TV most nights continuing to complain about three pointers. All in all, what I hope to have accomplished was to shed some light on topics that are completely left out of conversations of a similar topic in the media.

I590-Intro to Stats in R - Final Project

Kael Ecord

2023-11-28