This notebook shows how Stephen Curry Impacts the offense of his team, the Golden State Warriors.

Problem Statement: Does Stephen Curry’s stats affect the Golden State Warriors’ Offensive points scored?

Introduction

Stephen Curry, 33, is a Professional American Basketball player in the NBA for the Golden State Warriors Organization. He is regarded as the most prolific shooter in NBA and history and one of the best scorers in the game. It has been said that Stephen Curry is the heart of the team and the offense runs through him. In his absence in games or poor performance, the team struggles to score points and the team either wins barely or loses the game.

Some Important Basketball terminologies used in the following notebook:

  1. PPG (Points per game) : Number of points scored per game.
  2. FG% (Field Goal Percentage): It is the calculation of the number of shots made divided by number of shots attempted.
  3. 3P% (3 point Field Goal Percentage): It is the same as FG% but for the time when you score from the 3 point line on the court.
  4. T.Score: The score by the team.

First, the required libraries are loaded and installed which are required for the analysis.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(corrplot)
## corrplot 0.92 loaded

After loading the libraries, the dataset is read using the read.csv() function and then placed into a dataframe for convenience. The dataset was obtained from Kaggle which contains all Stephen Curry’s stats from 2009 to 2021.

read.csv('/cloud/project/Stephr.csv')
steph <- read.csv('/cloud/project/Stephr.csv')
head(steph)

Summary of the data can be seen using the summary() function.

summary(steph)
##  Season_year         Season_div            Date               OPP           
##  Length:761         Length:761         Length:761         Length:761        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Result             T.Score         O.Score           MIN       
##  Length:761         Min.   : 72.0   Min.   : 63.0   Min.   : 2.00  
##  Class :character   1st Qu.:102.0   1st Qu.: 97.0   1st Qu.:30.00  
##  Mode  :character   Median :110.0   Median :105.0   Median :35.00  
##                     Mean   :110.6   Mean   :105.4   Mean   :34.37  
##                     3rd Qu.:120.0   3rd Qu.:114.0   3rd Qu.:38.00  
##                     Max.   :149.0   Max.   :141.0   Max.   :49.00  
##       FG                 FGM              FGA             FG.       
##  Length:761         Min.   : 0.000   Min.   : 0.00   Min.   :  0.0  
##  Class :character   1st Qu.: 6.000   1st Qu.:14.00   1st Qu.: 38.9  
##  Mode  :character   Median : 8.000   Median :18.00   Median : 47.4  
##                     Mean   : 8.331   Mean   :17.47   Mean   : 47.1  
##                     3rd Qu.:11.000   3rd Qu.:21.00   3rd Qu.: 55.6  
##                     Max.   :20.000   Max.   :36.00   Max.   :100.0  
##      X3PT               X3PTM            X3PTA             X3P.       
##  Length:761         Min.   : 0.000   Min.   : 0.000   Min.   :  0.00  
##  Class :character   1st Qu.: 2.000   1st Qu.: 6.000   1st Qu.: 30.00  
##  Mode  :character   Median : 3.000   Median : 9.000   Median : 42.90  
##                     Mean   : 3.721   Mean   : 8.594   Mean   : 42.02  
##                     3rd Qu.: 5.000   3rd Qu.:11.000   3rd Qu.: 54.50  
##                     Max.   :13.000   Max.   :22.000   Max.   :100.00  
##       FT                 FTM             FTA              FT.        
##  Length:761         Min.   : 0.00   Min.   : 0.000   Min.   :  0.00  
##  Class :character   1st Qu.: 2.00   1st Qu.: 2.000   1st Qu.: 75.00  
##  Mode  :character   Median : 3.00   Median : 4.000   Median :100.00  
##                     Mean   : 3.84   Mean   : 4.234   Mean   : 78.31  
##                     3rd Qu.: 6.00   3rd Qu.: 6.000   3rd Qu.:100.00  
##                     Max.   :18.00   Max.   :19.000   Max.   :100.00  
##       REB              AST              BLK              STL       
##  Min.   : 0.000   Min.   : 0.000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.: 3.000   1st Qu.: 5.000   1st Qu.:0.0000   1st Qu.:1.000  
##  Median : 4.000   Median : 6.000   Median :0.0000   Median :1.000  
##  Mean   : 4.603   Mean   : 6.549   Mean   :0.2155   Mean   :1.683  
##  3rd Qu.: 6.000   3rd Qu.: 8.000   3rd Qu.:0.0000   3rd Qu.:3.000  
##  Max.   :14.000   Max.   :16.000   Max.   :2.0000   Max.   :7.000  
##        PF              TO              PTS       
##  Min.   :0.000   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.:1.000   1st Qu.: 2.000   1st Qu.:17.00  
##  Median :2.000   Median : 3.000   Median :24.00  
##  Mean   :2.432   Mean   : 3.143   Mean   :24.22  
##  3rd Qu.:3.000   3rd Qu.: 4.000   3rd Qu.:31.00  
##  Max.   :6.000   Max.   :11.000   Max.   :62.00

For a Simple Linear Regression, only some columns are needed from the Dataframe for analysis. After selecting the columns from the dataframe, these columns placed in another dataframe.

steph[, c("T.Score", "FG.", "X3P.", "PTS")]
newsteph <- steph[, c("T.Score", "FG.", "X3P.", "PTS")]

The columns selected from the dataframe will be used for further regression of the problem statement. From the problem statement, it can be said that the Team Score i.e “T.Score” is the dependant variable and the remaining features like Points scored by Curry (PTS), his field goal percentage(FG.) and his Three point percentage (X3P.) are the independent variables. To see the how much effect they have on the dependant variable, correlation is calculated for better answers.

cor(newsteph$T.Score, newsteph$FG.)
## [1] 0.3534566
cor(newsteph$T.Score, newsteph$X3P.)
## [1] 0.2584599
cor(newsteph$T.Score, newsteph$PTS)
## [1] 0.401315

From the output it can be seen that, all the variables has a positive correlation with the independent variable. Out of the three, PTS has the higher correlation value among all of them, followed by FG. and lastly three point percentage (X3P.). This suggests that Regression can be calculated with any chosen variable. But as PTS has a higher value, it will more significant to use it as an independent variable.

A histogram is created of all the plots using the pairs() function to visualize the correlation.

pairs(newsteph %>% select(where(is.numeric)), cex = .7)

The visualization shows an upward trend between T.Score and PTS. Other two variables show an upward trend as well but one can observe that PTS has a slight higher approach angle for the upward trend.

Now that the correlation has been calculated and the independent variables are chosen, the model can thus be created using the ‘lm’ function. Stephen Curry being a prolific three point shooter, his three point percentage is also being considered as an independent variable so as to see if that affects the team scoring as well. After the model is created, the summary can be seen using the summary() function.

Model1 <- lm(T.Score ~ PTS + X3P. , data = newsteph)
summary(Model1)
## 
## Call:
## lm(formula = T.Score ~ PTS + X3P., data = newsteph)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.808  -8.234  -0.361   8.115  42.725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 96.41931    1.26832  76.021   <2e-16 ***
## PTS          0.48053    0.05035   9.545   <2e-16 ***
## X3P.         0.06022    0.02606   2.311   0.0211 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.15 on 758 degrees of freedom
## Multiple R-squared:  0.1669, Adjusted R-squared:  0.1647 
## F-statistic: 75.94 on 2 and 758 DF,  p-value: < 2.2e-16

From the results, the equation formed is \[T.Score = 96.41931 + (0.480PTS) + (0.602*X3P.)\]. The Adjusted R Squared value is 0.1647 which is very minimal. It means that Stephen Curry’s stats account for about 16% in the change of the team score. Even though the R squared is minimal, 16% share in an offense is a lot to account for for one person as the team consists of around 11-13 players.

To check if the values of the offense really go through changes, we shall predict the points scored by the team using random points scored by Curry and his three point percentage. First, a dataframe will be created containing random inputs of points scored by Curry along with some random Three point percentages by him.

nd <- data.frame(PTS = c(26, 40, 16, 32, 51, 9) , X3P. = c(32.6, 51, 42.3, 34.1, 43, 12.5))
print(nd)
##   PTS X3P.
## 1  26 32.6
## 2  40 51.0
## 3  16 42.3
## 4  32 34.1
## 5  51 43.0
## 6   9 12.5

After the new dataframe is created, it will be used for the output prediction based on the linear model that was created earlier.

predict(Model1, newdata = nd)
##        1        2        3        4        5        6 
## 110.8764 118.7120 106.6553 113.8500 123.5161 101.4969

The predictions suggest that the points scored and the 3 point percentage of Curry does affect the team score in a subtle way.

To clearly see the predictions, they are combined with the dataframe of input variables using cbind().

prediction <- as.data.frame(predict(Model1, newdata = nd))
colnames(prediction) <- ('PPG Prediction')

StephCurryInfluence <- cbind(nd, prediction)

print(StephCurryInfluence)
##   PTS X3P. PPG Prediction
## 1  26 32.6       110.8764
## 2  40 51.0       118.7120
## 3  16 42.3       106.6553
## 4  32 34.1       113.8500
## 5  51 43.0       123.5161
## 6   9 12.5       101.4969