This notebook shows how Stephen Curry Impacts the offense of his team, the Golden State Warriors.
Problem Statement: Does Stephen Curry’s stats affect the Golden State Warriors’ Offensive points scored?
Introduction
Stephen Curry, 33, is a Professional American Basketball player in the NBA for the Golden State Warriors Organization. He is regarded as the most prolific shooter in NBA and history and one of the best scorers in the game. It has been said that Stephen Curry is the heart of the team and the offense runs through him. In his absence in games or poor performance, the team struggles to score points and the team either wins barely or loses the game.
Some Important Basketball terminologies used in the following notebook:
First, the required libraries are loaded and installed which are required for the analysis.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(corrplot)
## corrplot 0.92 loaded
After loading the libraries, the dataset is read using the read.csv() function and then placed into a dataframe for convenience. The dataset was obtained from Kaggle which contains all Stephen Curry’s stats from 2009 to 2021.
read.csv('/cloud/project/Stephr.csv')
steph <- read.csv('/cloud/project/Stephr.csv')
head(steph)
Summary of the data can be seen using the summary() function.
summary(steph)
## Season_year Season_div Date OPP
## Length:761 Length:761 Length:761 Length:761
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Result T.Score O.Score MIN
## Length:761 Min. : 72.0 Min. : 63.0 Min. : 2.00
## Class :character 1st Qu.:102.0 1st Qu.: 97.0 1st Qu.:30.00
## Mode :character Median :110.0 Median :105.0 Median :35.00
## Mean :110.6 Mean :105.4 Mean :34.37
## 3rd Qu.:120.0 3rd Qu.:114.0 3rd Qu.:38.00
## Max. :149.0 Max. :141.0 Max. :49.00
## FG FGM FGA FG.
## Length:761 Min. : 0.000 Min. : 0.00 Min. : 0.0
## Class :character 1st Qu.: 6.000 1st Qu.:14.00 1st Qu.: 38.9
## Mode :character Median : 8.000 Median :18.00 Median : 47.4
## Mean : 8.331 Mean :17.47 Mean : 47.1
## 3rd Qu.:11.000 3rd Qu.:21.00 3rd Qu.: 55.6
## Max. :20.000 Max. :36.00 Max. :100.0
## X3PT X3PTM X3PTA X3P.
## Length:761 Min. : 0.000 Min. : 0.000 Min. : 0.00
## Class :character 1st Qu.: 2.000 1st Qu.: 6.000 1st Qu.: 30.00
## Mode :character Median : 3.000 Median : 9.000 Median : 42.90
## Mean : 3.721 Mean : 8.594 Mean : 42.02
## 3rd Qu.: 5.000 3rd Qu.:11.000 3rd Qu.: 54.50
## Max. :13.000 Max. :22.000 Max. :100.00
## FT FTM FTA FT.
## Length:761 Min. : 0.00 Min. : 0.000 Min. : 0.00
## Class :character 1st Qu.: 2.00 1st Qu.: 2.000 1st Qu.: 75.00
## Mode :character Median : 3.00 Median : 4.000 Median :100.00
## Mean : 3.84 Mean : 4.234 Mean : 78.31
## 3rd Qu.: 6.00 3rd Qu.: 6.000 3rd Qu.:100.00
## Max. :18.00 Max. :19.000 Max. :100.00
## REB AST BLK STL
## Min. : 0.000 Min. : 0.000 Min. :0.0000 Min. :0.000
## 1st Qu.: 3.000 1st Qu.: 5.000 1st Qu.:0.0000 1st Qu.:1.000
## Median : 4.000 Median : 6.000 Median :0.0000 Median :1.000
## Mean : 4.603 Mean : 6.549 Mean :0.2155 Mean :1.683
## 3rd Qu.: 6.000 3rd Qu.: 8.000 3rd Qu.:0.0000 3rd Qu.:3.000
## Max. :14.000 Max. :16.000 Max. :2.0000 Max. :7.000
## PF TO PTS
## Min. :0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.:1.000 1st Qu.: 2.000 1st Qu.:17.00
## Median :2.000 Median : 3.000 Median :24.00
## Mean :2.432 Mean : 3.143 Mean :24.22
## 3rd Qu.:3.000 3rd Qu.: 4.000 3rd Qu.:31.00
## Max. :6.000 Max. :11.000 Max. :62.00
For a Simple Linear Regression, only some columns are needed from the Dataframe for analysis. After selecting the columns from the dataframe, these columns placed in another dataframe.
steph[, c("T.Score", "FG.", "X3P.", "PTS")]
newsteph <- steph[, c("T.Score", "FG.", "X3P.", "PTS")]
The columns selected from the dataframe will be used for further regression of the problem statement. From the problem statement, it can be said that the Team Score i.e “T.Score” is the dependant variable and the remaining features like Points scored by Curry (PTS), his field goal percentage(FG.) and his Three point percentage (X3P.) are the independent variables. To see the how much effect they have on the dependant variable, correlation is calculated for better answers.
cor(newsteph$T.Score, newsteph$FG.)
## [1] 0.3534566
cor(newsteph$T.Score, newsteph$X3P.)
## [1] 0.2584599
cor(newsteph$T.Score, newsteph$PTS)
## [1] 0.401315
From the output it can be seen that, all the variables has a positive correlation with the independent variable. Out of the three, PTS has the higher correlation value among all of them, followed by FG. and lastly three point percentage (X3P.). This suggests that Regression can be calculated with any chosen variable. But as PTS has a higher value, it will more significant to use it as an independent variable.
A histogram is created of all the plots using the pairs() function to visualize the correlation.
pairs(newsteph %>% select(where(is.numeric)), cex = .7)
The visualization shows an upward trend between T.Score and PTS. Other two variables show an upward trend as well but one can observe that PTS has a slight higher approach angle for the upward trend.
Now that the correlation has been calculated and the independent variables are chosen, the model can thus be created using the ‘lm’ function. Stephen Curry being a prolific three point shooter, his three point percentage is also being considered as an independent variable so as to see if that affects the team scoring as well. After the model is created, the summary can be seen using the summary() function.
Model1 <- lm(T.Score ~ PTS + X3P. , data = newsteph)
summary(Model1)
##
## Call:
## lm(formula = T.Score ~ PTS + X3P., data = newsteph)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.808 -8.234 -0.361 8.115 42.725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 96.41931 1.26832 76.021 <2e-16 ***
## PTS 0.48053 0.05035 9.545 <2e-16 ***
## X3P. 0.06022 0.02606 2.311 0.0211 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.15 on 758 degrees of freedom
## Multiple R-squared: 0.1669, Adjusted R-squared: 0.1647
## F-statistic: 75.94 on 2 and 758 DF, p-value: < 2.2e-16
From the results, the equation formed is \[T.Score = 96.41931 + (0.480PTS) + (0.602*X3P.)\]. The Adjusted R Squared value is 0.1647 which is very minimal. It means that Stephen Curry’s stats account for about 16% in the change of the team score. Even though the R squared is minimal, 16% share in an offense is a lot to account for for one person as the team consists of around 11-13 players.
To check if the values of the offense really go through changes, we shall predict the points scored by the team using random points scored by Curry and his three point percentage. First, a dataframe will be created containing random inputs of points scored by Curry along with some random Three point percentages by him.
nd <- data.frame(PTS = c(26, 40, 16, 32, 51, 9) , X3P. = c(32.6, 51, 42.3, 34.1, 43, 12.5))
print(nd)
## PTS X3P.
## 1 26 32.6
## 2 40 51.0
## 3 16 42.3
## 4 32 34.1
## 5 51 43.0
## 6 9 12.5
After the new dataframe is created, it will be used for the output prediction based on the linear model that was created earlier.
predict(Model1, newdata = nd)
## 1 2 3 4 5 6
## 110.8764 118.7120 106.6553 113.8500 123.5161 101.4969
The predictions suggest that the points scored and the 3 point percentage of Curry does affect the team score in a subtle way.
To clearly see the predictions, they are combined with the dataframe of input variables using cbind().
prediction <- as.data.frame(predict(Model1, newdata = nd))
colnames(prediction) <- ('PPG Prediction')
StephCurryInfluence <- cbind(nd, prediction)
print(StephCurryInfluence)
## PTS X3P. PPG Prediction
## 1 26 32.6 110.8764
## 2 40 51.0 118.7120
## 3 16 42.3 106.6553
## 4 32 34.1 113.8500
## 5 51 43.0 123.5161
## 6 9 12.5 101.4969