Ge Chen

Feb.24th, 2015

RPI

Outline

1.Data

Data Selection

Kobe Bryant is a famous NBA active athlete at present.Moreover, his career points made ranked in third place among all of the NBA players in the history.The dataset of Kobe historical data collected from www.basketball-reference.com, containing the modified every game data (standardlized to 36 min). The timeline of the data is between 2005 and 2007. Because :

  1. Kobe was the only super star in the team, other players are either rookie or role player.
  2. Kobe had unlimited field goals attempts.
  3. Kobe is at his peak (healthy and superiorly skilled)

This research is fllowing the simple reggression model to check whether Kobe’s Scores, Field Goals Ratio and total assist will help the team toward victory or we can define it as socre to victory efficiency.

Data Summary

G:the game number Opp:opponents FG:field goal
FGA:Field Goal attempts
FGRatio:field goal percentage
ORB:offense oebound
DRB:defensive rebound
TRB:total rebound
AST:assist
STL:steal
BLK:block
TOV:turnover PF:personal fouls
PTS:scores
PLus_Minus:Victory Contribution(When Kobe is on court the team will get more goals than opponents or get less goals)

Kobe<-read.csv("~/Desktop/Applied_Regression/Kobe.csv");
head(Kobe,n=14L);
##     G     Date Opp FG FGA FGRatio ORB DRB TRB AST STL BLK TOV PF PTS
## 1   1  11/2/05 DEN 13  28   0.464   0   5   5   4   1   2   6  4  33
## 2   2  11/3/05 PHO 13  26   0.500   2   5   7   5   0   0   3  3  39
## 3   3  11/6/05 DEN 16  31   0.516   3   5   8   5   0   1   4  2  37
## 4   4  11/8/05 ATL 15  26   0.577   1   2   3   5   1   1   1  5  37
## 5   5  11/9/05 MIN 12  26   0.462   1   3   4   4   1   0   3  0  28
## 6   6 11/11/05 PHI  7  27   0.259   3   6   9   7   1   0   3  4  17
## 7   7 11/14/05 MEM  7  18   0.389   0   3   3   2   0   1   4  3  18
## 8   8 11/16/05 NYK 15  36   0.417   3   2   5   3   2   1   0  3  42
## 9   9 11/18/05 LAC 12  35   0.343   1   3   4   5   0   0   2  2  36
## 10 10 11/20/05 CHI 17  34   0.500   1   5   6   3   2   0   3  3  43
## 11 11 11/24/05 SEA 12  26   0.462   1   1   2   5   0   0   2  4  34
## 12 12 11/27/05 NJN 14  36   0.389   0   3   3   3   2   0   5  5  46
## 13 13 11/29/05 SAS  9  33   0.273   1   3   4   0   4   1   3  0  25
## 14 14  12/1/05 UTA 11  31   0.355   2   5   7   3   2   0   2  6  30
##    Plus_Minus
## 1           6
## 2           4
## 3          24
## 4          12
## 5         -11
## 6          -2
## 7         -27
## 8           6
## 9          -3
## 10         -5
## 11         15
## 12        -10
## 13         -7
## 14          1
KobeStatistic<-Kobe[c("PTS","FGRatio","AST","Plus_Minus")];
attach(Kobe);
summary(KobeStatistic);
##       PTS           FGRatio            AST           Plus_Minus     
##  Min.   : 8.00   Min.   :0.2220   Min.   : 0.000   Min.   :-27.000  
##  1st Qu.:25.00   1st Qu.:0.3820   1st Qu.: 3.000   1st Qu.: -6.000  
##  Median :33.00   Median :0.4550   Median : 5.000   Median :  2.000  
##  Mean   :33.52   Mean   :0.4565   Mean   : 4.924   Mean   :  2.688  
##  3rd Qu.:40.00   3rd Qu.:0.5240   3rd Qu.: 6.000   3rd Qu.: 12.000  
##  Max.   :81.00   Max.   :0.7310   Max.   :13.000   Max.   : 35.000

2.Data Correlation and Hypothesis

Correlation

cor(KobeStatistic)
##                   PTS     FGRatio         AST Plus_Minus
## PTS         1.0000000  0.46326344 -0.31232978  0.2811119
## FGRatio     0.4632634  1.00000000 -0.07444296  0.2999217
## AST        -0.3123298 -0.07444296  1.00000000  0.1277786
## Plus_Minus  0.2811119  0.29992165  0.12777857  1.0000000

Hypothesis

The research set up three NULL hypothesises to different coefficients: Hb0_FGA:Kobe’s field goals attemped per game has no contributions to the team win; Hb0_AST:Kobe’s assistant per game has no contributions to team’s win; Hb0_TRB:Kobe’s total rebound per has no contributions to team’s win;

3.Moddels

(1)Entry-Wise

modelentry <- lm(KobeStatistic$Plus_Minus~KobeStatistic$PTS+KobeStatistic$AST+KobeStatistic$FGRatio)
summary(modelentry)
## 
## Call:
## lm(formula = KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST + 
##     KobeStatistic$FGRatio)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.0319  -8.9834  -0.2078   8.3841  27.9506 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -22.72239    4.88814  -4.648 7.17e-06 ***
## KobeStatistic$PTS       0.28632    0.09691   2.954  0.00363 ** 
## KobeStatistic$AST       1.07420    0.37588   2.858  0.00486 ** 
## KobeStatistic$FGRatio  23.05592    9.86348   2.338  0.02071 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.49 on 153 degrees of freedom
## Multiple R-squared:  0.1605, Adjusted R-squared:  0.144 
## F-statistic: 9.751 on 3 and 153 DF,  p-value: 6.319e-06

(2)Hierarchical

i single variable model

modelsingle <- lm(KobeStatistic$Plus_Minus ~ KobeStatistic$PTS)
summary(modelsingle);
## 
## Call:
## lm(formula = KobeStatistic$Plus_Minus ~ KobeStatistic$PTS)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.8870  -9.4341   0.3776   9.3282  26.8529 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -7.68255    2.99955  -2.561 0.011385 *  
## KobeStatistic$PTS  0.30942    0.08484   3.647 0.000362 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.96 on 155 degrees of freedom
## Multiple R-squared:  0.07902,    Adjusted R-squared:  0.07308 
## F-statistic:  13.3 on 1 and 155 DF,  p-value: 0.000362

ii Two variables model

modeldouble <- lm(KobeStatistic$Plus_Minus ~ KobeStatistic$PTS+KobeStatistic$AST)
summary(modeldouble);
## 
## Call:
## lm(formula = KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.031  -9.348  -0.604   8.481  28.938 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -16.08481    4.03602  -3.985 0.000104 ***
## KobeStatistic$PTS   0.39154    0.08706   4.497 1.35e-05 ***
## KobeStatistic$AST   1.14751    0.37996   3.020 0.002959 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.66 on 154 degrees of freedom
## Multiple R-squared:  0.1305, Adjusted R-squared:  0.1192 
## F-statistic: 11.56 on 2 and 154 DF,  p-value: 2.103e-05

iii Three variables model

modeltriple <- lm(KobeStatistic$Plus_Minus ~ KobeStatistic$PTS+KobeStatistic$AST+KobeStatistic$FGRatio)
summary(modeltriple)
## 
## Call:
## lm(formula = KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST + 
##     KobeStatistic$FGRatio)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.0319  -8.9834  -0.2078   8.3841  27.9506 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -22.72239    4.88814  -4.648 7.17e-06 ***
## KobeStatistic$PTS       0.28632    0.09691   2.954  0.00363 ** 
## KobeStatistic$AST       1.07420    0.37588   2.858  0.00486 ** 
## KobeStatistic$FGRatio  23.05592    9.86348   2.338  0.02071 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.49 on 153 degrees of freedom
## Multiple R-squared:  0.1605, Adjusted R-squared:  0.144 
## F-statistic: 9.751 on 3 and 153 DF,  p-value: 6.319e-06

ANOA Table

anova(modelsingle,modeldouble,modeltriple)
## Analysis of Variance Table
## 
## Model 1: KobeStatistic$Plus_Minus ~ KobeStatistic$PTS
## Model 2: KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST
## Model 3: KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST + 
##     KobeStatistic$FGRatio
##   Res.Df   RSS Df Sum of Sq      F   Pr(>F)   
## 1    155 22168                                
## 2    154 20928  1   1239.53 9.3855 0.002585 **
## 3    153 20206  1    721.61 5.4639 0.020709 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(3)Sequential

library(MASS)
fit<- modelentry
step<-stepAIC(fit,direction='both')
## Start:  AIC=770.63
## KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST + 
##     KobeStatistic$FGRatio
## 
##                         Df Sum of Sq   RSS    AIC
## <none>                               20206 770.63
## - KobeStatistic$FGRatio  1    721.61 20928 774.14
## - KobeStatistic$AST      1   1078.64 21285 776.79
## - KobeStatistic$PTS      1   1152.73 21359 777.34
step$anova
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST + 
##     KobeStatistic$FGRatio
## 
## Final Model:
## KobeStatistic$Plus_Minus ~ KobeStatistic$PTS + KobeStatistic$AST + 
##     KobeStatistic$FGRatio
## 
## 
##   Step Df Deviance Resid. Df Resid. Dev      AIC
## 1                        153   20206.48 770.6295

4.Plot

(1).Scatter Plot

plot(PTS,Plus_Minus, pch=21,main="Team Contribution vs Scores")

plot(AST,Plus_Minus, pch=21,main="Team Contribution vs Assist")

plot(FGRatio,Plus_Minus, pch=21,main="Team Contribution vs Field Goals Ratio")

plot(KobeStatistic,pch=21, cex=1, main="Team victory contribution Vs. scores, Assist and Field Goal Ratio")

(2).Plot with Regression Line

plot(PTS,Plus_Minus, pch=21,main="Team Contribution vs Scores")
PTS.lm<-lm(Plus_Minus~PTS)
abline(PTS.lm$coef, lwd=2)

plot(AST,Plus_Minus, pch=21,main="Team Contribution vs Assist")
AST.lm<-lm(Plus_Minus~AST)
abline(AST.lm$coef,lwd=2)

plot(FGRatio,Plus_Minus, pch=21,main="Team Contribution vs Field Goals Ratio")
FGRatio.lm<-lm(Plus_Minus~FGRatio)
abline(FGRatio.lm$coef,lwd=2)

5 Interpret

finalmodel<- lm(Plus_Minus~FGRatio+AST+PTS)
summary(finalmodel)
## 
## Call:
## lm(formula = Plus_Minus ~ FGRatio + AST + PTS)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.0319  -8.9834  -0.2078   8.3841  27.9506 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -22.72239    4.88814  -4.648 7.17e-06 ***
## FGRatio      23.05592    9.86348   2.338  0.02071 *  
## AST           1.07420    0.37588   2.858  0.00486 ** 
## PTS           0.28632    0.09691   2.954  0.00363 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.49 on 153 degrees of freedom
## Multiple R-squared:  0.1605, Adjusted R-squared:  0.144 
## F-statistic: 9.751 on 3 and 153 DF,  p-value: 6.319e-06