Answers are all the way to the bottom

# Read in data
 
NBA_dataset = read.csv("C:/Users/17862/Documents/SPORTS_ANALYTICS/Intro_to_R/NBA_train.csv")
str(NBA_dataset)
## 'data.frame':    835 obs. of  20 variables:
##  $ SeasonEnd: int  1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
##  $ Team     : chr  "Atlanta Hawks" "Boston Celtics" "Chicago Bulls" "Cleveland Cavaliers" ...
##  $ Playoffs : int  1 1 0 0 0 0 0 1 0 1 ...
##  $ W        : int  50 61 30 37 30 16 24 41 37 47 ...
##  $ PTS      : int  8573 9303 8813 9360 8878 8933 8493 9084 9119 8860 ...
##  $ oppPTS   : int  8334 8664 9035 9332 9240 9609 8853 9070 9176 8603 ...
##  $ FG       : int  3261 3617 3362 3811 3462 3643 3527 3599 3639 3582 ...
##  $ FGA      : int  7027 7387 6943 8041 7470 7596 7318 7496 7689 7489 ...
##  $ X2P      : int  3248 3455 3292 3775 3379 3586 3500 3495 3551 3557 ...
##  $ X2PA     : int  6952 6965 6668 7854 7215 7377 7197 7117 7375 7375 ...
##  $ X3P      : int  13 162 70 36 83 57 27 104 88 25 ...
##  $ X3PA     : int  75 422 275 187 255 219 121 379 314 114 ...
##  $ FT       : int  2038 1907 2019 1702 1871 1590 1412 1782 1753 1671 ...
##  $ FTA      : int  2645 2449 2592 2205 2539 2149 1914 2326 2333 2250 ...
##  $ ORB      : int  1369 1227 1115 1307 1311 1226 1155 1394 1398 1187 ...
##  $ DRB      : int  2406 2457 2465 2381 2524 2415 2437 2217 2326 2429 ...
##  $ AST      : int  1913 2198 2152 2108 2079 1950 2028 2149 2148 2123 ...
##  $ STL      : int  782 809 704 764 746 783 779 782 900 863 ...
##  $ BLK      : int  539 308 392 342 404 562 339 373 530 356 ...
##  $ TOV      : int  1495 1539 1684 1370 1533 1742 1492 1565 1517 1439 ...
# Compute Points Difference
NBA_dataset$PTSdiff = NBA_dataset$PTS - NBA_dataset$oppPTS
# How many wins to make the playoffs?
#The "W" column represents the number of games won by NBA teams, while the "Playoffs" column  indicate whether each team made it to the playoffs (1) or did not make it to the playoffs (0)
table(NBA_dataset$W, NBA_dataset$Playoffs)
##     
##       0  1
##   11  2  0
##   12  2  0
##   13  2  0
##   14  2  0
##   15 10  0
##   16  2  0
##   17 11  0
##   18  5  0
##   19 10  0
##   20 10  0
##   21 12  0
##   22 11  0
##   23 11  0
##   24 18  0
##   25 11  0
##   26 17  0
##   27 10  0
##   28 18  0
##   29 12  0
##   30 19  1
##   31 15  1
##   32 12  0
##   33 17  0
##   34 16  0
##   35 13  3
##   36 17  4
##   37 15  4
##   38  8  7
##   39 10 10
##   40  9 13
##   41 11 26
##   42  8 29
##   43  2 18
##   44  2 27
##   45  3 22
##   46  1 15
##   47  0 28
##   48  1 14
##   49  0 17
##   50  0 32
##   51  0 12
##   52  0 20
##   53  0 17
##   54  0 18
##   55  0 24
##   56  0 16
##   57  0 23
##   58  0 13
##   59  0 14
##   60  0  8
##   61  0 10
##   62  0 13
##   63  0  7
##   64  0  3
##   65  0  3
##   66  0  2
##   67  0  4
##   69  0  1
##   72  0  1
# Check for linear relationship
plot(NBA_dataset$PTSdiff, NBA_dataset$W)

# Linear regression model for wins
WinsReg = lm(W ~ PTSdiff, data=NBA_dataset)
summary(WinsReg)
## 
## Call:
## lm(formula = W ~ PTSdiff, data = NBA_dataset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7393 -2.1018 -0.0672  2.0265 10.6026 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.100e+01  1.059e-01   387.0   <2e-16 ***
## PTSdiff     3.259e-02  2.793e-04   116.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.061 on 833 degrees of freedom
## Multiple R-squared:  0.9423, Adjusted R-squared:  0.9423 
## F-statistic: 1.361e+04 on 1 and 833 DF,  p-value: < 2.2e-16
# Linear regression model for points scored
PointsReg = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB + DRB + TOV + STL + BLK, data=NBA_dataset)
summary(PointsReg)
## 
## Call:
## lm(formula = PTS ~ X2PA + X3PA + FTA + AST + ORB + DRB + TOV + 
##     STL + BLK, data = NBA_dataset)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -527.40 -119.83    7.83  120.67  564.71 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.051e+03  2.035e+02 -10.078   <2e-16 ***
## X2PA         1.043e+00  2.957e-02  35.274   <2e-16 ***
## X3PA         1.259e+00  3.843e-02  32.747   <2e-16 ***
## FTA          1.128e+00  3.373e-02  33.440   <2e-16 ***
## AST          8.858e-01  4.396e-02  20.150   <2e-16 ***
## ORB         -9.554e-01  7.792e-02 -12.261   <2e-16 ***
## DRB          3.883e-02  6.157e-02   0.631   0.5285    
## TOV         -2.475e-02  6.118e-02  -0.405   0.6859    
## STL         -1.992e-01  9.181e-02  -2.169   0.0303 *  
## BLK         -5.576e-02  8.782e-02  -0.635   0.5256    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 185.5 on 825 degrees of freedom
## Multiple R-squared:  0.8992, Adjusted R-squared:  0.8981 
## F-statistic: 817.3 on 9 and 825 DF,  p-value: < 2.2e-16
#max number of points
max_numb_points <- max(NBA_dataset$PTS)
max_numb_points
## [1] 10371
SSE = sum(PointsReg$residuals^2)
SSE
## [1] 28394314
# Root mean squared error
RMSE = sqrt(SSE/nrow(NBA_dataset))
RMSE
## [1] 184.4049
# Read in test set
NBA_test = read.csv("C:/Users/17862/Documents/SPORTS_ANALYTICS/NBA_test.csv")
NBA_test
PointsReg4 = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB + STL, data=NBA_test)
summary(PointsReg4)
## 
## Call:
## lm(formula = PTS ~ X2PA + X3PA + FTA + AST + ORB + STL, data = NBA_test)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -338.56  -54.78   -7.23   76.22  359.24 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1588.8677  1639.0327   0.969 0.343382    
## X2PA           0.4617     0.2626   1.758 0.093254 .  
## X3PA           0.8920     0.2668   3.343 0.003086 ** 
## FTA            0.8768     0.2027   4.325 0.000299 ***
## AST            0.6997     0.3590   1.949 0.064752 .  
## ORB           -0.8902     0.4583  -1.943 0.065593 .  
## STL            0.9466     0.5983   1.582 0.128571    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 175.8 on 21 degrees of freedom
## Multiple R-squared:  0.7907, Adjusted R-squared:  0.7309 
## F-statistic: 13.22 on 6 and 21 DF,  p-value: 3.473e-06
# Make predictions on test set
PointsPredictions = predict(PointsReg4, newdata=NBA_test)
# Compute out-of-sample R^2
SSE = sum((PointsPredictions - NBA_test$PTS)^2)
SST = sum((mean(NBA_dataset$PTS) - NBA_test$PTS)^2)
R2 = 1 - SSE/SST
R2
## [1] 0.8873897

How many observations do we have in the training dataset?

Answer: We have 835 observations


Is there any chance that a team winning 38 games can make it to the playoffs? Why?

Answer I am looking at the results when i applied the table() function on (Wins and Playoff). We ended having three columns ( First column is number of wins, second column is the count of teams that did not make it to the playoff, and the third column is the count of team that did make it to the playoff) Based on results when using the table function with 38 wins 8 teams did not make it to the playoff, but 7 teams did make it. So the answer is YES! With 38 wins a team can make it to the playoff. ******************************************************************************** What is the number of wins that can guarantee for any team a presence in the playoffs based on historical data?

Answer The number of wins that can guarantee for any team a presence in the playoffs based on Historical data is 49. Becasue any Team with 49 wins or more has ever not made it to the playoff.


Can you determine (visually) if there is any relationship between the points difference (PTSdiff) and the number of wins (W)?Explain.

Answer By looking at the plot graph (plot(NBA_dataset\(PTSdiff, NBA_dataset\)W)) we can clearly see that there is a linear relationship between the numbers of wins and the points diff.


Here we want to determine what aspects of the game affect the number of wins of a team(WingsReg model). Is the predictor variable points difference (PTSdiff) significant at a 5% significance level?

Answer the points diff affects the numbers of wins, and it is significant at 0.05 significant level. In addition it explained about 94 percent in the variance in the number of wins per season. So Points diff is an excellent predictor in this case based on the results for the ( Linear regression model for wins ) displayed above.


We also built a linear model to predict the number of points as a function of some aspects of the game. Is the number of blocks (BLK) significant at a 5% significance level?

Answer It is NOT significant!! Because the output for the Pvalue for the number of Block (BLK) it is showing 0.5256. 0.5256 > 0.05 .. Not significant


What has been the maximum number of points in a season? Answer Using the Max() function on the PTS column . The maximum number of points in a season is 10371.


What is the meaning of the RMSE(Root mean squared error) in the PointsReg model? Are you satisfied with this value? Answer Result of 184.4049 is the mean difference.


How well did your predictions work on the testing dataset? Report the new R2 and RMSE. Answer Based on the results it is very good at 88.73%