NBA = read.csv("NBA_train.csv")
str(NBA)
'data.frame': 835 obs. of 20 variables:
$ SeasonEnd: int 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
$ Team : chr "Atlanta Hawks" "Boston Celtics" "Chicago Bulls" "Cleveland Cavaliers" ...
$ Playoffs : int 1 1 0 0 0 0 0 1 0 1 ...
$ W : int 50 61 30 37 30 16 24 41 37 47 ...
$ PTS : int 8573 9303 8813 9360 8878 8933 8493 9084 9119 8860 ...
$ oppPTS : int 8334 8664 9035 9332 9240 9609 8853 9070 9176 8603 ...
$ FG : int 3261 3617 3362 3811 3462 3643 3527 3599 3639 3582 ...
$ FGA : int 7027 7387 6943 8041 7470 7596 7318 7496 7689 7489 ...
$ X2P : int 3248 3455 3292 3775 3379 3586 3500 3495 3551 3557 ...
$ X2PA : int 6952 6965 6668 7854 7215 7377 7197 7117 7375 7375 ...
$ X3P : int 13 162 70 36 83 57 27 104 88 25 ...
$ X3PA : int 75 422 275 187 255 219 121 379 314 114 ...
$ FT : int 2038 1907 2019 1702 1871 1590 1412 1782 1753 1671 ...
$ FTA : int 2645 2449 2592 2205 2539 2149 1914 2326 2333 2250 ...
$ ORB : int 1369 1227 1115 1307 1311 1226 1155 1394 1398 1187 ...
$ DRB : int 2406 2457 2465 2381 2524 2415 2437 2217 2326 2429 ...
$ AST : int 1913 2198 2152 2108 2079 1950 2028 2149 2148 2123 ...
$ STL : int 782 809 704 764 746 783 779 782 900 863 ...
$ BLK : int 539 308 392 342 404 562 339 373 530 356 ...
$ TOV : int 1495 1539 1684 1370 1533 1742 1492 1565 1517 1439 ...
We have 835 observations in the training dataset.
# How many wins to make the playoffs?
table(NBA$W, NBA$Playoffs)
0 1
11 2 0
12 2 0
13 2 0
14 2 0
15 10 0
16 2 0
17 11 0
18 5 0
19 10 0
20 10 0
21 12 0
22 11 0
23 11 0
24 18 0
25 11 0
26 17 0
27 10 0
28 18 0
29 12 0
30 19 1
31 15 1
32 12 0
33 17 0
34 16 0
35 13 3
36 17 4
37 15 4
38 8 7
39 10 10
40 9 13
41 11 26
42 8 29
43 2 18
44 2 27
45 3 22
46 1 15
47 0 28
48 1 14
49 0 17
50 0 32
51 0 12
52 0 20
53 0 17
54 0 18
55 0 24
56 0 16
57 0 23
58 0 13
59 0 14
60 0 8
61 0 10
62 0 13
63 0 7
64 0 3
65 0 3
66 0 2
67 0 4
69 0 1
72 0 1
Question 2: Is there any chance that a team winning 38 games can make it to the playoffs? Why?
Yes, based on historical data, there is about a 50% chance that a team with 38 wins can make it to the playoffs. Now with the new format implemented in the last few years, a team with 38 wins will have an even better chance to make it to the playoffs.
Question 3: What is the number of wins that can guarantee for any team a presence in the playoffs based on historical data?
Based on our results, a team with 49 wins never failed to make it to the post-season.
Question 4: Can you determine (visually) if there is any relationship between the points difference (PTSdiff) and the number of wins (W)?Explain.
Computing the points difference
# Compute Points Difference
NBA$PTSdiff = NBA$PTS - NBA$oppPTS
By looking into the scatter plot below, we can notice that there exists a correlation betweeen our dependent variable, number of wins, our independent variable and points difference.
# Check for linear relationship
plot(NBA$PTSdiff, NBA$W)
Question 5:
Here we want to determine what aspects of the game affect the number of wins of a team(WingsReg model). Is the predictor variable points difference (PTSdiff) significant at a 5% significance level?
Linear regression model for wins
# Linear regression model for wins
WinsReg = lm(W ~ PTSdiff, data=NBA)
summary(WinsReg)
Call:
lm(formula = W ~ PTSdiff, data = NBA)
Residuals:
Min 1Q Median 3Q Max
-9.7393 -2.1018 -0.0672 2.0265 10.6026
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.100e+01 1.059e-01 387.0 <2e-16 ***
PTSdiff 3.259e-02 2.793e-04 116.7 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.061 on 833 degrees of freedom
Multiple R-squared: 0.9423, Adjusted R-squared: 0.9423
F-statistic: 1.361e+04 on 1 and 833 DF, p-value: < 2.2e-16
Based on the output, the predictor points difference is significant at a 5% significance level. (p-value <<<0.05 or |t|>>>2).
Question 6: We also built a linear model to predict the number of points as a function of some aspects of the game. Is the number of blocks (BLK) significant at a 5% significance level?
Linear regression model for points scored
PointsReg = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB + DRB + TOV + STL + BLK, data=NBA)
summary(PointsReg)
Call:
lm(formula = PTS ~ X2PA + X3PA + FTA + AST + ORB + DRB + TOV +
STL + BLK, data = NBA)
Residuals:
Min 1Q Median 3Q Max
-527.40 -119.83 7.83 120.67 564.71
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.051e+03 2.035e+02 -10.078 <2e-16 ***
X2PA 1.043e+00 2.957e-02 35.274 <2e-16 ***
X3PA 1.259e+00 3.843e-02 32.747 <2e-16 ***
FTA 1.128e+00 3.373e-02 33.440 <2e-16 ***
AST 8.858e-01 4.396e-02 20.150 <2e-16 ***
ORB -9.554e-01 7.792e-02 -12.261 <2e-16 ***
DRB 3.883e-02 6.157e-02 0.631 0.5285
TOV -2.475e-02 6.118e-02 -0.405 0.6859
STL -1.992e-01 9.181e-02 -2.169 0.0303 *
BLK -5.576e-02 8.782e-02 -0.635 0.5256
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 185.5 on 825 degrees of freedom
Multiple R-squared: 0.8992, Adjusted R-squared: 0.8981
F-statistic: 817.3 on 9 and 825 DF, p-value: < 2.2e-16
Based on the model output, BLK (# of blocks), it is not significant at a 5% significance level (p-value = 0.5265>0.05).
question 7: What has been the maximum number of points in a season?
The max number of points scored in a season is 10,371.
max(NBA$PTS)
[1] 10371
** Question 8: What is the meaning of the RMSE(Root mean squared error) in the PointsReg model? Are you satisfied with this value?**
It is the average distance between the actual values and the theoretical values on the regression line.
In-class activity 13 Our data shows that a team with 49 wins has never missed the playoffs. What is the expected points difference for a team to make it to the postseason? Use the lecture solution file and more specifically the WingsReg model.
#Based on Model 4 Wins= Intercept+coefficient*PtsDiff
#49=41+0.03259*PtsDiff
IdealPtsDiff = (49-41)/(0.03259)
IdealPtsDiff
[1] 245.4741
The expected point difference for a team to make it to the post season is 245 points.