NBA dataset:
MyData_train <- read.csv(file= "/Users/GD/Desktop/national-basketball-associationnba-dataset/NBA_train.csv", header=TRUE, sep=",")
summary(MyData_train)
## SeasonEnd Team Playoffs W
## Min. :1980 Atlanta Hawks : 31 Min. :0.0000 Min. :11.0
## 1st Qu.:1989 Boston Celtics : 31 1st Qu.:0.0000 1st Qu.:31.0
## Median :1996 Chicago Bulls : 31 Median :1.0000 Median :42.0
## Mean :1996 Cleveland Cavaliers: 31 Mean :0.5749 Mean :41.0
## 3rd Qu.:2005 Denver Nuggets : 31 3rd Qu.:1.0000 3rd Qu.:50.5
## Max. :2011 Detroit Pistons : 31 Max. :1.0000 Max. :72.0
## (Other) :649
## PTS oppPTS FG FGA
## Min. : 6901 Min. : 6909 Min. :2565 Min. :5972
## 1st Qu.: 7934 1st Qu.: 7934 1st Qu.:2974 1st Qu.:6564
## Median : 8312 Median : 8365 Median :3150 Median :6831
## Mean : 8370 Mean : 8370 Mean :3200 Mean :6873
## 3rd Qu.: 8784 3rd Qu.: 8768 3rd Qu.:3434 3rd Qu.:7157
## Max. :10371 Max. :10723 Max. :3980 Max. :8868
##
## X2P X2PA X3P X3PA
## Min. :1981 Min. :4153 Min. : 10.0 Min. : 75.0
## 1st Qu.:2510 1st Qu.:5269 1st Qu.:131.5 1st Qu.: 413.0
## Median :2718 Median :5706 Median :329.0 Median : 942.0
## Mean :2881 Mean :5956 Mean :319.0 Mean : 916.9
## 3rd Qu.:3296 3rd Qu.:6754 3rd Qu.:481.5 3rd Qu.:1347.5
## Max. :3954 Max. :7873 Max. :841.0 Max. :2284.0
##
## FT FTA ORB DRB
## Min. :1189 Min. :1475 Min. : 639.0 Min. :2044
## 1st Qu.:1502 1st Qu.:2008 1st Qu.: 953.5 1st Qu.:2346
## Median :1628 Median :2176 Median :1055.0 Median :2433
## Mean :1650 Mean :2190 Mean :1061.6 Mean :2427
## 3rd Qu.:1781 3rd Qu.:2352 3rd Qu.:1167.0 3rd Qu.:2516
## Max. :2388 Max. :3051 Max. :1520.0 Max. :2753
##
## AST STL BLK TOV
## Min. :1423 Min. : 455.0 Min. :204.0 Min. : 931
## 1st Qu.:1735 1st Qu.: 599.0 1st Qu.:359.0 1st Qu.:1192
## Median :1899 Median : 658.0 Median :410.0 Median :1289
## Mean :1912 Mean : 668.4 Mean :419.8 Mean :1303
## 3rd Qu.:2078 3rd Qu.: 729.0 3rd Qu.:469.5 3rd Qu.:1396
## Max. :2575 Max. :1053.0 Max. :716.0 Max. :1873
##
str(MyData_train)
## 'data.frame': 835 obs. of 20 variables:
## $ SeasonEnd: int 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
## $ Team : Factor w/ 37 levels "Atlanta Hawks",..: 1 2 5 6 8 9 10 11 12 13 ...
## $ Playoffs : int 1 1 0 0 0 0 0 1 0 1 ...
## $ W : int 50 61 30 37 30 16 24 41 37 47 ...
## $ PTS : int 8573 9303 8813 9360 8878 8933 8493 9084 9119 8860 ...
## $ oppPTS : int 8334 8664 9035 9332 9240 9609 8853 9070 9176 8603 ...
## $ FG : int 3261 3617 3362 3811 3462 3643 3527 3599 3639 3582 ...
## $ FGA : int 7027 7387 6943 8041 7470 7596 7318 7496 7689 7489 ...
## $ X2P : int 3248 3455 3292 3775 3379 3586 3500 3495 3551 3557 ...
## $ X2PA : int 6952 6965 6668 7854 7215 7377 7197 7117 7375 7375 ...
## $ X3P : int 13 162 70 36 83 57 27 104 88 25 ...
## $ X3PA : int 75 422 275 187 255 219 121 379 314 114 ...
## $ FT : int 2038 1907 2019 1702 1871 1590 1412 1782 1753 1671 ...
## $ FTA : int 2645 2449 2592 2205 2539 2149 1914 2326 2333 2250 ...
## $ ORB : int 1369 1227 1115 1307 1311 1226 1155 1394 1398 1187 ...
## $ DRB : int 2406 2457 2465 2381 2524 2415 2437 2217 2326 2429 ...
## $ AST : int 1913 2198 2152 2108 2079 1950 2028 2149 2148 2123 ...
## $ STL : int 782 809 704 764 746 783 779 782 900 863 ...
## $ BLK : int 539 308 392 342 404 562 339 373 530 356 ...
## $ TOV : int 1495 1539 1684 1370 1533 1742 1492 1565 1517 1439 ...
table(MyData_train$W,MyData_train$Playoffs)
##
## 0 1
## 11 2 0
## 12 2 0
## 13 2 0
## 14 2 0
## 15 10 0
## 16 2 0
## 17 11 0
## 18 5 0
## 19 10 0
## 20 10 0
## 21 12 0
## 22 11 0
## 23 11 0
## 24 18 0
## 25 11 0
## 26 17 0
## 27 10 0
## 28 18 0
## 29 12 0
## 30 19 1
## 31 15 1
## 32 12 0
## 33 17 0
## 34 16 0
## 35 13 3
## 36 17 4
## 37 15 4
## 38 8 7
## 39 10 10
## 40 9 13
## 41 11 26
## 42 8 29
## 43 2 18
## 44 2 27
## 45 3 22
## 46 1 15
## 47 0 28
## 48 1 14
## 49 0 17
## 50 0 32
## 51 0 12
## 52 0 20
## 53 0 17
## 54 0 18
## 55 0 24
## 56 0 16
## 57 0 23
## 58 0 13
## 59 0 14
## 60 0 8
## 61 0 10
## 62 0 13
## 63 0 7
## 64 0 3
## 65 0 3
## 66 0 2
## 67 0 4
## 69 0 1
## 72 0 1
MyData_train$diff = MyData_train$PTS - MyData_train$oppPTS
model <- lm(W ~ diff,data = MyData_train)
summary(model)
##
## Call:
## lm(formula = W ~ diff, data = MyData_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.7393 -2.1018 -0.0672 2.0265 10.6026
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.100e+01 1.059e-01 387.0 <2e-16 ***
## diff 3.259e-02 2.793e-04 116.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.061 on 833 degrees of freedom
## Multiple R-squared: 0.9423, Adjusted R-squared: 0.9423
## F-statistic: 1.361e+04 on 1 and 833 DF, p-value: < 2.2e-16
If we take a look at R-squared the value is 0.9423 close to 1.
plot(model)
model$coefficients
## (Intercept) diff
## 41.00000000 0.03258633
The equation for regression becomes
As R-square is close to 1,the residual variance is lower and indicates that model is a good fit.
Data source : www.kaggle.com