## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Rows: 2455 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): TEAM, CONF, POSTSEASON
## dbl (21): G, W, ADJOE, ADJDE, BARTHAG, EFG_O, EFG_D, TOR, TORD, ORB, DRB, FT...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Identifying numeric response variables and numeric explantory variables

The response variable for this model is going to be WAB(Wins above bubble), and the explantory variable is ADJOE(adjusted offense efficiency)

Determine fitted model

mod <- lm(WAB ~ ADJOE, data = bball)
summary(mod)
## 
## Call:
## lm(formula = WAB ~ ADJOE, data = bball)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.1204  -2.5203  -0.0016   2.5731  13.3365 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -89.66782    1.07363  -83.52   <2e-16 ***
## ADJOE         0.79247    0.01037   76.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.788 on 2453 degrees of freedom
## Multiple R-squared:  0.7043, Adjusted R-squared:  0.7042 
## F-statistic:  5844 on 1 and 2453 DF,  p-value: < 2.2e-16

Perform test for the slope

Residual standard error: 3.788 on 2453 degrees of freedom

Multiple R-squared: 0.7043, Adjusted R-squared: 0.7042

F-statistic: 5844 on 1 and 2453 DF, p-value: < 2.2e-16

The null hypothesis is that the slope would be 0 which would mean that there would be no coorelation between the two variables, and the alternative hypothesis is that the slope is not equal to 0, and there would be a coorelation between the two variables.Because the slope is not equal to zero,we fail to accept the null hypothesis. This means that

ggplot(data = bball, aes(x = ADJOE, y = WAB, color = WAB))+
  geom_point()+
  geom_abline(intercept = mod$coefficients[1], slope = mod$coefficients[2])

##Create an ANOVA table and produce the F-statistic and discuss the R-squared ##value

anova(mod)
## Analysis of Variance Table
## 
## Response: WAB
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## ADJOE        1  83867   83867  5843.8 < 2.2e-16 ***
## Residuals 2453  35204      14                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-squared is .7043, which means that the model does a good job of explaining the variance, further supporting our hypothesis.

Create diagnostic plots to assess model assumptions

# QQ NORM Plot 
hist(mod$residuals)

qqnorm(mod$residuals)
qqline(mod$residuals)

bball<-cbind(bball, 
            fit=mod$fitted.values,
            residual=mod$residuals)

ggplot(data=bball, aes(residual))+
  geom_histogram(bins=8)+
  ggtitle("Histogram of Residuals")+
  theme_bw()

# Residual Plot
ggplot(data=bball, aes(WAB, residual))+
  geom_point()+
  ggtitle("Residual Plot")+
  xlab("WAB (Wins above bubble)")+
  ylab("Residuals")+
  theme_bw()+
  geom_hline(yintercept = 0,
             color="blue", lty=2, lwd=1)

##Summary of findings:

Overall it is safe to say that offensive effiency directly impacts the wins above bubble. To take this further, we can make a mutiple regression model of the defensive statistics as well, and possibly others as well, as long as there is still a statistically meaningful. Then this can be used as an indicator to which teams had a post season, and success in the postseason.