1. Pick any two quantitative variables from a data set that interests you.
    If you are at a loss, look at the internal R datasets in base R and choose any.
    You can continue to use the dataset from your last discussion, or pick up a new dataset. A). Tell us what are the dependent and independent variable.

I used the hoopR dataset for this assignment which includes a myriad of NBA and college basketball statistics. I chose team wins in the 2023-2024 NBA season for my dependent variable, and total team three point attempts as my independent variable. Here is my estimating equation: Y = β0 + β1X + ϵ. In this case, Y represents the number of wins for a particular team. β0 represents the number of wins when three point attempts equals zero (the intercept). β1X represents the number of three point attempts times the rate at which wins change by an increase in one three point attempt (slope). ϵ represents the error term.

\[ Y_i = \beta_0 + \beta_1X_i + \epsilon_i \] B. Estimate the linear regression in R using the lm() command.

model <- lm(wins~total_3pa,
            data = nba_summary)
summary(model)
## 
## Call:
## lm(formula = wins ~ total_3pa, data = nba_summary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.339 -11.504   2.134  10.605  22.199 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -14.766311  11.700808  -1.262    0.217    
## total_3pa     0.019247   0.003857   4.991  2.6e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.36 on 29 degrees of freedom
## Multiple R-squared:  0.462,  Adjusted R-squared:  0.4435 
## F-statistic: 24.91 on 1 and 29 DF,  p-value: 2.603e-05

C. Interpret the slope and intercept parameters.

The estimate intercept in this model is -14.77. In this case, -14.77 refers to the predicted number of wins when a team attempts no three-pointers. This is obviously an unrealistic value given that a team can’t have negative wins. However, each team’s three point attempts eclipse zero by such an extent that the intercept is not especially relevant in this context. The slope in this case is roughly 0.019, and predicts that for each added three-point attempt a team adds 0.019 wins to its record.

  1. Replicate the slope and intercept parameter using the covariance/variance formulas
slope_covariance <- cov(nba_summary$total_3pa, nba_summary$wins)
slope_variance <- var(nba_summary$total_3pa)
slope <- slope_covariance / slope_variance
print(slope)
## [1] 0.01924713
mean_x <- mean(nba_summary$total_3pa)
mean_y <- mean(nba_summary$wins)

intercept <- mean_y - slope * mean_x
intercept
## [1] -14.76631
  1. You will find this reading useful for understanding OLS assumptions (next lecture), and for your last Assignment. Please skim through chapter 8 of Open Statistics textbook. Do a few Google searches and in less than 20 lines, try to summarize your findings.

The ordinary least squares (OLS) regression estimates the relationship between one or more independent variables with a dependent variable. OLS can be used for a multitude of purposes, but is most often used for predictive modeling, hypothesis testing, and relationship analysis. OLS has a few assumptions that are worth considering before implementing it. It assumes a linear relationship between the dependent and independent variables. It assumes residuals are not correlated with one another. It assumes that residuals have consistent variance throughout. It also assumes that residuals are normally distributed.

## `geom_smooth()` using formula = 'y ~ x'

I was curious about the relationship between 3 point attempts and winning since 3s are being taken at an unprecedented rate in the NBA the last couple of years. I specifically looked at 3 point attempts rather than 3 pt percentage because I think this metric does a better job of illustrating the potency of 3 pointers. Can just the act of taking a 3 pointer lead to more wins, without even considering 3 point accuracy? Is it a coincidence that the two Finals teams (Celtics and Mavericks) were the teams that took the most 3 point attempts? Coincidence that the worst team took the least (Pistons)? Based on this plot, the answers seem to all be yes. However, there are some obvious and glaring limitations to this plot. First, the sample size is only one year. Second, other stats like rebounding, steals, and free throw percentage are also relevant when it comes to winning games. Third, 3 point percentage is still extremely relevant. It’s not like the Pistons could just take 5,000 3 point attempts and become a Finals team. Their roster is not conducive to 3 point success in the same way the Celtics roster because their roster lacks skilled shooters. Even so, I think just about every team would benefit from taking more 3s. I think so based on this plot and the idea that taking 3s spreads the floor out, leading to more opportunities for guards/wings to facilitate and score without being double teamed. In summation, Go Celtics!