What is Simple Linear Regression?

  • A regression model that intends to determine a linear relationship or correlation between two quantitative variables

    • Examples:
      • Points vs. Field Goal Attempts
      • Points vs. Minutes Played
      • Personal Fouls vs. Turnovers

  • A linear relationship is expressed by a best fit line

Plot of Field Goal Percentage vs Points in NBA Playoffs



The graph suggests a positive correlation between field goal attempts and points, such that an increase in field goal attempts correspond to an increase in points.

Simple Linear Regression Model Equations

  • Model: \(\displaystyle {y = \beta_0 + \beta_1 \cdot x + \varepsilon}\), where \(\varepsilon\) represents the noise of the data
  • Fitted Model: \(\displaystyle {y = \beta_0 + \beta_1 \cdot x}\), where the equation represents the best fit line

Equation of Fitted Model Line for Points vs. Field Goal Attempts Plot:

\(\small{\beta_1 = slope = \frac{y_{2}-y_{1}}{x_{2}-x_{1}} = \frac{22.47488-10.69017}{17.8-8.5} = \frac{11.78471}{9.3} = 1.26717}\)

\(\small{\beta_0 = y-intercept = 22.474885-1.2671731182\cdot17.8 = -0.08080}\)

\(\small{y = -0.08080 + 1.26717 \cdot x}\)

Best Fit Line

  • The best fit line expresses the trend of the data
    • Line does not necessarily hold true for each and every observation, but can be used to predict values in the data

  • The best fit line criteria:
    1. Line contains a lot of points
    2. Points are as close to the line as possible
  • The most common method for determining the best fit line is by using the sum of squares of errors: \(\displaystyle {SSE = \Sigma^{n}_{i=1}(y_{i} - \hat y_{i})^2}\)

Code for Points vs. Minutes Played in NBA Playoffs by Position Plot

fig2 <- ggplot(df, aes(x=MP, y=PTS)) +
  geom_point(aes(color=Pos)) + 
  geom_smooth(method="lm", color="black", se=FALSE) +
  labs(x="Minutes Played", y="Points", color="Position", 
       title="Points vs. Minuted Played in NBA Playoffs")

Plot for Points vs. Minutes Played in NBA Playoffs by Position

Plot for Points vs. Minutes Played in NBA Playoffs by Position Summary

The Points vs. Minutes Played in NBA Playoffs plot suggests that there is a linear trend in the data. It seems to show that an increase in minutes played corresponds to an increase in points.

Code for Personal Fouls vs. Turnovers in NBA Playoffs

fig3 <- ggplot(df, aes(x=TOV, y=PF)) +
  geom_point(aes(color=Pos)) +
  geom_smooth(method="lm", color="black", se=FALSE) +
  labs(x="Turnovers", y="Personal Fouls", 
       title="Personal Fouls vs. Turnovers in NBA Playoffs")

Plot for Personal Fouls vs. Turnovers in NBA Playoffs

Plot for Personal Fouls vs. Turnovers in NBA Playoffs Summary

The plot for Personal Fouls vs. Turnovers in NBA Playoffs by Position suggests a slightly positive trend in the data. It shows that generally as turnovers increase so do personal fouls, especially when a player has committed less than 3 turnovers.

Simple Linear Regression and Best Fit Lines

  • There may or may not always be a visible trend in the plot of the data
  • Sometimes the Best Fit Line is not always linear
  • Most of the relationships between two or more variables in real-life are not linear but rather much more complex

References

  • Bevans, R. (2023, June 22). Simple Linear Regression | An Easy Introduction & Examples. Scribbr. Retrieved April 3, 2024, from http://www.scribbr.com/statistics/simple-linear-regression/

  • Samara, M., personal communication - lecture, March-April 2024

  • Dataset found on Kaggle by Vinco, V. 2021-2022 NBA Player Stats - Playoffs.csv.