My Statistics Topic: Simple linear regression

Ultimately, simple linear regression models a relationship between: - a predictor X (what we know) - and a response Y (what we want to predict)

Example: Hours spent jump-roping (X) -> Vertical jump (Y)

SLR mathematical model / equation:

\[Y_i = \beta_0 + \beta_1 X_i + \epsilon_i\]

Example data (Randomly Generated)

x y
2.655087 59.63320
3.721239 63.36479
5.728534 77.09591
9.082078 89.66829
2.016819 53.93474
8.983897 81.69062
9.446753 89.97450
6.607978 81.04311

Example scatterplot with regression line

Interpreting the slope from previous two slides

\[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x\] - Intercept: 50.65 - Slope: 4.01 - Each extra hour jumping rope increases the predicted vertical jump by about 4.01 centimeters.

Residual plot

\[e_i = y_i - \hat{y}_i\]

The classic least squares idea

\[SSE = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2\]

Plotly 3D example

R code example

fit <- lm(y ~ x, data = df)
summary(fit)
predict(fit, newdata = data.frame(x = c(2, 5, 9)))

Conclusion!!

  • Linear regression models a straight-line relationship.
  • The slope shows how the response variable changes when you alter the predictor variable.
  • Residuals help check if the model is reasonable, and help to visualize accuracy of the model.