Simple Linear Regression Presentation

What is Linear Regression?

A means of interpreting the relationships between variables (how do changes in X affect Y?)
Allows for the prediction of numeric outcomes based on observable features
Foundation for many machine learning models

Core Mathematical Concept

Given some predictor, \(x\), and its response, \(y\), their presumed linear relationship is modeled as follows: \[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i,\quad i=1,\dots,n, \] where the errors are \[\varepsilon_i \sim {N}(0,\sigma^2)\quad\].

Data Creation and Modeling

library(ggplot2)
library(dplyr)
library(tibble)
library(plotly)

# Create dataset
x <- runif(100, 0, 10)
y <- 3 + 2*x + rnorm(100, 0, 2)
data <- tibble(x, y)

# Fit linear regression
model <- lm(y ~ x, data = data)

Plot Data with Fitted Line

ggplot(data, aes(x, y)) + geom_point() + geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Data with Fitted Line", x = "X", y = "Y")

Residuals vs Fitted

residuals <- resid(model)
fitted_vals <- fitted(model)
resid_data <- tibble(fitted = fitted_vals, resid = residuals)
ggplot(resid_data, aes(fitted, resid)) + geom_point() + geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Residuals vs Fitted", x = "Fitted", y = "Residuals")

Coefficients and R-squared

summary(model)$coefficients

##             Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 2.716664 0.46246530  5.874308 5.835836e-08
## x           2.078672 0.07760014 26.786965 6.944611e-47

summary(model)$r.squared

## [1] 0.8798344

Predictions with Bands

# Make predictions for a sequence of x values
new_x <- tibble(x = seq(0, 10, length.out = 100))

# Get prediction intervals
pred_mat <- predict(model, newdata = new_x, interval = "prediction")
pred <- cbind(new_x, as.data.frame(pred_mat))

Predictions with Bands Cont.

ggplot() + geom_point(data = data, aes(x, y)) + geom_line(data = pred, aes(x, fit)) +
  geom_ribbon(data = pred, aes(x, ymin = lwr, ymax = upr), alpha = 0.2) +
  labs(title = "Predictions with 95% Bands", x = "X", y = "Y")

Multiple Linear Regression

Linear models can also be produced using multiple predictors. For example using \(x_1\) and \(x_2\).

set.seed(2)
x1 <- runif(80, 0, 10)
x2 <- runif(80, -5, 5)
y2 <- 1 + 1.2*x1 - 0.7*x2 + rnorm(80, 0, 2)
data2 <- tibble(x1, x2, y2)
model2 <- lm(y2 ~ x1 + x2, data = data2)

# Build grid
g1 <- seq(min(x1), max(x1), length.out = 25)
g2 <- seq(min(x2), max(x2), length.out = 25)
grid <- expand.grid(x1 = g1, x2 = g2)
grid$yhat <- coef(model2)[1] + coef(model2)[2]*grid$x1 + coef(model2)[3]*grid$x2

Multiple Linear Regression Cont.

plot_ly() |> add_markers(data = data2, x = ~x1, y = ~x2, z = ~y2, opacity = 0.6) |> 
  add_surface(x=~g1, y=~g2, z=~matrix(grid$yhat, nrow=length(g1), ncol=length(g2)), showscale=FALSE, opacity=0.5) |> 
  layout(title = "3D Regression Plane")

Conclusion

Linear regression fits a line to predict y from x
Use lm(y ~ x) to fit, and summary() to see results
Plotly can display a plane when working with two predictors