2025-09-17

Introduction

  • Regression helps us understand the relationship between variables.
  • Simple Linear Regression models one predictor \(x\) and one response \(y\).
  • Example: Predicting exam scores from hours studied.

The Model

Mathematical form:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

  • \(y\): response variable
  • \(x\): predictor variable
  • \(\beta_0\): intercept
  • \(\beta_1\): slope
  • \(\epsilon\): error term

Parameter Estimation (OLS)

Coefficients are estimated with Ordinary Least Squares:

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Example Dataset

head(df, 10)
##     x         y
## 1   1  2.197622
## 2   2  6.849113
## 3   3 18.793542
## 4   4 14.352542
## 5   5 17.646439
## 6   6 28.575325
## 7   7 25.304581
## 8   8 19.674694
## 9   9 25.565736
## 10 10 29.771690

Scatter Plot (ggplot)

Regression Line (ggplot)

3D Surface (plotly)

R Code for Regression

model <- lm(y ~ x, data = df)

summary(model)
## 
## Call:
## lm(formula = y ~ x, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9395 -3.0140 -0.1884  2.5971  8.6677 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.5505     2.3100   1.537    0.142    
## x             2.9198     0.1928  15.141  1.1e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.973 on 18 degrees of freedom
## Multiple R-squared:  0.9272, Adjusted R-squared:  0.9232 
## F-statistic: 229.3 on 1 and 18 DF,  p-value: 1.102e-11

Residuals vs Fitted