2026-01-23

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Overview

  • What is simple linear regression
  • Model formulation
  • Example with data
  • Visualization and interpretation

The Linear Regression Model

Simple linear regression models the relationship between a response variable \(y\)
and a predictor variable \(x\) as:

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

where:

  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\varepsilon\) is the random error term with mean 0 and variance \(\sigma^2\)

Example Data Generation

To illustrate simple linear regression, we generate simulated data using the model:

\[ y = 2 + 1.5x + \varepsilon,\quad \varepsilon \sim N(0, 2^2) \]

This approach allows us to control the true relationship between \(x\) and \(y\), while introducing realistic random variation.

Fitted Regression Line

We fit a simple linear regression model and overlay the fitted line on the data.

## `geom_smooth()` using formula = 'y ~ x'

Residuals vs Fitted Values

To assess model assumptions, we examine the residuals plotted against the fitted values. A random scatter around zero indicates a good linear fit.

## Estimation of Coefficients

The estimates of \(\beta_0\) and \(\beta_1\) are obtained by minimizing the sum of squared residuals:

\[ \text{RSS} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

The estimated regression line is:

\[ \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x \]

R Code for Linear Regression

The following R code fits a simple linear regression model to the data:

model <- lm(y ~ x, data = data)
summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0224 -1.2445 -0.1639  1.3318  4.3193 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.13532    0.52122   4.097  0.00016 ***
## x            1.48670    0.08982  16.552  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.871 on 48 degrees of freedom
## Multiple R-squared:  0.8509, Adjusted R-squared:  0.8478 
## F-statistic:   274 on 1 and 48 DF,  p-value: < 2.2e-16

3D Visualization with Plotly

A three-dimensional view of the fitted values and residuals.