2026-02-09

Introduction

Simple Linear Regression models the relationship between:

  • One predictor variable (X)
  • One response variable (Y)

It helps explain how changes in X affect Y.

Mathematical Model

The regression equation is:

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Where:

  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\varepsilon\) = random error

Least Squares Estimation

The slope estimator is:

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})} {\sum (x_i - \bar{x})^2} \]

The intercept estimator is:

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Setup

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Example Dataset

We simulate data following a linear model:

##             x         y
## 1 -0.56047565  2.608166
## 2 -0.23017749  4.566351
## 3  1.55870831  9.429433
## 4  0.07050839  4.863983
## 5  0.12928774  4.436245
## 6  1.71506499 10.100167

Scatter Plot (ggplot)

Regression Line (ggplot)

## `geom_smooth()` using formula = 'y ~ x'

Fitting the Model in R

## 
## Call:
## lm(formula = y ~ x, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9073 -0.6835 -0.0875  0.5806  3.2904 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.89720    0.09755   50.20   <2e-16 ***
## x            2.94753    0.10688   27.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9707 on 98 degrees of freedom
## Multiple R-squared:  0.8859, Adjusted R-squared:  0.8847 
## F-statistic: 760.6 on 1 and 98 DF,  p-value: < 2.2e-16

The output provides:

  • Estimated intercept
  • Estimated slope
  • R-squared
  • p-values

Hypothesis Testing

To test whether X significantly affects Y:

\[ H_0: \beta_1 = 0 \]

\[ H_a: \beta_1 \ne 0 \]

If the p-value is small (< 0.05), we reject \(H_0\).

3D Interactive Plot (plotly)

We visualize X, Y, and predicted Y.

Interpretation

  • The scatter plot shows a positive linear relationship.
  • The regression line confirms the upward trend.
  • The slope estimate is close to the true simulated value.
  • The small p-value indicates statistical significance.

Conclusion

Simple Linear Regression:

  • Models linear relationships
  • Uses least squares estimation
  • Allows hypothesis testing
  • Is widely used in data science, engineering, and economics

It is one of the most fundamental tools in statistics.