Introduction

Linear Regression -used to predict a dependent variable based on the independent variables

-Uses line of best fit to show the relation between dependent and independent variables

The simple linear regression model is:

\(Y = \beta_0 + \beta_1 X + \epsilon\)

Where:

  • \(\beta_0\): intercept
  • \(\beta_1\): slope
  • \(\epsilon\): random error

Assumptions

  • Linearity
  • Independence
  • Homoscedasticity (constant variance)
  • Normality of errors

These assumptions ensure valid inference.

Estimation (Least Squares)

We estimate coefficients by minimizing:

\(\sum (Y_i - \hat{Y}_i)^2\)

The estimated line is:

\(\hat{Y} = b_0 + b_1 X\)

Where:

\(b_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2}\)

Example Dataset

# Sample data
set.seed(123)
x <- 1:50
y <- 3 + 2*x + rnorm(50, 0, 10)

data <- data.frame(x, y)

Scatter Plot for Sample Data Set

Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Interactive Plot