2025-10-26

Introduction

Linear regression is one of the most fundamental statistical techniques for:

  • Modeling the relationship between variables
  • Making predictions based on data
  • Understanding how one variable affects another

Real-world applications:

  • Predicting house prices based on size
  • Forecasting sales based on advertising spend
  • Estimating student performance based on study hours

What is Simple Linear Regression?

Simple linear regression models the relationship between:

  • One independent variable (predictor, X)
  • One dependent variable (response, Y)

The goal: Find the best-fitting straight line through the data points.

\[Y = \beta_0 + \beta_1 X + \epsilon\]

Where:

  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\epsilon\) is the error term

The Linear Regression Equation

The simple linear regression model is expressed as:

\[\hat{Y} = b_0 + b_1 X\]

Where:

  • \(\hat{Y}\) = predicted value of Y
  • \(b_0\) = estimated intercept (Y-intercept)
  • \(b_1\) = estimated slope (change in Y per unit change in X)
  • \(X\) = value of the independent variable

Estimating the Parameters

The least squares method minimizes the sum of squared residuals:

\[\text{SSE} = \sum_{i=1}^{n}(Y_i - \hat{Y}_i)^2\]

Formulas for slope and intercept:

\[b_1 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^{n}(X_i - \bar{X})^2}\]

\[b_0 = \bar{Y} - b_1\bar{X}\]

R Code Example: Creating a Linear Model

# Create sample data
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(2.1, 4.3, 5.8, 8.2, 10.1, 12.5, 14.2, 16.8, 18.5, 20.9)

# Fit the linear regression model
model <- lm(y ~ x)

# View the results
summary(model)

# Make predictions
new_data <- data.frame(x = c(11, 12))
predictions <- predict(model, new_data)

This code demonstrates how to fit a simple linear regression model in R.

Practical Example: Study Hours vs. Test Scores

Let’s analyze the relationship between study hours and test scores:

Dataset: 10 students, their study hours and corresponding test scores.

Visualization: Scatter Plot with Regression Line

The blue line represents our fitted regression model with the confidence interval shaded.

Visualization: Residuals Plot

Residuals show the difference between observed and predicted values.

Interactive 3D Visualization with Plotly

Interactive! You can rotate and zoom this 3D plot.

Key Takeaways

Simple Linear Regression allows us to:

  1. Model linear relationships between two variables
  2. Make predictions for new observations
  3. Quantify the strength and direction of relationships

Important considerations:

  • Assumes a linear relationship exists
  • Sensitive to outliers
  • Check residuals to validate model assumptions
  • R² measures how well the model fits the data

Applications are everywhere: economics, medicine, engineering, social sciences, and more!