2026-03-08

Introduction

Simple linear regression is a statistical method used to model the relationship between two quantitative variables.

One variable is the independent variable (X), which is used to explain or predict changes in the dependent variable (Y).

For example, we may study how study hours affect exam scores. Linear regression helps estimate this relationship using a straight line that best fits the data.

Regression Model

Simple linear regression models the relationship between two variables using a linear equation.

The regression model is:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Where:

  • \(y\) = dependent variable
  • \(x\) = independent variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope of the regression line
  • \(\epsilon\) = random error

Interpreting the Slope

The slope of the regression line represents how much the dependent variable changes when the independent variable increases by one unit.

For example, if the slope is 4, this means that every additional hour of studying increases the predicted exam score by 4 points.

A positive slope indicates a positive relationship between the variables.

Example Data

In this example, we examine the relationship between study hours and exam scores.

##    hours scores
## 1      2     52
## 2      3     55
## 3      4     60
## 4      5     64
## 5      6     68
## 6      7     72
## 7      8     78
## 8      9     81
## 9     10     85
## 10    11     90

Scatter Plot of the Data

Regression Line

The regression line shows the best linear fit between study hours and exam scores.
It summarizes the overall trend in the data and helps predict exam scores based on study time.

Interactive Visualization

An interactive plot allows users to explore the relationship between study hours and exam scores by hovering over data points.

This interactive visualization helps users inspect individual observations more easily.

R Code for the Regression Model

The following R code fits a simple linear regression model relating study hours to exam scores.

model <- lm(scores ~ hours, data = data)
summary(model)

This output shows the estimated intercept and slope, along with statistics that describe the strength and significance of the relationship.

Conclusion

Simple linear regression is used to model the relationship between two quantitative variables.

In this example, we examined how study hours relate to exam scores. The scatter plot and regression line showed a positive relationship, indicating that more study hours are associated with higher scores.

Linear regression helps summarize this relationship and allows us to make predictions based on the data.