2026-06-08

What is Simple Linear Regression?

Simple linear regression is a statistical method used to study the relationship between two variables.

In this example, we study the relationship between:

  • Study hours
  • Exam score

The goal is to see whether studying more hours is associated with a higher exam score.

Regression Model

A simple linear regression model has the form:

\[ y = \beta_0 + \beta_1x + \epsilon \]

Where:

  • \(y\) is the response variable
  • \(x\) is the explanatory variable
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\epsilon\) is the random error

Variables in This Example

For this example: \[ \text{Exam Score} = \beta_0 + \beta_1(\text{Study Hours}) + \epsilon \]

The explanatory variable is: \[ x = \text{Study Hours} \]

The response variable is: \[ y = \text{Exam Score} \]

Data Table

data
##    study_hours exam_score predicted   residuals
## 1          1.0         52  51.43079  0.56920845
## 2          2.0         55  56.14070 -1.14069623
## 3          2.5         59  58.49565  0.50435143
## 4          3.0         61  60.85060  0.14939909
## 5          4.0         65  65.56051 -0.56050559
## 6          4.5         68  67.91546  0.08454206
## 7          5.0         70  70.27041 -0.27041028
## 8          6.0         74  74.98031 -0.98031496
## 9          6.5         78  77.33527  0.66473270
## 10         7.0         81  79.69022  1.30978036
## 11         8.0         85  84.40012  0.59987567
## 12         9.0         89  89.11003 -0.11002901
## 13        10.0         93  93.81993 -0.81993369

Scatterplot with Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Fitted Regression Model

model
## 
## Call:
## lm(formula = exam_score ~ study_hours, data = data)
## 
## Coefficients:
## (Intercept)  study_hours  
##       46.72         4.71

The fitted regression equation is approximately:

## Exam Score = 46.72 + 4.71 (Study Hours)

Interpretation of the Slope

The slope tells us how much the predicted exam score changes for each additional hour studied.

## For each additional hour studied, the predicted exam score increases by about 4.71 points.

This means there is a positive relationship between study hours and exam score.

Residuals

A residual is the difference between the actual value and the predicted value.

\[ e_i = y_i - \hat{y}_i \]

Where:

  • \(e_i\) is the residual
  • \(y_i\) is the actual exam score
  • \(\hat{y}_i\) is the predicted exam score

Residual Plot

Interactive Plotly Plot

## `geom_smooth()` using formula = 'y ~ x'

R Code Used to Create the Model

study_hours <- c(1, 2, 2.5, 3, 4, 4.5, 5, 6, 6.5, 7, 8, 9, 10)
exam_score <- c(52, 55, 59, 61, 65, 68, 70, 74, 78, 81, 85, 89, 93)

data <- data.frame(study_hours, exam_score)

model <- lm(exam_score ~ study_hours, data = data)

summary(model)

Conclusion

Simple linear regression helps us understand and predict the relationship between two quantitative variables.

In this example, the regression model shows that students who study more hours tend to have higher exam scores.

The scatterplot, regression line, residual plot, and interactive plot all help us visualize this relationship.