Introduction

Goal: Show how simple linear regression can be used to model and visualize the relationship between two variables.

Variables: Study hours and exam score.

Dataset: A small made-up sample dataset used for illustration.

Definition: Simple linear regression is a statistical method that models the linear relationship between one predictor variable and one response variable.

Simple Linear Regression Model

The simple linear regression model is

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

where

  • \(y\) is the response variable
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(x\) is the predictor variable
  • \(\varepsilon\) is the error term

Estimated Regression Line

The estimated regression line is

\[ \hat{y} = b_0 + b_1 x \]

where

  • \(\hat{y}\) is the response value
  • \(b_0\) is the estimated intercept
  • \(b_1\) is the estimated slope
  • \(x\) is the predictor variable

R Code and Sample Dataset

study_hours <- c(0, 1, 1, 1, 2, 3, 3, 4, 5, 6)
exam_score  <- c(48, 61, 64, 65, 67, 77, 78, 84, 89, 93)

data <- data.frame(study_hours, exam_score)

data
   study_hours exam_score
1            0         48
2            1         61
3            1         64
4            1         65
5            2         67
6            3         77
7            3         78
8            4         84
9            5         89
10           6         93

Scatterplot with Regression Line

Residual Plot

Interactive Scatterplot

Conclusion

Simple linear regression is a statistical method that models the linear relationship between one predictor variable and one response variable.

Utilizing simple linear regression to analyze this data helped provide a better understanding of the relationship between study hours and exam score.

Based on our sample data, the conclusion is that study hours and exam score have a positive relationship.