What Is Simple Linear Regression?

Simple linear regression is a method used to study the relationship between two variables.

One variable is used to predict the other.

Example: - x = hours studied - y = exam score

We want to see if studying more leads to higher scores.

The Variables

In simple liner regression: - x = explanatory variable - y = response variable

In this example: - x = hours studied - y = exam score

We are trying to predict y using x.

The Regression Equation

The regression equation is:

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\varepsilon\) = error

This equation models the relationship between x and y.

What the Slope Means

The slope tells us how much y changes when x increases by 1.

Example:

\[ \hat{y} = b_0 + b_1 x \]

If the slope is positive, y increases as x increases. If the slope is negative, y decreases as x increases.

Example Data

Here is the data used.

##   hours scores
## 1     1     52
## 2     2     55
## 3     3     61
## 4     4     65
## 5     5     71
## 6     6     76
## 7     7     82
## 8     8     88

Scatterplot with Regession Line

Residual Plot

Plotly Plot

This plot was created with plotly and saved automatically as an image for display in the slides.

R Code Example

model = lm(scores ~ hours, data = study_data)
summary(model)
## 
## Call:
## lm(formula = scores ~ hours, data = study_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1429 -0.6071 -0.1429  0.4107  1.5000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.2857     0.7508   60.31 1.40e-09 ***
## hours         5.2143     0.1487   35.07 3.58e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9636 on 6 degrees of freedom
## Multiple R-squared:  0.9951, Adjusted R-squared:  0.9943 
## F-statistic:  1230 on 1 and 6 DF,  p-value: 3.583e-08

Conclusion

Simple linear regression helps us understand relations between variables.

In this example, more hours studied led to higher exam scores.

This method is useful in many real-world situations.