2026-02-08

Simple Linear Regression

Simple Linear Regression is a statistical tool used to: - Summarize and study relationships between two quantitative variables.

  • Predict an outcome based on a single input .

  • Determine how much the dependent variable changes when the independent variable changes by one unit.

Best Fitting Line

  • A line drawn on scatter plot to help summarize the trend between two quantitative variable

  • The equation to find the best fitting line is \[\hat{y}_i = b_0 + b_1 x_i\]

-\(y_i\): denotes the observed response for experimental unit i

-\(x_i\): denotes the predictor value for experimental unit i

-\(\hat{y}_i\): is the predicted response (or fitted value) for experimental unit

citation: https://online.stat.psu.edu/stat462/node/92/

Prediction error

  • Using \[\hat{y}_i = b_0 + b_1 x_i\] to predict the actual response \(y_i\), we make prediction error of size:

    \[\epsilon_i = y_i - \hat{y}_i\]

Application of simple linear regression

-To illustrate the application of a simple linear regression, a built-in data “cars” in RStudio will be used.

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Plotting the data set

-Plotting the data to show the relationship between two quantitative variable

Identifying a linear regression function

  • If a function is a non-linera regression function, the residuals depart from 0 in some systematic manner.

  • Such as being positive for small x values

  • Negative for medium x values

  • Positive again for large x values

  • Any systemic pattern is sufficient to suggest that teh regression is not linear

citation: https://online.stat.psu.edu/stat462/node/120/

Verifying linear

-using lm() R-studios

Regression plot