Introduction to Simple Linear Regression

  • Linear Regression models the relationship between two variables:
    • Independent variable (X)
    • Dependent variable (Y)
  • Purpose:
    • Predict the value of Y based on X
    • Understand the strength and direction of the relationship

The Simple Linear Regression Model

The mathematical representation of the model is:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

Where:

  • \(Y\) = Dependent variable
  • \(X\) = Independent variable
  • \(\beta_0\) = Intercept
  • \(\beta_1\) = Slope
  • \(\epsilon\) = Error term

Assumptions of the Model and Example Dataset

  1. Linearity: The relationship between X and Y is linear.
  2. Independence: The observations are independent of each other.
  3. Homoscedasticity: The variance of the error terms is constant.
  4. Normality: The error terms are normally distributed.

Scatter Plot with Linear Regression Line

## Warning: package 'ggplot2' was built under R version 4.3.3
## `geom_smooth()` using formula = 'y ~ x'

Residual Plot

A residual plot to assure outr assumptions of linear regression, especially the contant variance of error terms.

Linear Model code output

Here’s the linear regression model R code representation:

## 
## Call:
## lm(formula = Scores ~ Hours, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2695 -1.1248 -0.2785  0.7707  3.3628 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0509     1.3346   2.286   0.0516 .  
## Hours         2.8361     0.2151  13.186 1.04e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.954 on 8 degrees of freedom
## Multiple R-squared:  0.956,  Adjusted R-squared:  0.9505 
## F-statistic: 173.9 on 1 and 8 DF,  p-value: 1.042e-06

Interpretration:

  • Intercept (\(\beta_0\)): Represents the expected value of Y if X is 0.
  • Slope (\(\beta_1\)): Represents the change in Y when x has increased by 1 unit
  • R-squared: Represents the proportion of variance in Y explained by X.

\[ R^2 = 1 - \frac{\sum (Y_i - \hat{Y_i})^2}{\sum (Y_i - \bar{Y})^2} \]

Summary