Simple Linear Regression

Introduction

  • Simple Linear Regression models used to estimate the relationship between two quantitative variables.
  • a linear relationship between an independent variable \(x\) and a dependent variable \(y\).

The Linear Model

The simple linear regression model is:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

  • \(y\): Dependent variable
  • \(x\): Independent variable
  • \(\beta_0\): Intercept
  • \(\beta_1\): Slope
  • \(\epsilon\): Error term

Estimating Parameters

The parameters \(\beta_0\) and \(\beta_1\) are estimated using the Least Squares Method:

\[ \min_{\beta_0, \beta_1} \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_i)^2 \]

  • Minimizes the sum of squared residuals.

Dataset

We will use the mtcars dataset to model the relationship between Horsepower (hp) and Weight (wt).

Dataset diamond from UsingR

data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
model <- lm(wt ~ hp, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = wt ~ hp, data = mtcars)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.41757 -0.53122 -0.02038  0.42536  1.56455 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.838247   0.316520   5.808 2.39e-06 ***
## hp          0.009401   0.001960   4.796 4.15e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7483 on 30 degrees of freedom
## Multiple R-squared:  0.4339, Adjusted R-squared:  0.4151 
## F-statistic:    23 on 1 and 30 DF,  p-value: 4.146e-05

Residuals vs Fitted plot