Introduction to Simple Linear Regression

  • Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous variables.
  • One variable, denoted \(x\), is regarded as the predictor, explanatory, or independent variable.
  • The other variable, denoted \(y\), is regarded as the response, outcome, or dependent variable.

The simple linear regression model is defined by the following equation:

\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i\]

Where: - \(y_i\) is the response variable for the \(i^{th}\) observation. - \(x_i\) is the explanatory variable. - \(\beta_0\) is the y-intercept. - \(\beta_1\) is the slope of the regression line. - \(\epsilon_i\) is the random error term.

To find the line of best fit, we minimize the sum of squared residuals (Least Squares Method). The estimates are calculated as:

\[\hat{\beta}_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}\]

\[\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}\]

These formulas give us the estimated slope and intercept.

The Dataset: mtcars

We will use the built-in mtcars dataset to demonstrate simple linear regression. We want to look at the relationship between the weight of a car and its fuel efficiency.

Variables of interest: - Weight (wt) of the car in 1000 lbs - Miles per Gallon (mpg)

##                      wt  mpg
## Mazda RX4         2.620 21.0
## Mazda RX4 Wag     2.875 21.0
## Datsun 710        2.320 22.8
## Hornet 4 Drive    3.215 21.4
## Hornet Sportabout 3.440 18.7
## Valiant           3.460 18.1

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10