2024-10-31

Introduction to simple linear regression

Simple linear regression is used to model the relationship between two variables by fitting a linear equation to observe the data

General equation for simple linear regression

\[ Y = \beta_0 + \beta_1 x + \epsilon \] Where \(Y\) is the dependent variable, \(x\) is the independent variable, \(\beta_0\) is the intercept, \(\beta_1\) is the slope, and \(\epsilon\) is the error term that we try to minimize.

Example dataset

We will use the built in mtcars dataset to predict mpg (miles per gallon) based on wt (Weight 1000 lbs). And we will also show simple linear regression interactive plot for mpg and hp(Horse power).

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Scater plot: Weight vs Miles per Gallon

Fiting a linear model

Now let’s fit a linear model to the data.

mod <- lm(mpg ~ wt, data = mtcars)
summary(mod)$coefficients
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 37.285126   1.877627 19.857575 8.241799e-19
## wt          -5.344472   0.559101 -9.559044 1.293959e-10

The fitted line has an intercept of 37.29 and a slope of -5.34. Which can be represented as: \[\hat{y} = 37.29 - 5.34x\] This indicate that for every unit of weight the miles per gallon decrease by 5.34 on average.

Linear model fit for Weight vs Miles per Gallon

Interactive linear model Fit for Horsepower vs Miles per Gallon

Code used for the two plots

Weight vs Miles per Gallon
ggplot( mtcars, aes(x = wt, y = mpg)) +
  geom_point() + 
  geom_smooth(method = "lm")+
  labs(x = "Weight", y = "Miles per Galon")
Horsepower vs Miles per Gallon
plot_ly(data = mtcars, x = ~hp, y = ~mpg, type = 'scatter', 
        mode = 'markers', name = "Data") %>%
  add_lines(x = ~hp, y= fitted(lm(mpg ~ hp, data = mtcars)), 
            name = "Fitted Line") %>%
  layout(
    xaxis = list(title = "Horsepower"),
    yaxis = list(title = "Miles per Gallon ")
  )

Conclution

Simple linear regression is a powerful statistic tool to understand the relationship between two variables.