2024-10-25

Linear Regression

Linear regression is a statistical model that estimates the linear relationship between a scalar response and one or more explanatory variables.

Scientists in many fields, including biology and the behavioral, environmental, and social sciences, use linear regression to conduct preliminary data analysis and predict future trends.

In linear regression, the independent variable is the variable that is used to predict the value of another variable, while the dependent variable is the variable that is being predicted.

Mathematical Representation

Equation of Simple Linear Regression: \[ y = {\beta}_0 + {\beta}_1x \] where \(y\) is the dependent variable, \({\beta}_0\) is the intercept (when y = 0), \({\beta}_1\) is the slope of the regression line and \(x\) is the independent variable

Expected Value of \(y\): \[ E(y) = {\beta}_0 + {\beta}_1x \] where, \(E(y)\) is the expected value of \(y\), \({\beta}_0\) is the intercept, \({\beta}_1\) is the slope, and \(x\) is the independent variable

Data Set Used

We need to work with a data set to further explain linear regression. We will now load in the mtcars dataset.

data("mtcars")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Weight vs Mpg

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  labs(x = "Weight (in 1000's of lbs)", y = "Miles Per Gallon")

Linear Model of Weight vs MGP

We can observe there is a negative regression line. This means that for every unit increase in the independent variable (like weight), the dependent variable (MPG) decreases.

Linear Model Summary

model <- lm(mpg ~ wt, data = mtcars)
wt_coefficient <- summary(model)$coefficients["wt", "Estimate"]
wt_coefficient
## [1] -5.344472

The coefficient for wt is -5.3445, meaning that for every increase of 1 unit in weight (1000 lbs), the mpg decreases by approximately 5.34. This indicates a negative relationship between weight and miles per gallon: as weight increases, fuel efficiency decreases.

  • The coefficient for wt (\({\beta}_1\)) is the slope in the equation \(E(y)\) = \({\beta}_0\) + \({\beta}_1x\)
  • The variable wt itself represent \(x\) in the equation.

Horsepower vs MPG Linear Model

## [1] -0.06822828

We can observe the linear relationship of horsepower vs miles per gallon having a negative relationship. The coefficient -0.06 represents the expected change in miles per gallon for each one-unit increase in horsepower.

Mathematical Representation

\[ E(y) = {\beta}_0 + {\beta}_1x \] From the previous graph, we are calculating the expected value (\(E(y)\)) of miles per gallon based on horsepower.

  • \({\beta_0}\) is the intercept of the regression line, representing the expected MPG when horsepower is zero. While in practical terms, a horsepower of zero may not make sense, it provides a baseline for understanding the relationship.

  • \({\beta}_1\) is the slope of the regression line, indicating the expected negative change of 0.06 in MPG for each one-unit increase in horsepower.

Mathematical Representation Example

For example, if we assume \({\beta_0}\) = 30 and \({\beta}_1\) = -0.06 based from our earlier analysis, we can say:

  • When horsepower is zero, the model predicts that the MPG would be 30.

  • For every additional horsepower, we expect the MPG to decrease by 0.06, suggesting that higher horsepower vehicles are generally less fuel-efficient.

By examining these coefficients and their meanings, we can better understand the relationship between vehicle characteristics and their performance in terms of fuel consumption.

Behind the scenes

The code the use to produce the graph and horsepower coefficient is shown below.

ggplot(mtcars, aes(hp, mpg)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE, 
              color = "pink") +
  labs(x = "Horsepower", y = "Miles Per Gallon")

model2 <- lm(mpg ~ hp, data = mtcars)
hp_coeff <- summary(model2)$coefficients["hp", "Estimate"]
hp_coeff

3D Scatter Plot

Behind the Scenes

The code used to produce the 3D scatter plot in the previous slide is provided below.

plot_ly(data = mtcars, x = ~hp, y = ~wt, z = ~mpg,
        type = "scatter3d", mode = "markers", 
        marker = list(size = 5, color = "deeppink", opacity = 0.9)) %>%
  layout(scene = list(
      xaxis = list(title = 'Horsepower'),
      yaxis = list(title = "Weight (in 1000's of lbs)"),
      zaxis = list(title = 'Miles Per Gallon')),
    title = "3D Scatter Plot of MPG, Horsepower, and Weight"
)

Conclusion

Key Takeaways:

  • Linear regression models demonstrate the negative relationship between horsepower MPG as well as weight and MPG.
  • The coefficients provide valuable insights into how changes in independent variables (horsepower and weight) affect fuel efficiency.

Implications:

  • Understanding these relationships can inform consumer choices when purchasing vehicles, emphasizing the trade-offs between performance (horsepower) and fuel efficiency.

Further Research:

  • Further exploration of other factors influencing MPG, such as engine type and vehicle design, could enhance our understanding of fuel efficiency.