2026-06-07

Intro: Linear Regression

Linear regression is used to model the relationship between a continuous dependent variable and one or more independent variables. It is a straight line through data to help predict outcomes and understand how factors influence the target variable.

Regression Equation

The Model is

\[ Y = \beta_0 + \beta_1X + \epsilon \]

  • The first Beta is the y-intercept which represent the baseline value of y when x is 0
  • The second Beta is the Slope for equation and represent the change in y for every unit of x
  • X is the independent variable being used to predict the target variable
  • The Epsilon is the random error term that helps take into account error
  • Y is the target variable

Assumption

We assume that

\[ \epsilon \sim N(0,\sigma^2) \]

  • Epsilon is the random error term
  • N is the normal distribution
  • 0 represent the mean expected value of the errors
  • Sigma squared is the variance of the errors

Scatter Plot

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Residual Plot

Residual Plots are used to tell how good the line is created by linear regression. A good residual plot should show points scattered around the zero line in no obvious pattern

model <- lm(mpg ~ wt, data = mtcars)

ggplot(data.frame(
  fitted = fitted(model),
  residuals = resid(model)
),
aes(fitted, residuals)) +
  geom_point() + 
  geom_hline(yintercept = 0, linetype = "dashed") #Adds x-axis line at 0 to help

Interactive Plot

Here is a more interactive plot using Plotly to get a better idea of the data

plot_ly(
  data = mtcars, 
  x = ~wt, 
  y = ~mpg, 
  type = "scatter", 
  mode = "markers", 
  color = ~factor(cyl)
  ) %>%
  layout(
    title = "Miles per Gallon vs Weight",
      xaxis = list(title = "Weight"),
      yaxis = list(title = "Miles per Gallon")
    )
  #I do not know why it doesn't show up in the html file

Summary of Data

model <- lm(mpg ~ wt, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Conclusion

  • The data shows that weight has a negatively associated relationship with miles per gallon of fuel.
  • Linear Regression helps show the relationship of these two variables in an easier to process way.
  • The model created using linear regression can be used to predict the gas milage of a car using their weight.