dhubert2 Dat 301 HW3

2026-06-07

Intro: Linear Regression

Linear regression is used to model the relationship between a continuous dependent variable and one or more independent variables. It is a straight line through data to help predict outcomes and understand how factors influence the target variable.

Regression Equation

The Model is

\[ Y = \beta_0 + \beta_1X + \epsilon \]

The first Beta is the y-intercept which represent the baseline value of y when x is 0
The second Beta is the Slope for equation and represent the change in y for every unit of x
X is the independent variable being used to predict the target variable
The Epsilon is the random error term that helps take into account error
Y is the target variable

Assumption

We assume that

\[ \epsilon \sim N(0,\sigma^2) \]

Epsilon is the random error term
N is the normal distribution
0 represent the mean expected value of the errors
Sigma squared is the variance of the errors

Scatter Plot

ggplot(mtcars, aes(wt, mpg)) +
  geom_point() +
  geom_smooth(method = "lm")

## `geom_smooth()` using formula = 'y ~ x'

Residual Plot

Residual Plots are used to tell how good the line is created by linear regression. A good residual plot should show points scattered around the zero line in no obvious pattern

model <- lm(mpg ~ wt, data = mtcars)

ggplot(data.frame(
  fitted = fitted(model),
  residuals = resid(model)
),
aes(fitted, residuals)) +
  geom_point() + 
  geom_hline(yintercept = 0, linetype = "dashed") #Adds x-axis line at 0 to help

Interactive Plot

Here is a more interactive plot using Plotly to get a better idea of the data

plot_ly(
  data = mtcars, 
  x = ~wt, 
  y = ~mpg, 
  type = "scatter", 
  mode = "markers", 
  color = ~factor(cyl)
  ) %>%
  layout(
    title = "Miles per Gallon vs Weight",
      xaxis = list(title = "Weight"),
      yaxis = list(title = "Miles per Gallon")
    )

  #I do not know why it doesn't show up in the html file

Summary of Data

model <- lm(mpg ~ wt, data = mtcars)
summary(model)

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Conclusion

The data shows that weight has a negatively associated relationship with miles per gallon of fuel.
Linear Regression helps show the relationship of these two variables in an easier to process way.
The model created using linear regression can be used to predict the gas milage of a car using their weight.