ADEC7310 Discussion 6

Part 1

I decided to use the mtcars dataset. Specifically, selecting mpg and weight as my two variables.

mpg (fuel efficiency in miles per gallon): Dependent variable because it depends on the weight variable.

wt: (weight in thousands of pounds): Independent variable because it is the cause or the predictor. Another way to look at it is that it is the input.

The question I am asking is what impact does the weight (input) of a car have on its fuel efficiency?

A. Estimation Equation: \[ \hat{y} = \beta_0 + \beta_1 x_i \]

(From the text)
Where:
\(\hat{y}\) = The predicted/estimated value of the response variable given a specific value of x.
\(\beta_0\) = The y-intercept of the line representing the predicted value of y when x = 0.
\(\beta_1\) = Slope of the line representing the estimated average change in y for a one-unit increase in x.
\(x\) = The explanatory or predictor variable.

# Load the data
#mtcars <- read.csv("C:/Users/leonedo/Documents/ADEC7310 Discussion 6/mtcars.csv")
data("mtcars")

# Fit the linear regression model: mpg ~ wt
model <- lm(mpg ~ wt, data = mtcars)

# View the results
summary(model)

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Observations:
Intercept \((\hat{\beta_0} = 37.285)\): It seems good to know that a car weighing 0 pounds has a fuel efficiency of 37.285. Obviously, this intercept behaves more like an anchor point for the regression line than an actual value that can be interpreted.
Slope \((\hat{\beta_1} = -5.344)\): Negative value indicates what I think we would all expect; the heavier the car, the less fuel efficient it is.

# Manually compute slope and intercept

# Step 1: Compute means
mean_mpg <- mean(mtcars$mpg)
mean_wt  <- mean(mtcars$wt)

# Step 2: Compute covariance of wt and mpg, and variance of wt
cov_wt_mpg  <- cov(mtcars$wt, mtcars$mpg)
var_wt      <- var(mtcars$wt)

# Step 3: Compute slope
beta1_hat <- cov_wt_mpg / var_wt

# Step 4: Compute intercept
beta0_hat <- mean_mpg - beta1_hat * mean_wt

# Display results
cat("Intercept (beta0):", round(beta0_hat, 4), "\n")

## Intercept (beta0): 37.2851

cat("Slope (beta1)    :", round(beta1_hat, 4), "\n")

## Slope (beta1)    : -5.3445

# Plot
plot(mtcars$wt, mtcars$mpg,
     main = "Vehicle Weight vs. Fuel Efficiency",
     xlab = "Weight (1,000 lbs)",
     ylab = "Fuel Efficiency (mpg)",
     pch  = 16,
     col  = "steelblue")

# Add regression line
abline(model, col = "firebrick", lwd = 2)

# Add legend
legend("topright",
       legend = c("Observed", "Regression Line"),
       pch    = c(16, NA),
       lty    = c(NA, 1),
       col    = c("steelblue", "firebrick"),
       lwd    = 3)

### Part 2

For this reply, I happily made use of a few online resources. The below are those that I found most useful:

https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/least-squares-regression-line/#OLS

https://www.statlect.com/fundamentals-of-statistics/Gauss-Markov-theorem

https://utminers.utep.edu/crboehmer/Bivariate%20Regression.pdf

I used these in addition to one of the resources given to us.

Ordinary Least Squares (OLS) is a method used to determine the regression line (“best fit”). It does this by making certain assumptions. These are called the Gauss Markov Theorem. This states that under the assumptions, the OLS is the best linear unbiased estimator (BLUE). This asserts that there is no other linear estimation method that will give you a smaller variance.

Gauss Markov Assumptions/Conditions
- Linearity: x and y need to be linear.
- Random Sampling: In order for results to be generailized, the sampling must be random.
- Non Collinearity: At its simplest, this means that x cannot be constant. More broadly, the regressors being calculated are not perfectly correlated with each other.
- Zero Conditional Mean: The error term has an expected value of zero.
- Homoscedasticity (my new favorite word): The variance ofthe errors (aka residuals) are constant across all levels of the independent variable.

I am not going to pretend I fully understood all the assumptions above, but I do know that the reliability of OLS estimates is enhanced when the assumptions are met.

I can’t wait to read everyone else’s responses here.

ADEC7310 Discussion 6

Dan Leone

2026-02-18

Part 1