I decided to use the mtcars dataset. Specifically, selecting mpg and weight as my two variables.
mpg (fuel efficiency in miles per gallon): Dependent variable because it depends on the weight variable.
wt: (weight in thousands of pounds): Independent variable because it is the cause or the predictor. Another way to look at it is that it is the input.
The question I am asking is what impact does the weight (input) of a car have on its fuel efficiency?
A. Estimation Equation: \[ \hat{y} = \beta_0 + \beta_1 x_i \]
(From the text)
Where:
\(\hat{y}\) = The predicted/estimated
value of the response variable given a specific value of x.
\(\beta_0\) = The y-intercept of the
line representing the predicted value of y when x = 0.
\(\beta_1\) = Slope of the line
representing the estimated average change in y for a one-unit increase
in x.
\(x\) = The explanatory or predictor
variable.
# Load the data
#mtcars <- read.csv("C:/Users/leonedo/Documents/ADEC7310 Discussion 6/mtcars.csv")
data("mtcars")
# Fit the linear regression model: mpg ~ wt
model <- lm(mpg ~ wt, data = mtcars)
# View the results
summary(model)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Observations:
Intercept \((\hat{\beta_0} = 37.285)\):
It seems good to know that a car weighing 0 pounds has a fuel efficiency
of 37.285. Obviously, this intercept behaves more like an anchor point
for the regression line than an actual value that can be
interpreted.
Slope \((\hat{\beta_1} = -5.344)\):
Negative value indicates what I think we would all expect; the heavier
the car, the less fuel efficient it is.
# Manually compute slope and intercept
# Step 1: Compute means
mean_mpg <- mean(mtcars$mpg)
mean_wt <- mean(mtcars$wt)
# Step 2: Compute covariance of wt and mpg, and variance of wt
cov_wt_mpg <- cov(mtcars$wt, mtcars$mpg)
var_wt <- var(mtcars$wt)
# Step 3: Compute slope
beta1_hat <- cov_wt_mpg / var_wt
# Step 4: Compute intercept
beta0_hat <- mean_mpg - beta1_hat * mean_wt
# Display results
cat("Intercept (beta0):", round(beta0_hat, 4), "\n")
## Intercept (beta0): 37.2851
cat("Slope (beta1) :", round(beta1_hat, 4), "\n")
## Slope (beta1) : -5.3445
# Plot
plot(mtcars$wt, mtcars$mpg,
main = "Vehicle Weight vs. Fuel Efficiency",
xlab = "Weight (1,000 lbs)",
ylab = "Fuel Efficiency (mpg)",
pch = 16,
col = "steelblue")
# Add regression line
abline(model, col = "firebrick", lwd = 2)
# Add legend
legend("topright",
legend = c("Observed", "Regression Line"),
pch = c(16, NA),
lty = c(NA, 1),
col = c("steelblue", "firebrick"),
lwd = 3)
### Part 2
For this reply, I happily made use of a few online resources. The below are those that I found most useful:
https://www.statlect.com/fundamentals-of-statistics/Gauss-Markov-theorem
https://utminers.utep.edu/crboehmer/Bivariate%20Regression.pdf
I used these in addition to one of the resources given to us.
Ordinary Least Squares (OLS) is a method used to determine the regression line (“best fit”). It does this by making certain assumptions. These are called the Gauss Markov Theorem. This states that under the assumptions, the OLS is the best linear unbiased estimator (BLUE). This asserts that there is no other linear estimation method that will give you a smaller variance.
I am not going to pretend I fully understood all the assumptions above, but I do know that the reliability of OLS estimates is enhanced when the assumptions are met.
I can’t wait to read everyone else’s responses here.