mtcars
Linear_Model <- lm(mpg ~ wt + hp, data = mtcars) summary(Linear_Model)
#Question 3: par(mfrow = c(2,2)) plot(Linear_Model)
###Linear regression assumes linearity, independence, homoscedasticity, normality of errors, and no influential outliers. The Residuals vs Fitted plot shows slight curvature, suggesting minor deviation from strict linearity, though residuals are generally centered around zero. The Scale-Location plot shows a mild upward trend, indicating possible slight heteroscedasticity, but not severe. The Q-Q plot demonstrates that residuals approximately follow a normal distribution with minor deviation in the tails. The Residuals vs Leverage plot identifies a few moderately influential points, but none exceed Cook’s distance thresholds. Because the dataset contains different car models, independence of observations is reasonably assumed. Overall, the assumptions of linear regression are reasonably satisfied for this dataset.
###Question 4 mean(residuals(Linear_Model)^2) #The Mean Squared Error (MSE) of the model is approximately 6.10. This indicates that, on average, the squared prediction error of mpg is about 6.10 mpg².
###Question 5 #R^2 = .8268 or 82.68% which means about 82.7% of the variation in mpg is explained by weight and horsepower in the model. That is considered a strong fit
###Question 6 Linear_Model_Int <- lm(mpg ~ wt * hp, data = mtcars) summary(Linear_Model_Int) #The new model includes an interaction between weight and horsepower, meaning the effect of weight on mpg depends on the level of horsepower. The positive interaction coefficient (0.02785) suggests that as horsepower increases, the negative impact of weight on fuel efficiency becomes less severe. Because the interaction term is statistically significant and improves R², it meaningfully enhances the model’s explanatory power.
###Question 7 # Find percentiles p5 <- quantile(mtcars\(hp, 0.05) p95 <- quantile(mtcars\)hp, 0.95)
p5 <- quantile(mtcars\(hp, 0.05) p95 <- quantile(mtcars\)hp, 0.95) hp_win <- mtcars$hp hp_win[hp_win < p5] <- p5 hp_win[hp_win > p95] <- p95 Linear_Model_Win <- lm(mpg ~ wt + hp_win, data = mtcars) summary(Linear_Model_Win)
#After applying 5%/95% winsorization to horsepower, the R² increased slightly from 0.8268 to 0.8330. This indicates a small improvement in explanatory power, suggesting that extreme horsepower values had a minor influence on the model. The coefficient for weight changed from -3.8778 to -3.5828, indicating a slightly weaker negative effect. The horsepower coefficient changed from -0.03177 to -0.03952, becoming slightly more negative. Overall, winsorization did not dramatically change model performance, suggesting that the original model was relatively robust to outliers in horsepower.
#Question 8 install.packages(“car”) # only once library(car) vif(Linear_Model) ##The VIF values for both weight and horsepower are approximately 1.77. Since these values are well below the common threshold of 5, there is no evidence of serious multicollinearity. Although weight and horsepower are somewhat correlated, the level of multicollinearity is low and does not significantly inflate the variance of the estimated coefficients.