DATA605_Discussion12

We used the mtcars dataset to predict miles per gallon (mpg). The dependant variable.

The model includes horsepower (hp), weight (wt), and transmission type (am).
We will add horsepower squared (hp^2) for non-linearity.
An interaction term (am:wt) will check if transmission type changes how weight affects mpg.

Building the Model with Specified Terms:

   data(mtcars)

   # Create a quadratic term for hp
   mtcars$hp2 <- mtcars$hp^2
   
   model <- lm(mpg ~ hp + wt + am + hp2 + am:wt, data=mtcars)
   
   summary(model)

## 
## Call:
## lm(formula = mpg ~ hp + wt + am + hp2 + am:wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6762 -1.5665 -0.1225  1.2939  4.1822 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.593e+01  2.962e+00  12.130  3.3e-12 ***
## hp          -1.113e-01  3.038e-02  -3.665  0.00111 ** 
## wt          -1.999e+00  7.691e-01  -2.598  0.01522 *  
## am           1.203e+01  3.568e+00   3.372  0.00235 ** 
## hp2          2.320e-04  8.003e-05   2.899  0.00752 ** 
## wt:am       -4.093e+00  1.290e+00  -3.172  0.00386 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.066 on 26 degrees of freedom
## Multiple R-squared:  0.9014, Adjusted R-squared:  0.8825 
## F-statistic: 47.56 on 5 and 26 DF,  p-value: 2.9e-12

Interpretation of Coefficients:

Intercept: Estimated average mpg is 35.93 when all other variables are 0 (theoretical).
hp: Increase in horsepower leads to a decrease in mpg by 0.1113.
wt: Increase in weight leads to a decrease in mpg by 1.999.
am: Manual transmission cars have 12.03 more mpg on average than automatics.
hp2: The positive coefficient suggests a slight non-linear relationship between horsepower and mpg.
am:wt: The interaction term indicates that in manual cars, the negative impact of weight on mpg is greater by 4.093 units.

Residual Analysis:

   par(mfrow = c(2, 2))
   plot(model)

Residuals vs. Fitted suggests possible non-linearity or heteroscedasticity.
Q-Q Plot indicates residuals are mostly normal with potential outliers.
Scale-Location shows potential signs of heteroscedasticity.

Model Appropriateness:
The model seems to fit the data well, with a high R-squared of 0.9014, indicating a strong explanatory power. However, the residual plots suggest there may be some violations of homoscedasticity and linearity. Thus, while the model is generally appropriate, these issues may need to be addressed, possibly by transforming variables or considering a different model form.

DATA605_Discussion12

Haig Bedros

2024-04-10