This is the R portion of your mid-term exam. You will analyze the
Auto dataset, which contains information about various car models
(similar to mtcar). Follow the instructions carefully and
write your R code in the provided chunks. You will be graded on the
correctness of your code, the quality of your analysis, and your
interpretation of the results.
Total points: 10 Time allowed: 45 minutes
Good luck!
Auto, and display the first few rows. (1 points)Auto <- read.csv("Auto.csv")
head(Auto)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## 5 17 8 302 140 3449 10.5 70 1
## 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
str(Auto)
## 'data.frame': 392 obs. of 9 variables:
## $ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : int 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : int 130 165 150 150 140 198 220 215 225 190 ...
## $ weight : int 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
## $ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ year : int 70 70 70 70 70 70 70 70 70 70 ...
## $ origin : int 1 1 1 1 1 1 1 1 1 1 ...
## $ name : chr "chevrolet chevelle malibu" "buick skylark 320" "plymouth satellite" "amc rebel sst" ...
dim(Auto)
## [1] 392 9
#There are 392 observations and 9 variables in the dataset.
?sapply
## starting httpd help server ... done
Auto_Num_Only <- Auto[sapply(Auto, is.numeric)]
Auto_matrix <- cor(Auto_Num_Only)
print(Auto_matrix)
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
## horsepower -0.6891955 -0.4163615 -0.4551715
## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000
plot() or ggplot()). Add a title and proper
axis labels. You don’t need to interpret the result here but you should
know how. (1 points)plot(Auto$mpg , Auto$weight, main = "Scatterplot of mpg vs weight")
# Your code here, again, this is optional, no credit.Maybe come back when you finished all other questions.
m1 <- lm(mpg ~ weight + horsepower + year, data = Auto)
summary(m1)
##
## Call:
## lm(formula = mpg ~ weight + horsepower + year, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7911 -2.3220 -0.1753 2.0595 14.3527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.372e+01 4.182e+00 -3.281 0.00113 **
## weight -6.448e-03 4.089e-04 -15.768 < 2e-16 ***
## horsepower -5.000e-03 9.439e-03 -0.530 0.59663
## year 7.487e-01 5.212e-02 14.365 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.43 on 388 degrees of freedom
## Multiple R-squared: 0.8083, Adjusted R-squared: 0.8068
## F-statistic: 545.4 on 3 and 388 DF, p-value: < 2.2e-16
?lm
weight. What
do they tell us about the relationship between the predictors and ‘mpg’?
(1 points)#when all x variables are 0, mpg is -1.372e+01 (intercept) which does not work in the real world but that is the mathmatical intercept. For the coefficients, when the other variables are held constant, with 1 decrease in weight, mpg decreases by -6.44, same goes with horsepower, if contants are held, with 1 decrease in hp, results in a -5 to mpg. But for Year, when it goes down by 1, mpg actually increases by 7.487, which is surprising because id assumed older cars are less efficient on gas.
par(mfrow = c (2,2))
plot(m1)
There is slight non linearity as the plot points do not randomly scatter around value 0 in the residuals vs fitted plot. There is a slight curve to the line. Also there is potential issues on the tail ends of the model, specifically the top tail.
m1 <- lm(mpg ~ weight + horsepower + year, data = Auto)
summary(m1)$r.squared
## [1] 0.8083189
summary(m1)$r.squared
## [1] 0.8083189
str(m1)
## List of 12
## $ coefficients : Named num [1:4] -13.71936 -0.00645 -0.005 0.74871
## ..- attr(*, "names")= chr [1:4] "(Intercept)" "weight" "horsepower" "year"
## $ residuals : Named num [1:392] 2.553 0.946 2.214 0.195 1.248 ...
## ..- attr(*, "names")= chr [1:392] "1" "2" "3" "4" ...
## $ effects : Named num [1:392] -464.21 128.44 18.09 49.28 1.03 ...
## ..- attr(*, "names")= chr [1:392] "(Intercept)" "weight" "horsepower" "year" ...
## $ rank : int 4
## $ fitted.values: Named num [1:392] 15.4 14.1 15.8 15.8 15.8 ...
## ..- attr(*, "names")= chr [1:392] "1" "2" "3" "4" ...
## $ assign : int [1:4] 0 1 2 3
## $ qr :List of 5
## ..$ qr : num [1:392, 1:4] -19.799 0.0505 0.0505 0.0505 0.0505 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:392] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:4] "(Intercept)" "weight" "horsepower" "year"
## .. ..- attr(*, "assign")= int [1:4] 0 1 2 3
## ..$ qraux: num [1:4] 1.05 1.04 1.07 1.05
## ..$ pivot: int [1:4] 1 2 3 4
## ..$ tol : num 1e-07
## ..$ rank : int 4
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 388
## $ xlevels : Named list()
## $ call : language lm(formula = mpg ~ weight + horsepower + year, data = Auto)
## $ terms :Classes 'terms', 'formula' language mpg ~ weight + horsepower + year
## .. ..- attr(*, "variables")= language list(mpg, weight, horsepower, year)
## .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:4] "mpg" "weight" "horsepower" "year"
## .. .. .. ..$ : chr [1:3] "weight" "horsepower" "year"
## .. ..- attr(*, "term.labels")= chr [1:3] "weight" "horsepower" "year"
## .. ..- attr(*, "order")= int [1:3] 1 1 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(mpg, weight, horsepower, year)
## .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:4] "mpg" "weight" "horsepower" "year"
## $ model :'data.frame': 392 obs. of 4 variables:
## ..$ mpg : num [1:392] 18 15 18 16 17 15 14 14 14 15 ...
## ..$ weight : int [1:392] 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
## ..$ horsepower: int [1:392] 130 165 150 150 140 198 220 215 225 190 ...
## ..$ year : int [1:392] 70 70 70 70 70 70 70 70 70 70 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language mpg ~ weight + horsepower + year
## .. .. ..- attr(*, "variables")= language list(mpg, weight, horsepower, year)
## .. .. ..- attr(*, "factors")= int [1:4, 1:3] 0 1 0 0 0 0 1 0 0 0 ...
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:4] "mpg" "weight" "horsepower" "year"
## .. .. .. .. ..$ : chr [1:3] "weight" "horsepower" "year"
## .. .. ..- attr(*, "term.labels")= chr [1:3] "weight" "horsepower" "year"
## .. .. ..- attr(*, "order")= int [1:3] 1 1 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(mpg, weight, horsepower, year)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "numeric" "numeric"
## .. .. .. ..- attr(*, "names")= chr [1:4] "mpg" "weight" "horsepower" "year"
## - attr(*, "class")= chr "lm"
The r squared was .8083 or ~ 80.83% and the adjusted R was .8068 or ~ 80.68%. which means the 3 predictors account for roughly 80% of the variance in the model.The adjusted R penalizes having more variables but overall the R’s were quite similar.
weight and horsepower added to the ‘weight’,
‘horsepower’, and ‘year’ as predictors (X) and report the adjusted
R-squared. (1 point)m2 <- lm(mpg ~ weight + horsepower + year + weight * horsepower, data = Auto)
summary(m2)
##
## Call:
## lm(formula = mpg ~ weight + horsepower + year + weight * horsepower,
## data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.9146 -1.8987 -0.0386 1.5536 12.6333
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.577e+00 3.911e+00 0.915 0.361
## weight -1.185e-02 5.868e-04 -20.198 <2e-16 ***
## horsepower -2.236e-01 2.063e-02 -10.837 <2e-16 ***
## year 7.749e-01 4.508e-02 17.190 <2e-16 ***
## weight:horsepower 5.790e-05 5.020e-06 11.534 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.963 on 387 degrees of freedom
## Multiple R-squared: 0.8574, Adjusted R-squared: 0.8559
## F-statistic: 581.5 on 4 and 387 DF, p-value: < 2.2e-16
.8559 or 85.59%
Yes, including the interaction did increase the r-squared but that does not necessarily mean it improved the model, thats why its important to look at the adjusted r, because it takes into account, adding more variables. but overall adding the interaction did improve the model, suggesting that 2 those predictor variables have a greater effect on the target variable.
End of Exam. Please submit this RMD file along with a knitted HTML report. Failed to submit HTML will lead to 1pt deduction.