This is the R portion of your mid-term exam. You will analyze the
Auto dataset, which contains information about various car models
(similar to mtcar). Follow the instructions carefully and
write your R code in the provided chunks. You will be graded on the
correctness of your code, the quality of your analysis, and your
interpretation of the results.
Total points: 10 Time allowed: 45 minutes
Good luck!
Auto, and display the first few rows. (1 points)# Your code here
# import Auto dataset name it "Auto"
library(readr)
Auto <- read_csv("Auto.csv")
## Rows: 392 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): name
## dbl (8): mpg, cylinders, displacement, horsepower, weight, acceleration, yea...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Auto)
#display first few rows
head(Auto)
## # A tibble: 6 × 9
## mpg cylinders displacement horsepower weight acceleration year origin name
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 18 8 307 130 3504 12 70 1 chev…
## 2 15 8 350 165 3693 11.5 70 1 buic…
## 3 18 8 318 150 3436 11 70 1 plym…
## 4 16 8 304 150 3433 12 70 1 amc …
## 5 17 8 302 140 3449 10.5 70 1 ford…
## 6 15 8 429 198 4341 10 70 1 ford…
# Your code here
str(Auto)
## spc_tbl_ [392 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ mpg : num [1:392] 18 15 18 16 17 15 14 14 14 15 ...
## $ cylinders : num [1:392] 8 8 8 8 8 8 8 8 8 8 ...
## $ displacement: num [1:392] 307 350 318 304 302 429 454 440 455 390 ...
## $ horsepower : num [1:392] 130 165 150 150 140 198 220 215 225 190 ...
## $ weight : num [1:392] 3504 3693 3436 3433 3449 ...
## $ acceleration: num [1:392] 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
## $ year : num [1:392] 70 70 70 70 70 70 70 70 70 70 ...
## $ origin : num [1:392] 1 1 1 1 1 1 1 1 1 1 ...
## $ name : chr [1:392] "chevrolet chevelle malibu" "buick skylark 320" "plymouth satellite" "amc rebel sst" ...
## - attr(*, "spec")=
## .. cols(
## .. mpg = col_double(),
## .. cylinders = col_double(),
## .. displacement = col_double(),
## .. horsepower = col_double(),
## .. weight = col_double(),
## .. acceleration = col_double(),
## .. year = col_double(),
## .. origin = col_double(),
## .. name = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Num_observations <- nrow(Auto)
Num_variables <- ncol(Auto)
[Your answer here] #There’s 392 observations, with 9 variables making up the dataset.
# Your code here
cor_matrix <- cor(Auto[, sapply (Auto, is.numeric)])
library(ggcorrplot)
## Loading required package: ggplot2
ggcorrplot(cor_matrix, lab = TRUE)
plot() or ggplot()). Add a title and proper
axis labels. You don’t need to interpret the result here but you should
know how. (1 points)# Your code here
# Create a scatter plot
library(ggplot2)
ggplot(Auto, aes(x = weight, y = mpg)) +
geom_point() +
labs(title = "Scatter Plot of MPG vs Weight",
x = "Weight",
y = "Miles Per Gallon (MPG)")
boxplot() or ggplot()). You don’t need to
interpret the result here but you should know how. (1 points)# Your code here
# Create boxplots
ggplot(Auto, aes(x = factor(origin), y = mpg)) +
geom_boxplot() +
labs(title = "Boxplot of MPG by Origin",
x = "Origin",
y = "Miles Per Gallon (MPG)")
# Your code here
# Fitting the model
model <- lm(mpg ~ weight + horsepower + year, data = Auto)
# Displaying the summary
summary(model)
##
## Call:
## lm(formula = mpg ~ weight + horsepower + year, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7911 -2.3220 -0.1753 2.0595 14.3527
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.372e+01 4.182e+00 -3.281 0.00113 **
## weight -6.448e-03 4.089e-04 -15.768 < 2e-16 ***
## horsepower -5.000e-03 9.439e-03 -0.530 0.59663
## year 7.487e-01 5.212e-02 14.365 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.43 on 388 degrees of freedom
## Multiple R-squared: 0.8083, Adjusted R-squared: 0.8068
## F-statistic: 545.4 on 3 and 388 DF, p-value: < 2.2e-16
weight. What
do they tell us about the relationship between the predictors and ‘mpg’?
(1 points)‘Weight’ is statistically significant, the intercept and coefficients of ‘weight’ tell us that when weight changes by one, mpg is positively correlated and will positively change with it according to that estimate.
# Your code here
# Create diagnostic plots
# Arrange plots in 2x2
par(mfrow = c(2, 2))
plot(model)
#The residuals VS Leverage graph is abnormal, suggesting it may not fit the data well.
# Your code here
# Obtain R-squared and adjusted R-squared
r_squared <- summary(model)$r.squared
adj_r_squared <- summary(model)$adj.r.squared
cat("R-squared:", r_squared, "\n")
## R-squared: 0.8083189
cat("Adjusted R-squared:", adj_r_squared, "\n")
## Adjusted R-squared: 0.8068368
The way I interpret it is The R-squared decreased after adjustment which means the adjustment lowered the accuracy of the model, and the added data did not add value to the model.
weight and
horsepower and report whether your model improved based on
adjusted R-squared. (1 point)# Your code here
# Fit the model with interaction term
model_interaction <- lm(mpg ~ weight * horsepower + year, data = Auto)
# Compare adjusted R-squared
adj_r_squared_interaction <- summary(model_interaction)$adj.r.squared
cat("Adjusted R-squared with interaction term:", adj_r_squared_interaction, "\n")
## Adjusted R-squared with interaction term: 0.8558773
Adding the interaction term increased the R-Squared which means Adding the interaction term increased the accuracy of the model therefore adding value to the analysis.
End of Exam. Please submit this RMD file along with a knitted HTML report.