data(mtcars)
colSums(is.na(mtcars)) mpg cyl disp hp drat wt qsec vs am gear carb
0 0 0 0 0 0 0 0 0 0 0
title: “Investigating Determinants of Car Performance Using the mtcars Dataset” author: “Shen Hu” date: “2024-06-17” output: html_document: toc: true toc_depth: 2
data(mtcars)
colSums(is.na(mtcars)) mpg cyl disp hp drat wt qsec vs am gear carb
0 0 0 0 0 0 0 0 0 0 0
There is no missing values in the data set. ## Step 2 Use mutate() to change/create at least 3 variables. ::: {.cell}
mtcars <- mtcars %>%
mutate(
hp2 = hp^2,
lwt = log(wt),
high_mpg = factor(ifelse(mpg > 20, 1, 0))
):::
Convert at least one variable from numeric to factor by creating categories. ::: {.cell}
mtcars$cyl <- factor(mtcars$cyl):::
Use at least two fct_ functions to manage your factor variables. ::: {.cell}
mtcars$gear <- fct_infreq(factor(mtcars$gear))
mtcars$carb <- fct_collapse(factor(mtcars$carb), "Other" = c("6", "8")):::
Use select() and at least one select helper at least once.
mtcars_selected <- mtcars %>%
select(starts_with("cyl"), mpg, hp, wt, gear, carb, high_mpg)Use case_when() at least once. ::: {.cell}
mtcars <- mtcars %>%
mutate(
performance = case_when(
mpg < 15 ~ "Low",
mpg >= 15 & mpg < 25 ~ "Medium",
mpg >= 25 ~ "High"
)
):::
Apply some eligibility criteria (at least two criteria) using filter(). ::: {.cell}
mtcars_filtered <- mtcars %>%
filter(wt < 3.5, gear %in% c("4", "5")):::
Use the pipe to link a few of these steps together.
mtcars_transformed <- mtcars %>%
mutate(hp2 = hp^2, lwt = log(wt), high_mpg = factor(ifelse(mpg > 20, 1, 0))) %>%
mutate(performance = case_when(
mpg < 15 ~ "Low",
mpg >= 15 & mpg < 25 ~ "Medium",
mpg >= 25 ~ "High"
)) %>%
select(starts_with("cyl"), mpg, hp, wt, gear, carb, high_mpg) %>%
filter(wt < 3.5, gear %in% c("4", "5"))Create a Table 1 of descriptive statistics about your sample. ::: {.cell}
mtcars_transformed %>%
group_by(gear) %>%
summarise(
mean_wt = mean(wt, na.rm = TRUE),
sd_wt = sd(wt, na.rm = TRUE),
mean_hp = mean(hp, na.rm = TRUE),
sd_hp = sd(hp, na.rm = TRUE)
) %>%
knitr::kable(digits = 3, caption = "Summary Descriptive Statistics")| gear | mean_wt | sd_wt | mean_hp | sd_hp |
|---|---|---|---|---|
| 4 | 2.617 | 0.633 | 89.50 | 25.893 |
| 5 | 2.398 | 0.727 | 160.75 | 77.478 |
:::
Fit a regression and present well-formatted results. ::: {.cell}
model <- lm(mpg ~ hp + wt, data = mtcars_transformed)
summary(model)
Call:
lm(formula = mpg ~ hp + wt, data = mtcars_transformed)
Residuals:
Min 1Q Median 3Q Max
-3.2778 -1.3654 -0.6425 1.4224 4.5052
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.70902 2.55720 16.701 3.64e-10 ***
hp -0.04312 0.01342 -3.212 0.006809 **
wt -5.44012 1.09272 -4.979 0.000252 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.398 on 13 degrees of freedom
Multiple R-squared: 0.8301, Adjusted R-squared: 0.8039
F-statistic: 31.75 on 2 and 13 DF, p-value: 9.919e-06
stargazer(model, type = "text", title = "Regression Results")
Regression Results
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
hp -0.043***
(0.013)
wt -5.440***
(1.093)
Constant 42.709***
(2.557)
-----------------------------------------------
Observations 16
R2 0.830
Adjusted R2 0.804
Residual Std. Error 2.398 (df = 13)
F Statistic 31.755*** (df = 2; 13)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
:::
Display well-formatted regression results ::: {.cell}
stargazer(model, type = "text", title = "Regression Results")
Regression Results
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
hp -0.043***
(0.013)
wt -5.440***
(1.093)
Constant 42.709***
(2.557)
-----------------------------------------------
Observations 16
R2 0.830
Adjusted R2 0.804
Residual Std. Error 2.398 (df = 13)
F Statistic 31.755*** (df = 2; 13)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
:::
ggplot(mtcars_transformed, aes(x = hp, y = mpg)) +
geom_point() +
geom_smooth(method = "lm", se = F, color = "red", formula = "y ~ x") +
labs(x = "Horsepower", y = "Miles per Gallon", title = "Scatter Plot of MPG vs HP")mtcars_transformed %>%
ggplot(aes(x = gear, y = mpg)) +
geom_boxplot() +
labs(x = "Gear", title = "Box Plot of MPG by Gear")Use a _join() function. ::: {.cell}
df1 <- mtcars_transformed %>%
group_by(gear) %>%
summarise(mean_mpg = mean(mpg))
df2 <- mtcars_transformed %>%
group_by(gear) %>%
summarise(mean_wt = mean(wt))
df <- inner_join(df1, df2, by = "gear")
df# A tibble: 2 × 3
gear mean_mpg mean_wt
<fct> <dbl> <dbl>
1 4 24.5 2.62
2 5 23.0 2.40
:::
Write data to a different folder. ::: {.cell}
dir.create("data")Warning in dir.create("data"): 'data' already exists
write.csv(mtcars_transformed, file = "data/mtcars_transformed.csv", row.names = FALSE):::
The linear regression analysis reveals that both horsepower and weight are significant predictors of miles per gallon (mpg) in cars. The model’s overall fit is reasonable, explaining a significant portion of the variability in mpg. Further analysis and model refinement could provide deeper insights into the determinants of car performance.