title: “Investigating Determinants of Car Performance Using the mtcars Dataset” author: “Shen Hu” date: “2024-06-17” output: html_document: toc: true toc_depth: 2


data(mtcars)
colSums(is.na(mtcars))
 mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
   0    0    0    0    0    0    0    0    0    0    0 

There is no missing values in the data set. ## Step 2 Use mutate() to change/create at least 3 variables. ::: {.cell}

mtcars <- mtcars %>%  
  mutate(
    hp2 = hp^2, 
    lwt = log(wt), 
    high_mpg = factor(ifelse(mpg > 20, 1, 0))
  )

:::

Step 3

Convert at least one variable from numeric to factor by creating categories. ::: {.cell}

mtcars$cyl <- factor(mtcars$cyl)

:::

Step 4

Use at least two fct_ functions to manage your factor variables. ::: {.cell}

mtcars$gear <- fct_infreq(factor(mtcars$gear))
mtcars$carb <- fct_collapse(factor(mtcars$carb), "Other" = c("6", "8"))

:::

Step 5

Use select() and at least one select helper at least once.

mtcars_selected <- mtcars %>%  
  select(starts_with("cyl"), mpg, hp, wt, gear, carb, high_mpg)

Step 6

Use case_when() at least once. ::: {.cell}

mtcars <- mtcars %>%  
  mutate(
    performance = case_when(
      mpg < 15 ~ "Low",
      mpg >= 15 & mpg < 25 ~ "Medium",
      mpg >= 25 ~ "High"
    )
  )

:::

Step 7

Apply some eligibility criteria (at least two criteria) using filter(). ::: {.cell}

mtcars_filtered <- mtcars %>%  
  filter(wt < 3.5, gear %in% c("4", "5"))

:::

Step 8

Use the pipe to link a few of these steps together.

mtcars_transformed <- mtcars %>%  
  mutate(hp2 = hp^2, lwt = log(wt), high_mpg = factor(ifelse(mpg > 20, 1, 0))) %>%
  mutate(performance = case_when(
    mpg < 15 ~ "Low",
    mpg >= 15 & mpg < 25 ~ "Medium",
    mpg >= 25 ~ "High"
  )) %>%
  select(starts_with("cyl"), mpg, hp, wt, gear, carb, high_mpg) %>%
  filter(wt < 3.5, gear %in% c("4", "5"))

Step 9

Create a Table 1 of descriptive statistics about your sample. ::: {.cell}

mtcars_transformed %>%  
  group_by(gear) %>%  
  summarise(
    mean_wt = mean(wt, na.rm = TRUE), 
    sd_wt = sd(wt, na.rm = TRUE), 
    mean_hp = mean(hp, na.rm = TRUE), 
    sd_hp = sd(hp, na.rm = TRUE)
  ) %>% 
  knitr::kable(digits = 3, caption = "Summary Descriptive Statistics")
Summary Descriptive Statistics
gear mean_wt sd_wt mean_hp sd_hp
4 2.617 0.633 89.50 25.893
5 2.398 0.727 160.75 77.478

:::

Step 10

Fit a regression and present well-formatted results. ::: {.cell}

model <- lm(mpg ~ hp + wt, data = mtcars_transformed)
summary(model)

Call:
lm(formula = mpg ~ hp + wt, data = mtcars_transformed)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2778 -1.3654 -0.6425  1.4224  4.5052 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 42.70902    2.55720  16.701 3.64e-10 ***
hp          -0.04312    0.01342  -3.212 0.006809 ** 
wt          -5.44012    1.09272  -4.979 0.000252 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.398 on 13 degrees of freedom
Multiple R-squared:  0.8301,    Adjusted R-squared:  0.8039 
F-statistic: 31.75 on 2 and 13 DF,  p-value: 9.919e-06
stargazer(model, type = "text", title = "Regression Results")

Regression Results
===============================================
                        Dependent variable:    
                    ---------------------------
                                mpg            
-----------------------------------------------
hp                           -0.043***         
                              (0.013)          
                                               
wt                           -5.440***         
                              (1.093)          
                                               
Constant                     42.709***         
                              (2.557)          
                                               
-----------------------------------------------
Observations                    16             
R2                             0.830           
Adjusted R2                    0.804           
Residual Std. Error       2.398 (df = 13)      
F Statistic           31.755*** (df = 2; 13)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

:::

Display well-formatted regression results ::: {.cell}

stargazer(model, type = "text", title = "Regression Results")

Regression Results
===============================================
                        Dependent variable:    
                    ---------------------------
                                mpg            
-----------------------------------------------
hp                           -0.043***         
                              (0.013)          
                                               
wt                           -5.440***         
                              (1.093)          
                                               
Constant                     42.709***         
                              (2.557)          
                                               
-----------------------------------------------
Observations                    16             
R2                             0.830           
Adjusted R2                    0.804           
Residual Std. Error       2.398 (df = 13)      
F Statistic           31.755*** (df = 2; 13)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

:::

Step 11: Create at least 2 ggplots of different types.

ggplot(mtcars_transformed, aes(x = hp, y = mpg)) + 
  geom_point() + 
  geom_smooth(method = "lm", se = F, color = "red", formula = "y ~ x") + 
  labs(x = "Horsepower", y = "Miles per Gallon", title = "Scatter Plot of MPG vs HP")

mtcars_transformed %>%  
  ggplot(aes(x = gear, y = mpg)) + 
  geom_boxplot() + 
  labs(x = "Gear", title = "Box Plot of MPG by Gear")

Step 12

Use a _join() function. ::: {.cell}

df1 <- mtcars_transformed %>%  
  group_by(gear) %>%  
  summarise(mean_mpg = mean(mpg))
df2 <- mtcars_transformed %>%  
  group_by(gear) %>%  
  summarise(mean_wt = mean(wt))
df <- inner_join(df1, df2, by = "gear")
df
# A tibble: 2 × 3
  gear  mean_mpg mean_wt
  <fct>    <dbl>   <dbl>
1 4         24.5    2.62
2 5         23.0    2.40

:::

Step 13

Write data to a different folder. ::: {.cell}

dir.create("data")
Warning in dir.create("data"): 'data' already exists
write.csv(mtcars_transformed, file = "data/mtcars_transformed.csv", row.names = FALSE)

:::

Conclusion

The linear regression analysis reveals that both horsepower and weight are significant predictors of miles per gallon (mpg) in cars. The model’s overall fit is reasonable, explaining a significant portion of the variability in mpg. Further analysis and model refinement could provide deeper insights into the determinants of car performance.