1 Introduction

This report explores two core datasets: one focused on educational attainment and income in the United States, and the other on global annual mean temperatures. Using R, I apply data wrangling, custom function creation, visualization, and classification techniques to uncover trends and insights across both domains.


2 Part I: Education and Income in the U.S.

2.1 1. Load Data

acs_df <- readRDS("acs.rds") %>%
  clean_names() %>%
  mutate(
    edu = factor(edu, levels = c("Less than HS", "HS", "Some College",
                                 "Associate", "Bachelor", "Master",
                                 "Professional", "Doctorate"))
  )

2.2 2. Summary by Education Level

acs_df %>%
  group_by(edu) %>%
  summarise(
    count = n(),
    median_income = median(income, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(median_income))

2.3 3. Visualizations

plot_a <- ggplot(acs_df, aes(x = edu)) +
  geom_bar(fill = "steelblue") +
  labs(title = "Number of People by Education Level", x = "Education", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

plot_b <- ggplot(acs_df, aes(x = edu, y = income)) +
  geom_boxplot(fill = "purple") +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Household Income by Education Level", x = "Education", y = "Income") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

plot_a / plot_b

Insight: Median household income increases consistently with higher education levels.


4 Conclusion

This project showcases a range of data analysis skills, including:

Both parts demonstrate how data storytelling and technical skills come together to derive meaningful conclusions.


5 Session Info

sessionInfo()
## R version 4.3.3 (2024-02-29)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.2.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] janitor_2.2.1   patchwork_1.3.0 lubridate_1.9.3 forcats_1.0.0  
##  [5] stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.5    
##  [9] tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.9        utf8_1.2.4        generics_0.1.3    stringi_1.8.3    
##  [5] hms_1.1.3         digest_0.6.35     magrittr_2.0.3    evaluate_1.0.3   
##  [9] grid_4.3.3        timechange_0.3.0  fastmap_1.2.0     jsonlite_1.8.8   
## [13] fansi_1.0.6       scales_1.3.0      jquerylib_0.1.4   cli_3.6.2        
## [17] crayon_1.5.2      rlang_1.1.3       bit64_4.0.5       munsell_0.5.0    
## [21] withr_3.0.2       cachem_1.1.0      yaml_2.3.8        parallel_4.3.3   
## [25] tools_4.3.3       tzdb_0.4.0        colorspace_2.1-0  vctrs_0.6.5      
## [29] R6_2.5.1          lifecycle_1.0.4   snakecase_0.11.1  bit_4.0.5        
## [33] vroom_1.6.5       pkgconfig_2.0.3   pillar_1.9.0      bslib_0.6.2      
## [37] gtable_0.3.4      glue_1.7.0        xfun_0.43         tidyselect_1.2.1 
## [41] highr_0.10        rstudioapi_0.16.0 knitr_1.45        farver_2.1.1     
## [45] htmltools_0.5.8   rmarkdown_2.26    labeling_0.4.3    compiler_4.3.3