See below for infomration on Public Use Microdata Sample (PUMS)
Public Use Microdata Sample (PUMS) Documentation https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html 2010 Census Public Use Microdata Area (PUMA) Reference Maps - New Hampshire https://www.census.gov/geo/maps-data/maps/2010puma/st33_nh.html
# Load packages
library(tidyverse)
# Import data
PUMS_cleaned <- read.csv("~/R/busStat/Data/PUMS_cleaned.csv") %>% as_tibble()
PUMS_cleaned
## # A tibble: 67,248 x 6
## X PUMA age education field_of_degree income
## <int> <int> <int> <fct> <fct> <int>
## 1 1 1000 87 lessthanBA <NA> 11800
## 2 2 900 42 lessthanBA <NA> 8800
## 3 3 800 43 BAorhigher English Language 10000
## 4 4 800 43 lessthanBA <NA> 112000
## 5 5 800 14 lessthanBA <NA> NA
## 6 6 800 11 lessthanBA <NA> NA
## 7 7 900 63 lessthanBA <NA> 23900
## 8 8 900 59 BAorhigher Early Childhood Education 34600
## 9 9 900 65 lessthanBA <NA> 9400
## 10 10 300 50 lessthanBA <NA> 18000
## # ... with 67,238 more rows
Each row is a random sample selection.
Character.
PUMS_cleaned is a data frame because it can hold different types of variables.
Hint: Use View(). 87 years old in Rockingham County with less than a bachelor’s degree with no field of degree and an income of $11,800.
Hint: Use count(). There are 18,563 people that have a BA or higher.
PUMS_cleaned%>%count(education)
## # A tibble: 2 x 2
## education n
## <fct> <int>
## 1 BAorhigher 18563
## 2 lessthanBA 48685
Hint: Take PUMS_cleaned, pipe it to dplyr::count, and pipe it to dplyr::filter. 185 people majored in finance.
PUMS_cleaned%>%count(field_of_degree) %>% filter(field_of_degree == "Finance")
## # A tibble: 1 x 2
## field_of_degree n
## <fct> <int>
## 1 Finance 185
Hint: Add scale_x_log10() to the code to tranform the data and reveal its structure. Refer to the ggplot2 cheatsheet. Google it.
The typical income is around 70,000 to 80,000, and the range most people make is between 0 and 60,000.
ggplot(PUMS_cleaned, aes(income)) +
geom_histogram() +
scale_x_log10()
Hint: Take PUMS_cleaned, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.
The top field of degree in terms of median income is Petrolium Engineering making 188,000.
PUMS_cleaned %>%
group_by(field_of_degree) %>%
summarise(median_income = median(income)) %>%
arrange(desc(median_income))
## # A tibble: 169 x 2
## field_of_degree median_income
## <fct> <dbl>
## 1 Petroleum Engineering 188000
## 2 Materials Science 154800
## 3 Nuclear Engineering 148300
## 4 Physical Sciences 129000
## 5 Mechanical Engineering Related Technologies 111900
## 6 Pharmacy Pharmaceutical Sciences 106700
## 7 Biological Engineering 101800
## 8 Metallurgical Engineering 99700
## 9 Naval Architecture 97000
## 10 Electrical Engineering 94000
## # ... with 159 more rows