See below for infomration on Public Use Microdata Sample (PUMS)
Public Use Microdata Sample (PUMS) Documentation https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html 2010 Census Public Use Microdata Area (PUMA) Reference Maps - New Hampshire https://www.census.gov/geo/maps-data/maps/2010puma/st33_nh.html
# Load packages
library(tidyverse)
# Import data
PUMS_cleaned <- read.csv("~/R/busStat/Data/PUMS_cleaned.csv") %>% as_tibble()
PUMS_cleaned
## # A tibble: 67,248 x 6
## X PUMA age education field_of_degree income
## <int> <int> <int> <fct> <fct> <int>
## 1 1 1000 87 lessthanBA <NA> 11800
## 2 2 900 42 lessthanBA <NA> 8800
## 3 3 800 43 BAorhigher English Language 10000
## 4 4 800 43 lessthanBA <NA> 112000
## 5 5 800 14 lessthanBA <NA> NA
## 6 6 800 11 lessthanBA <NA> NA
## 7 7 900 63 lessthanBA <NA> 23900
## 8 8 900 59 BAorhigher Early Childhood Education 34600
## 9 9 900 65 lessthanBA <NA> 9400
## 10 10 300 50 lessthanBA <NA> 18000
## # ... with 67,238 more rows
The row represents individual people or housing units from New Hampshire
The variable education is a character type data
The R object PUMS_cleaned is a data frame because each variable is catorgorized by the individual person or the housing units from New Hampshire and looks like a table with different data type.
Hint: Use View(). The first observation is a 87 year old who is from the seacoast of New Hampshire which is in Portsmouth who does not have their BA so their field of degree is NA and is making an income of $11,800 a year.
Hint: Use count().
PUMS_cleaned %>% count(education)
## # A tibble: 2 x 2
## education n
## <fct> <int>
## 1 BAorhigher 18563
## 2 lessthanBA 48685
There are about 18,563 in this sample who have a BA or higher who live in New Hampshire
Hint: Take PUMS_cleaned, pipe it to dplyr::count, and pipe it to dplyr::filter.
PUMS_cleaned %>% count(field_of_degree) %>% filter(field_of_degree == "Finance")
## # A tibble: 1 x 2
## field_of_degree n
## <fct> <int>
## 1 Finance 185
There are 185 people who have majored in finance in this data set
Hint: Add scale_x_log10() to the code to tranform the data and reveal its structure. Refer to the ggplot2 cheatsheet. Google it.
ggplot(PUMS_cleaned, aes(income)) + geom_histogram() + scale_x_log10()
The typical income is, about $45,000 and the range of income most people make is less than $100,000 a year
Hint: Take PUMS_cleaned, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.
PUMS_cleaned %>%
group_by(field_of_degree) %>%
summarise(median_income = median(income)) %>%
arrange(desc(median_income))
## # A tibble: 169 x 2
## field_of_degree median_income
## <fct> <dbl>
## 1 Petroleum Engineering 188000
## 2 Materials Science 154800
## 3 Nuclear Engineering 148300
## 4 Physical Sciences 129000
## 5 Mechanical Engineering Related Technologies 111900
## 6 Pharmacy Pharmaceutical Sciences 106700
## 7 Biological Engineering 101800
## 8 Metallurgical Engineering 99700
## 9 Naval Architecture 97000
## 10 Electrical Engineering 94000
## # ... with 159 more rows
The top field of degree in terms of median income is, Petroleum Engineering making $188,000.