See below for infomration on Public Use Microdata Sample (PUMS)
Public Use Microdata Sample (PUMS) Documentation https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html 2010 Census Public Use Microdata Area (PUMA) Reference Maps - New Hampshire https://www.census.gov/geo/maps-data/maps/2010puma/st33_nh.html
# Load packages
library(tidyverse)
# Import data
PUMS_cleaned <- read.csv("~/R/business sat/DATA/PUMS_cleaned.csv") %>% as_tibble()
PUMS_cleaned
## # A tibble: 67,248 x 6
## X PUMA age education field_of_degree income
## <int> <int> <int> <fct> <fct> <int>
## 1 1 1000 87 lessthanBA <NA> 11800
## 2 2 900 42 lessthanBA <NA> 8800
## 3 3 800 43 BAorhigher English Language 10000
## 4 4 800 43 lessthanBA <NA> 112000
## 5 5 800 14 lessthanBA <NA> NA
## 6 6 800 11 lessthanBA <NA> NA
## 7 7 900 63 lessthanBA <NA> 23900
## 8 8 900 59 BAorhigher Early Childhood Education 34600
## 9 9 900 65 lessthanBA <NA> 9400
## 10 10 300 50 lessthanBA <NA> 18000
## # ... with 67,238 more rows
education and the field of education
nemeric
data frame since there is more than one variable
X , PUMA, age ,educatio,field_of_degree, income
PUMS_cleaned %>%count(education)
## # A tibble: 2 x 2
## education n
## <fct> <int>
## 1 BAorhigher 18563
## 2 lessthanBA 48685
1 BAorhigher 18563
Hint: Take PUMS_cleaned, pipe it to dplyr::count, and pipe it to dplyr::filter.
PUMS_cleaned %>%count(field_of_degree) %>%filter(field_of_degree == "Finance")
## # A tibble: 1 x 2
## field_of_degree n
## <fct> <int>
## 1 Finance 185
Finance 185
Hint: Add scale_x_log10() to the code to tranform the data and reveal its structure. Refer to the ggplot2 cheatsheet. Google it.
ggplot(PUMS_cleaned, aes(income))+
geom_histogram() +
scale_x_log10()
Hint: Take PUMS_cleaned, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.
PUMS_cleaned %>%
group_by(field_of_degree) %>%
summarise(median_income = median(income),n = n()) %>%
arrange(desc(median_income))
## # A tibble: 169 x 3
## field_of_degree median_income n
## <fct> <dbl> <int>
## 1 Petroleum Engineering 188000 1
## 2 Materials Science 154800 4
## 3 Nuclear Engineering 148300 7
## 4 Physical Sciences 129000 7
## 5 Mechanical Engineering Related Technologies 111900 14
## 6 Pharmacy Pharmaceutical Sciences 106700 83
## 7 Biological Engineering 101800 8
## 8 Metallurgical Engineering 99700 7
## 9 Naval Architecture 97000 18
## 10 Electrical Engineering 94000 465
## # ... with 159 more rows
Petroleum Engineering they make 188000.0