The given dataset, PUMS_reduced.csv, was obtained from the US Census, 2012-2016 ACS PUMS DATA. The sample data includes 67,248 New Hampshire residents.
See below for infomration on Public Use Microdata Sample (PUMS)
# Load packages
library(tidyverse)
# Import data
PUMS_reduced <- read.csv("~/R/Business Stats/data/PUMS_reduced.csv") %>% as_tibble()
PUMS_reduced
## # A tibble: 67,248 x 7
## X PUMA age education field_of_degree income occupation
## <int> <int> <int> <fct> <fct> <int> <fct>
## 1 1 1000 87 lessthanBA <NA> 11800 <NA>
## 2 2 900 42 lessthanBA <NA> 8800 Cashiers
## 3 3 800 43 BAorhigher English Language 10000 Human Resources ~
## 4 4 800 43 lessthanBA <NA> 112000 Securities, Comm~
## 5 5 800 14 lessthanBA <NA> NA <NA>
## 6 6 800 11 lessthanBA <NA> NA <NA>
## 7 7 900 63 lessthanBA <NA> 23900 Driver/Sales Wor~
## 8 8 900 59 BAorhigher Early Childhood E~ 34600 Elementary And M~
## 9 9 900 65 lessthanBA <NA> 9400 Retail Salespers~
## 10 10 300 50 lessthanBA <NA> 18000 Retail Sales Wor~
## # ... with 67,238 more rows
The variable occupation is character data.
Pums_reduced is a data frame because of the rows and collums of data.
The first row is showing that the person is 87 years old, he has less than a bacholors degree makes 11,800 a year and is unemployed.
Hint: Use count() with the sort argument.
PUMS_reduced %>% count(occupation)
## # A tibble: 469 x 2
## occupation n
## <fct> <int>
## 1 Accountants And Auditors 496
## 2 Actors 4
## 3 Actuaries 9
## 4 Adhesive Bonding Machine Opera 6
## 5 Administrative Services Manage 53
## 6 Advertising And Promotions Man 8
## 7 Advertising Sales Agents 44
## 8 Aerospace Engineers 45
## 9 Agents And Business Managers O 15
## 10 "Agricultural And Food Science " 4
## # ... with 459 more rows
The most common occupation is Accountants and Auditors with 496 people.
Hint: Take PUMS_reduced, pipe it to dplyr::count, and pipe it to dplyr::filter. Remember you can enter more than one variable in the count() function.
PUMS_reduced %>% count(field_of_degree, occupation) %>% filter(field_of_degree == "Finance")
## # A tibble: 69 x 3
## field_of_degree occupation n
## <fct> <fct> <int>
## 1 Finance Accountants And Auditors 16
## 2 Finance Aircraft Pilots And Flight Eng 1
## 3 Finance Bill And Account Collectors 1
## 4 Finance Billing And Posting Clerks 1
## 5 Finance Bookkeeping, Accounting, And A 6
## 6 Finance Business Operations Specialist 1
## 7 Finance Cashiers 1
## 8 Finance Chefs And Head Cooks 1
## 9 Finance Chief Executives And Legislato 10
## 10 Finance "Claims Adjusters, Appraisers, " 1
## # ... with 59 more rows
Auditors and Accountants are most common
Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.
PUMS_reduced %>%
group_by(field_of_degree) %>%
summarise(median_income = median(income)) %>%
arrange(desc(median_income))
## # A tibble: 169 x 2
## field_of_degree median_income
## <fct> <dbl>
## 1 Petroleum Engineering 188000
## 2 Materials Science 154800
## 3 Nuclear Engineering 148300
## 4 Physical Sciences 129000
## 5 Mechanical Engineering Related Technologies 111900
## 6 Pharmacy Pharmaceutical Sciences 106700
## 7 Biological Engineering 101800
## 8 Metallurgical Engineering 99700
## 9 Naval Architecture 97000
## 10 Electrical Engineering 94000
## # ... with 159 more rows
Petroleum Engineering out of the data frame is the top occupation.
Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, pipe it to dplyr::arrange, and pipe it to data.frame().
PUMS_reduced %>%
group_by(field_of_degree == "Finance") %>%
summarise(median_income = median(income)) %>%
arrange(desc(median_income)) %>%
data.frame
## field_of_degree.....Finance. median_income
## 1 TRUE 75000
## 2 FALSE 50000
## 3 NA NA
Finance is ranked within the data 28th and they tend to make 75,000