Analyzing Census Data

The given dataset, PUMS_reduced.csv, was obtained from the US Census, 2012-2016 ACS PUMS DATA. The sample data includes 67,248 New Hampshire residents.

See below for infomration on Public Use Microdata Sample (PUMS)

Public Use Microdata Sample (PUMS) Documentation https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html
PUMS Technical Documentation https://www.census.gov/programs-surveys/acs/technical-documentation/pums/documentation.2016.html
2010 Census Public Use Microdata Area (PUMA) Reference Maps - New Hampshire https://www.census.gov/geo/maps-data/maps/2010puma/st33_nh.html

# Load packages
library(tidyverse)

# Import data
PUMS_reduced <- read.csv("~/R/Business Stats/data/PUMS_reduced.csv") %>% as_tibble()

PUMS_reduced
## # A tibble: 67,248 x 7
##        X  PUMA   age education  field_of_degree    income occupation       
##    <int> <int> <int> <fct>      <fct>               <int> <fct>            
##  1     1  1000    87 lessthanBA <NA>                11800 <NA>             
##  2     2   900    42 lessthanBA <NA>                 8800 Cashiers         
##  3     3   800    43 BAorhigher English Language    10000 Human Resources ~
##  4     4   800    43 lessthanBA <NA>               112000 Securities, Comm~
##  5     5   800    14 lessthanBA <NA>                   NA <NA>             
##  6     6   800    11 lessthanBA <NA>                   NA <NA>             
##  7     7   900    63 lessthanBA <NA>                23900 Driver/Sales Wor~
##  8     8   900    59 BAorhigher Early Childhood E~  34600 Elementary And M~
##  9     9   900    65 lessthanBA <NA>                 9400 Retail Salespers~
## 10    10   300    50 lessthanBA <NA>                18000 Retail Sales Wor~
## # ... with 67,238 more rows

Q1. What type of data is the variable, occupation (i.e., numeric, character, logical)?

The variable occupation is character data.

Q2. What type of R object is PUMS_reduced (i.e., vector, matrix, data frame, list)? And why?

Pums_reduced is a data frame because of the rows and collums of data.

Q3. Describe the first observation (first row) using all variables.

The first row is showing that the person is 87 years old, he has less than a bacholors degree makes 11,800 a year and is unemployed.

Q4. What is the most common occupation in New Hampshire?

Hint: Use count() with the sort argument.

PUMS_reduced %>% count(occupation)
## # A tibble: 469 x 2
##    occupation                           n
##    <fct>                            <int>
##  1 Accountants And Auditors           496
##  2 Actors                               4
##  3 Actuaries                            9
##  4 Adhesive Bonding Machine Opera       6
##  5 Administrative Services Manage      53
##  6 Advertising And Promotions Man       8
##  7 Advertising Sales Agents            44
##  8 Aerospace Engineers                 45
##  9 Agents And Business Managers O      15
## 10 "Agricultural And Food Science "     4
## # ... with 459 more rows

The most common occupation is Accountants and Auditors with 496 people.

Q5. What’s the most common occupation in New Hamphsire among those who with a finance degree?

Hint: Take PUMS_reduced, pipe it to dplyr::count, and pipe it to dplyr::filter. Remember you can enter more than one variable in the count() function.

PUMS_reduced %>% count(field_of_degree, occupation) %>% filter(field_of_degree == "Finance")
## # A tibble: 69 x 3
##    field_of_degree occupation                           n
##    <fct>           <fct>                            <int>
##  1 Finance         Accountants And Auditors            16
##  2 Finance         Aircraft Pilots And Flight Eng       1
##  3 Finance         Bill And Account Collectors          1
##  4 Finance         Billing And Posting Clerks           1
##  5 Finance         Bookkeeping, Accounting, And A       6
##  6 Finance         Business Operations Specialist       1
##  7 Finance         Cashiers                             1
##  8 Finance         Chefs And Head Cooks                 1
##  9 Finance         Chief Executives And Legislato      10
## 10 Finance         "Claims Adjusters, Appraisers, "     1
## # ... with 59 more rows

Auditors and Accountants are most common

Q6. What’s the top occupation in terms of median income? How much do they make?

Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.

PUMS_reduced %>%
  group_by(field_of_degree) %>% 
  summarise(median_income = median(income)) %>%
  arrange(desc(median_income))
## # A tibble: 169 x 2
##    field_of_degree                             median_income
##    <fct>                                               <dbl>
##  1 Petroleum Engineering                              188000
##  2 Materials Science                                  154800
##  3 Nuclear Engineering                                148300
##  4 Physical Sciences                                  129000
##  5 Mechanical Engineering Related Technologies        111900
##  6 Pharmacy Pharmaceutical Sciences                   106700
##  7 Biological Engineering                             101800
##  8 Metallurgical Engineering                           99700
##  9 Naval Architecture                                  97000
## 10 Electrical Engineering                              94000
## # ... with 159 more rows

Petroleum Engineering out of the data frame is the top occupation.

Q7. Where does Finance rank in terms of median income (for example, 1st, 2nd or 3rd)? How much do they make?

Hint: Take PUMS_reduced, pipe it to dplyr::group_by, pipe it to dplyr::summarise, pipe it to dplyr::arrange, and pipe it to data.frame().

PUMS_reduced %>%
  group_by(field_of_degree == "Finance") %>% 
  summarise(median_income = median(income)) %>%
  arrange(desc(median_income)) %>%
  data.frame
##   field_of_degree.....Finance. median_income
## 1                         TRUE         75000
## 2                        FALSE         50000
## 3                           NA            NA

Finance is ranked within the data 28th and they tend to make 75,000

Analyzing Census Data

Tyler McDonald

March 18, 2019