See below for infomration on Public Use Microdata Sample (PUMS)

Public Use Microdata Sample (PUMS) Documentation https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html 2010 Census Public Use Microdata Area (PUMA) Reference Maps - New Hampshire https://www.census.gov/geo/maps-data/maps/2010puma/st33_nh.html

# Load packages
library(tidyverse)

# Import data
PUMS_cleaned <- read.csv("~/R/Business Stats/data/PUMS_cleaned.csv") %>% as_tibble()

PUMS_cleaned
## # A tibble: 67,248 x 6
##        X  PUMA   age education  field_of_degree           income
##    <int> <int> <int> <fct>      <fct>                      <int>
##  1     1  1000    87 lessthanBA <NA>                       11800
##  2     2   900    42 lessthanBA <NA>                        8800
##  3     3   800    43 BAorhigher English Language           10000
##  4     4   800    43 lessthanBA <NA>                      112000
##  5     5   800    14 lessthanBA <NA>                          NA
##  6     6   800    11 lessthanBA <NA>                          NA
##  7     7   900    63 lessthanBA <NA>                       23900
##  8     8   900    59 BAorhigher Early Childhood Education  34600
##  9     9   900    65 lessthanBA <NA>                        9400
## 10    10   300    50 lessthanBA <NA>                       18000
## # ... with 67,238 more rows

Q1. What does the row represent?

Each row respresents a different person in New Hampshire.

Q2. What type of data is the variable, education (i.e., numeric, character, logical)?

The data availble is a combination of numerical and character data.

Q3. What type of R object is PUMS_cleaned (i.e., vector, matrix, data frame, list)? And why?

The data is a data frame because it has rows and colums of data

Q4. Describe the first observation (first row) using all variables.

Each row is a New Hampshire resident, where they live, age, then there degree, field of degree and income. Hint: Use View().

Q5. How many people have BA or higher?

PUMS_cleaned%>% count(education)
## # A tibble: 2 x 2
##   education      n
##   <fct>      <int>
## 1 BAorhigher 18563
## 2 lessthanBA 48685

There are 18563 people who have a BA or higher in the data frame Hint: Use count().

Q6. How many people have majored in finance?

Hint: Take PUMS_cleaned, pipe it to dplyr::count, and pipe it to dplyr::filter.

PUMS_cleaned%>% count(field_of_degree)%>% filter(field_of_degree == "Finance")
## # A tibble: 1 x 2
##   field_of_degree     n
##   <fct>           <int>
## 1 Finance           185

There are 185 people who majored in Finance

Q7. Create a histogram for income. What’s the story (i.e, What’s the typical income, What’s the range of income most people make)?

Hint: Add scale_x_log10() to the code to tranform the data and reveal its structure. Refer to the ggplot2 cheatsheet. Google it.

ggplot(PUMS_cleaned, aes(income)) + geom_histogram() +
  scale_x_log10()

Most people make between 60,000 to 100,000

Q8. What’s the top field of degree in terms of median income? How much do they make?

Hint: Take PUMS_cleaned, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.

PUMS_cleaned %>% 
  group_by(field_of_degree) %>% 
  summarise(median_income = median(income), n = n()) %>% 
  arrange(desc(median_income))
## # A tibble: 169 x 3
##    field_of_degree                             median_income     n
##    <fct>                                               <dbl> <int>
##  1 Petroleum Engineering                              188000     1
##  2 Materials Science                                  154800     4
##  3 Nuclear Engineering                                148300     7
##  4 Physical Sciences                                  129000     7
##  5 Mechanical Engineering Related Technologies        111900    14
##  6 Pharmacy Pharmaceutical Sciences                   106700    83
##  7 Biological Engineering                             101800     8
##  8 Metallurgical Engineering                           99700     7
##  9 Naval Architecture                                  97000    18
## 10 Electrical Engineering                              94000   465
## # ... with 159 more rows

Petroleum Engineering makes the most at 188000