Analyzing US Census Data in R

Q1. What does the row represent?
Q2. What type of data is the variable, education (i.e., numeric, character, logical)?
Q3. What type of R object is PUMS_cleaned (i.e., vector, matrix, data frame, list)? And why?
Q4. Describe the first observation (first row) using all variables.
Q5. How many people have BA or higher?
Q6. How many people have majored in finance?
Q7. Create a histogram for income. What’s the story (i.e, What’s the typical income, What’s the range of income most people make)?
Q8. What’s the top field of degree in terms of median income? How much do they make?

See below for infomration on Public Use Microdata Sample (PUMS)

Public Use Microdata Sample (PUMS) Documentation https://www.census.gov/programs-surveys/acs/technical-documentation/pums.html 2010 Census Public Use Microdata Area (PUMA) Reference Maps - New Hampshire https://www.census.gov/geo/maps-data/maps/2010puma/st33_nh.html

# Load packages
library(tidyverse)

# Import data
PUMS_cleaned <- read.csv("~/R/business sat/DATA/PUMS_cleaned.csv") %>% as_tibble()

PUMS_cleaned
## # A tibble: 67,248 x 6
##        X  PUMA   age education  field_of_degree           income
##    <int> <int> <int> <fct>      <fct>                      <int>
##  1     1  1000    87 lessthanBA <NA>                       11800
##  2     2   900    42 lessthanBA <NA>                        8800
##  3     3   800    43 BAorhigher English Language           10000
##  4     4   800    43 lessthanBA <NA>                      112000
##  5     5   800    14 lessthanBA <NA>                          NA
##  6     6   800    11 lessthanBA <NA>                          NA
##  7     7   900    63 lessthanBA <NA>                       23900
##  8     8   900    59 BAorhigher Early Childhood Education  34600
##  9     9   900    65 lessthanBA <NA>                        9400
## 10    10   300    50 lessthanBA <NA>                       18000
## # ... with 67,238 more rows

Q1. What does the row represent?

education, and field of degree

Q2. What type of data is the variable, education (i.e., numeric, character, logical)?

numeric

Q3. What type of R object is PUMS_cleaned (i.e., vector, matrix, data frame, list)? And why?

data frame since there is more than one variable

Q4. Describe the first observation (first row) using all variables.

X , PUMA, age ,educatio,field_of_degree, income

Q5. How many people have BA or higher?

Q6. How many people have majored in finance?

Hint: Take PUMS_cleaned, pipe it to dplyr::count, and pipe it to dplyr::filter.

Q7. Create a histogram for income. What’s the story (i.e, What’s the typical income, What’s the range of income most people make)?

Hint: Add scale_x_log10() to the code to tranform the data and reveal its structure. Refer to the ggplot2 cheatsheet. Google it.

Q8. What’s the top field of degree in terms of median income? How much do they make?

Hint: Take PUMS_cleaned, pipe it to dplyr::group_by, pipe it to dplyr::summarise, and pipe it to dplyr::arrange.