ACS Microdata

For this homework assignment, we must first load and clean wage data from IPUMS. The survey data is held on my professor’s github account.

#First, let's load some libraries we will need
library(haven)
library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#Now we load the ACS Microdata
ipums<-read_dta("https://github.com/coreysparks/data/blob/master/usa_00045.dta?raw=true")

Data Recoding

As pointed out in class, not all of the data under the “incwage” variable represents real wage information. “999999” and “999998” mean “NA” and “Missing”, respectively. I’ll call the cleaned variable “mywage”.

#the original values give us a weird mean for #incwage
paste("original $incwage mean value:",round(mean(ipums$incwage,na.rm = TRUE),digits=2),sep=" ")
## [1] "original $incwage mean value: 205672.4"
#So, we recode 999999 and 999998 as "NA"
ipums<-ipums %>% mutate(mywage=ifelse(incwage %in% c(999998,999999),NA, incwage))
#much better!
paste("new $mywage mean value:", round(mean(ipums$mywage,na.rm = TRUE),digits=2),sep=" ")
## [1] "new $mywage mean value: 27489.69"

Mean, Median, Standard Deviation and Sample Size

Now, we will estimate the mean, median, standard deviation and sample size of the cleaned data.

ipums %>% summarise(newmedian=median(mywage,na.rm=T),newmean=mean(mywage,na.rm=T),newsd=sd(mywage,na.rm=T),sample_size=n())
## # A tibble: 1 x 4
##   newmedian  newmean   newsd sample_size
##       <dbl>    <dbl>   <dbl>       <int>
## 1      7000 27489.69 50665.1      300552

Summary Statistics of Adults in the Labor Force, by Education and Sex

Here, we will calculate summary statistics of Adults (over age 25), in the labor force, by education and sex

#according to the IPUMS codebook, "2" in the variable "$labforce" represents someone who is in the labor force, while "1" and "2" are "male" and "female", respectively, in the variable "$sex"
ipums %>% 
  mutate(mywage= ifelse(incwage%in%c(999998,999999), NA, incwage))%>%
  filter(labforce==2, age>25) %>%
  mutate(sexrecode=ifelse(sex==1,"Male","Female")) %>%
  mutate(edurec = case_when(.$educd %in% c(0:61)~"nohs", 
.$educd %in% c(62:64)~"hs",
.$educd %in% c(65:100)~"somecoll",
.$educd %in% c(101:116)~"collgrad", 
.$educd ==999 ~ "missing")) %>%
  group_by(edurec,sexrecode) %>%
  summarise(newmedian=median(mywage,na.rm=T),newmean=mean(mywage,na.rm=T),newsd=sd(mywage,na.rm=T),sample_size=n())
## # A tibble: 8 x 6
## # Groups:   edurec [?]
##     edurec sexrecode newmedian  newmean    newsd sample_size
##      <chr>     <chr>     <dbl>    <dbl>    <dbl>       <int>
## 1 collgrad    Female     49500 58349.99 57186.03       22975
## 2 collgrad      Male     70000 93302.39 98629.45       23391
## 3       hs    Female     21900 25729.43 26523.98       13199
## 4       hs      Male     33000 38268.80 37317.79       17005
## 5     nohs    Female     15000 18361.46 23312.47        3724
## 6     nohs      Male     23000 28374.60 33113.55        6261
## 7 somecoll    Female     28000 33058.59 31811.67       18914
## 8 somecoll      Male     40550 48672.07 47372.76       18670

This completes my homework for the week. Thank you!