In class, we used the 2017 American Community Survey (downloaded from https://usa.ipums.org/usa/) to study the relationship between education and income from wages and salary. In this exercise, we will revisit the data but
This dataset, acs2017.csv, is loaded on Canvas. A description of each variable you will need for this exercise is below.
| Name | Description |
|---|---|
age |
Age in 2017 |
incwage |
Income from wages and salary |
educ |
Education |
sex |
Gender |
statefip |
State FIPS Code (Oregon=41) |
Note that data is missing when a respondent chooses not to answer a question (or when a question does not apply to them.)
We will reconstruct the data as we did in class, with one difference in bold.
Load the data in R. Restrict the sample to 25 to 54 years olds living in Oregon. Generate a numeric variable equal to each individuals years of education. Drop the top coded income observations.
library(readr)
acs2017 <- read_csv("D:/R Work/In Class/acs2017.csv")
## Parsed with column specification:
## cols(
## year = col_double(),
## sample = col_character(),
## serial = col_double(),
## cbserial = col_double(),
## hhwt = col_double(),
## statefip = col_character(),
## gq = col_character(),
## pernum = col_double(),
## perwt = col_double(),
## sex = col_character(),
## age = col_character(),
## educ = col_character(),
## educd = col_character(),
## incwage = col_double()
## )
#Note: For some reason I am still seeing observations with ages 4,5 and 7 in the data set. I can't seem to clean ths data#
acs<-subset(acs2017,((acs2017$age>=25) & (acs2017$age<=54) & (acs2017$statefip=="Oregon")))
acs$hgc<-NA
acs$hgc<-as.numeric(acs$hgc)
acs$hgc[acs$educ=="Nursery school to grade 4"]<-4
acs$hgc[acs$educ=="Grade 5, 6, 7, or 8"]<-8
acs$hgc[acs$educ=="Grade 9"]<-9
acs$hgc[acs$educ=="Grade 10"]<-10
acs$hgc[acs$educ=="Grade 11"]<-11
acs$hgc[acs$educ=="Grade 12"]<-12
acs$hgc[acs$educ=="1 year of college"]<-13
acs$hgc[acs$educ=="2 years of college"]<-14
acs$hgc[acs$educ=="3 years of college"]<-15
acs$hgc[acs$educ=="4 years of college"]<-16
acs$hgc[acs$educ=="5+ years of college"]<-17
acs$incwage[acs$incwage==999999]<-NA
acs<-na.omit(acs)
Plot the conditional mean of income from salary and wages as a function of years of education.
library(ggplot2)
ggplot(acs,aes(x=hgc,y=incwage)) +
geom_point(stat="summary", fun.y="mean")+
xlab("Years of Education") + ylab("Income") +
theme_bw(base_size = 24)
Regress income from salary and wages on years of education. Interpret the coefficient.
lm(incwage~hgc,data=acs)
##
## Call:
## lm(formula = incwage ~ hgc, data = acs)
##
## Coefficients:
## (Intercept) hgc
## -55585 6991
# For every increase in education of 1 year we can estimate that income will increase by 6991#