The New York Times published an article “Do Millenial Men want Stay-at-Home Wives”, with disturbing graph showing that about 50% of millenial men agreed or strongly agreed with “It is much better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family.”
It’s not surprising that more men than women agree with the statement: self-interest would explain that. It’s surprising that there’s such a clear worsening trend. So, I wanted to look at the confidence intervals around the curves, because the General Social Survey isn’t that big, especially after it’s broken down into small age groups (the usual definition of `millennials’ would stretch maybe a decade higher than the 18-25 used in the article)
The General Social Survey is a wonderful public resource. You can do simple analyses online, for free, or if you’re a card-carrying survey nerd you can download all the data. That’s what I did.
The data are in Stata format, so we’ll load them into R with the haven package
library(haven)
GSS<-read_dta("~/Downloads/GSS_Stata/GSS7216_R1a.dta")
The New York Times didn’t give the variable name, but that’s why we have Google, and once I found it was called fefam it was easy to get official documentation
First, how many observations of this variable do we have for people 25 and under:
## How much data?
with(subset(GSS,age>=18 & age <=25 & !is.na(fefam)), table(sex,year))
## year
## sex 1977 1985 1986 1988 1989 1990 1991 1993 1994 1996 1998 2000 2002 2004
## 1 105 92 74 65 64 52 60 46 92 129 85 92 62 52
## 2 117 103 101 73 69 58 58 69 105 170 96 110 61 57
## year
## sex 2006 2008 2010 2012 2014 2016
## 1 87 65 76 67 66 85
## 2 120 77 91 72 75 113
As I suspected, not a whole lot. We should really use the weights that come with the dataset: the GSS takes an equal-probability sample of households, but non-response and variation in household size both have an effect. Also, the variable wasn’t measured every year; it will be convenient to just drop the years when we don’t have data.
## survey analysis
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
options(survey.lonely.psu="adjust")
GSS<-subset(GSS, year>1974)
gss<-svydesign(id=~vpsu, strata=~interaction(year,vstrat), weights=~wtssall, data=GSS, nest=TRUE)
gssyng<-subset(gss,age>=18 & age <=25 & !is.na(fefam))
gssyng<-subset(gssyng, year %in% c(1977, 1985:1986, 1988:2016))
So, we’ve got a survey object with all the years of this question in people under 25. The variable is on a 4-point scale, so we need to recode it to binary:
gssyng<-update(gssyng, losers=(fefam<=2))
Now the analysis: the proportion with losers==TRUE for each year and sex
means<-svyby(~losers,~year+sex,FUN=svymean,na.rm=TRUE,design=gssyng)
round(means[,1:3],2)
## year sex losersFALSE
## 1977.1 1977 1 0.45
## 1985.1 1985 1 0.71
## 1986.1 1986 1 0.70
## 1988.1 1988 1 0.82
## 1989.1 1989 1 0.64
## 1990.1 1990 1 0.62
## 1991.1 1991 1 0.67
## 1993.1 1993 1 0.79
## 1994.1 1994 1 0.83
## 1996.1 1996 1 0.72
## 1998.1 1998 1 0.75
## 2000.1 2000 1 0.68
## 2002.1 2002 1 0.74
## 2004.1 2004 1 0.71
## 2006.1 2006 1 0.70
## 2008.1 2008 1 0.62
## 2010.1 2010 1 0.62
## 2012.1 2012 1 0.60
## 2014.1 2014 1 0.52
## 2016.1 2016 1 0.89
## 1977.2 1977 2 0.58
## 1985.2 1985 2 0.73
## 1986.2 1986 2 0.70
## 1988.2 1988 2 0.81
## 1989.2 1989 2 0.75
## 1990.2 1990 2 0.85
## 1991.2 1991 2 0.75
## 1993.2 1993 2 0.87
## 1994.2 1994 2 0.87
## 1996.2 1996 2 0.75
## 1998.2 1998 2 0.83
## 2000.2 2000 2 0.79
## 2002.2 2002 2 0.69
## 2004.2 2004 2 0.71
## 2006.2 2006 2 0.89
## 2008.2 2008 2 0.76
## 2010.2 2010 2 0.75
## 2012.2 2012 2 0.86
## 2014.2 2014 2 0.74
## 2016.2 2016 2 0.74
with(means,
plot(year[means$sex==1],losersFALSE[means$sex==1],col="darkred",pch=19,type="b",xlab="year",ylab="% disagree",ylim=c(0,1))
)
with(means,
points(year[means$sex==2],losersFALSE[means$sex==2],col="grey",pch=19,type="b")
)
The graph agrees with the NY Times graph, except it has more data. The last data point looks really strange. Is that what the raw data really say?
with(subset(GSS, year==2016 & age<=25),table(fefam, sex))
## sex
## fefam 1 2
## 1 1 6
## 2 9 23
## 3 43 50
## 4 32 34
Yes, it is. The percentages for 18–34 and all ages also agree with the online explorer, so it’s not just me.
Now, a graph with uncertainty intervals
props<-svyby(~losers,~year+sex,FUN=svyciprop, method="xlogit",na.rm=TRUE,design=gssyng,vartype="ci")
with(props,
plot(year[props$sex==1],1-losers[props$sex==1],col="darkred",pch=19,type="b",xlab="year",ylab="% disagree",ylim=c(0,1))
)
with(props,
points(year[props$sex==2],1-losers[means$sex==2],col="darkgrey",pch=19,type="b")
)
with(props,
polygon(c(year[props$sex==2],rev(year[props$sex==2])),c( 1-ci_l[props$sex==2], rev(1-ci_u[props$sex==2])),col="#aaaaaa50",border=NA)
)
with(props,
polygon(c(year[props$sex==1],rev(year[props$sex==1])),c( 1-ci_l[props$sex==1], rev(1-ci_u[props$sex==1])),col="#F0000030",border=NA)
)
So, what should we believe about this trend? Well, the GSS isn’t the only data set showing it, but in the GSS it’s less robust than it looks. Perhaps things are a little more complicated.
Also, it’s impressive what can be done quickly and easily with a commodity laptop and internet connection, free software, and open data.