All the variables in this dataset were compiled from a sample of the National Survey of College Graduates through the IPUMS-HigherEd site.
For this assignment, we analyze the relationship between gender, ethnicity, education level and membership to professional organizations.
The count of outcome selected for this assignment is the number of membership to organizations.
The research questions we attempt to respond are:
1. Are higher levels of education associated with a larger number of memberships to professional organizations?
2. Are males more likely to exceed in the number of professional organizations they belong to when compared to females?
3. Is the White group more likely to have a larger number of memberships to professional organizations when compared to other minorities and Asian groups?
The data used for this analysis consists of a sample from the National Survey of College Graduates gathered through IPUMS-Higher Ed.
library(car)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.1. https://CRAN.R-project.org/package=stargazer
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
library(sjPlot)
library(ggplot2)
library(pander)
library(knitr)
library(haven)
cpsipums3<-read_dta("~/Statistics II/Data for project/highered/highered_00008.dta")
names(cpsipums3) #print the column names
## [1] "personid" "year" "weight" "sample" "surid" "gender"
## [7] "minrty" "raceth" "bthus" "chtot" "dgrdg" "hrswkgr"
## [13] "prmbr" "salary"
#income grouping
cpsipums3$salary3<-ifelse(cpsipums3$salary==9999998:9999999, NA, cpsipums3$salary)
summary (cpsipums3$salary3)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 44000 71000 906935 108000 9999998 6519
#number of children living in the household
cpsipums3$chtot<-recode(cpsipums3$chtot, recodes="00='no children'; 01='one child'; 02='one to three children'; 03='two or more children'; 04='more than 3 children'; 98=NA", as.factor.result=T)
cpsipums3$chtot<-relevel(cpsipums3$chtot, ref = "one child")
table(cpsipums3$chtot)
##
## one child two or more children
## 14472 20517
#gender
cpsipums3$female<-recode(cpsipums3$gender, recodes=1)
cpsipums3$male<-recode(cpsipums3$gender, recodes=2)
cpsipums3$gender<-recode(cpsipums3$gender, recodes="1='female'; 2='male'", as.factor.result=T)
table(cpsipums3$gender)
##
## female male
## 38626 45830
cpsipums3$gender<-relevel(cpsipums3$gender, ref = "female")
#race/ethnicity
#There are no entries in this data set under "other"
cpsipums3$asian<-recode(cpsipums3$raceth, recodes=1)
cpsipums3$white<-recode(cpsipums3$raceth, recodes=2)
cpsipums3$minorities<-recode(cpsipums3$raceth, recodes=3)
cpsipums3$other<-recode(cpsipums3$raceth, recodes=4)
cpsipums3$raceth<-recode(cpsipums3$raceth, recodes="1='asian'; 2='white'; 3='minorities'", as.factor.result=T)
cpsipums3$raceth<-relevel(cpsipums3$raceth, ref = "white")
table(cpsipums3$raceth)
##
## white asian minorities
## 51846 13868 18742
#education level
cpsipums3$dgrdg<-recode(cpsipums3$dgrdg, recodes="1='0bachelors'; 2='1masters'; 3='2doctorate'; 4='3professional'", as.factor.result=T)
table(cpsipums3$dgrdg, cpsipums3$gender)
##
## female male
## 0bachelors 18930 25269
## 1masters 16756 16585
## 2doctorate 1103 1611
## 3professional 1837 2365
Our outcome here is the number of professional organizations each individual belongs to.
Q: To how many regional, national, or international professional societies or associations do you currently belong?
The max amount of organizations that participants are members of in this sample is 6 with a mean of .85.
cpsipums3$organizations<-recode(cpsipums3$prmbr, recodes = "0000=0; 96=NA; 99=NA; 98=NA")
hist(cpsipums3$organizations)
summary(cpsipums3$organizations)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0000 0.0000 0.0000 0.8502 1.0000 6.0000 488
Below is a subset of data to have complete cases for the variables in the model and the survey design object
##To keep complete cases on key variables,
cpsipums3<-cpsipums3[is.na(cpsipums3$organizations)==F&
is.na(cpsipums3$gender)==F&
is.na(cpsipums3$raceth)==F&
is.na(cpsipums3$dgrdg)==F,]
options(survey.lonely.psu = "adjust")
des<-svydesign(ids=~1, strata=~sample,
weights=~weight,
data =cpsipums3[is.na(cpsipums3$weight)==F,])
To fit a Poisson GLM to survey data in R, we use the svyglm function in the survey library.
We can identify in looking at the counts that the white population of our sample with a professional degree has a larger number of memberships to organizations when compared to the rest of the groups. The lowest number belongs to the minority group with a bachelors degree.
A comparison between genders, shows that males are slightly more likely to have a higher number of memberships to professional organizations when compared to females.
After calculating the risk ratios, we see that those with a professional degree are 3.2 times more likely than those with a bachelor to belong to a professional organization, followed by those with a doctoral degree and masters degree. Asians are 29% less likely and minorities 4% when compared to the white population to belong to a professional organization.
#Simple descriptives
svyhist(~organizations, des)
svyby(~organizations, ~raceth+dgrdg, des, svymean, na.rm=T)
## raceth dgrdg organizations se
## white.0bachelors white 0bachelors 0.6272220 0.01158311
## asian.0bachelors asian 0bachelors 0.4700259 0.02408137
## minorities.0bachelors minorities 0bachelors 0.5891335 0.02758589
## white.1masters white 1masters 0.9920589 0.01694046
## asian.1masters asian 1masters 0.6383439 0.03290856
## minorities.1masters minorities 1masters 1.0072562 0.03316601
## white.2doctorate white 2doctorate 1.8949060 0.07546039
## asian.2doctorate asian 2doctorate 1.3807363 0.07395621
## minorities.2doctorate minorities 2doctorate 1.6521542 0.16744618
## white.3professional white 3professional 2.0526469 0.04623289
## asian.3professional asian 3professional 1.5993105 0.10948006
## minorities.3professional minorities 3professional 1.8856489 0.08347133
svyby(~organizations, ~gender, des, svymean, na.rm=T)
## gender organizations se
## female female 0.8141294 0.01178230
## male male 0.8815456 0.01214697
#Poisson glm fit to survey data
fit1<-svyglm(organizations~factor(raceth)+factor(dgrdg)+factor(gender), design=des, family=poisson)
summary(fit1)
##
## Call:
## svyglm(formula = organizations ~ factor(raceth) + factor(dgrdg) +
## factor(gender), design = des, family = poisson)
##
## Survey design:
## svydesign(ids = ~1, strata = ~sample, weights = ~weight, data = cpsipums3[is.na(cpsipums3$weight) ==
## F, ])
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.49127 0.02008 -24.471 <2e-16 ***
## factor(raceth)asian -0.32997 0.03224 -10.236 <2e-16 ***
## factor(raceth)minorities -0.03972 0.02745 -1.447 0.1480
## factor(dgrdg)1masters 0.46077 0.02205 20.894 <2e-16 ***
## factor(dgrdg)2doctorate 1.09130 0.03663 29.796 <2e-16 ***
## factor(dgrdg)3professional 1.18491 0.02538 46.684 <2e-16 ***
## factor(gender)male 0.04532 0.01907 2.376 0.0175 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1.63174)
##
## Number of Fisher Scoring iterations: 6
#here are the poisson model "risk ratios", which just show the change in the mean
round(exp(summary(fit1)$coef[-1,1]), 3)
## factor(raceth)asian factor(raceth)minorities
## 0.719 0.961
## factor(dgrdg)1masters factor(dgrdg)2doctorate
## 1.585 2.978
## factor(dgrdg)3professional factor(gender)male
## 3.270 1.046
Measure of the residual deviance
fit2<-glm(organizations~factor(raceth)+factor(dgrdg)+factor(gender), data=cpsipums3, family=poisson)
summary(fit2)
##
## Call:
## glm(formula = organizations ~ factor(raceth) + factor(dgrdg) +
## factor(gender), family = poisson, data = cpsipums3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0073 -1.1333 -0.9686 0.4342 4.4197
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.443961 0.008067 -55.037 <2e-16 ***
## factor(raceth)asian -0.314026 0.011428 -27.479 <2e-16 ***
## factor(raceth)minorities -0.025085 0.009200 -2.727 0.0064 **
## factor(dgrdg)1masters 0.458239 0.008307 55.165 <2e-16 ***
## factor(dgrdg)2doctorate 1.040561 0.016047 64.845 <2e-16 ***
## factor(dgrdg)3professional 1.143373 0.012710 89.958 <2e-16 ***
## factor(gender)male 0.001060 0.007565 0.140 0.8886
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 135176 on 83967 degrees of freedom
## Residual deviance: 124840 on 83961 degrees of freedom
## AIC: 217352
##
## Number of Fisher Scoring iterations: 6
scale<-sqrt(fit2$deviance/fit2$df.residual)
scale
## [1] 1.219375
The deviance can also be a test of model fit, using a chi square distribution, with degrees of freedom equal to the residual d.f. (n-p).
The results shows a value of Zero indicating that the model does not work.
1-pchisq(fit2$deviance, df = fit2$df.residual)
## [1] 0
The dispersion parameter is 1.6 which is not too high.
fit3<-glm(organizations~factor(raceth)+factor(dgrdg)+factor(gender), data=cpsipums3, family=quasipoisson)
summary(fit3)
##
## Call:
## glm(formula = organizations ~ factor(raceth) + factor(dgrdg) +
## factor(gender), family = quasipoisson, data = cpsipums3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0073 -1.1333 -0.9686 0.4342 4.4197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.443961 0.010217 -43.454 <2e-16 ***
## factor(raceth)asian -0.314026 0.014474 -21.696 <2e-16 ***
## factor(raceth)minorities -0.025085 0.011653 -2.153 0.0313 *
## factor(dgrdg)1masters 0.458239 0.010521 43.554 <2e-16 ***
## factor(dgrdg)2doctorate 1.040561 0.020324 51.197 <2e-16 ***
## factor(dgrdg)3professional 1.143373 0.016098 71.025 <2e-16 ***
## factor(gender)male 0.001060 0.009582 0.111 0.9119
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 1.604186)
##
## Null deviance: 135176 on 83967 degrees of freedom
## Residual deviance: 124840 on 83961 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(sandwich)
A review of the samples in CPSIPUMS3 shows that there is no variation in the clustering.
The level of dispersion remains the same as previous models.
summary(cpsipums3$sample)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1001 1001 1001 1001 1001 1001
#Fit poisson, and compare it to the fit from the survey design
#Fit the Poisson GLM
cpsipums3$wts<-cpsipums3$weight/mean(cpsipums3$weight, na.rm=T)
fit.pois<-glm(organizations~factor(raceth)+factor(dgrdg)+factor(gender),data=cpsipums3[is.na(cpsipums3$organizations)==F,], weights=wts, family=poisson)
coeftest(fit.pois, vcov=vcovHC(fit.pois, type="HC1"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.491269 0.020076 -24.4706 < 2e-16 ***
## factor(raceth)asian -0.329966 0.032237 -10.2356 < 2e-16 ***
## factor(raceth)minorities -0.039720 0.027456 -1.4467 0.14798
## factor(dgrdg)1masters 0.460767 0.022054 20.8928 < 2e-16 ***
## factor(dgrdg)2doctorate 1.091299 0.036627 29.7953 < 2e-16 ***
## factor(dgrdg)3professional 1.184909 0.025382 46.6823 < 2e-16 ***
## factor(gender)male 0.045321 0.019071 2.3764 0.01748 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit1)
##
## Call:
## svyglm(formula = organizations ~ factor(raceth) + factor(dgrdg) +
## factor(gender), design = des, family = poisson)
##
## Survey design:
## svydesign(ids = ~1, strata = ~sample, weights = ~weight, data = cpsipums3[is.na(cpsipums3$weight) ==
## F, ])
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.49127 0.02008 -24.471 <2e-16 ***
## factor(raceth)asian -0.32997 0.03224 -10.236 <2e-16 ***
## factor(raceth)minorities -0.03972 0.02745 -1.447 0.1480
## factor(dgrdg)1masters 0.46077 0.02205 20.894 <2e-16 ***
## factor(dgrdg)2doctorate 1.09130 0.03663 29.796 <2e-16 ***
## factor(dgrdg)3professional 1.18491 0.02538 46.684 <2e-16 ***
## factor(gender)male 0.04532 0.01907 2.376 0.0175 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1.63174)
##
## Number of Fisher Scoring iterations: 6
#Fit the Negative Binomial GLM
library(MASS)
fit.nb2<-glm.nb(organizations~factor(raceth)+factor(dgrdg)+factor(gender),
data=cpsipums3[is.na(cpsipums3$organizations)==F,],
weights=wts)
coeftest(fit.nb2, vcov=vcovHC(fit.nb2, type="HC1"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.489866 0.020408 -24.0037 < 2e-16 ***
## factor(raceth)asian -0.333834 0.032901 -10.1467 < 2e-16 ***
## factor(raceth)minorities -0.038256 0.029608 -1.2921 0.19632
## factor(dgrdg)1masters 0.459917 0.022030 20.8770 < 2e-16 ***
## factor(dgrdg)2doctorate 1.093180 0.036244 30.1615 < 2e-16 ***
## factor(dgrdg)3professional 1.186027 0.025505 46.5017 < 2e-16 ***
## factor(gender)male 0.043071 0.020067 2.1463 0.03185 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A review of the models, shows that the negative binomial may be a better choice for our analysis.
AIC(fit.pois)#Poisson
## [1] 215974.4
AIC(fit.nb2)#Negative binomial
## [1] 207041.1
The Poisson model does not seem to fit our analysis based on the result of the AIC and the chi square distribution. However, a negative binomial analysis seems to be a better choice.
We are now able to answer each of the following research questions:
1. Are higher levels of education associated with a larger number of memberships to professional organizations?No, the higher number of professional organizations is lead by those with a professional degree followed by those with the doctoral and then masters degree.
2. Are males more likely to exceed in the number of professional organizations they belong to when compared to females?Slightly at only .04 more likely than females.
3. Is the White group more likely to have a larger number of memberships to professional organizations when compared to other minorities and Asian groups?Yes, the White population is more likely to belong to more professional organization followed by the minorities and then the Asian group. It is interesting to see that in all models, the Asian group had a lower estimate than the minorities.