All the variables in this dataset were compiled from a sample of the National Survey of College Graduates through the IPUMS-HigherEd site.

For this assignment, we analyze the relationship between gender, ethnicity, education level and membership to professional organizations.

The count of outcome selected for this assignment is the number of membership to organizations.

The research questions we attempt to respond are:

1. Are higher levels of education associated with a larger number of memberships to professional organizations?

2. Are males more likely to exceed in the number of professional organizations they belong to when compared to females?

3. Is the White group more likely to have a larger number of memberships to professional organizations when compared to other minorities and Asian groups?

The data used for this analysis consists of a sample from the National Survey of College Graduates gathered through IPUMS-Higher Ed.

library(car)
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.1. https://CRAN.R-project.org/package=stargazer
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(sjPlot)
library(ggplot2)
library(pander)
library(knitr)
library(haven)
cpsipums3<-read_dta("~/Statistics II/Data for project/highered/highered_00008.dta")
names(cpsipums3) #print the column names
##  [1] "personid" "year"     "weight"   "sample"   "surid"    "gender"  
##  [7] "minrty"   "raceth"   "bthus"    "chtot"    "dgrdg"    "hrswkgr" 
## [13] "prmbr"    "salary"

Recoding of Variables

#income grouping
cpsipums3$salary3<-ifelse(cpsipums3$salary==9999998:9999999, NA, cpsipums3$salary)

summary (cpsipums3$salary3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0   44000   71000  906935  108000 9999998    6519
#number of children living in the household
cpsipums3$chtot<-recode(cpsipums3$chtot, recodes="00='no children'; 01='one child'; 02='one to three children'; 03='two or more children'; 04='more than 3 children'; 98=NA", as.factor.result=T)
cpsipums3$chtot<-relevel(cpsipums3$chtot, ref = "one child")
table(cpsipums3$chtot)
## 
##            one child two or more children 
##                14472                20517
#gender
cpsipums3$female<-recode(cpsipums3$gender, recodes=1)
cpsipums3$male<-recode(cpsipums3$gender, recodes=2)
cpsipums3$gender<-recode(cpsipums3$gender, recodes="1='female'; 2='male'", as.factor.result=T)
table(cpsipums3$gender)
## 
## female   male 
##  38626  45830
cpsipums3$gender<-relevel(cpsipums3$gender, ref = "female")
#race/ethnicity 
#There are no entries in this data set under "other"
cpsipums3$asian<-recode(cpsipums3$raceth, recodes=1)
cpsipums3$white<-recode(cpsipums3$raceth, recodes=2)
cpsipums3$minorities<-recode(cpsipums3$raceth, recodes=3)
cpsipums3$other<-recode(cpsipums3$raceth, recodes=4)

cpsipums3$raceth<-recode(cpsipums3$raceth, recodes="1='asian'; 2='white'; 3='minorities'", as.factor.result=T)
cpsipums3$raceth<-relevel(cpsipums3$raceth, ref = "white")

table(cpsipums3$raceth)
## 
##      white      asian minorities 
##      51846      13868      18742
 #education level
cpsipums3$dgrdg<-recode(cpsipums3$dgrdg, recodes="1='0bachelors'; 2='1masters'; 3='2doctorate'; 4='3professional'", as.factor.result=T)
table(cpsipums3$dgrdg, cpsipums3$gender)
##                
##                 female  male
##   0bachelors     18930 25269
##   1masters       16756 16585
##   2doctorate      1103  1611
##   3professional   1837  2365

Our outcome here is the number of professional organizations each individual belongs to.

Q: To how many regional, national, or international professional societies or associations do you currently belong?

The max amount of organizations that participants are members of in this sample is 6 with a mean of .85.

cpsipums3$organizations<-recode(cpsipums3$prmbr, recodes = "0000=0; 96=NA; 99=NA; 98=NA")
hist(cpsipums3$organizations)

summary(cpsipums3$organizations)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##  0.0000  0.0000  0.0000  0.8502  1.0000  6.0000     488

Analysis

Below is a subset of data to have complete cases for the variables in the model and the survey design object

##To keep complete cases on key variables,


cpsipums3<-cpsipums3[is.na(cpsipums3$organizations)==F&
                     is.na(cpsipums3$gender)==F&
                     is.na(cpsipums3$raceth)==F&
                     is.na(cpsipums3$dgrdg)==F,]

options(survey.lonely.psu = "adjust")
des<-svydesign(ids=~1, strata=~sample, 
               weights=~weight,
               data =cpsipums3[is.na(cpsipums3$weight)==F,])

Poisson regression example

To fit a Poisson GLM to survey data in R, we use the svyglm function in the survey library.

We can identify in looking at the counts that the white population of our sample with a professional degree has a larger number of memberships to organizations when compared to the rest of the groups. The lowest number belongs to the minority group with a bachelors degree.

A comparison between genders, shows that males are slightly more likely to have a higher number of memberships to professional organizations when compared to females.

After calculating the risk ratios, we see that those with a professional degree are 3.2 times more likely than those with a bachelor to belong to a professional organization, followed by those with a doctoral degree and masters degree. Asians are 29% less likely and minorities 4% when compared to the white population to belong to a professional organization.

#Simple descriptives
svyhist(~organizations, des)

svyby(~organizations, ~raceth+dgrdg, des, svymean, na.rm=T)
##                              raceth         dgrdg organizations         se
## white.0bachelors              white    0bachelors     0.6272220 0.01158311
## asian.0bachelors              asian    0bachelors     0.4700259 0.02408137
## minorities.0bachelors    minorities    0bachelors     0.5891335 0.02758589
## white.1masters                white      1masters     0.9920589 0.01694046
## asian.1masters                asian      1masters     0.6383439 0.03290856
## minorities.1masters      minorities      1masters     1.0072562 0.03316601
## white.2doctorate              white    2doctorate     1.8949060 0.07546039
## asian.2doctorate              asian    2doctorate     1.3807363 0.07395621
## minorities.2doctorate    minorities    2doctorate     1.6521542 0.16744618
## white.3professional           white 3professional     2.0526469 0.04623289
## asian.3professional           asian 3professional     1.5993105 0.10948006
## minorities.3professional minorities 3professional     1.8856489 0.08347133
svyby(~organizations, ~gender, des, svymean, na.rm=T)
##        gender organizations         se
## female female     0.8141294 0.01178230
## male     male     0.8815456 0.01214697
#Poisson glm fit to survey data
fit1<-svyglm(organizations~factor(raceth)+factor(dgrdg)+factor(gender), design=des, family=poisson)
summary(fit1)
## 
## Call:
## svyglm(formula = organizations ~ factor(raceth) + factor(dgrdg) + 
##     factor(gender), design = des, family = poisson)
## 
## Survey design:
## svydesign(ids = ~1, strata = ~sample, weights = ~weight, data = cpsipums3[is.na(cpsipums3$weight) == 
##     F, ])
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -0.49127    0.02008 -24.471   <2e-16 ***
## factor(raceth)asian        -0.32997    0.03224 -10.236   <2e-16 ***
## factor(raceth)minorities   -0.03972    0.02745  -1.447   0.1480    
## factor(dgrdg)1masters       0.46077    0.02205  20.894   <2e-16 ***
## factor(dgrdg)2doctorate     1.09130    0.03663  29.796   <2e-16 ***
## factor(dgrdg)3professional  1.18491    0.02538  46.684   <2e-16 ***
## factor(gender)male          0.04532    0.01907   2.376   0.0175 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1.63174)
## 
## Number of Fisher Scoring iterations: 6
#here are the poisson model "risk ratios", which just show the change in the mean
round(exp(summary(fit1)$coef[-1,1]), 3)
##        factor(raceth)asian   factor(raceth)minorities 
##                      0.719                      0.961 
##      factor(dgrdg)1masters    factor(dgrdg)2doctorate 
##                      1.585                      2.978 
## factor(dgrdg)3professional         factor(gender)male 
##                      3.270                      1.046

Overdispersion

Measure of the residual deviance

fit2<-glm(organizations~factor(raceth)+factor(dgrdg)+factor(gender), data=cpsipums3, family=poisson)
summary(fit2)
## 
## Call:
## glm(formula = organizations ~ factor(raceth) + factor(dgrdg) + 
##     factor(gender), family = poisson, data = cpsipums3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0073  -1.1333  -0.9686   0.4342   4.4197  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                -0.443961   0.008067 -55.037   <2e-16 ***
## factor(raceth)asian        -0.314026   0.011428 -27.479   <2e-16 ***
## factor(raceth)minorities   -0.025085   0.009200  -2.727   0.0064 ** 
## factor(dgrdg)1masters       0.458239   0.008307  55.165   <2e-16 ***
## factor(dgrdg)2doctorate     1.040561   0.016047  64.845   <2e-16 ***
## factor(dgrdg)3professional  1.143373   0.012710  89.958   <2e-16 ***
## factor(gender)male          0.001060   0.007565   0.140   0.8886    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 135176  on 83967  degrees of freedom
## Residual deviance: 124840  on 83961  degrees of freedom
## AIC: 217352
## 
## Number of Fisher Scoring iterations: 6
scale<-sqrt(fit2$deviance/fit2$df.residual)
scale
## [1] 1.219375

The deviance can also be a test of model fit, using a chi square distribution, with degrees of freedom equal to the residual d.f. (n-p).

The results shows a value of Zero indicating that the model does not work.

1-pchisq(fit2$deviance, df = fit2$df.residual)
## [1] 0

Overdispesion via a Quasi Distribution

The dispersion parameter is 1.6 which is not too high.

fit3<-glm(organizations~factor(raceth)+factor(dgrdg)+factor(gender), data=cpsipums3, family=quasipoisson)
summary(fit3)
## 
## Call:
## glm(formula = organizations ~ factor(raceth) + factor(dgrdg) + 
##     factor(gender), family = quasipoisson, data = cpsipums3)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0073  -1.1333  -0.9686   0.4342   4.4197  
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -0.443961   0.010217 -43.454   <2e-16 ***
## factor(raceth)asian        -0.314026   0.014474 -21.696   <2e-16 ***
## factor(raceth)minorities   -0.025085   0.011653  -2.153   0.0313 *  
## factor(dgrdg)1masters       0.458239   0.010521  43.554   <2e-16 ***
## factor(dgrdg)2doctorate     1.040561   0.020324  51.197   <2e-16 ***
## factor(dgrdg)3professional  1.143373   0.016098  71.025   <2e-16 ***
## factor(gender)male          0.001060   0.009582   0.111   0.9119    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 1.604186)
## 
##     Null deviance: 135176  on 83967  degrees of freedom
## Residual deviance: 124840  on 83961  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Other count models - Negative binomial

library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(sandwich)

A review of the samples in CPSIPUMS3 shows that there is no variation in the clustering.

The level of dispersion remains the same as previous models.

summary(cpsipums3$sample)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1001    1001    1001    1001    1001    1001
#Fit poisson, and compare it to the fit from the survey design
#Fit the Poisson GLM
cpsipums3$wts<-cpsipums3$weight/mean(cpsipums3$weight, na.rm=T)
fit.pois<-glm(organizations~factor(raceth)+factor(dgrdg)+factor(gender),data=cpsipums3[is.na(cpsipums3$organizations)==F,], weights=wts, family=poisson)

coeftest(fit.pois, vcov=vcovHC(fit.pois, type="HC1"))
## 
## z test of coefficients:
## 
##                             Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)                -0.491269   0.020076 -24.4706  < 2e-16 ***
## factor(raceth)asian        -0.329966   0.032237 -10.2356  < 2e-16 ***
## factor(raceth)minorities   -0.039720   0.027456  -1.4467  0.14798    
## factor(dgrdg)1masters       0.460767   0.022054  20.8928  < 2e-16 ***
## factor(dgrdg)2doctorate     1.091299   0.036627  29.7953  < 2e-16 ***
## factor(dgrdg)3professional  1.184909   0.025382  46.6823  < 2e-16 ***
## factor(gender)male          0.045321   0.019071   2.3764  0.01748 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit1)
## 
## Call:
## svyglm(formula = organizations ~ factor(raceth) + factor(dgrdg) + 
##     factor(gender), design = des, family = poisson)
## 
## Survey design:
## svydesign(ids = ~1, strata = ~sample, weights = ~weight, data = cpsipums3[is.na(cpsipums3$weight) == 
##     F, ])
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -0.49127    0.02008 -24.471   <2e-16 ***
## factor(raceth)asian        -0.32997    0.03224 -10.236   <2e-16 ***
## factor(raceth)minorities   -0.03972    0.02745  -1.447   0.1480    
## factor(dgrdg)1masters       0.46077    0.02205  20.894   <2e-16 ***
## factor(dgrdg)2doctorate     1.09130    0.03663  29.796   <2e-16 ***
## factor(dgrdg)3professional  1.18491    0.02538  46.684   <2e-16 ***
## factor(gender)male          0.04532    0.01907   2.376   0.0175 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1.63174)
## 
## Number of Fisher Scoring iterations: 6
#Fit the Negative Binomial GLM
library(MASS)
fit.nb2<-glm.nb(organizations~factor(raceth)+factor(dgrdg)+factor(gender),
              data=cpsipums3[is.na(cpsipums3$organizations)==F,],
              weights=wts)
coeftest(fit.nb2, vcov=vcovHC(fit.nb2, type="HC1"))
## 
## z test of coefficients:
## 
##                             Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)                -0.489866   0.020408 -24.0037  < 2e-16 ***
## factor(raceth)asian        -0.333834   0.032901 -10.1467  < 2e-16 ***
## factor(raceth)minorities   -0.038256   0.029608  -1.2921  0.19632    
## factor(dgrdg)1masters       0.459917   0.022030  20.8770  < 2e-16 ***
## factor(dgrdg)2doctorate     1.093180   0.036244  30.1615  < 2e-16 ***
## factor(dgrdg)3professional  1.186027   0.025505  46.5017  < 2e-16 ***
## factor(gender)male          0.043071   0.020067   2.1463  0.03185 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Review of the Relative Model Fits

A review of the models, shows that the negative binomial may be a better choice for our analysis.

AIC(fit.pois)#Poisson
## [1] 215974.4
AIC(fit.nb2)#Negative binomial
## [1] 207041.1

Conclusion

The Poisson model does not seem to fit our analysis based on the result of the AIC and the chi square distribution. However, a negative binomial analysis seems to be a better choice.

We are now able to answer each of the following research questions:

1. Are higher levels of education associated with a larger number of memberships to professional organizations?No, the higher number of professional organizations is lead by those with a professional degree followed by those with the doctoral and then masters degree.

2. Are males more likely to exceed in the number of professional organizations they belong to when compared to females?Slightly at only .04 more likely than females.

3. Is the White group more likely to have a larger number of memberships to professional organizations when compared to other minorities and Asian groups?Yes, the White population is more likely to belong to more professional organization followed by the minorities and then the Asian group. It is interesting to see that in all models, the Asian group had a lower estimate than the minorities.