Introduction

For this assignment I am using the General Social Survey of 2012. In the following assessment I want to see the effect of basic demographics of respondents affect whether they believe that the racial disparities between blacks and whites is due to discrimination.

library(Zelig)
library(DescTools)
library(stargazer)
library(dplyr)
library(scatterplot3d)
library(tidyr)
library(memisc)
library(pander)
library(gmodels)
library(Hmisc)
library(car)
library(foreign)
library(readstata13)
library(ggplot2)
library(Rcpp)
library(ggthemes)
gss<-read.dta("C:/Users/Xiomara/Desktop/R/GSS.dta")
names(gss)
##  [1] "CASEID"   "WORKBLKS" "RACDIF1"  "RACMAR"   "RACDIF2"  "RACDIF3" 
##  [7] "HELPBLK"  "HELPPOOR" "YEAR"     "SEX"      "AGE"      "RACE"    
## [13] "REALINC"  "REALRINC" "EDUC"     "DEGREE"   "PRESTG80" "PAPRES80"
## [19] "MARITAL"  "DIVORCE"  "CHILDS"   "RELIG"    "WRKSLF"   "UNEMP"   
## [25] "REGION"   "SIZE"     "RACLIVE"  "FEAR"     "GUN"      "POLVIEWS"
## [31] "FECHLD"   "FEFAM"
Data1<-select(gss, WORKBLKS, RACDIF1, RACDIF2, RACDIF3, RACMAR, HELPBLK, HELPPOOR, SEX, AGE, RACE, REALINC, EDUC, RELIG, CHILDS)
names(Data1)
##  [1] "WORKBLKS" "RACDIF1"  "RACDIF2"  "RACDIF3"  "RACMAR"   "HELPBLK" 
##  [7] "HELPPOOR" "SEX"      "AGE"      "RACE"     "REALINC"  "EDUC"    
## [13] "RELIG"    "CHILDS"
df<- data.frame(RACDIF1= c("NO", "YES", NA), stringAsFactor=FALSE)
Data1$AGE= as.numeric(Data1$AGE)
Data1$SEX= as.numeric(Data1$SEX)
Data1$EDUC= as.numeric(Data1$EDUC)
Data1$RELIG= as.numeric(Data1$RELIG)
Data1$RACE = as.numeric(Data1$RACE)

Logit Model #1

The following three models are used to demonstrate how these basic demographics affect whether a respondent believes that racial disparities exists because of discrimination. Model number 1 (m1) is showing whether age and sex affect respondents answer. Model 2(m2) adds education to the previous model to see if its statistical significance changes and model 3(m3) adds religion.

m1<-zelig(RACDIF1 ~ AGE + SEX, data=Data1, model="logit", cite=F)
m2<-zelig(RACDIF1 ~ AGE + SEX + EDUC, data=Data1, model="logit", cite=F)
m3<-zelig(RACDIF1~ AGE + SEX + EDUC + RELIG, data=Data1, model="logit", cite=F)
stargazer(m1, m2, m3, type="html")
Dependent variable:
RACDIF1
(1) (2) (3)
AGE 0.002** 0.002** 0.001
(0.001) (0.001) (0.001)
SEX -0.264*** -0.263*** -0.273***
(0.027) (0.027) (0.028)
EDUC 0.010** 0.011**
(0.005) (0.005)
RELIG -0.044***
(0.008)
Constant 0.739*** 0.600*** 0.748***
(0.056) (0.087) (0.092)
Observations 22,773 22,724 22,645
Log Likelihood -15,296.990 -15,261.920 -15,194.420
Akaike Inf. Crit. 30,599.990 30,531.850 30,398.850
Note: p<0.1; p<0.05; p<0.01

To determine the best model we have to look at the Akaike Inf. Crit. this determines that model 3 is the best fit. What we see in the above model is that there is a statistical significance at a 95% confidence level between age and sex of respondent with their belief that racial differences are due primarily to discrimination. We the variable religion is input we see that the significance of age vanishes.

Using Ggplot2

Here I give some examples of using the ggplot package to demonstrate the relationship between age and education on belief that racial differences are due to discrimination

Data1$RACDIF1= as.numeric(Data1$RACDIF1)
g<-ggplot(Data1, mapping = aes(x= AGE, y = RACDIF1))
g1<- g + geom_smooth()
g1
## Warning in loop_apply(n, do.ply): Removed 32314 rows containing missing
## values (stat_smooth).

g2<-ggplot(Data1, mapping = aes(x=EDUC, y = RACDIF1))
g2<- g2 + geom_smooth()
g2
## Warning in loop_apply(n, do.ply): Removed 32301 rows containing missing
## values (stat_smooth).

Logit Model #2

The second model below shows the relationship between race and age with their belief that racial discrimination is due to discrimination. This model also shows if there is a relationship between belief that racial discrimination is due to discrimination and the influence, if any, of any interaction between race and age.

Data1$RACDIF1= as.factor(Data1$RACDIF1)
m4 <- zelig(RACDIF1 ~ RACE + AGE, data=Data1, model="logit", cite=F)

m5 <- zelig(RACDIF1 ~ RACE + AGE + RACE:AGE, data=Data1, model="logit", cite=F)
stargazer(m4, m5, type="html")
Dependent variable:
RACDIF1
(1) (2)
RACE -0.618*** -0.235***
(0.026) (0.072)
AGE -0.001 0.020***
(0.001) (0.004)
RACE:AGE -0.010***
(0.002)
Constant 1.840*** 0.993***
(0.073) (0.167)
Observations 22,773 22,773
Log Likelihood -15,039.560 -15,023.580
Akaike Inf. Crit. 30,085.110 30,055.170
Note: p<0.1; p<0.05; p<0.01

The above model shows that there is a correlation with 99% confindence interval between the belief that racial differences are due to discrimination and the interaction with the respondents race and age.

Simulation

The following simulations are based on the race of the respondent and how their age and the number of years of education predict the likelihood of each respondents belief that racial differences are due primarily to discrimination.

s1<-zelig(RACDIF1 ~ RACE + EDUC + AGE, data=Data1, model="logit", cite=F)
s2<-setx(s1, RACE="2")
s3<-sim(s1, x=s2)
summary(s3)
## 
##   Model: logit 
##   Number of simulations: 1000 
## 
## Values of X 
##      (Intercept) RACE     EDUC     AGE
## 7591           1    2 13.06139 45.7206
## 
## Expected Values: E(Y|X)
##        mean          sd      2.5%     97.5%
## 1 0.6334598 0.003368065 0.6272744 0.6399612
## 
## Predicted Values: Y|X
##       0     1
## 1 0.367 0.633
v1<-setx(s1, RACE ="3")
v2<-sim(s1, x=v1)
summary(v2)
## 
##   Model: logit 
##   Number of simulations: 1000 
## 
## Values of X 
##      (Intercept) RACE     EDUC     AGE
## 7591           1    3 13.06139 45.7206
## 
## Expected Values: E(Y|X)
##        mean          sd      2.5%     97.5%
## 1 0.4825936 0.005998436 0.4710358 0.4945634
## 
## Predicted Values: Y|X
##       0     1
## 1 0.497 0.503
n1<-setx(s1, RACE="4")
n2<-sim(s1, x=n1)
summary(n2)
## 
##   Model: logit 
##   Number of simulations: 1000 
## 
## Values of X 
##      (Intercept) RACE     EDUC     AGE
## 7591           1    4 13.06139 45.7206
## 
## Expected Values: E(Y|X)
##        mean         sd      2.5%     97.5%
## 1 0.3336588 0.01059002 0.3128387 0.3538362
## 
## Predicted Values: Y|X
##       0     1
## 1 0.654 0.346

According to the simulation above whites are 37% likely to believe that racial differences is due to discrimination. Based on this simulation Blacks are 50% likely to believe that racial differnces are due to discrimination. Those respondents who are categorized under “Other” in terms of race are 65% likely to believe that racial differences are due to discrimination.

Simulation #1 using Ggplot2

d<-as.data.frame(s3$qi$ev)
colnames(d)<-("EVs")
g4<-ggplot(d, mapping = aes(x=EVs)) + geom_density()
g4

Difference in Differences

Data1$age2[Data1$AGE>46]=1
Data1$age2[Data1$AGE<46]=2
xh1<-setx(m5, age2 = mean(Data1$age2)+ sd(Data1$age), RACE=2)
xl1<-setx(m5, age2 = mean(Data1$age2), RACE=2)
xh0<-setx(m5, age2= mean(Data1$age2)+ sd(Data1$age2), RACE=3)
xl0<-setx(m5, age2=mean(Data1$age2), RACE=3)
zh1 <- sim(m5, x=xh1)

zl1 <- sim(m5, x=xl1)

zh0 <- sim(m5, x=xh0)

zl0 <- sim(m5, x=xl0)

eff <- (zh1$qi$ev - zl1$qi$ev) -(zh0$qi$ev - zl0$qi$ev)


summary(eff)
##        V1           
##  Min.   :-0.031957  
##  1st Qu.:-0.008158  
##  Median :-0.000861  
##  Mean   :-0.001002  
##  3rd Qu.: 0.005876  
##  Max.   : 0.036489
hist(eff)

GLM using Count Data

In the following model I am using the same data, the GSS of 2012. I this section I am looking at the variable CHILDS, a count variable which measures the number of children the respondent has. This model will look at if there is a higher probability of have more children based on the race of the respondent or their age. We want to see if there is a significant interaction between these variables

Data1$CHILDS= as.numeric(Data1$CHILDS)
model.c <- zelig(CHILDS~ RACE + AGE, data=Data1, model="poisson", cite=F)

stargazer(model.c, type="html")
Dependent variable:
CHILDS
RACE 0.147***
(0.006)
AGE 0.018***
(0.0002)
Constant -0.553***
(0.017)
Observations 54,726
Log Likelihood -99,331.460
Akaike Inf. Crit. 198,668.900
Note: p<0.1; p<0.05; p<0.01

The above model shows that there is statistical significance at a 99% confidence level between the respodents age and race and the number of children they have. We see that if a respondent is non-white the number of children increases by .147, we also see that a one point increase in age increases the number of children by .018.

model1<-zelig(CHILDS ~ RACE + AGE, data=Data1, model= "poisson", cite=F)

model2 <- setx(model1, RACE= "2")
model3 <- sim(model1, x=model2)
summary(model3)
## 
##   Model: poisson 
##   Number of simulations: 1000 
## 
## Values of X 
##   (Intercept) RACE      AGE
## 1           1    2 45.60565
## 
## Expected Values: E(Y|X)
##       mean          sd     2.5%    97.5%
## 1 1.790742 0.006556439 1.778024 1.803636
## 
## Predicted Values: Y|X
##    mean       sd 2.5% 97.5%
## 1 1.738 1.258149    0     5
plot(model3)

Simulation #2 Graph Using Ggplot2

d1<-as.data.frame(model3$qi$ev)
colnames(d1)<-("EVs")
g5<-ggplot(d1, mapping=aes(x=EVs)) + geom_density()
g5<-g5 + theme_stata() + scale_colour_stata()
g5

summary(model3)
## 
##   Model: poisson 
##   Number of simulations: 1000 
## 
## Values of X 
##   (Intercept) RACE      AGE
## 1           1    2 45.60565
## 
## Expected Values: E(Y|X)
##       mean          sd     2.5%    97.5%
## 1 1.790742 0.006556439 1.778024 1.803636
## 
## Predicted Values: Y|X
##    mean       sd 2.5% 97.5%
## 1 1.738 1.258149    0     5