Safiya Stewart

Sociology 712

Professor Songe

3/29/19

For this assignment, I am still working with the NHIS (National Health Interview Survey.) For this week, I decided to focus on the geographical location of respondents and their racial background. I wanted to see if any possible relationship exists between the two and if so, what is the probability that a respondent of a certain racial background will choose to live in a certain region of the United States. My question is as follows:

Does one’s racial ethnicity & sex affect their geographic location (USA only)?

The effect that I will be focusing on is, US Region, I chose to work with the file titled Persons for the 2017 calendar year. For your reference, you can download the data set and codebook here

VARIABLES USED

  • Dependent variable is categorical and named US_Region. Independent variables are also categorical and named Sex and Race.

  • Independent variables are: Race & Sex coded as follows:

  • Variable US_Region was coded as 1=Northeast, 2=Midwest 3=South & 4=West
  • Variable Sex was coded as 1= Male; 2= Female
  • Variable Race was coded as 1= White, 2= Black & 3= Asian

HYPOTHESIS

  1. I believe that race and sex will have a strong relationship on the likelihood of where one chooses to live in the US

I will be using the multinomial logit model as my dependent variable has 4 categories and my independent variables also contain multiple categories.

NOTE TO READER: THROUGHOUT MY INTERPRETATIONS, I USED THE TERMS GEOGRAPHIC LOCATION AND U.S. REGION INTERCHANGEABLY. IN THIS ANALYSIS, THEY REFER TO THE SAME THING WHICH IS THE SPECIFIC REGION OF THE UNITED STATES THAT OUR 50 STATES BELONG TO

#install.packages("ZeligChoice")
library(dplyr)
library(tidyr)
library(Zelig)
library(readr)
library(ZeligChoice)

InputStat<-read_csv("/Users/safiesaf/Downloads/personsx.csv")

LivStat<-InputStat%>%
  rename("US_Region"=REGION,
         "Sex"=SEX,
         "Race"=RACRECI3,
         "Education"=EDUC1,
         "MaritalStat"=R_MARITL)%>%
  
select(US_Region,
       Sex,
       Race,
       Education,
       MaritalStat)%>%
  
  mutate(US_Region=factor(US_Region),
         Race=factor(Race),
         Sex=factor(Sex),
         Education=factor(Education),
         MaritalStat=factor(MaritalStat))

head(LivStat)
## # A tibble: 6 x 5
##   US_Region Sex   Race  Education MaritalStat
##   <fct>     <fct> <fct> <fct>     <fct>      
## 1 3         2     1     15        4          
## 2 3         2     2     13        7          
## 3 3         1     2     3         0          
## 4 2         1     1     15        7          
## 5 2         2     1     14        1          
## 6 2         1     1     16        1

GENERATING MY MODEL WITH MLOGIT

MODEL 1:

This shows the effect of Gender and Race on the US Region a person chooses to live

Z.Area <- zelig(US_Region~ Sex + Race, model = "mlogit", data = LivStat, cite = F)
summary(Z.Area)
## Model: 
## 
## Call:
## z5$zelig(formula = US_Region ~ Sex + Race, data = LivStat)
## 
## Pearson residuals:
##                       Min      1Q  Median      3Q   Max
## log(mu[,1]/mu[,4]) -1.599 -0.3228 -0.2341 -0.2055 4.670
## log(mu[,2]/mu[,4]) -1.592 -0.3882 -0.3083 -0.1782 2.906
## log(mu[,3]/mu[,4]) -2.249 -0.7192 -0.4193  1.3277 1.688
## 
## Coefficients: 
##                Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -0.352853   0.017360 -20.325  < 2e-16
## (Intercept):2  0.054737   0.015697   3.487 0.000489
## (Intercept):3  0.340610   0.014531  23.440  < 2e-16
## Sex2:1         0.034797   0.023097   1.507 0.131922
## Sex2:2        -0.003384   0.021189  -0.160 0.873111
## Sex2:3         0.035101   0.019264   1.822 0.068434
## Race2:1        0.816048   0.045231  18.042  < 2e-16
## Race2:2        0.429805   0.044886   9.575  < 2e-16
## Race2:3        1.553525   0.038095  40.780  < 2e-16
## Race3:1       -0.757856   0.042533 -17.818  < 2e-16
## Race3:2       -1.572683   0.049354 -31.865  < 2e-16
## Race3:3       -0.972730   0.035953 -27.056  < 2e-16
## Race4:1       -2.158730   0.128705 -16.773  < 2e-16
## Race4:2       -1.247438   0.074267 -16.797  < 2e-16
## Race4:3       -1.053044   0.062233 -16.921  < 2e-16
## 
## Names of linear predictors: log(mu[,1]/mu[,4]), log(mu[,2]/mu[,4]), 
## log(mu[,3]/mu[,4])
## 
## Residual deviance: 204817.8 on 234381 degrees of freedom
## 
## Log-likelihood: -102408.9 on 234381 degrees of freedom
## 
## Number of Fisher scoring iterations: 5 
## 
## No Hauck-Donner effect found in any of the estimates
## 
## 
## Reference group is level  4  of the response
## Next step: Use 'setx' method

MODEL 2:

This shows the effect of Gender (Sex), Race and Marital Status on the US Region someone chooses

#Z.Area2 <- zelig(US_Region~ Sex + Race+ MaritalStat, model = "mlogit", data = LivStat, cite = F)
#summary(Z.Area2)

MODEL 3:

This shows the effect of Gender (Sex), Race and Education on the US Region someone chooses

#Z.Area3 <- zelig(US_Region~ Sex + Race + Education, model = "mlogit", data = LivStat, cite = F)
#summary(Z.Area3)

I will be using Model 1 based on its significance. The output from the model shows statistical significance with race where US region is concerened. However, when it comes to gender the results are not significant. Because Race and Sex were the two most significant variables, I will focus on them in my analysis.

SETTING THE INDEPENDENT VARIABLE: RACE DIFFERENCE

After setting my counterfactual variable race, I want to see the probability of each race living in a particular region of the United States.

x <- setx(Z.Area, Race = 1) # 1= White
x1 <- setx(Z.Area, Race = 2) # 2= Black
x2 <- setx(Z.Area, Race = 3) # 3= Asian
s.race <- sim(Z.Area, x = x, x1 = x1, x2= x2)
#summary(s.race)
  • The above results show that on average, white respondents are 0.34 times more likely to live in the Soth than any other race. They are also, on average, 0.17 times more likely to live in the Northeast than any other race.

  • On average, black respondents are 0.62 times more likely ot live in the South than other races. Black respondents, on average, are also 0.09 times more likely to live in the West than other races.

THE FIRST DIFFERENCE IN THE RACES

  • Results show on average that black respondents are 0.02 times less likely to live in the Northeast than any other race. Black respondents are also, on average, 0.10 times less likely to live in the Midwest than any other race. Blacks are 0.27 times more likely to live in the South than any other race and 0.15 times less likely to live in the West than any other race.

SETTING THE INDEPENDENT VARIABLE: SEX DIFFERENCE

After setting my counterfactual variable sex, I want to see the probability of male and female living in a particular region of the United States.

x <- setx(Z.Area, Sex = 1) # 1= Male
x1 <- setx(Z.Area, Sex = 2) # 2= Female
s.sex <- sim(Z.Area, x = x, x1 = x1)
summary(s.sex)
## 
##  sim x :
##  -----
## ev
##              mean          sd       50%      2.5%     97.5%
## Pr(Y=1) 0.1686654 0.002085182 0.1688257 0.1641793 0.1723911
## Pr(Y=2) 0.2535879 0.002355645 0.2536125 0.2489357 0.2581446
## Pr(Y=3) 0.3375670 0.002577009 0.3376812 0.3325722 0.3428064
## Pr(Y=4) 0.2401796 0.002278230 0.2401480 0.2356630 0.2450298
## pv
##          1     2    3     4
## [1,] 0.173 0.238 0.35 0.239
## 
##  sim x1 :
##  -----
## ev
##              mean          sd       50%      2.5%     97.5%
## Pr(Y=1) 0.1718750 0.001964158 0.1718601 0.1680631 0.1758359
## Pr(Y=2) 0.2484385 0.002258674 0.2485161 0.2439903 0.2529004
## Pr(Y=3) 0.3436694 0.002545804 0.3437205 0.3387920 0.3485772
## Pr(Y=4) 0.2360172 0.002211953 0.2360280 0.2316529 0.2405419
## pv
##          1    2     3     4
## [1,] 0.163 0.24 0.335 0.262
## fd
##                 mean          sd          50%          2.5%       97.5%
## Pr(Y=1)  0.003209540 0.002654645  0.003216856 -0.0019475325 0.008568716
## Pr(Y=2) -0.005149446 0.003169480 -0.005119998 -0.0113456920 0.001329043
## Pr(Y=3)  0.006102359 0.003414478  0.006132603 -0.0008508194 0.012735104
## Pr(Y=4) -0.004162454 0.003023664 -0.004086313 -0.0104523649 0.001515673
  • On average, male respondents are 0.34 times more likely to live in the South than female respondents and they are 0.17 times more likely to live in the Northeast than females.

  • On average, female respondents are 0.34 times more likely to live in the South than males, they are also, on average, 0.17 times more likely to live in the northeast than men.

  • Women are estimated to live 0.003 times more in the Northeastern region of the United States than men, they are also estimated to live 0.005 times more in the Midwest region than men, they are estimated to live 0.006 times more in the Southern region of the United States than men. Lastly, females are estimated to live 0.004 times less in the Western region of the United States than male respondents.

CONCLUSION

The results show that when gender is a factor, there doesn’t seem to be much difference in the geographic location of men and women (they both have the same or close probabilities of living in the same regions of the U.S.) The resuls are very slim and thus not statistically significant so we can infer that the difference is simply due to random chance. Race on the other hand, IS statistically significant where geographic location is concerned. Black respondents have a high probability of living in the South with probability near that of 1, from the results, we can see that on average they are almost double that of whites in the South. However in the Midwest, black respondents have a VERY low probability of living there followed by the Northeast. My hypothesis was partly right in that I suspected that race and US Region would have a strong relationship. However, gender did not seem to have much of a relationship when U.S. Region was taken into account.