Alternative logit specifications a) Define an ordinal or multinomial outcome variable of your choosing and define how you will recode the original variable. b) State a research question about what factors you believe will affect your outcome variable. c) Fit either the ordinal or the multinomial logistic regression models to your outcome. a. Are the assumptions of the proportional odds model met? How did you evaluate the proportional odds assumption? d) For the best fitting model from part c, Describe the results of your model, and present output from the model in terms of odds ratios and confidence intervals for all model parameters in a table
options(repos="https://cran.rstudio.com" )
library(htmlTable)
library(car)
## Loading required package: carData
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
library(questionr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
I pulled data for 2018 5 Year ACS, with hopes to enventually do some time-series analysis. This data is only gathered for Texas, Oklahoma, and Louisiana, which are three likely states to have high concentrations of oil and gas employment. This was also done to limit the file size for my computer’s sake.
Compared to HW2, II will focus more on the characteristics of oil and gas workers, not the areas where oil and gas employment has a high concentration. So I am using a subset of iPums data where I am only looking at those in oil and gas (4,567 observations out of 1,715,275 original observations).
For iPUMS in R, more details available here: https://cran.r-project.org/web/packages/ipumsr/vignettes/ipums.html
#
install.packages("ipumsr")
##
## The downloaded binary packages are in
## /var/folders/lr/thl33hwj2jdd01hth_whxlz00000gn/T//Rtmp70Jxol/downloaded_packages
library(ipumsr)
## Registered S3 methods overwritten by 'ipumsr':
## method from
## format.pillar_shaft_haven_labelled_chr haven
## format.pillar_shaft_haven_labelled_num haven
## pillar_shaft.haven_labelled haven
# Note to self: Must set wd to where all files are, then open ddi and feed into read_ipums_micro
setwd("~/Desktop/UTSA/DEM7283_StatsII/HW2")
ddi <- read_ipums_ddi("usa_00011.xml")
data_total <- read_ipums_micro(ddi)
## Use of data from IPUMS-USA is subject to conditions including that users should
## cite the data appropriately. Use command `ipums_conditions()` for more details.
names(data_total)
## [1] "YEAR" "MULTYEAR" "SAMPLE" "SERIAL" "CBSERIAL" "HHWT"
## [7] "CLUSTER" "STATEFIP" "COUNTYFIP" "MET2013" "PUMA" "STRATA"
## [13] "GQ" "OWNERSHP" "OWNERSHPD" "VALUEH" "PERNUM" "PERWT"
## [19] "SEX" "AGE" "RACE" "RACED" "HISPAN" "HISPAND"
## [25] "EMPSTAT" "EMPSTATD" "LABFORCE" "OCC" "IND" "CLASSWKR"
## [31] "CLASSWKRD" "WKSWORK2" "UHRSWORK" "INCWAGE" "POVERTY" "MIGRATE1"
## [37] "MIGRATE1D"
## Since I'm hoping to find the industry employment for oil and gas workers, I need to see what codes are in the data. The Census Industry code I want is 0370 - Oil and Gas Extraction.
data <- filter(data_total, IND==370)
Instead of looking at oil gas geographies, I’m going to focus on the characteristics of people who work in oil/gas, so I’m limiting my population to only those in oil and gas, then will be looking at the occupations of those in the sector (service, management etc). Predictor variables are race, hours worked, whether someone moved in the last year (maybe different occupations are more likley to have moved than others) as well as poverty status. * Population: Those employed in Oil and Gas (Census Ind Code = 0370) in TX, LA, OK in 2018 5 Year ACS * Multinomial Outcome Variable: Occupation Class * Predictor Variables: Age, Hours Worked, Moved in Last Year, Hispanic/Latino
And the stratification for occupation is as follows: 1. Mgmt, Business, Science, Arts (“Mgmt”) 2. Services (“Svcs”) 3. Sales/Office (“Sales/Off”) 4. Natural Resources, Construction, and Maintenance (“Res/Const/Maint”) 5. Production, Transportation, and Material Moving (“Prod/Transp”) 0. Unemployed (“Unemp”)
## Exploring Variables
hist(data$INCWAGE)
hist(data$OCC)
hist(data$EMPSTAT)
describe(data$UHRSWORK)
## [4567 obs.] Usual hours worked per week
## labelled integer: 60 40 84 37 40 40 40 42 40 84 ...
## min: 0 - max: 99 - NAs: 0 (0%) - 70 unique values
## 2 value labels: [0] N/A [99] 99 (Topcode)
describe(data$OCC)
## [4567 obs.] Occupation
## labelled double: 565 5600 6200 5120 440 5120 6410 8990 1520 440 ...
## min: 10 - max: 9760 - NAs: 0 (0%) - 227 unique values
## 10 value labels: [1880] Occupation Codes [URL omitted from DDI.] (used for 1850-1900 samples) [1920] Occupation Codes [URL omitted from DDI.] (used for 1910-1920 samples) [1930] Occupation Codes [URL omitted from DDI.] [1940] Occupation Codes [URL omitted from DDI.] [1950] Occupation Codes - see OCC1950 [1960] Occupation Codes [URL omitted from DDI.] [1970] Occupation Codes [URL omitted from DDI.] [1980] Occupation Codes [URL omitted from DDI.] [1990] Occupation Codes [URL omitted from DDI.] [2000] Occupation Codes [URL omitted from DDI.]
describe(data$INCWAGE)
## [4567 obs.] Wage and salary income
## labelled double: 332205 101566 74058 111088 118494 51629 10580 68769 105798 158697 ...
## min: 0 - max: 477388 - NAs: 0 (0%) - 937 unique values
## 2 value labels: [999998] Missing [999999] N/A
# Recoding Occupation into categories for outcome variable
data$occ_c <- Recode(data$OCC, recodes = "10:3550 = '1. Mgmt'; 3601:4655 = '2. Svcs' ; 4700:5940 = '3. Sales/Off' ; 6005:7640 = '4. Res/Const/Maint' ; 7700:9830 = '5. Prod/Transp' ; else ='0. Unemp'", as.factor=T)
# Recoding age into groups of working age people (min = 17, max = 93)
data$age_c <-Recode(data$AGE, recodes="17:29 ='1. Young Adult'; 30:55='2. Working Age'; 56:65='3. Pre-Retirement' ; 65:93 = '4. Retirement Age'", as.factor=T)
# Recoding into part time, full time, overtime
data$fulltime <- Recode(data$UHRSWORK, recodes="0=NA; 1:34='1. Part Time'; 35:49='2. Full Time' ; 50:99='3. Overtime'", as.factor=T)
# Recoding move within last year
data$move <- Recode(data$MIGRATE1, recodes="0=NA; 1='2. Did not Move'; 2:4='1. Moved within last year'; 9=NA", as.factor=T)
# Recoding Hispanic/Latino ethnicity
data$hisp <- Recode(data$HISPAN, recodes = "0='1. Not Hispanic'; 1:4='2. Hispanic/Latino'; else = NA", as.factor=T)
# Recoding Race
data$race_c <- Recode(data$RACE, recodes = "1='1. White'; 2='2. Black'; 3='3. Native American'; 4:6='4. Asian/PI'; else = '5. Other, or more than one race'", as.factor=T)
# Recoding poverty
data$pov_b <-Recode(data$POVERTY, recodes="000=NA; 1:100='Below Poverty'; else='Above Poverty'", as.factor=T)
Working in reverse order for the occupation categories, this is what was found from descriptive statistics. There are more young adults working in Production/Transportation than other occupations, and these are also the occupations where people are more likely to be working overtime (this is anecdotally heard from reports of truck drivers for fracking jobs driving extremely long hours).
The Resources/Construction/Maintenace occupations is where “roughnecks” are classified (those working on the rigs, described as “Derrick, rotary drill, and service unit operators, and roustabouts, oil, gas, and mining”). These jobs are also likely to have more people working overtime, which could make sense as folks are likely to be living on the rigs.
Both of the above occupations are also more likely to be Hispanic/Latino, in addition to service occupations. However, white workers are the majority of the workforce for all occupations in oil and gas.
#using survey design
library(tableone)
library(survey)
options(survey.lonely.psu = "adjust")
des <- svydesign(id=~CLUSTER, strata=~STRATA, weights=~PERWT, data=data)
st1<-svyCreateTableOne(vars = c("age_c","fulltime","move","hisp","race_c","pov_b"), strata = "occ_c", test = T, data = des)
#st1<-print(st1, format="p")
print(st1, format="p")
Stratified by occ_c
1. Mgmt 2. Svcs 3. Sales/Off
n 53580.0 1050.0 11067.0
age_c (%)
1. Young Adult 16.1 13.4 10.5
2. Working Age 58.5 56.9 56.5
3. Pre-Retirement 19.3 17.5 22.3
4. Retirement Age 6.1 12.2 10.7
fulltime (%)
1. Part Time 2.9 16.8 9.0
2. Full Time 69.7 67.3 74.6
3. Overtime 27.4 16.0 16.3
move = 2. Did not Move (%) 84.1 97.9 88.0
hisp = 2. Hispanic/Latino (%) 10.1 32.3 17.0
race_c (%)
1. White 82.2 77.2 86.2
2. Black 4.9 18.6 8.2
3. Native American 0.5 0.0 1.0
4. Asian/PI 9.6 2.1 0.9
5. Other, or more than one race 2.9 2.1 3.8
pov_b = Below Poverty (%) 1.8 11.6 2.8
Stratified by occ_c 4. Res/Const/Maint 5. Prod/Transp p
n 16619.0 10261.0
age_c (%) 0.014 1. Young Adult 12.5 18.5
2. Working Age 59.6 57.5
3. Pre-Retirement 22.2 18.5
4. Retirement Age 5.7 5.6
fulltime (%) <0.001 1. Part Time 3.6 4.2
2. Full Time 47.9 55.2
3. Overtime 48.5 40.7
move = 2. Did not Move (%) 87.3 89.1 0.008 hisp = 2. Hispanic/Latino (%) 22.9 22.6 <0.001 race_c (%) <0.001 1. White 85.9 81.4
2. Black 7.4 9.6
3. Native American 0.8 1.3
4. Asian/PI 1.6 1.8
5. Other, or more than one race 4.2 5.9
pov_b = Below Poverty (%) 5.5 3.4 <0.001 Stratified by occ_c test n
age_c (%)
1. Young Adult
2. Working Age
3. Pre-Retirement
4. Retirement Age
fulltime (%)
1. Part Time
2. Full Time
3. Overtime
move = 2. Did not Move (%)
hisp = 2. Hispanic/Latino (%)
race_c (%)
1. White
2. Black
3. Native American
4. Asian/PI
5. Other, or more than one race
pov_b = Below Poverty (%)
As noted in class, there is not a survey corrected multinomal model, so this will use vglm.
#fitting multinomial model
library(VGAM)
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following object is masked from 'package:survey':
##
## calibrate
## The following object is masked from 'package:car':
##
## logit
mfit2<-vglm(occ_c~age_c+fulltime+move+hisp+race_c+pov_b,
family=multinomial(refLevel = 1),
data=data,
weights=PERWT/mean(PERWT, na.rm=T))
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 2 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 2 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 2 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 44 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 97 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 107 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 110 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control$wzepsilon):
## 110 diagonal elements of the working weights variable 'wz' have been replaced by
## 1.819e-12
summary(mfit2)
##
## Call:
## vglm(formula = occ_c ~ age_c + fulltime + move + hisp + race_c +
## pov_b, family = multinomial(refLevel = 1), data = data, weights = PERWT/mean(PERWT,
## na.rm = T))
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## log(mu[,2]/mu[,1]) -0.9782 -0.1068 -0.02645 -2.841e-05 16.341
## log(mu[,3]/mu[,1]) -1.6042 -0.4156 -0.26897 -1.135e-01 17.511
## log(mu[,4]/mu[,1]) -2.0456 -0.4902 -0.32687 -1.513e-01 6.733
## log(mu[,5]/mu[,1]) -1.7602 -0.3987 -0.26681 -1.345e-01 13.946
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -18.38270 716.25930 NA NA
## (Intercept):2 -1.33245 0.27479 -4.849 1.24e-06
## (Intercept):3 -1.76282 0.28141 -6.264 3.74e-10
## (Intercept):4 -1.93071 0.31732 -6.084 1.17e-09
## age_c2. Working Age:1 0.14788 0.50684 0.292 0.770464
## age_c2. Working Age:2 0.54027 0.17078 3.164 0.001559
## age_c2. Working Age:3 0.22921 0.13232 1.732 0.083233
## age_c2. Working Age:4 -0.28985 0.14053 -2.063 0.039152
## age_c3. Pre-Retirement:1 0.22578 0.59148 0.382 0.702669
## age_c3. Pre-Retirement:2 0.68488 0.19577 3.498 0.000468
## age_c3. Pre-Retirement:3 0.37542 0.15722 2.388 0.016944
## age_c3. Pre-Retirement:4 -0.27051 0.17831 -1.517 0.129236
## age_c4. Retirement Age:1 -15.77341 1096.50335 NA NA
## age_c4. Retirement Age:2 0.81660 0.27165 3.006 0.002647
## age_c4. Retirement Age:3 0.33594 0.25862 1.299 0.193950
## age_c4. Retirement Age:4 -0.06226 0.28959 -0.215 0.829776
## fulltime2. Full Time:1 -2.13011 0.47544 -4.480 7.45e-06
## fulltime2. Full Time:2 -0.97812 0.21312 -4.589 4.44e-06
## fulltime2. Full Time:3 -0.42357 0.24562 -1.724 0.084618
## fulltime2. Full Time:4 -0.43499 0.27463 -1.584 0.113213
## fulltime3. Overtime:1 -2.70452 0.59285 -4.562 5.07e-06
## fulltime3. Overtime:2 -1.65464 0.23922 -6.917 4.62e-12
## fulltime3. Overtime:3 0.46955 0.24880 1.887 0.059127
## fulltime3. Overtime:4 0.13945 0.28050 0.497 0.619088
## move2. Did not Move:1 15.98028 716.25914 0.022 0.982200
## move2. Did not Move:2 0.23109 0.16134 1.432 0.152064
## move2. Did not Move:3 0.31493 0.13436 2.344 0.019083
## move2. Did not Move:4 0.61779 0.17128 3.607 0.000310
## hisp2. Hispanic/Latino:1 1.58397 0.39241 4.037 5.43e-05
## hisp2. Hispanic/Latino:2 0.54764 0.14719 3.720 0.000199
## hisp2. Hispanic/Latino:3 0.94881 0.11641 8.151 3.61e-16
## hisp2. Hispanic/Latino:4 0.88319 0.13754 6.421 1.35e-10
## race_c2. Black:1 2.01824 0.41929 4.813 1.48e-06
## race_c2. Black:2 0.65110 0.20000 3.256 0.001132
## race_c2. Black:3 0.53866 0.17927 3.005 0.002657
## race_c2. Black:4 0.67499 0.20330 3.320 0.000900
## race_c3. Native American:1 -14.50870 2438.78508 NA NA
## race_c3. Native American:2 1.06079 0.54002 1.964 0.049490
## race_c3. Native American:3 -0.19757 0.65518 -0.302 0.762992
## race_c3. Native American:4 1.05683 0.50797 2.081 0.037479
## race_c4. Asian/PI:1 -16.88339 1829.61548 NA NA
## race_c4. Asian/PI:2 -2.68843 0.57081 -4.710 2.48e-06
## race_c4. Asian/PI:3 -1.38825 0.28931 -4.798 1.60e-06
## race_c4. Asian/PI:4 -1.42768 0.36210 -3.943 8.05e-05
## race_c5. Other, or more than one race:1 -15.62461 1046.93480 NA NA
## race_c5. Other, or more than one race:2 0.11574 0.27205 0.425 0.670508
## race_c5. Other, or more than one race:3 -0.12996 0.23131 -0.562 0.574226
## race_c5. Other, or more than one race:4 0.32507 0.23812 1.365 0.172198
## pov_bBelow Poverty:1 -15.24562 1273.82917 NA NA
## pov_bBelow Poverty:2 0.33616 0.40948 0.821 0.411677
## pov_bBelow Poverty:3 1.42926 0.29064 4.918 8.76e-07
## pov_bBelow Poverty:4 1.08052 0.34229 3.157 0.001596
##
## (Intercept):1
## (Intercept):2 ***
## (Intercept):3 ***
## (Intercept):4 ***
## age_c2. Working Age:1
## age_c2. Working Age:2 **
## age_c2. Working Age:3 .
## age_c2. Working Age:4 *
## age_c3. Pre-Retirement:1
## age_c3. Pre-Retirement:2 ***
## age_c3. Pre-Retirement:3 *
## age_c3. Pre-Retirement:4
## age_c4. Retirement Age:1
## age_c4. Retirement Age:2 **
## age_c4. Retirement Age:3
## age_c4. Retirement Age:4
## fulltime2. Full Time:1 ***
## fulltime2. Full Time:2 ***
## fulltime2. Full Time:3 .
## fulltime2. Full Time:4
## fulltime3. Overtime:1 ***
## fulltime3. Overtime:2 ***
## fulltime3. Overtime:3 .
## fulltime3. Overtime:4
## move2. Did not Move:1
## move2. Did not Move:2
## move2. Did not Move:3 *
## move2. Did not Move:4 ***
## hisp2. Hispanic/Latino:1 ***
## hisp2. Hispanic/Latino:2 ***
## hisp2. Hispanic/Latino:3 ***
## hisp2. Hispanic/Latino:4 ***
## race_c2. Black:1 ***
## race_c2. Black:2 **
## race_c2. Black:3 **
## race_c2. Black:4 ***
## race_c3. Native American:1
## race_c3. Native American:2 *
## race_c3. Native American:3
## race_c3. Native American:4 *
## race_c4. Asian/PI:1
## race_c4. Asian/PI:2 ***
## race_c4. Asian/PI:3 ***
## race_c4. Asian/PI:4 ***
## race_c5. Other, or more than one race:1
## race_c5. Other, or more than one race:2
## race_c5. Other, or more than one race:3
## race_c5. Other, or more than one race:4
## pov_bBelow Poverty:1
## pov_bBelow Poverty:2
## pov_bBelow Poverty:3 ***
## pov_bBelow Poverty:4 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: log(mu[,2]/mu[,1]), log(mu[,3]/mu[,1]),
## log(mu[,4]/mu[,1]), log(mu[,5]/mu[,1])
##
## Residual deviance: 8976.504 on 16244 degrees of freedom
##
## Log-likelihood: -4488.252 on 16244 degrees of freedom
##
## Number of Fisher scoring iterations: 18
##
## Warning: Hauck-Donner effect detected in the following estimate(s):
## '(Intercept):1', 'age_c4. Retirement Age:1', 'race_c3. Native American:1', 'race_c4. Asian/PI:1', 'race_c4. Asian/PI:2', 'race_c5. Other, or more than one race:1', 'pov_bBelow Poverty:1'
##
##
## Reference group is level 1 of the response
#calculate odds ratios
round(exp(coef(mfit2)), 3)
## (Intercept):1 (Intercept):2
## 0.000 0.264
## (Intercept):3 (Intercept):4
## 0.172 0.145
## age_c2. Working Age:1 age_c2. Working Age:2
## 1.159 1.716
## age_c2. Working Age:3 age_c2. Working Age:4
## 1.258 0.748
## age_c3. Pre-Retirement:1 age_c3. Pre-Retirement:2
## 1.253 1.984
## age_c3. Pre-Retirement:3 age_c3. Pre-Retirement:4
## 1.456 0.763
## age_c4. Retirement Age:1 age_c4. Retirement Age:2
## 0.000 2.263
## age_c4. Retirement Age:3 age_c4. Retirement Age:4
## 1.399 0.940
## fulltime2. Full Time:1 fulltime2. Full Time:2
## 0.119 0.376
## fulltime2. Full Time:3 fulltime2. Full Time:4
## 0.655 0.647
## fulltime3. Overtime:1 fulltime3. Overtime:2
## 0.067 0.191
## fulltime3. Overtime:3 fulltime3. Overtime:4
## 1.599 1.150
## move2. Did not Move:1 move2. Did not Move:2
## 8712602.437 1.260
## move2. Did not Move:3 move2. Did not Move:4
## 1.370 1.855
## hisp2. Hispanic/Latino:1 hisp2. Hispanic/Latino:2
## 4.874 1.729
## hisp2. Hispanic/Latino:3 hisp2. Hispanic/Latino:4
## 2.583 2.419
## race_c2. Black:1 race_c2. Black:2
## 7.525 1.918
## race_c2. Black:3 race_c2. Black:4
## 1.714 1.964
## race_c3. Native American:1 race_c3. Native American:2
## 0.000 2.889
## race_c3. Native American:3 race_c3. Native American:4
## 0.821 2.877
## race_c4. Asian/PI:1 race_c4. Asian/PI:2
## 0.000 0.068
## race_c4. Asian/PI:3 race_c4. Asian/PI:4
## 0.250 0.240
## race_c5. Other, or more than one race:1 race_c5. Other, or more than one race:2
## 0.000 1.123
## race_c5. Other, or more than one race:3 race_c5. Other, or more than one race:4
## 0.878 1.384
## pov_bBelow Poverty:1 pov_bBelow Poverty:2
## 0.000 1.400
## pov_bBelow Poverty:3 pov_bBelow Poverty:4
## 4.176 2.946