Ordinal logistic regression
Following the previous homework where we explore the relation of violence in childhood and age of sexual initiation, now I will use the National Survey of the Determinants of Adolescent Pregnancy 2017 (ENFADEA)to extend beyond Baja California the similar idea using an ordinal logistic regression. The population for this survey where females between 20 and 24 years.
Outcome Variable
# Age of sexual initiation.
# This variable is define from the question about age of sexual initiation from the women who respondend. Since the median age of sexual initiation was 17, I decided to code Age of sexual initiation (eis) as 'before 17 years' (< 17), '17 years' (17) and 'after 17 years' (> 17).
enfadea2$eis<-car::recode(enfadea2$p204, recodes="11:16='1.< 17'; 17='2.17'; 18:24='3.> 17'; else=NA", as.factor.result = T)
enfadea2$eis<-relevel(enfadea2$eis, ref='1.< 17')
freq(enfadea2$eis)
## n % val%
## 1.< 17 922 27.3 33.4
## 2.17 505 14.9 18.3
## 3.> 17 1334 39.5 48.3
## NA 618 18.3 NA
enfadea2$afsr<-car::recode(enfadea2$eis, recodes = "'1.< 17' =1; '2.17'=2; '3.> 17'=3; else=NA", as.factor.result = F)
For the independent variables I’ll use similar variables as in the previous homework: if the respondent considers itself indigenous; education (as none/basic/secundaty/high school/professional); religion (as none/catholic/other); if the respondent grew in an urban context and if the respondent worked as a minor.
Independent Variables: Sociodemographics
#Indigenous
freq(enfadea2$p104)
## n % val%
## [1] Sí 1659 49.1 49.2
## [2] No 1713 50.7 50.8
## NA 7 0.2 NA
enfadea2$indi<-car::recode(enfadea2$p104, recodes="1='1.indigenous'; 2='2.no indigenous'; else=NA", as.factor.result=T)
freq(enfadea2$indi)
## n % val%
## 1.indigenous 1659 49.1 49.2
## 2.no indigenous 1713 50.7 50.8
## NA 7 0.2 NA
enfadea2$indi<-relevel(enfadea2$indi, ref = '1.indigenous')
#Education
freq(enfadea2$p105_n)
## n %
## [0] Ninguno 35 1.0
## [1] Preescolar 3 0.1
## [2] Primaria 362 10.7
## [3] Secundaria 1035 30.6
## [4] Preparatoria o bachillerato 1016 30.1
## [5] Estudios técnicos o comerciales con primaria terminada 8 0.2
## [6] Estudios técnicos o comerciales con secundaria terminada 26 0.8
## [7] Estudios técnicos o comerciales con preparatoria terminada 244 7.2
## [8] Normal con licenciatura 42 1.2
## [9] Licenciatura o profesional 598 17.7
## [10] Posgrado (Especialidad, Maestría o Doctorado) 10 0.3
## val%
## [0] Ninguno 1.0
## [1] Preescolar 0.1
## [2] Primaria 10.7
## [3] Secundaria 30.6
## [4] Preparatoria o bachillerato 30.1
## [5] Estudios técnicos o comerciales con primaria terminada 0.2
## [6] Estudios técnicos o comerciales con secundaria terminada 0.8
## [7] Estudios técnicos o comerciales con preparatoria terminada 7.2
## [8] Normal con licenciatura 1.2
## [9] Licenciatura o profesional 17.7
## [10] Posgrado (Especialidad, Maestría o Doctorado) 0.3
enfadea2$educ<-car::recode(enfadea2$p105_n, recodes="0='1.none'; 1:2='2.basic'; 3='3.secundary'; 4= '4.high school'; 5='2.basic'; 6='3.secundary'; 7='4.high school'; 8:10='5.professional'; else=NA", as.factor.result=T)
freq(enfadea2$educ)
## n % val%
## 1.none 35 1.0 1.0
## 2.basic 373 11.0 11.0
## 3.secundary 1061 31.4 31.4
## 4.high school 1260 37.3 37.3
## 5.professional 650 19.2 19.2
enfadea2$educ<-relevel(enfadea2$educ, ref = '3.secundary')
#Religion
freq(enfadea2$p106)
## n
## [0] Ninguna 226
## [1] Católica 2731
## [2] Cristiana, Presbiteriana, Luz del Mundo 266
## [3] Evangélica 40
## [4] Testigos de Jehová 36
## [5] Pentecostal 33
## [6] Creyente de la Santa Muerte, Palo Mayombe, Budista 22
## [7] Bíblica diferente de Evangélica (Adventista, Sabática, Mormón) 24
## NA 1
## %
## [0] Ninguna 6.7
## [1] Católica 80.8
## [2] Cristiana, Presbiteriana, Luz del Mundo 7.9
## [3] Evangélica 1.2
## [4] Testigos de Jehová 1.1
## [5] Pentecostal 1.0
## [6] Creyente de la Santa Muerte, Palo Mayombe, Budista 0.7
## [7] Bíblica diferente de Evangélica (Adventista, Sabática, Mormón) 0.7
## NA 0.0
## val%
## [0] Ninguna 6.7
## [1] Católica 80.8
## [2] Cristiana, Presbiteriana, Luz del Mundo 7.9
## [3] Evangélica 1.2
## [4] Testigos de Jehová 1.1
## [5] Pentecostal 1.0
## [6] Creyente de la Santa Muerte, Palo Mayombe, Budista 0.7
## [7] Bíblica diferente de Evangélica (Adventista, Sabática, Mormón) 0.7
## NA NA
enfadea2$relig<-car::recode(enfadea2$p106, recodes="0='3.none'; 1='1.catholic'; 2:7='2.other'; else=NA", as.factor.result=T)
freq(enfadea2$relig)
## n % val%
## 1.catholic 2731 80.8 80.8
## 2.other 421 12.5 12.5
## 3.none 226 6.7 6.7
## NA 1 0.0 NA
enfadea2$relig<-relevel(enfadea2$relig, ref = '1.catholic')
#Urban context
freq(enfadea2$p110)
## n % val%
## [1] En un rancho 1000 29.6 29.6
## [2] En un pueblo 1133 33.5 33.6
## [3] En una ciudad 1244 36.8 36.8
## NA 2 0.1 NA
enfadea2$urb<-car::recode(enfadea2$p110, recodes="1:2='2.no urban'; 3='1.urban'; else=NA", as.factor.result=T)
freq(enfadea2$urb)
## n % val%
## 1.urban 1244 36.8 36.8
## 2.no urban 2133 63.1 63.2
## NA 2 0.1 NA
enfadea2$urb<-relevel(enfadea2$urb, ref = '1.urban')
#Worked as a minor
freq(enfadea2$p144)
## n % val%
## 10 65 1.9 2.5
## 11 13 0.4 0.5
## 12 62 1.8 2.4
## 13 78 2.3 3.0
## 14 119 3.5 4.6
## 15 294 8.7 11.4
## 16 341 10.1 13.2
## 17 365 10.8 14.2
## 18 489 14.5 19.0
## 19 315 9.3 12.2
## 20 246 7.3 9.6
## 21 95 2.8 3.7
## 22 55 1.6 2.1
## 23 30 0.9 1.2
## 24 8 0.2 0.3
## NA 804 23.8 NA
summary(enfadea2$p144)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 10.00 16.00 17.00 17.12 19.00 24.00 804
enfadea2$work<-car::recode(enfadea2$p144, recodes="10:17='1.worked <17'; 18:24='2.worked >17'; else=NA", as.factor.result=T)
freq(enfadea2$work)
## n % val%
## 1.worked <17 1337 39.6 51.9
## 2.worked >17 1238 36.6 48.1
## NA 804 23.8 NA
enfadea2$work<-relevel(enfadea2$work, ref = '1.worked <17')
For this set of variables I’ll use the questions about insults, hits, inappropriate touching and sexual abuse for those respondents who suffered the previous in their adolescense or childhood.
Independent Variables: Violence in childhood
#Insults
freq(enfadea2$p601_a)
## n % val%
## [1] Muy frecuente 176 5.2 5.2
## [2] Poco frecuente 280 8.3 8.3
## [3] Una vez 99 2.9 2.9
## [4] No ocurrió 2819 83.4 83.6
## NA 5 0.1 NA
enfadea2$insults<-car::recode(enfadea2$p601_a, recodes="1:3='1.Yes_insults'; 4='2.No_insults'; else=NA", as.factor.result=T)
freq(enfadea2$insults)
## n % val%
## 1.Yes_insults 555 16.4 16.4
## 2.No_insults 2819 83.4 83.6
## NA 5 0.1 NA
enfadea2$insults<-relevel(enfadea2$insults, ref = '1.Yes_insults')
#Hits
freq(enfadea2$p601_b)
## n % val%
## [1] Muy frecuente 145 4.3 4.3
## [2] Poco frecuente 213 6.3 6.3
## [3] Una vez 105 3.1 3.1
## [4] No ocurrió 2912 86.2 86.3
## NA 4 0.1 NA
enfadea2$hits<-car::recode(enfadea2$p601_b, recodes="1:3='1.Yes_hits'; 4='2.No_hits'; else=NA", as.factor.result=T)
freq(enfadea2$hits)
## n % val%
## 1.Yes_hits 463 13.7 13.7
## 2.No_hits 2912 86.2 86.3
## NA 4 0.1 NA
enfadea2$hits<-relevel(enfadea2$hits, ref = '1.Yes_hits')
freq(enfadea2$p601_e)
## n % val%
## [1] Muy frecuente 29 0.9 0.9
## [2] Poco frecuente 18 0.5 0.5
## [3] Una vez 34 1.0 1.0
## [4] No ocurrió 3294 97.5 97.6
## NA 4 0.1 NA
enfadea2$touch<-car::recode(enfadea2$p601_e, recodes="1:3='1.Yes_touching'; 4='2.No_touching'; else=NA", as.factor.result=T)
freq(enfadea2$touch)
## n % val%
## 1.Yes_touching 81 2.4 2.4
## 2.No_touching 3294 97.5 97.6
## NA 4 0.1 NA
enfadea2$touch<-relevel(enfadea2$touch, ref = '1.Yes_touching')
freq(enfadea2$p601_f)
## n % val%
## [1] Muy frecuente 12 0.4 0.4
## [2] Poco frecuente 13 0.4 0.4
## [3] Una vez 19 0.6 0.6
## [4] No ocurrió 3330 98.5 98.7
## NA 5 0.1 NA
enfadea2$sexabuse<-car::recode(enfadea2$p601_f, recodes="1:3='1.Yes_abuse'; 4='2.No_abuse'; else=NA", as.factor.result=T)
freq(enfadea2$sexabuse)
## n % val%
## 1.Yes_abuse 44 1.3 1.3
## 2.No_abuse 3330 98.5 98.7
## NA 5 0.1 NA
enfadea2$sexabuse<-relevel(enfadea2$sexabuse, ref = '1.Yes_abuse')
I kept the variables of interest in a separate objec and created the survey design.
enfadea3<-enfadea2%>%
select(eis, afsr, indi, educ, relig, urb, work, insults, hits, touch, sexabuse, pon_muj)%>%
filter(complete.cases(.))
options(survey.lonely.psu = "adjust")
des<-svydesign(ids = ~1, weights = enfadea3$pon_muj, data = enfadea3)
The model ord1 uses the first set of independent variables. The AIC were calculated using what we saw in class. The results are from the proportional odds model. We see that compared with the indigenous women, no indigenous women have lower odds of have had their first sexual relation before 17. Women with basic, high school or professional education have higher odds of have had their first sexual relation before 11 compared with women with secundary education. Women with no religion, compared with catholic women, have lower odds of have had their first sexual relation before 17.
ord1<-svyolr(eis~indi+educ+relig+urb+work, des)
summary(ord1)
## Call:
## svyolr(eis ~ indi + educ + relig + urb + work, des)
##
## Coefficients:
## Value Std. Error t value
## indi2.no indigenous -0.1741332 0.1764248 -0.9870112
## educ1.none -0.8952040 0.5825528 -1.5366916
## educ2.basic 0.2567913 0.3045023 0.8433149
## educ4.high school 0.7682987 0.1856980 4.1373565
## educ5.professional 1.7717252 0.2335937 7.5846441
## relig2.other 0.1956024 0.2168599 0.9019758
## relig3.none -0.4435646 0.2307986 -1.9218687
## urb2.no urban 0.2784466 0.1770191 1.5729752
## work2.worked >17 0.5777754 0.1628626 3.5476247
##
## Intercepts:
## Value Std. Error t value
## 1.< 17|2.17 0.0989 0.2050 0.4822
## 2.17|3.> 17 0.9229 0.2084 4.4278
or.ord1<-exp(ord1$coefficients); or.ord1
## indi2.no indigenous educ1.none educ2.basic
## 0.8401850 0.4085242 1.2927753
## educ4.high school educ5.professional relig2.other
## 2.1560950 5.8809907 1.2160433
## relig3.none urb2.no urban work2.worked >17
## 0.6417448 1.3210761 1.7820696
#Examining the proportional odds assumption by fitting logits for each change
ex1<-svyglm(I(afsr>1)~indi+educ+relig+urb+work, des, family="binomial")
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
ex2<-svyglm(I(afsr>2)~indi+educ+relig+urb+work, des, family="binomial")
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
plot(coef(ex1)[-1], ylim=c(-3, 3), type="l",xaxt="n", ylab="Beta", main=c("Comparison of betas for proportional odds assumption"))
lines(coef(ex2)[-1], col=1, lty=2)
axis(side=1, at=1:9, labels=F)
text(x=1:9, y=-4, srt = 90, pos = 1, xpd = TRUE,cex=.7,
labels = c( "No indigenous", "None","Basic","High school","Professional","Other religion", "No religion", "Non urban", "Worked > 17"))
legend("topright",col=c(1,1),lty=c(1,2), legend=c("> 1", "> 2"))
lines(coef(ord1)[c(-11:-12)], col=4, lwd=3)

#Odds ratios
round(exp(rbind(coef(ex1)[-1], coef(ex2)[-1])),3)
## indi2.no indigenous educ1.none educ2.basic educ4.high school
## [1,] 0.918 0.463 1.367 2.257
## [2,] 0.790 0.281 1.196 2.080
## educ5.professional relig2.other relig3.none urb2.no urban
## [1,] 6.689 1.000 0.624 1.230
## [2,] 5.643 1.293 0.640 1.418
## work2.worked >17
## [1,] 1.530
## [2,] 1.976
The model ord2 incorporates the second set of independent variables. From the odds ratios we observed similar results to the ord1. Women who do not suffered from hits have higher odds of have had their first sexual relation before 17 compared with women who suffered from hits.
ord2<-svyolr(eis~indi+educ+relig+urb+work+insults+hits+touch+sexabuse, des)
summary(ord2)
## Call:
## svyolr(eis ~ indi + educ + relig + urb + work + insults + hits +
## touch + sexabuse, des)
##
## Coefficients:
## Value Std. Error t value
## indi2.no indigenous -0.15419013 0.1779343 -0.8665568
## educ1.none -0.88398748 0.6094230 -1.4505319
## educ2.basic 0.29726473 0.2980361 0.9974117
## educ4.high school 0.77760966 0.1878587 4.1393337
## educ5.professional 1.77797995 0.2384983 7.4548960
## relig2.other 0.22876240 0.2224274 1.0284813
## relig3.none -0.40437195 0.2189503 -1.8468662
## urb2.no urban 0.27755280 0.1768984 1.5689957
## work2.worked >17 0.54227729 0.1657857 3.2709542
## insults2.No_insults -0.01306457 0.2063189 -0.0633222
## hits2.No_hits 0.33822008 0.2257994 1.4978782
## touch2.No_touching -0.46349954 0.6001061 -0.7723627
## sexabuse2.No_abuse 1.63088580 0.8265460 1.9731339
##
## Intercepts:
## Value Std. Error t value
## 1.< 17|2.17 1.5143 0.7487 2.0225
## 2.17|3.> 17 2.3482 0.7579 3.0983
or.ord2<-exp(ord2$coefficients); or.ord2
## indi2.no indigenous educ1.none educ2.basic
## 0.8571090 0.4131323 1.3461716
## educ4.high school educ5.professional relig2.other
## 2.1762640 5.9178899 1.2570433
## relig3.none urb2.no urban work2.worked >17
## 0.6673958 1.3198958 1.7199192
## insults2.No_insults hits2.No_hits touch2.No_touching
## 0.9870204 1.4024491 0.6290783
## sexabuse2.No_abuse
## 5.1083977
#AIC for ord1 and ord2
ord1$deviance+2*length(ord1$coefficients)
## [1] 3993.275
ord2$deviance+2*length(ord2$coefficients)
## [1] 3971.915
#Examining the proportional odds assumption by fitting logits for each change
ex11<-svyglm(I(afsr>1)~indi+educ+relig+urb+work+insults+hits+touch+sexabuse, des, family="binomial")
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
ex22<-svyglm(I(afsr>2)~indi+educ+relig+urb+work+insults+hits+touch+sexabuse, des, family="binomial")
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
plot(coef(ex11)[-1], ylim=c(-3, 3), type="l",xaxt="n", ylab="Beta", main=c("Comparison of betas for proportional odds assumption"))
lines(coef(ex22)[-1], col=1, lty=2)
axis(side=1, at=1:13, labels=F)
text(x=1:13, y=-4, srt = 90, pos = 1, xpd = TRUE,cex=.7,
labels = c("No indigenous", "None","Basic","High school","Professional","Other religion", "No religion", "Non urban", "Worked > 17", "No insults", "No hits", "No inapp touching", "No sexual abuse"))
legend("topleft", col=c(1,1),lty=c(1,2), legend=c("> 1", "> 2"))
lines(coef(ord2)[c(-12:-13)], col=4, lwd=3)

#Odds ratios
round(exp(rbind(coef(ex11)[-1], coef(ex22)[-1])),3)
## indi2.no indigenous educ1.none educ2.basic educ4.high school
## [1,] 0.955 0.456 1.415 2.266
## [2,] 0.799 0.287 1.252 2.123
## educ5.professional relig2.other relig3.none urb2.no urban
## [1,] 6.697 1.037 0.678 1.247
## [2,] 5.796 1.343 0.648 1.405
## work2.worked >17 insults2.No_insults hits2.No_hits touch2.No_touching
## [1,] 1.466 1.057 1.365 0.782
## [2,] 1.928 0.996 1.436 0.479
## sexabuse2.No_abuse
## [1,] 5.080
## [2,] 3.197
It seems that when I included the variables of violence in childhood the proportional assumption doesn’t hold. I fitted the model with the non-proportional odds but the AIC was higher than the proportional assumption. I believe I am caught in a dilemma.
library(VGAM)
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following object is masked from 'package:survey':
##
## calibrate
## The following object is masked from 'package:car':
##
## logit
#Non-proportional odds for ord2
cum.vgam.ord2<-vglm(as.ordered(eis)~indi+educ+relig+urb+work+insults+hits+touch+sexabuse, enfadea3, weights = pon_muj>0/mean(pon_muj, na.rm=T), family = cumulative(parallel =F, reverse = T))
summary(cum.vgam.ord2)
##
## Call:
## vglm(formula = as.ordered(eis) ~ indi + educ + relig + urb +
## work + insults + hits + touch + sexabuse, family = cumulative(parallel = F,
## reverse = T), data = enfadea3, weights = pon_muj > 0/mean(pon_muj,
## na.rm = T))
##
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## logit(P[Y>=2]) -699 -0.9208 0.2929 0.4640 4.554e+07
## logit(P[Y>=3]) -49286889 -0.5701 0.4367 0.8112 7.463e+02
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -1.308499 0.081687 -16.018 < 2e-16 ***
## (Intercept):2 -1.278678 0.080312 -15.921 < 2e-16 ***
## indi2.no indigenous:1 -0.214090 0.025295 -8.464 < 2e-16 ***
## indi2.no indigenous:2 -0.206605 0.025242 -8.185 2.73e-16 ***
## educ1.none:1 -1.276573 0.560298 -2.278 0.022704 *
## educ1.none:2 -0.739398 0.544732 NA NA
## educ2.basic:1 0.065431 0.007062 9.265 < 2e-16 ***
## educ2.basic:2 0.120935 0.004533 26.678 < 2e-16 ***
## educ4.high school:1 0.966234 0.092812 10.411 < 2e-16 ***
## educ4.high school:2 0.750416 0.091593 8.193 2.55e-16 ***
## educ5.professional:1 1.426947 0.091836 15.538 < 2e-16 ***
## educ5.professional:2 1.409451 0.091886 15.339 < 2e-16 ***
## relig2.other:1 -0.041143 0.031068 -1.324 0.185408
## relig2.other:2 0.072853 0.023243 3.134 0.001722 **
## relig3.none:1 -0.651858 0.082476 -7.904 2.71e-15 ***
## relig3.none:2 -0.392144 0.077642 -5.051 4.40e-07 ***
## urb2.no urban:1 0.256943 0.030042 8.553 < 2e-16 ***
## urb2.no urban:2 0.283061 0.028408 9.964 < 2e-16 ***
## work2.worked >17:1 0.224403 0.074132 NA NA
## work2.worked >17:2 0.485538 0.075169 6.459 1.05e-10 ***
## insults2.No_insults:1 0.071844 0.126629 NA NA
## insults2.No_insults:2 0.006314 0.124671 0.051 0.959611
## hits2.No_hits:1 0.486516 0.133182 3.653 0.000259 ***
## hits2.No_hits:2 0.381294 0.130819 2.915 0.003561 **
## touch2.No_touching:1 -0.212641 0.173450 -1.226 0.220216
## touch2.No_touching:2 -0.427436 0.157748 NA NA
## sexabuse2.No_abuse:1 1.146483 0.175105 6.547 5.85e-11 ***
## sexabuse2.No_abuse:2 0.585043 0.172906 3.384 0.000715 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 2
##
## Names of linear predictors: logit(P[Y>=2]), logit(P[Y>=3])
##
## Residual deviance: 4752.131 on 4246 degrees of freedom
##
## Log-likelihood: NA on 4246 degrees of freedom
##
## Number of iterations: 2
##
## Warning: Hauck-Donner effect detected in the following estimate(s):
## '(Intercept):2', 'indi2.no indigenous:2', 'educ1.none:2', 'educ2.basic:1', 'educ4.high school:1', 'educ5.professional:1', 'relig3.none:2', 'urb2.no urban:1', 'work2.worked >17:1', 'insults2.No_insults:1', 'hits2.No_hits:1', 'touch2.No_touching:2', 'sexabuse2.No_abuse:1'
##
## Exponentiated coefficients:
## indi2.no indigenous:1 indi2.no indigenous:2 educ1.none:1
## 0.8072754 0.8133411 0.2789917
## educ1.none:2 educ2.basic:1 educ2.basic:2
## 0.4774014 1.0676187 1.1285515
## educ4.high school:1 educ4.high school:2 educ5.professional:1
## 2.6280281 2.1178804 4.1659599
## educ5.professional:2 relig2.other:1 relig2.other:2
## 4.0937069 0.9596922 1.0755725
## relig3.none:1 relig3.none:2 urb2.no urban:1
## 0.5210768 0.6756071 1.2929710
## urb2.no urban:2 work2.worked >17:1 work2.worked >17:2
## 1.3271863 1.2515749 1.6250483
## insults2.No_insults:1 insults2.No_insults:2 hits2.No_hits:1
## 1.0744878 1.0063335 1.6266393
## hits2.No_hits:2 touch2.No_touching:1 touch2.No_touching:2
## 1.4641776 0.8084463 0.6521790
## sexabuse2.No_abuse:1 sexabuse2.No_abuse:2
## 3.1471059 1.7950675