Problem: Replicate the analysis reported in the bullying example.
Data: Two measures of bullying behavior have been constructed. The first uses self-report on a 9-item Illinois Bully Scale. The second is peer nomination in which children list any person they perceive as a bully. The total number of nominations a person receives is a measure of bullying behavior.
Column 1: Score on the Illinois Bully Scale
Column 2: Total number of peer nominations
# Loading package
pacman::p_load(tidyr, tidyverse, nlme, car, afex, emmeans, lattice, faraway, boot, ggplot2, kableExtra, MASS, pscl, nnet, mgcv,VGAM)
# input data
dta1 <- read.table("bullying.txt", h=T)
## Score Nomination
## 1 1.56 0
## 2 1.56 0
## 3 1.11 0
## 4 1.56 0
## 5 1.22 4
## 6 1.33 0
## 'data.frame': 291 obs. of 2 variables:
## $ Score : num 1.56 1.56 1.11 1.56 1.22 1.33 1.11 1 1.11 1.22 ...
## $ Nomination: int 0 0 0 0 4 0 0 0 0 19 ...
## [1] 0.5635739
## Score Nomination
## 1.658110 2.487973
MOre than half without nominations (0.56).
The mean score of the Illinois Bully Scale is 1.66.
## Score Nomination
## 0.4842037 41.2645100
The variance of score is 0.48.
The variance of nomination is 41.26.
Histogram of number of nomination
# use the Freedman-Diaconis rule for bin width
bd <- 2*IQR(dta1$Nomination)/(dim(dta1)[1]^(1/3))
# theme set
ot <- theme_set(theme_bw())
# histogram of nomination
ggplot(dta1, aes(Nomination)) +
stat_bin(binwidth = bd) +
labs(x = "Number of nominations", y = "Count")
It seems most of the nomination number is located 0 ~ 10.
# Poisson fit
ggplot(dta1, aes(Score, Nomination)) +
geom_jitter(alpha = 0.2) +
stat_smooth(method = "glm", method.args = list(family = poisson)) +
labs(y = "Number of peer nomination", x = "Bully score")
## `geom_smooth()` using formula 'y ~ x'
Most of the subjects without nomination but within them, some of them is hidden bully (0 nomination with high bully score). Higher bully score more likely to be reported as bully by peer.
##
## Call:
## glm(formula = Nomination ~ Score, family = poisson, data = dta1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.4995 -1.8246 -1.5952 -0.1552 11.6806
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.66308 0.08871 -7.475 7.75e-14 ***
## Score 0.81434 0.03504 23.239 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 2205.2 on 290 degrees of freedom
## Residual deviance: 1774.6 on 289 degrees of freedom
## AIC: 2160.3
##
## Number of Fisher Scoring iterations: 6
 | Nomination | ||
---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p |
(Intercept) | 0.52 | 0.43 – 0.61 | <0.001 |
Score | 2.26 | 2.11 – 2.42 | <0.001 |
For those who with score increasing a unit, the count of nomination will be 2.26 times.
##
## Call:
## zeroinfl(formula = Nomination ~ Score | Score, data = dta1)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -2.37068 -0.68873 -0.59685 -0.05357 9.87491
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.49604 0.09987 4.967 6.81e-07 ***
## Score 0.59609 0.03942 15.120 < 2e-16 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.4967 0.3463 4.322 1.54e-05 ***
## Score -0.7716 0.1967 -3.924 8.72e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 8
## Log-likelihood: -780.6 on 4 Df
 | Nomination | |||
---|---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p | |
Count Model | ||||
(Intercept) | 1.64 | 1.35 – 2.00 | <0.001 | |
Score | 1.82 | 1.68 – 1.96 | <0.001 | |
Zero-Inflated Model | ||||
(Intercept) | 4.47 | 2.27 – 8.81 | <0.001 | |
Score | 0.46 | 0.31 – 0.68 | <0.001 |
The probability of nomination decreases with score by a rate of between 0.31 to 0.69 per score.
# zero-inflated negative binomial
m2 <- zeroinfl(Nomination ~ Score | Score,
data = dta1,
dist = "negbin")
#, EM = TRUE
summary(m2)
##
## Call:
## zeroinfl(formula = Nomination ~ Score | Score, data = dta1, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.58338 -0.48698 -0.38705 0.02297 6.96982
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.6737 0.4354 -1.547 0.122
## Score 0.8819 0.2052 4.297 1.73e-05 ***
## Log(theta) -1.0658 0.1792 -5.947 2.73e-09 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.954 2.484 1.189 0.234
## Score -3.419 2.260 -1.513 0.130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 0.3445
## Number of iterations in BFGS optimization: 21
## Log-likelihood: -494.7 on 5 Df
 | Nomination | |||
---|---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p | |
Count Model | ||||
(Intercept) | 0.51 | 0.22 – 1.20 | 0.122 | |
Score | 2.42 | 1.62 – 3.61 | <0.001 | |
Zero-Inflated Model | ||||
(Intercept) | 19.18 | 0.15 – 2494.37 | 0.234 | |
Score | 0.03 | 0.00 – 2.75 | 0.130 |
The probability of nomination decreases with score by a rate of between 0 to 2.75 per score. (?)
 | Nomination | Nomination | Nomination | ||||||
---|---|---|---|---|---|---|---|---|---|
Predictors | Incidence Rate Ratios | CI | p | Incidence Rate Ratios | CI | p | Incidence Rate Ratios | CI | p |
(Intercept) | 0.52 | 0.43 – 0.61 | <0.001 | ||||||
Score | 2.26 | 2.11 – 2.42 | <0.001 | ||||||
(Intercept) | 1.64 | 1.35 – 2.00 | <0.001 | 0.51 | 0.22 – 1.20 | 0.122 | |||
Score | 1.82 | 1.68 – 1.96 | <0.001 | 2.42 | 1.62 – 3.61 | <0.001 | |||
Zero-Inflated Model | |||||||||
(Intercept) | 4.47 | 2.27 – 8.81 | <0.001 | 19.18 | 0.15 – 2494.37 | 0.234 | |||
Score | 0.46 | 0.31 – 0.68 | <0.001 | 0.03 | 0.00 – 2.75 | 0.130 |
## Vuong Non-Nested Hypothesis Test-Statistic:
## (test-statistic is asymptotically distributed N(0,1) under the
## null that the models are indistinguishible)
## -------------------------------------------------------------
## Vuong z-statistic H_A p-value
## Raw -4.891816 model2 > model1 4.9955e-07
## AIC-corrected -4.858930 model2 > model1 5.9011e-07
## BIC-corrected -4.798531 model2 > model1 7.9917e-07
## Vuong Non-Nested Hypothesis Test-Statistic:
## (test-statistic is asymptotically distributed N(0,1) under the
## null that the models are indistinguishible)
## -------------------------------------------------------------
## Vuong z-statistic H_A p-value
## Raw -4.242754 model2 > model1 1.104e-05
## AIC-corrected -4.242754 model2 > model1 1.104e-05
## BIC-corrected -4.242754 model2 > model1 1.104e-05
The m2 (Zero-inflated negative binomial) tend to be the more suitable model
Poisson vs. Negative Binomial
# fortify data with fitted values and residuals
dta1_m <- data.frame(dta1, fit1 = predict(m1, type = "response"),
fit2 = predict(m2, type = "response"),
r1 = resid(m1, type="pearson"),
r2 = resid(m2, type="pearson"))
# residual plots
ggplot(dta1_m, aes(fit1, r1)) +
geom_point(pch=1) +
geom_point(aes(fit2, r2), pch=3) +
geom_hline(yintercept = 0) +
labs(x = "Fitted values", y = "Residuals",
title = "Poisson vs. Negative Binomial")
Symbol o: m1, +: m2
The residual value of m2 compare to m1 is more close to the 0.
# model fit over observations
ggplot(dta1_m2, aes(Score, Nomination)) +
geom_point(pch = 20, alpha = .2) +
geom_line(aes(Score, yhat), col = "magenta", lwd=rel(1)) +
stat_smooth(method = "glm", method.args = list(family = poisson)) +
labs(x = "Bully Score", y = "Predicted number of nomination")
## `geom_smooth()` using formula 'y ~ x'
The estimated value of magenta (negative binomial) line is higher than blue (Possion).
Source: Espelage, D.L., Holt, M.K., & Henkel, R.R. (2003). Examination of Peer-Group Contextual Effects on Aggression During Early Adolescence. Child Development, 74, 205-220.
Problem: Complete the analysis reported in the youth employment example by improving on the R script and adding comments and interpretation.
Data: A sample of 978 young men aged 20 to 22 years was extracted from the National Longitudinal Survey of Youth (NLSY). The outcome of interest is whether a youth’s reported major activity in 1985 was (1) working, (2) in school, or (3) inactive. Six additional variables were also recorded.
Column 1: Employment status: School = In school, Work = Working, Idle = Inactive
Column 2: Ethnicity, Black or Others
Column 3: Family structure, Non-intact or Otherwise
Column 4: Father’s education, College or Otherwise
Column 5: Family income in 1979 in thousands
Column 6: Local unemployment rate in 1980
Column 7: Standardized score on the Armed Services Aptitude Battery Test
## Employment Race Family DadEdu Income Local ASAB
## 1 Work Others Non-Intact Otherwise 0.4909 6.9 -0.110
## 2 School Others Non-Intact Otherwise 0.6940 4.8 0.452
## 3 Work Others Otherwise Otherwise 1.1000 6.5 0.967
## 4 Work Others Non-Intact College 1.5000 3.8 1.667
## 5 Work Others Non-Intact Otherwise 0.2544 6.9 0.000
## 6 School Others Non-Intact Otherwise 0.9391 5.4 0.000
## 'data.frame': 978 obs. of 7 variables:
## $ Employment: chr "Work" "School" "Work" "Work" ...
## $ Race : chr "Others" "Others" "Others" "Others" ...
## $ Family : chr "Non-Intact" "Non-Intact" "Otherwise" "Non-Intact" ...
## $ DadEdu : chr "Otherwise" "Otherwise" "Otherwise" "College" ...
## $ Income : num 0.491 0.694 1.1 1.5 0.254 ...
## $ Local : num 6.9 4.8 6.5 3.8 6.9 5.4 4.2 6.2 6.2 6.9 ...
## $ ASAB : num -0.11 0.452 0.967 1.667 0 ...
# construct a table of frequencies
# get totals for Idle+School and Idle+Work
dta2w <- data.frame(with(dta2, table(Employment, Race, Family, DadEdu))) %>%
pivot_wider(names_from=Employment, values_from=Freq) %>%
mutate(nsi=Idle+School, nwi=Idle+Work) %>%
as.data.frame
# wide to long format
dta2l <- dta2w %>% reshape(direction='long',
varying=c('School', 'nsi', 'Work', 'nwi'),
timevar='Status',
times=c('School', 'Work'),
v.names=c('freq', 'n'))
ggplot(dta2L, aes(DadEdu, mean, color=Status, group=Status)) +
geom_pointrange(aes(ymin=lower, ymax=upper), position=position_dodge(width=.2))+
geom_line(position = position_dodge(width=.2))+
facet_grid(Family ~ Race)+
geom_hline(yintercept = 0, linetype = 3) +
scale_y_continuous(limits=c(-.5,1), breaks=seq(-.5, 1, by=.5))+
labs(x="Father's education",
y="Probability of being at school or at work(Ref=Idle)")+
theme_minimal()+
theme(legend.position=c(.9,.1))
# Multinomial regression
## # weights: 24 (14 variable)
## initial value 1074.442818
## iter 10 value 1021.501931
## iter 20 value 1017.247769
## iter 20 value 1017.247768
## iter 20 value 1017.247768
## final value 1017.247768
## converged
## Call:
## nnet::multinom(formula = Employment ~ ., data = dta2)
##
## Coefficients:
## (Intercept) RaceBlack FamilyOtherwise DadEduOtherwise Income
## School 0.05285612 0.2289663 0.5474412 -0.2413852 0.2683452
## Work 0.77086583 -0.4440719 0.1343576 -0.1796897 0.4071225
## Local ASAB
## School 0.01247790 0.1765461
## Work -0.07114791 0.3078798
##
## Std. Errors:
## (Intercept) RaceBlack FamilyOtherwise DadEduOtherwise Income
## School 0.4104362 0.1963012 0.1861409 0.2354813 0.2090957
## Work 0.4221547 0.2185806 0.1922746 0.2411005 0.2109821
## Local ASAB
## School 0.03460183 0.1065025
## Work 0.03738945 0.1102139
##
## Residual Deviance: 2034.496
## AIC: 2062.496
dta2$ID <- paste(1:978)
dta2$Employment <- relevel(factor(dta2$Employment), ref="Idle")
dta2$Race <- relevel(factor(dta2$Race), ref="Others", levels=c("Black", "Others"))
dta2$Family <- relevel(factor(dta2$Family), ref="Otherwise")
dta2$DadEdu <- relevel(factor(dta2$DadEdu), ref="Otherwise")
dta2 <- dta2 %>%
mutate(Incf = factor(cut(dta2$Income,
breaks=c(quantile(dta2$Income,
probs=seq(0, 1, by=.33))),
labels=c("Low", "Middle","High"), ordered=T)),
ASABf = factor(cut(dta2$ASAB,
breaks=c(quantile(dta2$ASAB,
probs=seq(0, 1, by=.33))),
labels=c("Low", "Middle","High"), ordered=T)),
Localf = factor(cut(dta2$Local,
breaks=c(quantile(dta2$Local,
probs=seq(0, 1, by=.5))),
labels=c("Low","High"), ordered=T,
lowest.inclued=T)),
Work = if_else(dta2$Employment == "Work",1,0),
School = if_else(dta2$Employment == "School",1,0),
Idle = if_else(dta2$Employment == "Idle",1,0))
str(dta2)
## 'data.frame': 978 obs. of 14 variables:
## $ Employment: Factor w/ 3 levels "Idle","School",..: 3 2 3 3 3 2 3 2 2 3 ...
## $ Race : Factor w/ 2 levels "Others","Black": 1 1 1 1 1 1 1 1 1 1 ...
## $ Family : Factor w/ 2 levels "Otherwise","Non-Intact": 2 2 1 2 2 2 1 1 2 1 ...
## $ DadEdu : Factor w/ 2 levels "Otherwise","College": 1 1 1 2 1 1 1 2 2 2 ...
## $ Income : num 0.491 0.694 1.1 1.5 0.254 ...
## $ Local : num 6.9 4.8 6.5 3.8 6.9 5.4 4.2 6.2 6.2 6.9 ...
## $ ASAB : num -0.11 0.452 0.967 1.667 0 ...
## $ ID : chr "1" "2" "3" "4" ...
## $ Incf : Ord.factor w/ 3 levels "Low"<"Middle"<..: 2 2 3 3 1 3 3 3 3 3 ...
## $ ASABf : Ord.factor w/ 3 levels "Low"<"Middle"<..: 2 3 3 NA 2 2 2 3 3 3 ...
## $ Localf : Ord.factor w/ 2 levels "Low"<"High": 1 1 1 1 1 1 1 1 1 1 ...
## $ Work : num 1 0 1 1 1 0 1 0 0 1 ...
## $ School : num 0 1 0 0 0 1 0 1 1 0 ...
## $ Idle : num 0 0 0 0 0 0 0 0 0 0 ...
#aggregate(cbind(Idle, School, Work) ~ Race + Family + DadEdu, data=dta2, FUN = sum, drop = T)
with(dta2, ftable(Race, Family, DadEdu, Employment))
## Employment Idle School Work
## Race Family DadEdu
## Others Otherwise Otherwise 84 161 141
## College 23 66 72
## Non-Intact Otherwise 47 43 54
## College 5 12 18
## Black Otherwise Otherwise 28 60 26
## College 6 8 3
## Non-Intact Otherwise 39 40 27
## College 3 10 2
#aggregate(cbind(Idle, School, Work) ~ Incf + ASABf + Localf, data = dta2, FUN = sum, drop = T)
with(dta2, ftable(Incf, ASABf, Localf, Employment))
## Employment Idle School Work
## Incf ASABf Localf
## Low Low Low 26 39 19
## High 42 34 19
## Middle Low 21 22 15
## High 11 26 10
## High Low 3 13 8
## High 4 5 3
## Middle Low Low 13 20 24
## High 11 24 10
## Middle Low 22 28 36
## High 6 16 13
## High Low 9 22 29
## High 9 19 12
## High Low Low 6 9 10
## High 4 6 5
## Middle Low 18 23 16
## High 4 16 15
## High Low 15 40 59
## High 11 26 27
b <- as.data.frame(with(dta2, ftable(Incf, ASABf, Localf, Employment)))
dta2_fn <- subset(dta2, Employment!="School")%>%
mutate(resp = ifelse(Employment == "Work", 1,0))
dta2_pn <- subset(dta2, Employment!="Work") %>%
mutate(resp = ifelse(Employment == "School",1,0))
m_fn <- glm(resp ~ Race+Family+DadEdu+Income+Local+ASAB,
data=dta2_fn, family=binomial)
m_pn <- glm(resp ~ Race+Family+DadEdu+Income+Local+ASAB,
data=dta2_pn, family=binomial)
sjPlot::tab_model(m_fn, m_pn, show.obs = F, show.r2 = F)
 | resp | resp | ||||
---|---|---|---|---|---|---|
Predictors | Odds Ratios | CI | p | Odds Ratios | CI | p |
(Intercept) | 2.02 | 1.00 – 4.10 | 0.051 | 1.52 | 0.78 – 2.94 | 0.218 |
Race [Black] | 0.64 | 0.41 – 0.99 | 0.046 | 1.25 | 0.85 – 1.84 | 0.265 |
Family [Non-Intact] | 0.90 | 0.62 – 1.33 | 0.609 | 0.57 | 0.40 – 0.82 | 0.003 |
DadEdu [College] | 1.19 | 0.74 – 1.92 | 0.471 | 1.28 | 0.81 – 2.04 | 0.303 |
Income | 1.72 | 1.10 – 2.73 | 0.019 | 1.22 | 0.84 – 1.81 | 0.316 |
Local | 0.92 | 0.85 – 0.99 | 0.025 | 1.01 | 0.95 – 1.09 | 0.708 |
ASAB | 1.34 | 1.07 – 1.67 | 0.010 | 1.22 | 0.99 – 1.51 | 0.067 |
ggplot(subset(a, Employment!="Idle"),
aes(DadEdu, Freq/cumsum(Freq), color=Employment))+
geom_line()+
stat_smooth(aes(DadEdu, Freq/cumsum(Freq)),
method = "glm", method.args = list(family=binomial))+
geom_point()+
facet_grid(Family ~ Race)+
geom_hline(yintercept = 0, linetype = 3) +
scale_y_continuous(limits=c(-1,1), breaks=seq(-1, 1, by=.5))+
labs(x="Father's education", y="Log-odds(Inactive)")
## `geom_smooth()` using formula 'y ~ x'
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
ggplot(subset(b, Employment!="Idle"), aes(ASABf, Freq/cumsum(Freq), color=Employment))+
geom_line()+
stat_smooth(method = "glm", method.args = list(family=binomial))+
geom_point()+
facet_wrap(Incf~Localf, nrow=2)+
geom_hline(yintercept = 0, linetype = 3) +
scale_y_continuous(limits=c(-1,1), breaks=seq(-1, 1, by=.5))+
labs(x="Father's education", y="Log-odds(Inactive)")
## `geom_smooth()` using formula 'y ~ x'
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?
Source: Powers, D.A., & Xie, Y. (2000). Statistical Methods for Categorical Data Analysis. p. 232.
Problem: Reproduce the results of analysis reported in the mental impairment and SES example by adapting the R script in the lecture note and from the rating experiment example.
Data: The mental health status of a sample of 1,660 young New York residents in midtown Manhattan was classified by their parents’ socioeconomic status (A=High, F=Low).
Column 1: Subject ID
Column 2: SES, numerical code A=6, … , F=1
Column 3: SES, reference code for A
Column 4: SES, reference code for B
Column 5: SES, reference code for C
Column 6: SES, reference code for D
Column 7: SES, reference code for E
Column 9: Mental impairment, well=1, mild=2, moderate=3, impaired=4
## Id ses A B C D E mental
## 1 1 6 1 0 0 0 0 1
## 2 2 6 1 0 0 0 0 1
## 3 3 6 1 0 0 0 0 1
## 4 4 6 1 0 0 0 0 1
## 5 5 6 1 0 0 0 0 1
## 6 6 6 1 0 0 0 0 1
## 'data.frame': 1660 obs. of 8 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ ses : int 6 6 6 6 6 6 6 6 6 6 ...
## $ A : int 1 1 1 1 1 1 1 1 1 1 ...
## $ B : int 0 0 0 0 0 0 0 0 0 0 ...
## $ C : int 0 0 0 0 0 0 0 0 0 0 ...
## $ D : int 0 0 0 0 0 0 0 0 0 0 ...
## $ E : int 0 0 0 0 0 0 0 0 0 0 ...
## $ mental: int 1 1 1 1 1 1 1 1 1 1 ...
dta3_f <- dta3 %>%
mutate(mental=dplyr::recode(mental, 'Well', 'Mild', 'Moderate', 'Impaired')) %>%
mutate(mental=factor(mental, levels=c('Well','Mild','Moderate','Impaired')),
ses=factor(ses, levels=c('1', '2', '3', '4', '5', '6'),
labels = c('F', 'E', 'D', 'C', 'B', 'A')))
##
## Call:
## vglm(formula = ordered(mental) ~ ses, family = cumulative(parallel = TRUE),
## data = dta3_f)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -2.0278 0.1346 -15.070 < 2e-16 ***
## (Intercept):2 -0.3285 0.1249 -2.630 0.008531 **
## (Intercept):3 0.6802 0.1260 5.398 6.72e-08 ***
## sesE 0.2571 0.1654 1.555 0.120024
## sesD 0.5248 0.1538 3.413 0.000643 ***
## sesC 0.6157 0.1630 3.776 0.000159 ***
## sesB 0.8408 0.1696 4.958 7.13e-07 ***
## sesA 0.8238 0.1669 4.935 8.03e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]),
## logitlink(P[Y<=3])
##
## Residual deviance: 4449.382 on 4972 degrees of freedom
##
## Log-likelihood: -2224.691 on 4972 degrees of freedom
##
## Number of Fisher scoring iterations: 4
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Exponentiated coefficients:
## sesE sesD sesC sesB sesA
## 1.293161 1.690184 1.850902 2.318250 2.279240
## Likelihood ratio test
##
## Model 1: ordered(mental) ~ ses
## Model 2: ordered(mental) ~ ses
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4972 -2224.7
## 2 4962 -2220.8 -10 7.8272 0.6457
no different between m0 (parallel) and m1.
## mental Well Mild Moderate Impaired
## ses
## F 21 71 54 71
## E 36 97 54 78
## D 72 141 77 94
## C 57 105 65 60
## B 57 94 54 40
## A 64 94 58 46
dta3_f2 <- dta3_f %>% count(ses, mental, sort = F)
m <- as.data.frame(matrix(dta3_f2$n, 4, 6))
names(m) <- c('F', 'E', 'D', 'C', 'B', 'A')
rownames(m) <- c("Well", "Mild", "Moderate", "Impaired")
print(m)
## F E D C B A
## Well 21 36 72 57 57 64
## Mild 71 97 141 105 94 94
## Moderate 54 54 77 65 54 58
## Impaired 71 78 94 60 40 46
dta3_f2$ses <- relevel(factor(dta3_f2$ses), ref="F")
dta3_f2$mental <- relevel(factor(dta3_f2$mental), ref="Well")
str(dta3_f2)
## 'data.frame': 24 obs. of 3 variables:
## $ ses : Factor w/ 6 levels "F","E","D","C",..: 1 1 1 1 2 2 2 2 3 3 ...
## $ mental: Factor w/ 4 levels "Well","Mild",..: 1 2 3 4 1 2 3 4 1 2 ...
## $ n : int 21 71 54 71 36 97 54 78 72 141 ...
## Call:
## polr(formula = mental ~ ses, data = dta3_f2, weights = n, Hess = TRUE,
## method = "probit")
##
## Coefficients:
## Value Std. Error t value
## sesE -0.1542 0.09886 -1.560
## sesD -0.3187 0.09172 -3.475
## sesC -0.3783 0.09700 -3.901
## sesB -0.5143 0.10058 -5.113
## sesA -0.5050 0.09912 -5.095
##
## Intercepts:
## Value Std. Error t value
## Well|Mild -1.2289 0.0788 -15.5918
## Mild|Moderate -0.1984 0.0750 -2.6455
## Moderate|Impaired 0.4151 0.0755 5.4963
##
## Residual Deviance: 4447.146
## AIC: 4463.146
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## sesE -0.3480252 0.03949501
## sesD -0.4986156 -0.13907814
## sesC -0.5685622 -0.18833252
## sesB -0.7115325 -0.31726724
## sesA -0.6993807 -0.31083041
Source: Srole, L. & Fisher, A.K. (1978). Mental health in the metropolis. The midtown Manhattan study. Reported in Agresti, A. (1989). Tutorial on modeling ordered categorical response data. Psychological Bulletin, 105, 290-301.