Problem: Fit a regression model for parents and another for teachers, separately, and compare the results with that from the multiple sources approach for the Connecticut child surveys example.
Data: The data are obtained from two surveys of children’s mental health. The data set consists of 1,428 children who had both parent and teacher responses, and an additional 1,073 children with only a parent response. In the psychiatric literature, these sources of respondents are often referred to as “informants”. A standardized measure of childhood psychopathology was used both by parents (Child Behavior Checklist, CBC) and teachers (Teacher Report Forms, TRF) to assess children in the study. The response variable is derived from the externalizing scale, which assesses delinquent and aggressive behavior. The scale has been dichotomized at the cutpoint for borderline/clinical psychopathology.
Column 1: Child ID
Column 2: Child’s physical health problems: Fair = Fair to bad , Good
Column 3: Single parent status: Single, Otherwise
Column 4: Externalizing behavior (Parent report)
Column 5: Externalizing behavior (Teacher report)
# Loading package
pacman::p_load(tidyr, tidyverse, nlme, car, afex, emmeans, lattice, faraway, boot, ggplot2, kableExtra, MASS, pscl, nnet, mgcv, VGAM, geepack)
# input data
dta1 <- read.table("ccs-data.txt", h=F, na.strings='.')
# rename
colnames(dta1) <- c("ID",
"Phyhealth",
"ParentSingle",
"Parent",
"Teacher")
dta1 <- dta1 %>%
mutate(Phyhealth = factor(ifelse(Phyhealth == 1, "Bad", "Good")),
ParentSingle = factor(ifelse(ParentSingle == 1, "Y", "N")),
Parent = as.numeric(Parent),
Teacher = as.numeric(Teacher))
head(dta1)
## ID Phyhealth ParentSingle Parent Teacher
## 1 1 Bad Y 0 NA
## 2 2 Good N 1 NA
## 3 3 Bad N 0 NA
## 4 4 Bad Y 1 NA
## 5 5 Bad N 0 NA
## 6 6 Bad N 0 NA
## 'data.frame': 2501 obs. of 5 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Phyhealth : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 1 1 2 2 2 ...
## $ ParentSingle: Factor w/ 2 levels "N","Y": 2 1 1 2 1 1 1 1 2 1 ...
## $ Parent : num 0 1 0 1 0 0 0 0 0 0 ...
## $ Teacher : num NA NA NA NA NA NA NA 0 NA 0 ...
## ID Phyhealth ParentSingle Informant Report
## 1 1 Bad Y Parent 0
## 2 2 Good N Parent 1
## 3 3 Bad N Parent 0
## 4 4 Bad Y Parent 1
## 5 5 Bad N Parent 0
## 6 6 Bad N Parent 0
## 'data.frame': 5002 obs. of 5 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Phyhealth : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 1 1 2 2 2 ...
## $ ParentSingle: Factor w/ 2 levels "N","Y": 2 1 1 2 1 1 1 1 2 1 ...
## $ Informant : Factor w/ 2 levels "Parent","Teacher": 1 1 1 1 1 1 1 1 1 1 ...
## $ Report : num 0 1 0 1 0 0 0 0 0 0 ...
## Teacher 0 1
## Parent
## 0 1030 165
## 1 129 104
Teacher more likely to report the student with delinquent and aggressive behavior.
## Parent Teacher
## Parent 1.0000000 0.2913296
## Teacher 0.2913296 1.0000000
Correlation = 0.29 (low level)
## [1] 0.4290284
Percent missing = 42.9%
sjPlot::tab_model(m1 <- geeglm(Report ~ Informant + Phyhealth + ParentSingle + Informant:Phyhealth, data=dta1L, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)
Report | |||
---|---|---|---|
Predictors | Odds Ratios | CI | p |
(Intercept) | 0.21 | 0.18 – 0.25 | <0.001 |
Informant Teacher | 1.05 | 0.83 – 1.33 | 0.699 |
Phyhealth [Good] | 0.55 | 0.44 – 0.68 | <0.001 |
ParentSingle [Y] | 1.88 | 1.56 – 2.28 | <0.001 |
Informant Teacher * Phyhealth [Good] |
1.53 | 1.08 – 2.16 | 0.016 |
## Estimate Std.err
## alpha 0 0
Why zero ?
sjPlot::tab_model(mt <- geeglm(Report ~ Phyhealth + ParentSingle, subset = Informant=="Teacher", data=dta1L, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)
Report | |||
---|---|---|---|
Predictors | Odds Ratios | CI | p |
(Intercept) | 0.22 | 0.18 – 0.27 | <0.001 |
Phyhealth [Good] | 0.84 | 0.64 – 1.10 | 0.201 |
ParentSingle [Y] | 1.92 | 1.42 – 2.62 | <0.001 |
sjPlot::tab_model(mp <- geeglm(Report ~ Phyhealth + ParentSingle, subset = Informant=="Parent", data=dta1L, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)
Report | |||
---|---|---|---|
Predictors | Odds Ratios | CI | p |
(Intercept) | 0.21 | 0.18 – 0.25 | <0.001 |
Phyhealth [Good] | 0.55 | 0.44 – 0.68 | <0.001 |
ParentSingle [Y] | 1.86 | 1.46 – 2.37 | <0.001 |
Report | Report | Report | |||||||
---|---|---|---|---|---|---|---|---|---|
Predictors | Odds Ratios | CI | p | Odds Ratios | CI | p | Odds Ratios | CI | p |
(Intercept) | 0.21 | 0.18 – 0.25 | <0.001 | 0.22 | 0.18 – 0.27 | <0.001 | 0.21 | 0.18 – 0.25 | <0.001 |
Informant Teacher | 1.05 | 0.83 – 1.33 | 0.699 | ||||||
Phyhealth [Good] | 0.55 | 0.44 – 0.68 | <0.001 | 0.84 | 0.64 – 1.10 | 0.201 | 0.55 | 0.44 – 0.68 | <0.001 |
ParentSingle [Y] | 1.88 | 1.56 – 2.28 | <0.001 | 1.92 | 1.42 – 2.62 | <0.001 | 1.86 | 1.46 – 2.37 | <0.001 |
Informant Teacher * Phyhealth [Good] |
1.53 | 1.08 – 2.16 | 0.016 | ||||||
N | 2501 ID | 1428 ID | 2501 ID | ||||||
Observations | 3929 | 1428 | 2501 |
dta1L <- data.frame(dta1L, id=row.names(dta1L))
yhat_m1 <- data.frame(id=row.names(fitted(m1)), phat=fitted(m1))
dta1_m1 <- inner_join(dta1L, yhat_m1, by="id")
ggplot(dta1_m1, aes(Phyhealth, Report, color = Informant)) +
geom_point(aes(Phyhealth, phat),
position = position_dodge(width=.2),
size = rel(3), pch = 1) +
geom_hline(yintercept = c(mean(dta1$Parent, na.rm=T),
mean(dta1$Teacher, na.rm=T)),
linetype = "dotted")+
stat_summary(fun.data = "mean_cl_boot",
position = position_dodge(width=.2)) +
facet_wrap( ~ ParentSingle)+
labs(x = "Physical health",
y = "Probability of report") +
guides(color = guide_legend(reverse=T))+
theme_minimal()+
theme(legend.position = c(.1, .9))
The figure is same as lecture note (page 16), but not same as output figure in the Connecticut child surveys example.
Source: Zahner, G.E.P., Jacobs, J.H., Freeman, D.H. & Trainor, K.F. (1993). Rural-urban child psychopathology in a northeastern U.S. State: 1986-1989. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 378-387. Reported in Fitzmaurice, G.M., Laird, N.M., & Ware, J.H. (2004). Applied Longitudinal Analysis. p. 434-439.
Problem: Replicate the results of analysis reported in the insomnia example.
Data: The data come from a randomized, double-blind clinical trial comparing active hypnotic drug with a placebo in patients who have insomnia problems. The response is the patient’s reported time (in minutes) to fall asleep after going to bed. Patients responded before and after a two-week treatment period. The patients receiving the two treatments were independent samples.
Column 1: Subject ID
Column 2: Treatment, Active = 1, Placebo = 0
Column 3: Time, Initial = 0, Follow-up = 1
Column 4: Time to sleep, < 20 min. = 1, 20-30 = 2, 30-60 = 3, > 60 min. = 4
# input data
dta2 <- read.table("insomnia.txt", h=T)
# rename
colnames(dta2) <- c("ID", "Treatment", "Time", "Sleeplatency")
head(dta2)
## ID Treatment Time Sleeplatency
## 1 1 1 0 1
## 2 1 1 1 1
## 3 2 1 0 1
## 4 2 1 1 1
## 5 3 1 0 1
## 6 3 1 1 1
## 'data.frame': 478 obs. of 4 variables:
## $ ID : int 1 1 2 2 3 3 4 4 5 5 ...
## $ Treatment : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Time : int 0 1 0 1 0 1 0 1 0 1 ...
## $ Sleeplatency: int 1 1 1 1 1 1 1 1 1 1 ...
# repolr [data = as.data.frame()]
pacman::p_load(repolr)
# repolr
summary(dta2_repolr <- repolr(Sleeplatency ~ Treatment + Time + Treatment:Time,
subjects = "ID",
data = as.data.frame(dta2),
times = c(1,2),
categories = 4,
corr.mod = "independence",
po.test=TRUE,))
##
## repolr: 2016-02-26 version 3.4
##
## Call:
## repolr(formula = Sleeplatency ~ Treatment + Time + Treatment:Time,
## subjects = "ID", data = as.data.frame(dta2), times = c(1,
## 2), categories = 4, corr.mod = "independence", po.test = TRUE)
##
## Coefficients:
## coeff se.robust z.robust p.value
## cuts1|2 -2.2671 0.2091 -10.8422 0.0000
## cuts2|3 -0.9515 0.1769 -5.3787 0.0000
## cuts3|4 0.3517 0.1751 2.0086 0.0446
## Treatment 0.0336 0.2369 0.1418 0.8872
## Time 1.0381 0.1564 6.6375 0.0000
## Treatment:Time 0.7077 0.2458 2.8792 0.0040
##
## Correlation Structure: independence
## Fixed Correlation: 0
## PO Score Test: 8.6394 (d.f. = 6 and p.value = 0.1949)
# polr
sjPlot::tab_model(m0 <- MASS::polr(ordered(Sleeplatency) ~ Treatment + Time +
Treatment:Time, data=dta2, Hess = T),
show.obs=F, show.r2=F, show.se=T, transform=NULL)
ordered(Sleeplatency) | ||||
---|---|---|---|---|
Predictors | Log-Odds | std. Error | CI | p |
1|2 | -2.27 | 0.20 | -2.73 – -1.80 | <0.001 |
2|3 | -0.95 | 0.18 | -1.43 – -0.48 | <0.001 |
3|4 | 0.35 | 0.17 | -0.30 – 1.01 | 0.045 |
Treatment | -0.03 | 0.24 | -0.50 – 0.43 | 0.888 |
Time | -1.04 | 0.24 | -1.51 – -0.57 | <0.001 |
Treatment * Time | -0.71 | 0.33 | -1.36 – -0.05 | 0.035 |
Time effect
1.04 for placebo, 1.04 + 0.71 = 1.75 for treatment.
Odds ratios: exp(1.04) = 2.83 for placebo, exp(1.75) = 5.75 for treatment
Treatment effect
exp(0.03) = 1.03 initially,
exp(0.03 + 0.71) = 2.1 at follow-up.
Model: logit(P{Yt <= j}) = αk + β1 time + β2 trt + β3 time × trt,
j = 1, 2, 3, 4; k = j - 1 for j > 1.
## Length Class Mode
## 0 NULL NULL
# hand-made cumulative prop. against time to sleep
dta2 <- dta2 %>%
mutate(Treatment = factor(Treatment,
levels=0:1,
labels=c("Placebo","Active")),
Time = factor(Time,
levels=0:1,
labels=c("Initial","Follow-up")),
Sleeplatency = factor(Sleeplatency))
dta2.tbl <- ftable(dta2$Treatment, dta2$Time, dta2$Sleeplatency)
dta2.tbl
## 1 2 3 4
##
## Placebo Initial 14 20 35 51
## Follow-up 31 29 35 25
## Active Initial 12 20 40 47
## Follow-up 40 49 19 11
The sleep latency time in both group are improved. However, the hypnotic drug group more likely better than Placebo group.
# prop.table: matrix 1 indicates rows
# t: matrix transpose()
obsp <- as.numeric(t(prop.table(dta2.tbl, m=1)))
# Cumulative Sums
clp <- c(cumsum(obsp[1:4]),
cumsum(obsp[5:8]),
cumsum(obsp[9:12]),
cumsum(obsp[13:16]))
yt <- rep(c(20,25,45,60),4)
y <- data.frame(time2sleep=yt, clprop=clp)
plot(y[,1], y[,2], type="n",
xlab="Time to Sleep (min.)", ylab="Cumulative proportion")
lines(y[1:4, 1], y[1:4,2], lty=1, col="steelblue")
lines(y[5:8, 1], y[5:8,2], lty=1, col="steelblue")
lines(y[9:12, 1], y[9:12,2], lty=2, col="indianred")
lines(y[13:16,1], y[13:16,2], lty=2, col="indianred")
text(22, .12, "T,I")
text(22, .47, "T,F")
text(22, .20, "P,I")
text(22, .33, "P,F")
grid()
The active drug group and the placebo groups initially fell asleep at about the same probability. Two weeks later, at the follow-up those with the active treatment tended to with shorter sleep latency time (fall asleep quickly).
Source: Agresti, A. (2002). Categorical Data Analysis. 2nd Ed. p. 462