1 Inclass 1

1.1 Info

Problem: Fit a regression model for parents and another for teachers, separately, and compare the results with that from the multiple sources approach for the Connecticut child surveys example.
Data: The data are obtained from two surveys of children’s mental health. The data set consists of 1,428 children who had both parent and teacher responses, and an additional 1,073 children with only a parent response. In the psychiatric literature, these sources of respondents are often referred to as “informants”. A standardized measure of childhood psychopathology was used both by parents (Child Behavior Checklist, CBC) and teachers (Teacher Report Forms, TRF) to assess children in the study. The response variable is derived from the externalizing scale, which assesses delinquent and aggressive behavior. The scale has been dichotomized at the cutpoint for borderline/clinical psychopathology.

Column 1: Child ID
Column 2: Child’s physical health problems: Fair = Fair to bad , Good
Column 3: Single parent status: Single, Otherwise
Column 4: Externalizing behavior (Parent report)
Column 5: Externalizing behavior (Teacher report)

1.2 Data management

# Loading package
pacman::p_load(tidyr, tidyverse, nlme, car, afex, emmeans, lattice, faraway, boot, ggplot2, kableExtra, MASS, pscl, nnet, mgcv, VGAM, geepack)

# input data
dta1 <- read.table("ccs-data.txt", h=F, na.strings='.')

# rename
colnames(dta1) <- c("ID", 
                    "Phyhealth", 
                    "ParentSingle", 
                    "Parent",
                    "Teacher")
dta1 <- dta1 %>% 
  mutate(Phyhealth = factor(ifelse(Phyhealth == 1, "Bad", "Good")),
         ParentSingle = factor(ifelse(ParentSingle == 1, "Y", "N")),
         Parent = as.numeric(Parent),
         Teacher = as.numeric(Teacher))

head(dta1)

##   ID Phyhealth ParentSingle Parent Teacher
## 1  1       Bad            Y      0      NA
## 2  2      Good            N      1      NA
## 3  3       Bad            N      0      NA
## 4  4       Bad            Y      1      NA
## 5  5       Bad            N      0      NA
## 6  6       Bad            N      0      NA

str(dta1)

## 'data.frame':    2501 obs. of  5 variables:
##  $ ID          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Phyhealth   : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 1 1 2 2 2 ...
##  $ ParentSingle: Factor w/ 2 levels "N","Y": 2 1 1 2 1 1 1 1 2 1 ...
##  $ Parent      : num  0 1 0 1 0 0 0 0 0 0 ...
##  $ Teacher     : num  NA NA NA NA NA NA NA 0 NA 0 ...

1.2.1 Wide to Long

dta1L <- gather(dta1, key=Informant, value=Report,
                Parent:Teacher, factor_key=TRUE)

head(dta1L)

##   ID Phyhealth ParentSingle Informant Report
## 1  1       Bad            Y    Parent      0
## 2  2      Good            N    Parent      1
## 3  3       Bad            N    Parent      0
## 4  4       Bad            Y    Parent      1
## 5  5       Bad            N    Parent      0
## 6  6       Bad            N    Parent      0

str(dta1L)

## 'data.frame':    5002 obs. of  5 variables:
##  $ ID          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Phyhealth   : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 1 1 2 2 2 ...
##  $ ParentSingle: Factor w/ 2 levels "N","Y": 2 1 1 2 1 1 1 1 2 1 ...
##  $ Informant   : Factor w/ 2 levels "Parent","Teacher": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Report      : num  0 1 0 1 0 0 0 0 0 0 ...

1.3 Descriptive info

with(dta1, ftable(Parent, Teacher))

##        Teacher    0    1
## Parent                  
## 0              1030  165
## 1               129  104

Teacher more likely to report the student with delinquent and aggressive behavior.

cor(dta1[,4:5], use="pair")

##            Parent   Teacher
## Parent  1.0000000 0.2913296
## Teacher 0.2913296 1.0000000

Correlation = 0.29 (low level)

sum(is.na(dta1[,4]) | is.na(dta1[,5]))/dim(dta1)[1]

## [1] 0.4290284

Percent missing = 42.9%

1.4 Parameter estimates

sjPlot::tab_model(m1 <- geeglm(Report ~ Informant + Phyhealth + ParentSingle + Informant:Phyhealth, data=dta1L, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)

	Report
Predictors	Odds Ratios	CI	p
(Intercept)	0.21	0.18 – 0.25	<0.001
Informant Teacher	1.05	0.83 – 1.33	0.699
Phyhealth [Good]	0.55	0.44 – 0.68	<0.001
ParentSingle [Y]	1.88	1.56 – 2.28	<0.001
Informant Teacher * Phyhealth [Good]	1.53	1.08 – 2.16	0.016

The good physical health of children less likely to be report with delinquent and aggressive behavior
The single parent’s children more likely to be report with delinquent and aggressive behavior. (For kids of single parent status, both informants were more likely to report behavior problems.)
Even for the good physical health of children, teacher more likely to report the children with delinquent and aggressive behavior (The effect of children health problems on report of behavior problems is more apparent on parents than on teachers.)

summary(m1)$corr

##       Estimate Std.err
## alpha        0       0

Why zero ?

1.5 Comparison

1.5.1 Teacher

sjPlot::tab_model(mt <- geeglm(Report ~ Phyhealth + ParentSingle, subset = Informant=="Teacher", data=dta1L, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)

	Report
Predictors	Odds Ratios	CI	p
(Intercept)	0.22	0.18 – 0.27	<0.001
Phyhealth [Good]	0.84	0.64 – 1.10	0.201
ParentSingle [Y]	1.92	1.42 – 2.62	<0.001

1.5.2 Parent

sjPlot::tab_model(mp <- geeglm(Report ~ Phyhealth + ParentSingle, subset = Informant=="Parent", data=dta1L, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)

	Report
Predictors	Odds Ratios	CI	p
(Intercept)	0.21	0.18 – 0.25	<0.001
Phyhealth [Good]	0.55	0.44 – 0.68	<0.001
ParentSingle [Y]	1.86	1.46 – 2.37	<0.001

sjPlot::tab_model(m1, mt, mp)

	Report			Report			Report
Predictors	Odds Ratios	CI	p	Odds Ratios	CI	p	Odds Ratios	CI	p
(Intercept)	0.21	0.18 – 0.25	<0.001	0.22	0.18 – 0.27	<0.001	0.21	0.18 – 0.25	<0.001
Informant Teacher	1.05	0.83 – 1.33	0.699
Phyhealth [Good]	0.55	0.44 – 0.68	<0.001	0.84	0.64 – 1.10	0.201	0.55	0.44 – 0.68	<0.001
ParentSingle [Y]	1.88	1.56 – 2.28	<0.001	1.92	1.42 – 2.62	<0.001	1.86	1.46 – 2.37	<0.001
Informant Teacher * Phyhealth [Good]	1.53	1.08 – 2.16	0.016
N	2501 _ID			1428 _ID			2501 _ID
Observations	3929			1428			2501

The physical health in the model of only parent and all model is the same.
If consider the teacher effect, the physical health will not affect the report.

1.6 Fitted probabilities

dta1L <- data.frame(dta1L, id=row.names(dta1L))

yhat_m1 <- data.frame(id=row.names(fitted(m1)), phat=fitted(m1))

dta1_m1 <- inner_join(dta1L, yhat_m1, by="id")

ggplot(dta1_m1, aes(Phyhealth, Report, color = Informant)) +
 geom_point(aes(Phyhealth, phat),
            position = position_dodge(width=.2),
            size = rel(3), pch = 1) +
 geom_hline(yintercept = c(mean(dta1$Parent, na.rm=T),
                         mean(dta1$Teacher, na.rm=T)), 
            linetype = "dotted")+
 stat_summary(fun.data = "mean_cl_boot", 
              position = position_dodge(width=.2)) +
 facet_wrap( ~ ParentSingle)+
 labs(x = "Physical health", 
      y = "Probability of report") +
 guides(color = guide_legend(reverse=T))+
 theme_minimal()+
 theme(legend.position = c(.1, .9))

The figure is same as lecture note (page 16), but not same as output figure in the Connecticut child surveys example.

1.7 Reference

Source: Zahner, G.E.P., Jacobs, J.H., Freeman, D.H. & Trainor, K.F. (1993). Rural-urban child psychopathology in a northeastern U.S. State: 1986-1989. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 378-387. Reported in Fitzmaurice, G.M., Laird, N.M., & Ware, J.H. (2004). Applied Longitudinal Analysis. p. 434-439.

2 Inclass 2

2.1 Info

Problem: Replicate the results of analysis reported in the insomnia example.
Data: The data come from a randomized, double-blind clinical trial comparing active hypnotic drug with a placebo in patients who have insomnia problems. The response is the patient’s reported time (in minutes) to fall asleep after going to bed. Patients responded before and after a two-week treatment period. The patients receiving the two treatments were independent samples.

Column 1: Subject ID
Column 2: Treatment, Active = 1, Placebo = 0
Column 3: Time, Initial = 0, Follow-up = 1
Column 4: Time to sleep, < 20 min. = 1, 20-30 = 2, 30-60 = 3, > 60 min. = 4

2.2 Data management

# input data
dta2 <- read.table("insomnia.txt", h=T)

# rename
colnames(dta2) <- c("ID", "Treatment", "Time", "Sleeplatency")

head(dta2)

##   ID Treatment Time Sleeplatency
## 1  1         1    0            1
## 2  1         1    1            1
## 3  2         1    0            1
## 4  2         1    1            1
## 5  3         1    0            1
## 6  3         1    1            1

str(dta2)

## 'data.frame':    478 obs. of  4 variables:
##  $ ID          : int  1 1 2 2 3 3 4 4 5 5 ...
##  $ Treatment   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Time        : int  0 1 0 1 0 1 0 1 0 1 ...
##  $ Sleeplatency: int  1 1 1 1 1 1 1 1 1 1 ...

2.3 GEE

# repolr [data = as.data.frame()]
pacman::p_load(repolr)

# repolr
summary(dta2_repolr <- repolr(Sleeplatency ~ Treatment + Time + Treatment:Time,
                 subjects = "ID",
                 data = as.data.frame(dta2),
                 times = c(1,2),
                 categories = 4,
                 corr.mod = "independence",
                 po.test=TRUE,))

## 
## repolr: 2016-02-26 version 3.4 
## 
## Call:
## repolr(formula = Sleeplatency ~ Treatment + Time + Treatment:Time, 
##     subjects = "ID", data = as.data.frame(dta2), times = c(1, 
##         2), categories = 4, corr.mod = "independence", po.test = TRUE)
## 
## Coefficients: 
##                 coeff     se.robust  z.robust  p.value 
## cuts1|2          -2.2671    0.2091   -10.8422    0.0000
## cuts2|3          -0.9515    0.1769    -5.3787    0.0000
## cuts3|4           0.3517    0.1751     2.0086    0.0446
## Treatment         0.0336    0.2369     0.1418    0.8872
## Time              1.0381    0.1564     6.6375    0.0000
## Treatment:Time    0.7077    0.2458     2.8792    0.0040
## 
## Correlation Structure:  independence 
## Fixed Correlation:  0 
## PO Score Test: 8.6394 (d.f. = 6 and p.value = 0.1949)

# polr

sjPlot::tab_model(m0 <- MASS::polr(ordered(Sleeplatency) ~ Treatment + Time + 
                                   Treatment:Time, data=dta2, Hess = T),
                  show.obs=F, show.r2=F, show.se=T, transform=NULL)

	ordered(Sleeplatency)
Predictors	Log-Odds	std. Error	CI	p
1\|2	-2.27	0.20	-2.73 – -1.80	<0.001
2\|3	-0.95	0.18	-1.43 – -0.48	<0.001
3\|4	0.35	0.17	-0.30 – 1.01	0.045
Treatment	-0.03	0.24	-0.50 – 0.43	0.888
Time	-1.04	0.24	-1.51 – -0.57	<0.001
Treatment * Time	-0.71	0.33	-1.36 – -0.05	0.035

Time effect
1.04 for placebo, 1.04 + 0.71 = 1.75 for treatment.
Odds ratios: exp(1.04) = 2.83 for placebo, exp(1.75) = 5.75 for treatment
Treatment effect
exp(0.03) = 1.03 initially,
exp(0.03 + 0.71) = 2.1 at follow-up.

Model: logit(P{Y_t <= j}) = α_k + β₁ time + β₂ trt + β₃ time × trt,
j = 1, 2, 3, 4; k = j - 1 for j > 1.

#???
summary(dta2_repolr$gee)

## Length  Class   Mode 
##      0   NULL   NULL

# hand-made cumulative prop. against time to sleep

dta2 <- dta2 %>% 
  mutate(Treatment = factor(Treatment, 
                            levels=0:1, 
                            labels=c("Placebo","Active")),
         Time = factor(Time, 
                       levels=0:1, 
                       labels=c("Initial","Follow-up")),
         Sleeplatency = factor(Sleeplatency))

dta2.tbl <- ftable(dta2$Treatment, dta2$Time, dta2$Sleeplatency)

dta2.tbl

##                     1  2  3  4
##                               
## Placebo Initial    14 20 35 51
##         Follow-up  31 29 35 25
## Active  Initial    12 20 40 47
##         Follow-up  40 49 19 11

The sleep latency time in both group are improved. However, the hypnotic drug group more likely better than Placebo group.

2.4 Cumulative prop.plot

# prop.table: matrix 1 indicates rows
# t: matrix transpose()
obsp <- as.numeric(t(prop.table(dta2.tbl, m=1)))

# Cumulative Sums
clp <- c(cumsum(obsp[1:4]), 
         cumsum(obsp[5:8]),
         cumsum(obsp[9:12]), 
         cumsum(obsp[13:16]))

yt <- rep(c(20,25,45,60),4)

y <- data.frame(time2sleep=yt, clprop=clp)

plot(y[,1], y[,2], type="n",
     xlab="Time to Sleep (min.)", ylab="Cumulative proportion")

lines(y[1:4,  1],  y[1:4,2], lty=1, col="steelblue")
lines(y[5:8,  1],  y[5:8,2], lty=1, col="steelblue")
lines(y[9:12, 1],  y[9:12,2], lty=2, col="indianred")
lines(y[13:16,1],  y[13:16,2], lty=2, col="indianred")

text(22, .12, "T,I")
text(22, .47, "T,F")
text(22, .20, "P,I")
text(22, .33, "P,F")

grid()

The active drug group and the placebo groups initially fell asleep at about the same probability. Two weeks later, at the follow-up those with the active treatment tended to with shorter sleep latency time (fall asleep quickly).

2.5 Reference

Source: Agresti, A. (2002). Categorical Data Analysis. 2nd Ed. p. 462

Inclass: Generalized Estimating Equations

Chang-Chun Chen

2020-11-20