1 Introduction

The data are obtained from two surveys of children’s mental health. The data set consists of 1,428 children who had both parent and teacher responses, and an additional 1,073 children with only a parent response. In the psychiatric literature, these sources of respondents are often referred to as “informants”.

A standardized measure of childhood psychopathology was used both by parents (Child Behavior Checklist, CBC) and teachers (Teacher Report Forms, TRF) to assess children in the study.

The response variable is derived from the externalizing scale, which assesses delinquent and aggressive behavior. The scale has been dichotomized at the cutpoint for borderline/clinical psychopathology.

Source: Zahner, G.E.P., Jacobs, J.H., Freeman, D.H. & Trainor, K.F. (1993). Rural-urban child psychopathology in a northeastern U.S. State: 1986-1989. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 378-387. Reported in Fitzmaurice, G.M., Laird, N.M., & Ware, J.H. (2004). Applied Longitudinal Analysis. p. 434-439.

Column 1: Child ID Column 2: Child’s physical health problems: Fair = Fair to bad , Good Column 3: Single parent status: Single, Otherwise Column 4: Externalizing behavior (Parent report) Column 5: Externalizing behavior (Teachert report)

2 Data management

#Loading package
require(pacman)

## Loading required package: pacman

pacman::p_load(MASS, gee, tidyverse, geepack)

# input data
dta <- read.table("ccs-data.txt", h=F, na.strings='.')

head(dta)

##   V1 V2 V3 V4 V5
## 1  1  1  1  0 NA
## 2  2  0  0  1 NA
## 3  3  1  0  0 NA
## 4  4  1  1  1 NA
## 5  5  1  0  0 NA
## 6  6  1  0  0 NA

#將變數重新命名
names(dta) <- c("ID", "Phyhealth", "ParentSingle", "Parent", "Teacher")

#
dta <- dta %>%
mutate(ID = factor(ID), 
Phyhealth = factor(ifelse(Phyhealth == 1, "Bad", "Good")),
ParentSingle = factor(ifelse(ParentSingle == 1, "Y", "N")))

#
head(dta)

##   ID Phyhealth ParentSingle Parent Teacher
## 1  1       Bad            Y      0      NA
## 2  2      Good            N      1      NA
## 3  3       Bad            N      0      NA
## 4  4       Bad            Y      1      NA
## 5  5       Bad            N      0      NA
## 6  6       Bad            N      0      NA

#
str(dta)

## 'data.frame':    2501 obs. of  5 variables:
##  $ ID          : Factor w/ 2501 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Phyhealth   : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 1 1 2 2 2 ...
##  $ ParentSingle: Factor w/ 2 levels "N","Y": 2 1 1 2 1 1 1 1 2 1 ...
##  $ Parent      : int  0 1 0 1 0 0 0 0 0 0 ...
##  $ Teacher     : int  NA NA NA NA NA NA NA 0 NA 0 ...

#寬資料轉換成長資料
dtaL <- gather(dta, key=Informant, value=Report,
                Parent:Teacher, factor_key=TRUE)

head(dtaL,10)

##    ID Phyhealth ParentSingle Informant Report
## 1   1       Bad            Y    Parent      0
## 2   2      Good            N    Parent      1
## 3   3       Bad            N    Parent      0
## 4   4       Bad            Y    Parent      1
## 5   5       Bad            N    Parent      0
## 6   6       Bad            N    Parent      0
## 7   7       Bad            N    Parent      0
## 8   8      Good            N    Parent      0
## 9   9      Good            Y    Parent      0
## 10 10      Good            N    Parent      0

str(dtaL)

## 'data.frame':    5002 obs. of  5 variables:
##  $ ID          : Factor w/ 2501 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Phyhealth   : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 1 1 2 2 2 ...
##  $ ParentSingle: Factor w/ 2 levels "N","Y": 2 1 1 2 1 1 1 1 2 1 ...
##  $ Informant   : Factor w/ 2 levels "Parent","Teacher": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Report      : int  0 1 0 1 0 0 0 0 0 0 ...

with(dta, ftable(Teacher,Parent))

##         Parent    0    1
## Teacher                 
## 0              1030  129
## 1               165  104

老師報告學生問題行為的可能性大於父母。

# correlation
cor(dta[,4:5], use="pair")

##            Parent   Teacher
## Parent  1.0000000 0.2913296
## Teacher 0.2913296 1.0000000

老師報告學生問題行為與父母報告孩子問題行為的相關係數為0.291

# proportion of missing responses
sum(is.na(dta[,4]) | is.na(dta[,5]))/dim(dta)[1]

## [1] 0.4290284

missing data比率約有42.9%

3 Parameter estimates

sjPlot::tab_model(m1 <- geeglm(Report ~ Informant + Phyhealth + ParentSingle + Informant:Phyhealth, data=dtaL, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)

	Report
Predictors	Odds Ratios	CI	p
(Intercept)	0.21	0.18 – 0.25	<0.001
Informant [Teacher]	1.05	0.83 – 1.33	0.699
Phyhealth [Good]	0.55	0.44 – 0.68	<0.001
ParentSingle [Y]	1.88	1.56 – 2.28	<0.001
Informant [Teacher] * Phyhealth [Good]	1.53	1.08 – 2.16	0.016

1.身體健康好的學生，不太可能被通報有問題行為。 2.單親家庭的孩子，無論是老師或是家長，都更可能通報孩子有問題行為。

summary(m1)$corr

##       Estimate Std.err
## alpha        0       0

4 Comparison

##Teacher

sjPlot::tab_model(mt <- geeglm(Report ~ Phyhealth + ParentSingle, subset = Informant=="Teacher", data=dtaL, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)

	Report
Predictors	Odds Ratios	CI	p
(Intercept)	0.22	0.18 – 0.27	<0.001
Phyhealth [Good]	0.84	0.64 – 1.10	0.201
ParentSingle [Y]	1.92	1.42 – 2.62	<0.001

##Parent

sjPlot::tab_model(mp <- geeglm(Report ~ Phyhealth + ParentSingle, subset = Informant=="Parent", data=dtaL, id = ID, family = binomial, corstr = "exchangeable"), show.obs=F, show.ngroups = F)

	Report
Predictors	Odds Ratios	CI	p
(Intercept)	0.21	0.18 – 0.25	<0.001
Phyhealth [Good]	0.55	0.44 – 0.68	<0.001
ParentSingle [Y]	1.86	1.46 – 2.37	<0.001

sjPlot::tab_model(m1, mt, mp)

	Report			Report			Report
Predictors	Odds Ratios	CI	p	Odds Ratios	CI	p	Odds Ratios	CI	p
(Intercept)	0.21	0.18 – 0.25	<0.001	0.22	0.18 – 0.27	<0.001	0.21	0.18 – 0.25	<0.001
Informant [Teacher]	1.05	0.83 – 1.33	0.699
Phyhealth [Good]	0.55	0.44 – 0.68	<0.001	0.84	0.64 – 1.10	0.201	0.55	0.44 – 0.68	<0.001
ParentSingle [Y]	1.88	1.56 – 2.28	<0.001	1.92	1.42 – 2.62	<0.001	1.86	1.46 – 2.37	<0.001
Informant [Teacher] * Phyhealth [Good]	1.53	1.08 – 2.16	0.016
N	2501 _ID			1428 _ID			2501 _ID
Observations	3929			1428			2501

比較m1與mp:身體健康程度對家長而言是不影響(數值相同) 比較m1與mt:身體健康程度對教師而言不能說會影響其通報(因未達顯著)

5 Fitted probabilities

dtaL <- data.frame(dtaL, id=row.names(dtaL))

yhat_m1 <- data.frame(id=row.names(fitted(m1)), phat=fitted(m1))

dta_m1 <- inner_join(dtaL, yhat_m1, by="id")

ggplot(dta_m1, aes(Phyhealth, Report, color = Informant)) +
 geom_point(aes(Phyhealth, phat),
            position = position_dodge(width=.2),
            size = rel(3), pch = 1) +
 geom_hline(yintercept = c(mean(dta$Parent, na.rm=T),
                         mean(dta$Teacher, na.rm=T)), 
            linetype = "dotted")+
 stat_summary(fun.data = "mean_cl_boot", 
              position = position_dodge(width=.2)) +
 facet_wrap( ~ ParentSingle)+
 labs(x = "Physical health", 
      y = "Probability of report") +
 guides(color = guide_legend(reverse=T))+
 theme_minimal()+
 theme(legend.position = c(.1, .9))

#The End

Week10 In-class exercise1-Gee model for multiple sources data

Ching-Fang Wu

Sun Nov 22 21:21:14 2020

1 Introduction

2 Data management

3 Parameter estimates

4 Comparison

5 Fitted probabilities