Data from here, inspiration from Hedeker’s web page
fn <- "schizx1.dat"
if (!file.exists(fn)) {
download.file("http://www.uic.edu/~hedeker/SCHIZX1.DAT.txt",dest=fn)
}
(This almost works: I found an extra Ctrl-Z character at the end of the file that I had to hand-edit out …)
dd0 <- read.table("schizx1.dat")
names(dd0) <- c("id","imps79","imps79b","imps79o","int","tx",
"week","sweek","txswk")
apply(dd0,2,function(x) sum(x==-9,na.rm=TRUE))
## id imps79 imps79b imps79o int tx week sweek txswk
## 0 1456 1456 1456 0 0 0 0 0
(I could have used na.strings=c("-9","NA") but that seems dangerous in general – what if there are legitimate -9 values in some columns?)
Utility functions/packages:
na.vals <- function(x,crit) {
x[crit] <- NA
return(x)
}
library(plyr) ## for mutate()
library(MASS) ## for glmmPQL
library(nlme)
library(lme4)
library(reshape2) ## melt()
library(MCMCglmm)
library(ggplot2); theme_set(theme_bw())
## get this with repos="http://www.math.mcmaster.ca/bolker/R":
library(coefplot2)
library(RColorBrewer)
cvec <- brewer.pal(7,"Dark2")
Post-process data a little bit (this follows the steps on Hedeker’s web page):
dd <- mutate(dd0,
imps79=na.vals(imps79,imps79<0),
imps79b=na.vals(imps79b,imps79b<0),
imps79b=factor(imps79b,labels=c("le mild","ge moderate")),
tx=factor(tx,labels=c("placebo","drug")))
The sweek variable is square-root of week (presumably done to improve linearity on the logit scale??)
ggplot(dd,aes(sweek,imps79b,colour=tx,group=id))+geom_line()+
stat_sum(aes(group=tx))+
geom_smooth(method="glm",family="binomial",se=FALSE)
Trying to facet on individuals is slow & silly (most of the space is taken up by the facet labels; we could strip the plots down to sparklines and remove the labels, but I’m not sure how much it would help).
(Not evaluated:)
ggplot(dd,aes(sweek,imps79b,colour=tx,group=id))+geom_line()+
facet_wrap(~id)
Try sparklines-style plots, possibly in batches? (473 individuals is a big challenge for anything other than spaghetti plots …)
Check experimental design:
with(dd,table(tx,sweek))
## sweek
## tx 0 1 1.4142 1.7321 2 2.2361 2.4495
## placebo 108 108 108 108 108 108 108
## drug 329 329 329 329 329 329 329
with(na.omit(dd),table(tx,sweek))
## sweek
## tx 0 1 1.4142 1.7321 2 2.2361 2.4495
## placebo 107 105 5 87 2 2 70
## drug 327 321 9 287 9 7 265
Try reducing the data to sequences: most data are observed at only 4 unique time periods
dd$rweek <- round(dd$sweek^2)
dd2 <- na.omit(subset(dd,rweek %in% c(0,1,3,6)))
nrow(na.omit(dd))-nrow(dd2) ## lose only 34 cases (out of 1600)
## [1] 34
dd3 <- ddply(dd2,"id",summarise,
tx=tx[1],
trans=paste(imps79b,collapse=""))
ttab <- with(dd3,table(trans,tx))
ttab0 <- ttab[rowSums(ttab)>10,]
fracrep <- c(sum(ttab0),length(unique(dd$id)))
After stripping out the unusual sequences, we still have 408 out of 437 individuals represented:
## tx
## trans placebo drug
## 10 2 9
## 100 3 10
## 1000 0 25
## 11 17 20
## 110 1 25
## 1100 4 42
## 111 18 13
## 1110 13 70
## 1111 41 95
Or normalize by column total (= number of individuals in the treatment) and sort by frequency in placebo arm:
It might be a special-case method, but I can imagine that we could come up with a statistical model (e.g. a Markov model/ finite-state machine?) for the likelihood of these sequences under an appropriate distribution of individual-level REs …
Fit regular GLM, PQL with random-intercept and random-slopes models, Laplace with random intercept:
m0 <- glm(imps79b~tx*sweek,dd,family=binomial)
m1 <- MASS::glmmPQL(imps79b~tx*sweek,random=~1|id,dd,family=binomial,
verbose=FALSE)
m2 <- MASS::glmmPQL(imps79b~tx*sweek,random=~1+sweek|id,dd,family=binomial,
verbose=FALSE)
m3 <- glmer(imps79b~tx*sweek+(1|id),dd,family=binomial)
Try Laplace with random slopes. Extremely slow: lots of ‘step-halving’ action, and eventually fails to converge with maxgrad=180.4 (!!)
m4 <- glmer(imps79b~tx*sweek+(1+sweek|id),dd,family=binomial)
## Warning: Model failed to converge with max|grad| = 180.415 (tol = 0.001,
## component 4)
Try both models with MCMCglmm:
## random intercept
m5 <- MCMCglmm(imps79b~tx*sweek,
random=~id,
data=na.omit(dd),
family="categorical",verbose=FALSE)
## random intercept+slope
m6 <- MCMCglmm(imps79b~tx*sweek,
random=~us(sweek):id,
data=na.omit(dd),
family="categorical",verbose=FALSE)
cbind(glm=coef(m0),glmmPQL_int=fixef(m1),
glmmPQL_slope=fixef(m2),lme4_int=fixef(m3),
lme4_slope=fixef(m4))
## glm glmmPQL_int glmmPQL_slope lme4_int lme4_slope
## (Intercept) 3.7027053 4.6846075 12.113539 5.29432838 41.553451
## txdrug -0.4054238 0.3149482 2.513707 -0.01603148 5.701931
## sweek -1.1125529 -1.3674879 -4.231725 -1.43552801 -10.942024
## txdrug:sweek -0.4179643 -1.0223856 -3.325614 -1.03903393 -11.860041
coefplot2(list(glm=m0,PQL_int=m1,PQL_slope=m2,
Lapl_int=m3,Lapl_slope=m4),
legend=TRUE)
Needs more work to combine m5 properly:
coefplot2(list(glm=m0,PQL_int=m1,PQL_slope=m2,
Lapl_int=m3,Lapl_slope=m4,
MCMC_int=m5,MCMC_slope=m6),
col=cvec,
legend=TRUE)