Research question: We measured 500 young adults’ overall physical health at 4 time points (i.e., September each year during their undergraduate studies), at each time point we also collected information about their health consciousness (time varying) and converted it into a single numeric score. Parental health status at first time point and average steps taken per day over their final year of high school (assuming all subjects owns wearables and sync data with their phones religiously) were used as time-invariant predictors. We examined whether health consciousness, parental health status and daily activity level affects the trajectories of university students’ overall physical health.
library(lme4)
library(afex)
library(lattice) #xyplot
library(data.table) #setnames
library(reshape2) #melt
library(ggplot2)
library(texreg)
df<-read.csv('Assign3.csv')
#change colnames to fit our scenerio
setnames(df, old = c('Y','A3','X1','X2','Person','Time'),
new = c('phy_health','heal_consc','parent_heal','steps','person','time'))
df['person']<-lapply(df['person'],factor)
head(df)
#View(df)
All predictor variables are standardized (centered around 0), and normally distributed. The measurement occasions time were coded as t=0,1,2,3, then the intercept can be interpreted as the expected outcome on the first occasion. Before we start the analysis, we examine the distribution of the outcome variable physical health in the long data file with 500 × 4 = 2000 observations.
summary(df)
## phy_health heal_consc time parent_heal
## Min. :-7.2030 Min. :-3.990000 Min. :0.00 Min. :-2.51800
## 1st Qu.: 0.2677 1st Qu.:-0.639500 1st Qu.:0.75 1st Qu.:-0.72350
## Median : 1.9015 Median :-0.036000 Median :1.50 Median :-0.04950
## Mean : 2.1945 Mean :-0.006729 Mean :1.50 Mean :-0.07299
## 3rd Qu.: 3.7923 3rd Qu.: 0.667000 3rd Qu.:2.25 3rd Qu.: 0.59175
## Max. :12.6860 Max. : 3.446000 Max. :3.00 Max. : 2.79700
##
## steps person
## Min. :-2.0120 1 : 4
## 1st Qu.:-0.5058 2 : 4
## Median : 0.0615 3 : 4
## Mean : 0.1265 4 : 4
## 3rd Qu.: 0.7798 5 : 4
## Max. : 3.1760 6 : 4
## (Other):1976
long <- melt(df,id.vars = 'person')
ggplot(long, aes(value)) +
facet_wrap(~variable, scales = 'free_x') +
geom_histogram(binwidth = function(x) 2 * IQR(x) / (length(x)^(1/3)))
All numerical predictors seem to normally distributed. Scores seem to be standardized and centered around 0.
#sample 8 persons to see phy_health regressed on time intercept and slope variations among subjects
xyplot(phy_health~time | person,
data=df[df$person %in% c(4, 14, 23, 32, 41, 56, 65, 82), ],
panel=function(x,y)
{panel.xyplot(x, y)
panel.lmline(x,y)},
ylim=c(-3, 10), as.table=T)
We do see some variations between persons in terms of intercept and slope.
m_null<-lmer(phy_health~1+(1|person),df,REML=F)
summary(m_null)
## Linear mixed model fit by maximum likelihood . t-tests use
## Satterthwaite's method [lmerModLmerTest]
## Formula: phy_health ~ 1 + (1 | person)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 8943.8 8960.6 -4468.9 8937.8 1997
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.7111 -0.5862 -0.0811 0.5280 3.3443
##
## Random effects:
## Groups Name Variance Std.Dev.
## person (Intercept) 4.063 2.016
## Residual 3.268 1.808
## Number of obs: 2000, groups: person, 500
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.19455 0.09879 499.99998 22.21 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4.063/(4.063+3.268)
## [1] 0.5542218
The null-model contains only an intercept term and variances at the occasion and the subject level. The intercept of 4.063 in this model is simply the average physical health across all individuals and occasions. The intercept-only model estimates the repeated measures variance as 4.06 and the subject-level variance as 3.27, this renders an intra-class correlation of .55. About half of the variance of the 4 physical health measures is variance between individuals, and about half is variance within individuals across time.
m<-lmer(phy_health~1+(time|person),df,REML=F)
summary(m)
## Linear mixed model fit by maximum likelihood . t-tests use
## Satterthwaite's method [lmerModLmerTest]
## Formula: phy_health ~ 1 + (time | person)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 7375.9 7403.9 -3682.9 7365.9 1995
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.3181 -0.5072 0.0143 0.4922 3.0978
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## person (Intercept) 2.3533 1.5340
## time 1.5798 1.2569 0.65
## Residual 0.6349 0.7968
## Number of obs: 2000, groups: person, 500
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -0.03019 0.06534 499.99750 -0.462 0.644
#model with random intercept and random slope
m1<-lmer(phy_health~time+heal_consc+
(time|person),df,REML=F)
#model without random slope
m2<-lmer(phy_health~time+heal_consc+
(1|person),df,REML=F)
#model without random intercept
m3<-lmer(phy_health~time+heal_consc+
(0+time|person),df,REML=F)
anova(m1, m2, m3)
Model with random intercept and random slope is the best model
m4<-lmer(phy_health~time+heal_consc+parent_heal+steps+
(time|person),df,REML=F)
anova(m3, m4)
summary(m4)
## Linear mixed model fit by maximum likelihood . t-tests use
## Satterthwaite's method [lmerModLmerTest]
## Formula: phy_health ~ time + heal_consc + parent_heal + steps + (time |
## person)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 6505.2 6555.6 -3243.6 6487.2 1991
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.13650 -0.50443 -0.00628 0.47587 3.05359
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## person (Intercept) 1.1169 1.0568
## time 0.5010 0.7078 -0.07
## Residual 0.5415 0.7359
## Number of obs: 2000, groups: person, 500
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.559e-01 5.531e-02 4.595e+02 10.05 <2e-16 ***
## time 1.051e+00 3.491e-02 4.995e+02 30.11 <2e-16 ***
## heal_consc 3.002e-01 2.210e-02 1.327e+03 13.59 <2e-16 ***
## parent_heal 6.544e-01 5.366e-02 5.000e+02 12.20 <2e-16 ***
## steps 8.854e-01 5.493e-02 5.002e+02 16.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) time hl_cns prnt_h
## time -0.225
## heal_consc -0.001 0.007
## parent_heal 0.080 0.000 0.002
## steps -0.131 0.000 -0.017 -0.072
m5<-lmer(phy_health~time+heal_consc+parent_heal+steps+
time*parent_heal+
(time|person),df,REML=F)
anova(m4, m5)
m6<-lmer(phy_health~time+heal_consc+parent_heal+steps+
time*parent_heal+time*steps+
(time|person),df,REML=F)
anova(m5, m6)
summary(m6)
## Linear mixed model fit by maximum likelihood . t-tests use
## Satterthwaite's method [lmerModLmerTest]
## Formula: phy_health ~ time + heal_consc + parent_heal + steps + time *
## parent_heal + time * steps + (time | person)
## Data: df
##
## AIC BIC logLik deviance df.resid
## 6173.7 6235.3 -3075.9 6151.7 1989
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.12316 -0.52340 0.00693 0.48509 3.09613
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## person (Intercept) 1.0786 1.0386
## time 0.2031 0.4506 0.11
## Residual 0.5416 0.7359
## Number of obs: 2000, groups: person, 500
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.704e-01 5.463e-02 4.999e+02 10.44 <2e-16 ***
## time 1.010e+00 2.525e-02 4.992e+02 40.01 <2e-16 ***
## heal_consc 2.968e-01 2.136e-02 1.461e+03 13.89 <2e-16 ***
## parent_heal 5.605e-01 5.440e-02 4.999e+02 10.30 <2e-16 ***
## steps 7.166e-01 5.569e-02 5.002e+02 12.87 <2e-16 ***
## time:parent_heal 2.630e-01 2.514e-02 4.992e+02 10.46 <2e-16 ***
## time:steps 4.734e-01 2.574e-02 4.994e+02 18.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) time hl_cns prnt_h steps tm:pr_
## time -0.165
## heal_consc -0.001 0.007
## parent_heal 0.082 -0.013 0.002
## steps -0.134 0.022 -0.019 -0.072
## tim:prnt_hl -0.014 0.082 -0.001 -0.165 0.012
## time:steps 0.022 -0.134 0.017 0.012 -0.165 -0.072
A model-building approach is used to arrive at the final model. First, the time variable is added as a linear predictor with the same coefficient for all subjects; next, a series of models are tested with either or both of random intercept and slope of the measurement occasion variable, and both are included in the model (unconditional growth model); then the time-varying covariate health consciousness is added; after which, the subject-level (time-invariant) predictors parental health status and average steps taken per day were added; finally, the interaction of the time variable with the subject-level predictors parental health status and average steps taken per day are added to the model. All effects and interactions are significant at p < .001, likelihood ratio tests were done in each model-building step through comparing models with and without certain effect. See detailed model testing and comparison above.
In the final model, the regression coefficient for time 1.01 and significant, it means that with each scale points higher on time, physical health is expected to increase by 1.01 scale points; the regression coefficient for health consciousness .3 and significant, it means that with each scale points higher on health consciousness, physical health is expected to increase by .3 scale points. In our multilevel model, the regression coefficient for time varies across individuals, the varying regression slopes for time are assumed to follow a normal distribution. The variance of this distribution is estimated as .2. The regression coefficient for parental health is .56, it means that for each unit increase of parental health score, the physical health of the person goes up by .56. The variance of this distribution is estimated as .2. The regression coefficient for average daily steps is .47, it means that for each unit increase of steps, the physical health of the person goes up by .47. The cross-level interaction between time and parental health shows how parental health status moderate the effect of time on individual physical health. The regression coefficient for this cross-level interaction is .26, it can be interpreted that parental health significantly affects the correlation between time and individual physical health. The cross-level interaction between time and steps shows how steps moderate the effect of time on individual physical health. The regression coefficient for this cross-level interaction is .47, it can be interpreted that steps significantly affects the correlation between time and individual physical health. The covariance between the initial status and the growth rate in the final model is positive (.05); students who start with a relatively high value of their physical health score increase their physical health faster than the other students, it seems that physical health is self-perpetuating. Note that before the interaction terms were added, this correlation is slightly negative (-. 07), which means the subject-level predictors really influenced the correlation between intercept and slope of time. The deviation went down to 6151.7 from 8937.8 in the null model. The estimated of τ00 is interpreted as the variance of mean for each individual around the overall mean physical health score, in our case the subject level variance of mean is 1.08. The estimate of residual is the variance of each measurement of physical health score around the individual’s mean score, in our case the observation level variance is .54.
#write results to table
htmlreg(list(m_null, m6),
file='texreg.doc',
single.row = T,
stars = numeric(0),
caption = "",
custom.note = "null model = intercept only,
final model =
")
## The table was written to the file 'texreg.doc'.
Model summary table with parameter estimation and significance level