You will need the following packages to succesfully complete this assignment:
Download file “LGM.dat”. We use data on the self-reported well-being of individuals in treatment (N=700) measured on four (equally spaced) time points T1-T4. The data set contains well-being scores on 4 timepoints and SES for each participant.
Now let’s change the variable names in our dataset:
## T1WB T2WB T3WB T4WB SES
## 1 0.715726 0.782026 0.904950 -0.705383 -0.378137
## 2 1.908699 4.660801 8.724440 12.484368 2.390625
## 3 0.148017 2.102866 3.972971 6.013007 -0.547830
## 4 2.734165 2.936197 4.505138 7.053811 0.642068
## 5 -0.394722 -0.105100 0.713551 1.655679 0.092867
## 6 -1.825596 -1.295102 -3.278919 -2.920580 -0.798527
a.) Run a repeated measures ANOVA and see if there is an effect of time on well-being. Report and interpret all results (including plots)
Before we actually run our RM ANOVA, let’s first convert our data from wide format to long format. In R, there are a variety of different ways to do this (I would recommend checking out the Tidyr Tutorial for some awesome ways to re-shape your data! However, for this assignment, we will just be using the basic ‘reshape’ function that comes with the ‘base R’ functions.
Because long data requires some sort of ‘participant ID’ or participant number of some sorts, I first went ahead and manually added one to the dataset by adding a column with numbers 1-700 (which corresponds to our total number of participants).
Now if we look at our data, we can see that each individual now has an assigned ID number called ‘ID.’
## T1WB T2WB T3WB T4WB SES ID
## 1 0.715726 0.782026 0.904950 -0.705383 -0.378137 1
## 2 1.908699 4.660801 8.724440 12.484368 2.390625 2
## 3 0.148017 2.102866 3.972971 6.013007 -0.547830 3
## 4 2.734165 2.936197 4.505138 7.053811 0.642068 4
## 5 -0.394722 -0.105100 0.713551 1.655679 0.092867 5
## 6 -1.825596 -1.295102 -3.278919 -2.920580 -0.798527 6
Now we can go ahead and actually change our data into long format fairly easily with the ‘reshape’ command.
long.data<-reshape(data, #name of our current dataset
direction = "long", #going to wide-->long
varying = list(names(data)[1:4]), #names of sets of variables in the wide format that correspond to single variables in long format (the ‘time-varying’ variable). In this case, variables T1-T4!
v.names = "WellBeing", #The name of the variable (we choose the name) that we are creating that corresponds with our 'varying' variable(s) we defined above. In our dataset, we had T1, T2, T3, T4 that represent well-being at each of the time points, but will now all be in one column of the data that is called 'WellBeing'
idvar = "ID", #participant specific grouping ID vbl
timevar = "Time") #An arbitrary label that we create for our time variable (IV)Also, let’s go ahead and tell R that our new created variable ‘Time’ is categorical. Remember to use our new dataset called ‘long.data.’
RM.model<-ezANOVA(data = long.data, dv = WellBeing, wid = ID,
within = Time, detailed = TRUE, type = 2, return_aov=T)
RM.model$ANOVA## Effect DFn DFd SSn SSd F p p<.05
## 1 (Intercept) 1 699 11905.999 10845.385 767.3580 1.457118e-114 *
## 2 Time 3 2097 3421.241 2629.035 909.6293 0.000000e+00 *
## ges
## 1 0.4691017
## 2 0.2024923
If we want to extract more output from our above anaylyses, we can pull the ‘aov’ output out of the model like so:
RM.model$`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 2 Time 0.5875011 4.850716e-224 * 0.5888727 1.48275e-224
## p[HF]<.05
## 2 *##
## Call:
## aov(formula = formula(aov_formula), data = data)
##
## Grand Mean: 2.062072
##
## Stratum 1: ID
##
## Terms:
## Residuals
## Sum of Squares 10845.39
## Deg. of Freedom 699
##
## Residual standard error: 3.938981
##
## Stratum 2: ID:Time
##
## Terms:
## Time Residuals
## Sum of Squares 3421.241 2629.035
## Deg. of Freedom 3 2097
##
## Residual standard error: 1.119693
## Estimated effects may be unbalanced
ezPlot(data = long.data, dv = WellBeing,
within = Time,
wid = ID,
x = Time,
y_lab = 'Mean of Well-Being') #title of y axis## Warning: Converting "ID" to factor for ANOVA.
The df (i.e., DFn in the output) for the effect of Time is simply k − 1, where k is the number of levels of the independent variable, in this case 4. The error df (DFd in the output) is (n − 1)(k −1), where n is the number of participants, and k is as before.
Based on the results of the RM ANOVA, we can see that the F statistic was significant at p < .05 and can therefore conclude that there was a significant difference between the four time intervals in their capacity to predict well-being. However, this main test does not tell us which time frame differed from each other.
If we did want to find out which time differs, we could run post-hoc analyses using something like Tukey’s test. If we wanted to do this for fun, we could use the following code:
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(Time) 3 3421 1140.4 909.63 <2e-16 ***
## as.factor(ID) 699 10845 15.5 12.38 <2e-16 ***
## Residuals 2097 2629 1.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = WellBeing ~ as.factor(Time) + as.factor(ID), data = long.data)
##
## $`as.factor(Time)`
## diff lwr upr p adj
## T2-T1 1.0138324 0.8599522 1.167713 0
## T3-T1 1.9667630 1.8128828 2.120643 0
## T4-T1 2.9777330 2.8238528 3.131613 0
## T3-T2 0.9529306 0.7990504 1.106811 0
## T4-T2 1.9639006 1.8100205 2.117781 0
## T4-T3 1.0109700 0.8570898 1.164850 0
Looks like all of our ‘time’ comparisons are significantly different from one another. This is not surprising if we examine the means of well-being over time, which appears to be linearly increasing
## [1] 0.5724903
## [1] 1.586323
## [1] 2.539253
## [1] 3.550223
Now let’s see how this can be implemented within a latent variable modeling framework using latent growth curve analysis.
b) Run a latent growth curve model; Show your syntax and your relevant output (Results and Graph) on your document.
Similar to previous assignments, we will be using ‘lavaan’ to fit our LGM, but this time with the ‘growth’ command. With the LGM model, we will also be going back to our initial dataset called ‘data’ that is in the wide format!
Remember for the intercept LV that we manually define, all of our time-point variables (i.e., T1WB-T4WB) are multipled by ‘1’ and set to load on our defined intercept factor.
#fitting intercept only LCM
int <- 'I =~ 1*T1WB + 1*T2WB + 1*T3WB + 1*T4WB
'
fit.int <- growth(int, data=data, mimic='mplus')Toggle Output of Model
## lavaan 0.6-3 ended normally after 28 iterations
##
## Optimization method NLMINB
## Number of free parameters 6
##
## Number of observations 700
## Number of missing patterns 1
##
## Estimator ML
## Model Fit Test Statistic 2455.644
## Degrees of freedom 8
## P-value (Chi-square) 0.000
##
## Model test baseline model:
##
## Minimum Function Test Statistic 2984.372
## Degrees of freedom 6
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 0.178
## Tucker-Lewis Index (TLI) 0.384
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -5662.531
## Loglikelihood unrestricted model (H1) -4434.709
##
## Number of free parameters 6
## Akaike (AIC) 11337.061
## Bayesian (BIC) 11364.368
## Sample-size adjusted Bayesian (BIC) 11345.316
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.661
## 90 Percent Confidence Interval 0.639 0.683
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.678
##
## Parameter Estimates:
##
## Information Observed
## Observed information based on Hessian
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## I =~
## T1WB 1.000 1.789 0.750
## T2WB 1.000 1.789 1.007
## T3WB 1.000 1.789 0.754
## T4WB 1.000 1.789 0.564
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 0.000 0.000 0.000
## .T2WB 0.000 0.000 0.000
## .T3WB 0.000 0.000 0.000
## .T4WB 0.000 0.000 0.000
## I 1.574 0.069 22.930 0.000 0.880 0.880
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 2.485 0.134 18.530 0.000 2.485 0.437
## .T2WB -0.043 0.049 -0.881 0.378 -0.043 -0.014
## .T3WB 2.424 0.138 17.514 0.000 2.424 0.431
## .T4WB 6.848 0.369 18.577 0.000 6.848 0.681
## I 3.202 0.185 17.320 0.000 1.000 1.000
Now we will actually go ahead and add the ‘slope’ factor, which is specified similarly, except the first time point is multipled by ‘0’, with each subsequent time-point increasing by ‘1.’ (This represents linear change).
#fitting linear LCM
lin.LGM <- '
I =~ 1*T1WB + 1*T2WB + 1*T3WB + 1*T4WB
S =~ 0*T1WB + 1*T2WB + 2*T3WB + 3*T4WB'
fit.lin <- growth(lin.LGM, data=data, mimic='mplus')Toggle Output of Linear LGM
## lavaan 0.6-3 ended normally after 28 iterations
##
## Optimization method NLMINB
## Number of free parameters 9
##
## Number of observations 700
## Number of missing patterns 1
##
## Estimator ML
## Model Fit Test Statistic 3.300
## Degrees of freedom 5
## P-value (Chi-square) 0.654
##
## Model test baseline model:
##
## Minimum Function Test Statistic 2984.372
## Degrees of freedom 6
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.001
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4436.359
## Loglikelihood unrestricted model (H1) -4434.709
##
## Number of free parameters 9
## Akaike (AIC) 8890.717
## Bayesian (BIC) 8931.677
## Sample-size adjusted Bayesian (BIC) 8903.100
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent Confidence Interval 0.000 0.042
## P-value RMSEA <= 0.05 0.979
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.010
##
## Parameter Estimates:
##
## Information Observed
## Observed information based on Hessian
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## I =~
## T1WB 1.000 1.092 0.835
## T2WB 1.000 1.092 0.609
## T3WB 1.000 1.092 0.466
## T4WB 1.000 1.092 0.367
## S =~
## T1WB 0.000 0.000 0.000
## T2WB 1.000 0.663 0.370
## T3WB 2.000 1.326 0.566
## T4WB 3.000 1.989 0.669
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## I ~~
## S 0.521 0.039 13.225 0.000 0.720 0.720
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 0.000 0.000 0.000
## .T2WB 0.000 0.000 0.000
## .T3WB 0.000 0.000 0.000
## .T4WB 0.000 0.000 0.000
## I 0.579 0.047 12.271 0.000 0.530 0.530
## S 0.988 0.028 35.362 0.000 1.491 1.491
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 0.517 0.049 10.472 0.000 0.517 0.302
## .T2WB 0.534 0.036 14.823 0.000 0.534 0.167
## .T3WB 0.463 0.042 11.008 0.000 0.463 0.084
## .T4WB 0.565 0.075 7.502 0.000 0.565 0.064
## I 1.192 0.088 13.617 0.000 1.000 1.000
## S 0.440 0.030 14.538 0.000 1.000 1.000
##
## R-Square:
## Estimate
## T1WB 0.698
## T2WB 0.833
## T3WB 0.916
## T4WB 0.936
Compare models:
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## fit.lin 5 8890.7 8931.7 3.2997
## fit.int 8 11337.1 11364.4 2455.6436 2452.3 3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The R2 values increase over time for well-being, with 93.6% of the variance at time 4 explained by the slope and interept.
Now let’s see if there is a quadratic effect present in the data.
The syntax for this model is identical to the linear LGM, except now we add a ‘quadratic’ effect by creating a latent variable and defining it by our four well-being time variables, except now they values are squared (so 2 becomes 4, 3 becomes 9, etc.).
#fitting quadratic LCM
quad <- '
I =~ 1*T1WB + 1*T2WB + 1*T3WB + 1*T4WB
Lin.S =~ 0*T1WB + 1*T2WB + 2*T3WB + 3*T4WB #linear slope
Q =~ 0*T1WB + 1*T2WB + 4*T3WB + 9*T4WB #quadratic effect
'
fit.quad <- growth(quad, data=data, mimic='mplus')Toggle Output for Quadratic LGM
## lavaan 0.6-3 ended normally after 68 iterations
##
## Optimization method NLMINB
## Number of free parameters 13
##
## Number of observations 700
## Number of missing patterns 1
##
## Estimator ML
## Model Fit Test Statistic 1.015
## Degrees of freedom 1
## P-value (Chi-square) 0.314
##
## Model test baseline model:
##
## Minimum Function Test Statistic 2984.372
## Degrees of freedom 6
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4435.216
## Loglikelihood unrestricted model (H1) -4434.709
##
## Number of free parameters 13
## Akaike (AIC) 8896.433
## Bayesian (BIC) 8955.597
## Sample-size adjusted Bayesian (BIC) 8914.319
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.005
## 90 Percent Confidence Interval 0.000 0.100
## P-value RMSEA <= 0.05 0.634
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.004
##
## Parameter Estimates:
##
## Information Observed
## Observed information based on Hessian
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## I =~
## T1WB 1.000 1.094 0.832
## T2WB 1.000 1.094 0.618
## T3WB 1.000 1.094 0.465
## T4WB 1.000 1.094 0.368
## Lin.S =~
## T1WB 0.000 0.000 0.000
## T2WB 1.000 0.695 0.392
## T3WB 2.000 1.390 0.591
## T4WB 3.000 2.085 0.701
## Q =~
## T1WB 0.000 0.000 0.000
## T2WB 1.000 0.157 0.089
## T3WB 4.000 0.628 0.267
## T4WB 9.000 1.414 0.475
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## I ~~
## Lin.S 0.512 0.176 2.907 0.004 0.673 0.673
## Q 0.008 0.042 0.182 0.855 0.045 0.045
## Lin.S ~~
## Q -0.039 0.040 -0.989 0.323 -0.359 -0.359
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 0.000 0.000 0.000
## .T2WB 0.000 0.000 0.000
## .T3WB 0.000 0.000 0.000
## .T4WB 0.000 0.000 0.000
## I 0.579 0.049 11.740 0.000 0.529 0.529
## Lin.S 0.990 0.049 20.123 0.000 1.425 1.425
## Q -0.000 0.014 -0.017 0.986 -0.002 -0.002
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 0.531 0.157 3.389 0.001 0.531 0.307
## .T2WB 0.474 0.072 6.577 0.000 0.474 0.151
## .T3WB 0.528 0.095 5.573 0.000 0.528 0.095
## .T4WB 0.211 0.352 0.601 0.548 0.211 0.024
## I 1.198 0.167 7.180 0.000 1.000 1.000
## Lin.S 0.483 0.182 2.651 0.008 1.000 1.000
## Q 0.025 0.017 1.431 0.152 1.000 1.000
##
## R-Square:
## Estimate
## T1WB 0.693
## T2WB 0.849
## T3WB 0.905
## T4WB 0.976
Compare the linear only LGM versus quadractic LGM:
## Chi Square Difference Test
##
## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
## fit.quad 1 8896.4 8955.6 1.0151
## fit.lin 5 8890.7 8931.7 3.2997 2.2846 4 0.6836
If we want to add a co-variate into the model, we simply modify our original code by regressing SES on both the ‘slope’ factor and ‘intercept’ factor.
#fitting linear LCM with co-variate, SES
lin.LGMcov <- '
I =~ 1*T1WB + 1*T2WB + 1*T3WB + 1*T4WB
S =~ 0*T1WB + 1*T2WB + 2*T3WB + 3*T4WB
I ~ SES #intercept regressed on SES (co-variate)
S ~ SES #slope regressed on SES (co-variate)
'
fit.lincov <- growth(lin.LGMcov, data=data)Toggle Output
## lavaan 0.6-3 ended normally after 27 iterations
##
## Optimization method NLMINB
## Number of free parameters 11
##
## Number of observations 700
##
## Estimator ML
## Model Fit Test Statistic 6.278
## Degrees of freedom 7
## P-value (Chi-square) 0.508
##
## Model test baseline model:
##
## Minimum Function Test Statistic 3393.629
## Degrees of freedom 10
## P-value 0.000
##
## User model versus baseline model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4233.219
## Loglikelihood unrestricted model (H1) -4230.080
##
## Number of free parameters 11
## Akaike (AIC) 8488.438
## Bayesian (BIC) 8538.500
## Sample-size adjusted Bayesian (BIC) 8503.573
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent Confidence Interval 0.000 0.044
## P-value RMSEA <= 0.05 0.978
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.011
##
## Parameter Estimates:
##
## Information Expected
## Information saturated (h1) model Structured
## Standard Errors Standard
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## I =~
## T1WB 1.000 1.088 0.830
## T2WB 1.000 1.088 0.608
## T3WB 1.000 1.088 0.464
## T4WB 1.000 1.088 0.366
## S =~
## T1WB 0.000 0.000 0.000
## T2WB 1.000 0.662 0.370
## T3WB 2.000 1.325 0.565
## T4WB 3.000 1.987 0.668
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## I ~
## SES 0.439 0.045 9.715 0.000 0.404 0.395
## S ~
## SES 0.496 0.022 23.052 0.000 0.749 0.733
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .I ~~
## .S 0.316 0.028 11.171 0.000 0.701 0.701
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 0.000 0.000 0.000
## .T2WB 0.000 0.000 0.000
## .T3WB 0.000 0.000 0.000
## .T4WB 0.000 0.000 0.000
## .I 0.601 0.044 13.558 0.000 0.552 0.552
## .S 1.013 0.021 48.010 0.000 1.529 1.529
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .T1WB 0.534 0.049 10.831 0.000 0.534 0.311
## .T2WB 0.530 0.036 14.858 0.000 0.530 0.166
## .T3WB 0.462 0.040 11.603 0.000 0.462 0.084
## .T4WB 0.559 0.068 8.220 0.000 0.559 0.063
## .I 0.998 0.078 12.852 0.000 0.844 0.844
## .S 0.203 0.018 10.979 0.000 0.462 0.462
##
## R-Square:
## Estimate
## T1WB 0.689
## T2WB 0.834
## T3WB 0.916
## T4WB 0.937
## I 0.156
## S 0.538
-SES has a greater impact of the overall rate of change (i.e., the slope factor), with a standardized coefficient of 0.733.
-SES also has a positive association with where individual’s score at T1 for well-being, with a standardized coefficient of 0.395.
The ANOVA tested whether there was a significant difference between groups for time predicting well-being. Our post-hoc analyses also revealed that scores on well-being at all time-points were significantly different from one another. This makes sense when we plot our results as well, because we can see the linear trajectory of average score of well-being over time. This makes sense given that we also found a linear effect for well-being over time in the LGM framework, with the R2 values also steadily increasing, with no evidence for a quadratic effect. Thus, results are consistent, however, an advantage with LGM is that we can partition out measurement error and obtain more specific growth information that the RM ANOVA in unable to provide. The LGM also provides more information overall and model fit indices compared to the RM ANOVA.