When I am trying to write down the statistical details of multilevel LCA in PROC LCA
, I found an important information that actually make me feel all the analyses about mixed effect LCA we have been conducting previously were somehow misleading.
In the PROC LCA
user’s guideline (https://methodology.psu.edu/sites/default/files/software/proclcalta/proc_lca_lta_1-3-2-1_users_guide.pdf), if you go to Page 10, last sentence of the top paragraph. It says “Clustering is ignored for estimation purposes, but is taken into account in calculating standard errors by using a ‘robust’ or ‘sandwich’ style covariance estimate.” Which contradicts with what was described in page 21 about the function of CLUSTER
option: “This statement tells PROC LCA
that the subjects are not independent random draws, but are nested within clusters such as schools or classrooms…”
Therefore, I ran two SAS PROC LCA
programmes (one with the CLUSTER ID
option, one without) and compared the outputs from the two programmes. If the data structure was considered properly, the two programmes should have given me two different estimates. However, it turned out that they produced identical parameter estimates. This meant that the clustered data structure was ignored in SAS
, even if we used the CLUSTER ID
statement, the observations were considered as independent in the previous calculations. I was really disappointed, and sincerely sorry about that. But they were not in vain, because when digging into the methodology, conducting LCA by ingoring the data structure should be the first step when someone perform a Multilevel LCA.
According to a multilevel LCA (MLCA) example paper: Multilevel latent class analysis: An application of adolescent smoking typologies with individual and contextual predictors:
“MLCA accounts for the nested structure of the data by allowing latent class intercepts to vary across Level 2 units and thereby examining if and how Level 2 units influence the Level 1 latent classes. These random intercepts allow the probability of membership in a particular Level 1 latent class to vary across Level 2 units (e.g., communities).”
In their article, they wanted to identify the smoking pattern for individuals while the individuals were nested within the communities. So they applied MLCA to identify smoking pattern for their study sample, and then within the MLCA framework, communities were also classified by the patterns of smokers within the community:
This gave me an idea of “how to deal with the non-consistent eaters problem” that we were concerning about. Because we don’t need to. A proper MLCA does it for us.
In MLCA, separate latent class models are specified for level 1 (in NDNS data, the observations) and level 2 (in NDNS data, the individuals). So our data from NDNS actually provided valuable data about how adults in the UK eat throughout the day for 4 days. We can use the data to capture the carbohydrates (or whatever else we are interested) eating time patterns from these observations, and then based on the carb eating patterns calculated by the model, people that eated similar across four days should be regrouped. So there should be two-steps of regrouping.
I have done a MLCA that specified 3 latent classes in level 1 (observations) and two latent classes in level 2 (individuals) using Mplus 7.4, and the summary of analysis as well as the results are shown below:
Level 1 latent classes are called within classes (CW), level 2 latent classes are called between classes (CB);
Univariate proportions and counts for the responses about carborhydrates consumption: (Note that H0-H23 indicate the hour of the day)
UNIVARIATE PROPORTIONS AND COUNTS FOR CATEGORICAL VARIABLES
H0 proportion counts
Not eating 0.974 23838.000
< 50% energy 0.015 364.000
>= 50% energy 0.011 281.000
H1
Not eating 0.988 24190.000
< 50% energy 0.006 150.000
>= 50% energy 0.006 143.000
H2
Not eating 0.992 24295.000
< 50% energy 0.004 89.000
>= 50% energy 0.004 99.000
H3
Not eating 0.993 24315.000
< 50% energy 0.003 77.000
>= 50% energy 0.004 91.000
H4
Not eating 0.992 24284.000
< 50% energy 0.003 73.000
>= 50% energy 0.005 126.000
H5
Not eating 0.986 24143.000
< 50% energy 0.005 125.000
>= 50% energy 0.009 215.000
H6
Not eating 0.869 21265.000
< 50% energy 0.051 1253.000
>= 50% energy 0.080 1965.000
H7
Not eating 0.678 16599.000
< 50% energy 0.095 2315.000
>= 50% energy 0.227 5569.000
H8
Not eating 0.623 15248.000
< 50% energy 0.118 2881.000
>= 50% energy 0.260 6354.000
H9
Not eating 0.660 16149.000
< 50% energy 0.117 2872.000
>= 50% energy 0.223 5462.000
H10
Not eating 0.630 15428.000
< 50% energy 0.154 3769.000
>= 50% energy 0.216 5286.000
H11
Not eating 0.723 17706.000
< 50% energy 0.121 2952.000
>= 50% energy 0.156 3825.000
H12
Not eating 0.600 14683.000
< 50% energy 0.222 5431.000
>= 50% energy 0.178 4369.000
H13
Not eating 0.501 12262.000
< 50% energy 0.277 6778.000
>= 50% energy 0.222 5443.000
H14
Not eating 0.738 18061.000
< 50% energy 0.129 3163.000
>= 50% energy 0.133 3259.000
H15
Not eating 0.676 16548.000
< 50% energy 0.144 3527.000
>= 50% energy 0.180 4408.000
H16
Not eating 0.696 17045.000
< 50% energy 0.146 3581.000
>= 50% energy 0.158 3857.000
H17
Not eating 0.685 16779.000
< 50% energy 0.189 4638.000
>= 50% energy 0.125 3066.000
H18
Not eating 0.577 14134.000
< 50% energy 0.276 6750.000
>= 50% energy 0.147 3599.000
H19
Not eating 0.621 15207.000
< 50% energy 0.242 5913.000
>= 50% energy 0.137 3363.000
H20
Not eating 0.619 15144.000
< 50% energy 0.227 5546.000
>= 50% energy 0.155 3793.000
H21
Not eating 0.641 15690.000
< 50% energy 0.199 4871.000
>= 50% energy 0.160 3922.000
H22
Not eating 0.752 18403.000
< 50% energy 0.132 3242.000
>= 50% energy 0.116 2838.000
H23
Not eating 0.921 22548.000
< 50% energy 0.043 1062.000
>= 50% energy 0.036 873.000
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON THEIR MOST LIKELY LATENT CLASS PATTERN
Class Counts and Proportions
Latent Class
Pattern
count proportion
1 1 4658 0.19025
1 2 4190 0.17114
1 3 6037 0.24658
2 1 473 0.01932
2 2 4761 0.19446
2 3 4364 0.17825
FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE
BASED ON THEIR MOST LIKELY LATENT CLASS PATTERN
Latent Class
Variable Class
count proportion
CB 1 14885 0.60797
2 9598 0.39203
CW 1 5131 0.20957
2 8951 0.36560
3 10401 0.42483
So, if what we have defined in the level 1 classes are correct then in the individual level, there are two types of people, class 1 and class 2. Individuals in class 1 have about evenly distributed probabilities of carbohydrate eating pattern, while individuals in class 2 have very low possiblity (4.9%) of eating a big breakfast.
Where:
Latent Class CB CW
Pattern No. Class Class
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3