1 Data description

1.1 Number of subjects per survey year

with(NDNS, tab1(SurveyYear, graph = FALSE, decimal = 2))
## SurveyYear : 
##             Frequency Percent Cum. percent
## NDNS Year 1       801   13.01        13.01
## NDNS Year 2       812   13.19        26.21
## NDNS Year 3       782   12.71        38.91
## NDNS Year 4      1055   17.14        56.05
## NDNS Year 5       625   10.15        66.21
## NDNS Year 6       663   10.77        76.98
## NDNS Year 7       703   11.42        88.40
## NDNS Year 8       714   11.60       100.00
##   Total          6155  100.00       100.00

1.2 Number of subjects per servey year by gender where Men == 1, Women == 2

with(NDNS, tabpct(SurveyYear, Sex,  graph = FALSE, decimal = 2))
## 
## Original table 
##              Sex
## SurveyYear        1     2  Total
##   NDNS Year 1   336   465    801
##   NDNS Year 2   350   462    812
##   NDNS Year 3   339   443    782
##   NDNS Year 4   418   637   1055
##   NDNS Year 5   249   376    625
##   NDNS Year 6   254   409    663
##   NDNS Year 7   321   382    703
##   NDNS Year 8   270   444    714
##   Total        2537  3618   6155
## 
## Row percent 
##              Sex
## SurveyYear          1       2  Total
##   NDNS Year 1     336     465    801
##                (41.9)  (58.1)  (100)
##   NDNS Year 2     350     462    812
##                (43.1)  (56.9)  (100)
##   NDNS Year 3     339     443    782
##                (43.4)  (56.6)  (100)
##   NDNS Year 4     418     637   1055
##                (39.6)  (60.4)  (100)
##   NDNS Year 5     249     376    625
##                (39.8)  (60.2)  (100)
##   NDNS Year 6     254     409    663
##                (38.3)  (61.7)  (100)
##   NDNS Year 7     321     382    703
##                (45.7)  (54.3)  (100)
##   NDNS Year 8     270     444    714
##                (37.8)  (62.2)  (100)
## 
## Column percent 
##              Sex
## SurveyYear        1        %     2        %
##   NDNS Year 1   336  (13.24)   465  (12.85)
##   NDNS Year 2   350  (13.80)   462  (12.77)
##   NDNS Year 3   339  (13.36)   443  (12.24)
##   NDNS Year 4   418  (16.48)   637  (17.61)
##   NDNS Year 5   249   (9.81)   376  (10.39)
##   NDNS Year 6   254  (10.01)   409  (11.30)
##   NDNS Year 7   321  (12.65)   382  (10.56)
##   NDNS Year 8   270  (10.64)   444  (12.27)
##   Total        2537    (100)  3618    (100)

1.3 Summary of their age

NDNS %>% 
  group_by(SurveyYear, Sex) %>% 
  summarise(N = n(), MeanAge = mean(Age), SDAge = sd(Age), minAge = min(Age), maxAge = max(Age))
## # A tibble: 16 x 7
## # Groups:   SurveyYear [?]
##    SurveyYear  Sex       N MeanAge SDAge minAge maxAge
##    <chr>       <chr> <int>   <dbl> <dbl>  <dbl>  <dbl>
##  1 NDNS Year 1 1       336    49.9  17.3     19     86
##  2 NDNS Year 1 2       465    49.2  17.8     19     94
##  3 NDNS Year 2 1       350    48.9  17.3     19     96
##  4 NDNS Year 2 2       462    50.2  17.9     19     92
##  5 NDNS Year 3 1       339    48.5  16.9     19     87
##  6 NDNS Year 3 2       443    49.6  18.0     19     93
##  7 NDNS Year 4 1       418    51.4  17.0     19     90
##  8 NDNS Year 4 2       637    48.8  17.0     19     94
##  9 NDNS Year 5 1       249    51.7  16.6     19     93
## 10 NDNS Year 5 2       376    49.4  17.8     19     92
## 11 NDNS Year 6 1       254    51.1  17.7     19     93
## 12 NDNS Year 6 2       409    49.0  18.1     19     95
## 13 NDNS Year 7 1       321    51.5  18.3     19     92
## 14 NDNS Year 7 2       382    49.8  18.3     19     89
## 15 NDNS Year 8 1       270    50.4  16.8     19     90
## 16 NDNS Year 8 2       444    49.9  17.7     19     94

1.4 Distribution of the dietary data (by Day of Week)

1.4.1 Day1

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSday1_4.Rdata")
with(dta_d1_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          726   11.80        11.80
## Tuesday         847   13.77        25.56
## Wednesday       814   13.23        38.79
## Thursday       1082   17.58        56.38
## Friday         1013   16.46        72.84
## Saturday        848   13.78        86.62
## Sunday          823   13.38       100.00
##   Total        6153  100.00       100.00

1.4.2 Day2

with(dta_d2_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          822   13.36        13.36
## Tuesday         727   11.82        25.17
## Wednesday       848   13.78        38.96
## Thursday        812   13.20        52.15
## Friday         1081   17.57        69.72
## Saturday       1015   16.50        86.22
## Sunday          848   13.78       100.00
##   Total        6153  100.00       100.00

1.4.3 Day3

with(dta_d3_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          846   13.75        13.75
## Tuesday         824   13.40        27.15
## Wednesday       725   11.79        38.94
## Thursday        850   13.82        52.76
## Friday          810   13.17        65.92
## Saturday       1080   17.56        83.48
## Sunday         1016   16.52       100.00
##   Total        6151  100.00       100.00

1.4.4 Day4

with(dta_d4_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          994   16.50        16.50
## Tuesday         833   13.82        30.32
## Wednesday       811   13.46        43.78
## Thursday        705   11.70        55.48
## Friday          830   13.77        69.25
## Saturday        792   13.14        82.39
## Sunday         1061   17.61       100.00
##   Total        6026  100.00       100.00

1.5 Distribution of the dietary data (by DayNo)

1.5.1 Monday

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSMon_Sun.Rdata")
with(dta_Mon_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             726   21.43        21.43
## 2             822   24.26        45.69
## 3             846   24.97        70.66
## 4             994   29.34       100.00
##   Total      3388  100.00       100.00

1.5.2 Tuesday

with(dta_Tue_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             847   26.21        26.21
## 2             727   22.50        48.72
## 3             824   25.50        74.22
## 4             833   25.78       100.00
##   Total      3231  100.00       100.00

1.5.3 Wednesday

with(dta_Wed_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             814   25.45        25.45
## 2             848   26.52        51.97
## 3             725   22.67        74.64
## 4             811   25.36       100.00
##   Total      3198  100.00       100.00

1.5.4 Thursday

with(dta_Thurs_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1            1082   31.37        31.37
## 2             812   23.54        54.91
## 3             850   24.64        79.56
## 4             705   20.44       100.00
##   Total      3449  100.00       100.00

1.5.5 Friday

with(dta_Fri_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1            1013   27.13        27.13
## 2            1081   28.95        56.08
## 3             810   21.69        77.77
## 4             830   22.23       100.00
##   Total      3734  100.00       100.00

1.5.6 Saturday

with(dta_Sat_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             848   22.70        22.70
## 2            1015   27.18        49.88
## 3            1080   28.92        78.80
## 4             792   21.20       100.00
##   Total      3735  100.00       100.00

1.5.7 Sunday

with(dta_Sun_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             823   21.96        21.96
## 2             848   22.63        44.58
## 3            1016   27.11        71.69
## 4            1061   28.31       100.00
##   Total      3748  100.00       100.00

The problem of analysing data by day of the week would be that every subject only contributed 2-4 days’ data. Therefore, we cannot see one subject’s dietary change during a Mon-Sun week.

1.6 Dietary data by day

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSday1_4.Rdata")

vecid <- unique(dfs3$id)

vecid1<-unique(dta_day1$id) # n = 6153
vecid2<-unique(dta_day2$id) # n = 6153
vecid3<-unique(dta_day3$id) # n = 6151
vecid4<-unique(dta_day4$id) # n = 6026


setdiff(vecid, vecid1) # two subjects did not have day 1 data
## [1] 50506161 70908241
setdiff(vecid, vecid2) # two subjects did not have day 2 data
## [1] 31012251 40714261
setdiff(vecid, vecid3) # four subjects did not have day 3 data
## [1] 10914251 11205071 80702191 81210131
setdiff(vecid, vecid4) # 129 subjects did not have day 4 data
##   [1] 10112011 10701161 10702161 10707261 10906181 10910111 10914251
##   [8] 20106041 20116171 20202081 20205081 20301211 20307041 20405101
##  [15] 20509211 20602011 20615041 21002101 21011041 21107031 21113041
##  [22] 21211041 21211101 30113231 30205131 30205201 30402131 30404081
##  [29] 30411081 30417081 30603071 30605131 30609131 30708201 30709031
##  [36] 30906071 30906201 30907251 30912021 31110201 40101011 40104021
##  [43] 40109221 40116011 40214081 40221221 40315101 40402221 40410251
##  [50] 40504211 40506221 40516021 40710081 40710101 40714251 40714261
##  [57] 40803081 40803221 40808081 40814131 40816011 40902051 40904021
##  [64] 41012081 41016131 41202051 50104191 50105161 50306241 50310271
##  [71] 50501271 50504271 50710161 51002141 51002191 51004011 51102241
##  [78] 51203191 51205141 51208041 51209071 60202081 60202261 60206161
##  [85] 60310131 60313021 60405161 60508071 60606271 60808161 60909271
##  [92] 61013261 61102251 61109081 70113191 70302241 70305031 70309241
##  [99] 70311181 70311251 70407251 70613181 70703181 70714181 70802241
## [106] 70812251 70815241 71101061 71206191 80108061 80301061 80301281
## [113] 80302191 80308251 80312241 80405181 80405281 80410131 80611131
## [120] 80713281 80805191 81002251 81004251 81005191 81007061 81101221
## [127] 81110061 81110131 81203221

2 Definition of carbohydrates intake groups:

In this analysis, subjects have 2-4 days’ dietary data: their food during the participation were self-recorded. We were able to calculate their energy intake each hour, then we also calculated the percentage of energy coming from carbohydrates that contributed to each hour when there is food consumption recorded. Four categories were identified:

  1. Not eating;
  2. Eating low carbohydrate food (energy contribution less than or equal to 25%);
  3. Eating medium carbohydrate food (energy contribution between 26% and 75%);
  4. Eating high carbohydrate food (energy contribution higher or equal to 75%).

3 LCA analyses by day

3.1 Day 1

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d1_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d1_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d1_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d1_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d1_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d1_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d1_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d1_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

3.1.1 Model comparison and selection

Model Comparison. (Day 1, n = 6153)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -98828.36 6081 197800.7 198284.9 198056.1 198356.9 91153.99
2 -97963.22 6008 196216.4 197191.5 196730.7 197336.5 89423.72 0.788
3 -97484.48 5935 195405.0 196870.9 196178.2 197088.9 88466.24 0.747
4 -97120.79 5862 194823.6 196780.5 195855.8 197071.5 87738.86 0.72
5 -96753.21 5789 194234.4 196682.2 195525.5 197046.2 87003.69 0.679
6 -96474.32 5716 193822.6 196761.3 195372.7 197198.3 86445.91 0.679
7 -96163.55 5643 193347.1 196776.7 195156.0 197286.7 85824.37 0.743
8 -95938.79 5570 193043.6 196964.1 195111.5 197547.1 85374.85 0.741
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

3.1.2 Day 1 data (Fig. 7 classes)

3.2 Day 2

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d1_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d1_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d1_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d1_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d1_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d1_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d1_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d1_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  31.60514 mins

3.2.1 Model comparison and selection

Model Comparison. (Day 2, n = 6153)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -97465.15 6081 195074.3 195558.5 195329.7 195630.5 88514.23
2 -96552.25 6008 193394.5 194369.6 193908.8 194514.6 86688.42 0.816
3 -96039.99 5935 192516.0 193982.0 193289.2 194200.0 85663.90 0.737
4 -95629.78 5862 191841.6 193798.5 192873.7 194089.5 84843.50 0.694
5 -95306.17 5789 191340.3 193788.1 192631.4 194152.1 84196.26 0.719
6 -95032.54 5716 190939.1 193877.8 192489.1 194314.8 83649.01 0.683
7 -94784.05 5643 190588.1 194017.7 192397.1 194527.7 83152.03 0.718
8 -94546.10 5570 190258.2 194178.7 192326.1 194761.7 82676.12 0.71
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

3.2.2 Day 2 data (Fig. 7 classes)

3.3 Day 3

set.seed(01012)
max_II <- 100000
lc1 <- poLCA(f, data=dta_d3_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2 <- poLCA(f, data=dta_d3_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3 <- poLCA(f, data=dta_d3_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4 <- poLCA(f, data=dta_d3_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5 <- poLCA(f, data=dta_d3_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6 <- poLCA(f, data=dta_d3_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7 <- poLCA(f, data=dta_d3_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8 <- poLCA(f, data=dta_d3_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  31.65244 mins

3.3.1 Model comparison and selection

Model Comparison. (Day 3, n = 6151)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -96133.21 6079 192410.4 192894.6 192665.8 192966.6 86105.74
2 -95229.29 6006 190748.6 191723.6 191262.9 191868.6 84297.92 0.807
3 -94708.16 5933 189852.3 191318.2 190625.5 191536.2 83255.65 0.759
4 -94325.31 5860 189232.6 191189.4 190264.7 191480.4 82489.95 0.774
5 -93983.02 5787 188694.0 191141.7 189985.0 191505.7 81805.37 0.722
6 -93654.43 5714 188182.9 191121.4 189732.7 191558.4 81148.19 0.694
7 -93385.94 5641 187791.9 191221.3 189600.7 191731.3 80611.21 0.715
8 -93183.36 5568 187532.7 191453.0 189600.4 192036.0 80206.05 0.747
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

3.3.2 Day 3 data (Fig. 7 classes)

3.4 Day 4

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d4_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d4_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d4_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d4_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d4_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d4_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d4_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d4_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  27.12666 mins

3.4.1 Model comparison and selection

Model Comparison. (Day 4, n = 6026)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -92490.81 5954 185125.6 185608.3 185379.5 185680.3 81316.02
2 -91693.48 5881 183677.0 184649.0 184188.2 184794.0 79721.37 0.789
3 -91150.49 5808 182737.0 184198.4 183505.7 184416.4 78635.39 0.777
4 -90739.65 5735 182061.3 184012.1 183087.4 184303.1 77813.71 0.717
5 -90396.21 5662 181520.4 183960.6 182803.9 184324.6 77126.84 0.733
6 -90104.35 5589 181082.7 184012.3 182623.6 184449.3 76543.11 0.719
7 -89839.94 5516 180699.9 184118.8 182498.2 184628.8 76014.29 0.625
8 -89634.70 5443 180435.4 184343.7 182491.1 184926.7 75603.81 0.76
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

3.4.2 Day 4 data (Fig. 8 classes)

4 How to deal with within-individual variation?

  1. LTA (latent transit analysis): this method can help calculate the probabilities of transition from one class to another on condition of the previous (occasion’s) class. I don’t think we can apply this method since every subject in the NDNS data set has a different four days of Week:
    • For example subject A provided dietary data for Monday (day1), Tuesday (day2), Wednesday (day3), and Thursday (day4);
    • However, subject B provided dietary data on Thursday (day1), Friday (day2), Saturday (day3), and Sunday (day4);
    • Using day 1 (or any day from the four days) data alone would be not appropriate since different people started from different day, their day 1 might be weekday or weekends.
    • So using LTA analysis would be a good choice when all of the subjects were measured on the same day of the week. Here, in our case, it may not be a good option.
  2. LCA with random effect: this approach is actually using all of the data that we have, taking the personal variation into account. By saying random effect here, we mean in the data set, some observations are not independent from each other (every 2-4 observations are from an exact same person), in other words, they are nested within the person. Then we should tell software that it is the persons rather than the observations are independent from each other. This is more reasonable than enforcing a subject to stay in the same class for all four days, because it is certainly possible that one person can belong to a class for a few days, and jump to another class on the other day(s) within the survey.

Therefore, we chose the 2nd method in the analyses. This can be done by installing the LCA macro in the SAS program (https://methodology.psu.edu/downloads/proclcalta), and using the PROC LCA command.

5 Mixed effect LCA (or shall we call it multilevel LCA?) results

5.1 Input and Output example when fitting a three class LCA model

5.1.1 SAS code in running the three classes mixed effect LCA analysis on the NDNS data

PROC LCA DATA=store.NDNS OUTPOST=STORE.NDNS_PP3 OUTEST=STORE.NDNS_EST3
            OUTPARAM=STORE.NDNS_PARAM3; /* input data and output (results, estimation, posterior probabilities) dataset names */
NCLASS 3; /* number of classes fitting*/                                  
ID ID_DAY ID SEX DAYOFWEEK AGE; /* variable names want to keep in the output datasets */
ITEMS H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23;
CATEGORIES 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4;
CLUSTERS ID; /* this is the key, to specify ID as the cluster variable */ 
MAXITER 100000;
SEED 100000;
RUN;

5.1.2 SAS output of running the mixed effect LCA analysis

The following output shows that there are 24483 observations in the NDNS data set, the number of cluster is 6155 (which matches the sample size what we described in the description section). Smallest/largest cluster size is 2/4 indicating the number of observations within individual varies between 2 to 4.

Data Summary, Model Information, and Fit Statistics (EM Algorithm)



Number of subjects in dataset:       24483 
Number of subjects in analysis:      24483
Number of clusters in analysis:       6155 
Smallest cluster size:                   2 
Largest cluster size:                    4 

Number of measurement items:            24
Response categories per item:            4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
Number of groups in the data:            1
Number of latent classes:                3

The analysis includes clusters.
Cluster variable name:  id
Rho starting values were randomly generated (seed = 100000).

No parameter restrictions were specified (freely estimated).

The model converged in 298 iterations.

Maximum number of iterations: 100000
Convergence method: maximum absolute deviation (MAD)
Convergence criterion:  0.000001000

=============================================
Fit statistics:
=============================================

Log-likelihood:   -380275.38
G-squared:         275662.53
AIC:               276098.53
BIC:               277865.58
CAIC:              278083.58
Adjusted BIC:      277172.79
Entropy:                0.85
(Based on the pseudo-likelihood.)

5.1.3 Example of the posterior probabilities estimated by the model

Where:

  • POSTLC1: posterior probability of belonging to latent class 1 given the subject’s carbohydrate eating pattern on that day.
  • POSTLC2: posterior probability of belonging to latent class 2 given the subject’s carbohydrate eating pattern on that day.
  • POSTLC3: posterior probability of belonging to latent class 3 given the subject’s carbohydrate eating pattern on that day.
  • BEST: The latent class group assigned by the model (highest probability among POSTLC1, POSTLC2, and, POSTLC3).

5.2 Model comparison and selection

Model Comparison. (All data, n = 6155)
N of classes log-likelihood G^2 AIC BIC cAIC aBIC Entropy
1 -402141.9 319395.6 319539.6 320123.2 320195.2 319894.4
2 -382002.7 279117.1 279407.1 280582.4 280727.4 280121.6 0.918601999
3 -380275.4 275662.5 276098.5 277865.6 278083.6 277172.8 0.847683204
4 -378255.6 271622.9 272204.9 274563.7 274854.7 273638.9 0.817010712
5 -377170.5 269452.7 270180.7 273131.2 273495.2 271974.4 0.824733724
6 -376353.9 267819.5 268693.5 272235.7 272672.7 270846.9 0.852703371
7 -375440.1 265991.9 267011.9 271145.9 271655.9 269525.1 0.78651275
8 -374751.6 264614.9 265780.9 270506.5 271089.5 268653.8 0.80932745
9 -374596.3 264304.4 265616.4 270933.7 271589.7 268849.0 0.811534882
10 -373730.0 262571.7 264029.7 269938.8 270667.8 267622.0 0.76856798
Note:
Abbreviation: N, number; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

From comparing output of different mixed effect LCA models in the above table, we think a 6 class model may be considered as an appropriate model among them.

5.3 Visualisation of the 6-class model

5.4 Visualisation of the classes by Carbohydrate eating types

From this results, we might consider combining high-carbohydrate with the medium-carbohydrate category since in the third figure (“High carbohydrate consumption by latent classes”) above, there is hardly any difference can be recognised. So, we tried the same analysis in the same data set combining medium and high carbohydrate consumptions together:

6 Mixed effect LCA with carbohydrates intake groups redefined:

  1. Not eating;
  2. Eating low carbohydrate food (energy contribution less than or equal to 25%);
  3. Eating medium-or-high carbohydrate food (energy contribution higher than 26%);
  4. Eating high carbohydrate food (energy contribution higher or equal to 75%).

6.1 Model comparison and selection

Model Comparison. (All data, n = 6155)
N of classes log-likelihood G^2 AIC BIC cAIC aBIC Entropy
1 -353347.6 244621.0 244717.0 245106.1 245154.1 244953.5
2 -334011.3 205948.5 206142.5 206928.8 207025.8 206620.5 0.907998248
3 -333020.9 203967.7 204259.7 205443.1 205589.1 204979.1 0.879225821
4 -330768.5 199462.9 199852.9 201433.5 201628.5 200813.8 0.87025904
5 -329939.7 197805.3 198293.3 200271.1 200515.1 199495.7 0.857307256
6 -329189.1 196304.0 196890.0 199265.0 199558.0 198333.9 0.824370514
7 -328687.2 195300.3 195984.3 198756.4 199098.4 197669.6 0.810165805
8 -328029.5 193984.8 194766.8 197936.2 198327.2 196693.6 0.776409294
9 -327704.8 193335.5 194215.5 197782.0 198222.0 196383.7 0.783451995
10 -327665.3 193256.5 194234.5 198198.3 198687.3 196644.2 0.807738459
Note:
Abbreviation: N, number; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

6.2 Visualisation of the 3-class model

6.3 Visualisation of the 4-class model

6.4 Visualisation of the 5-class model

6.5 Visualisation of the 6-class model