class: center, middle, inverse, title-slide # Tutorial ## Smoothing mixture model (SMM) ### Jenny Hsu & Chang-Chun Chen ### 2020-12-14 --- ### Intro > {lcmm} Latent class mixed-effect models > {gamm4} Generalized additive mixed models --- ### Time intensity curves .pull-left[ ```r dta1 <- read.table("time_intensity.txt", header = TRUE) ggplot(dta1, aes(Time, Intensity)) + geom_point(alpha = .5, size = 1) + stat_smooth(aes(group = ID), method = "loess", se = F, lwd = .7) + labs(x = "Time (sec)", y = "Intensity") + theme_minimal() ``` ] .pull-right[ <!-- --> ] /* custom.css */ .left-code { color: #777; width: 35%; height: 90%; float: left; } .right-plot { width: 60%; float: right; padding-left: 2%; } --- ### LCMM ```r library(lcmm) model_2 <- lcmm(fixed = Intensity ~ Time + I(Time^2), mixture = ~ Time + I(Time^2), random = ~ Time, ng = 2, nwg = T, idiag = FALSE, link='splines', data = dta1, subject = "ID") ``` ``` ## Be patient, lcmm is running ... ## The program took 3.59 seconds ``` ```r model_3 <- lcmm(fixed = Intensity ~ Time + I(Time^2), mixture = ~ Time + I(Time^2), random = ~ Time, ng = 3, nwg = T, idiag = FALSE, link='splines', data = dta1, subject = "ID") ``` ``` ## Be patient, lcmm is running ... ## The program took 13.2 seconds ``` ```r model_4 <- lcmm(fixed = Intensity ~ Time + I(Time^2), mixture = ~ Time + I(Time^2), random = ~ Time, ng = 4, nwg = T, idiag = FALSE, link='splines', data = dta1, subject = "ID") ``` ``` ## Be patient, lcmm is running ... ## The program took 21.86 seconds ``` --- ### LCMM comparison ```r summarytable(model_2, model_3, model_4, which = c("G", "loglik", "npm", "AIC", "BIC", "%class")) ``` ``` ## G loglik npm AIC BIC %class1 %class2 %class3 %class4 ## model_2 2 -554.4949 17 1142.990 1156.124 56.25 43.75 ## model_3 3 -549.0244 22 1142.049 1159.046 18.75 50.00 31.25 ## model_4 4 -534.9250 27 1123.850 1144.710 18.75 12.50 37.50 31.25 ``` --- ### LCMM .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] /* custom.css */ .left-plot { width: 48%; height: 90%; float: right; padding-right: 2%; } .right-plot { width: 48%; height: 90%; float: right; padding-left: 2%; } --- ### SMM (Three group) ```r library(MASS) ##generate data library(nlme) library(gmodels) ##cross table library(mgcv) ##GAMM library(gamm4) ##GAMM4 ``` 1. Initial assignment of groups: used mean intensity to categorize individuals into three groups. 2. E-M algorithm - M step: GAMM4 - E step: calculate Pearson residual for each participant assign participants to groups with the lowest residual MAX LLK, MINI BIC --- ### SMM .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] /* custom.css */ .left-plot { width: 48%; height: 90%; float: right; padding-right: 2%; } .right-plot { width: 48%; height: 90%; float: right; padding-left: 2%; } --- ### LCMM vs. SMM vs. REAL <!-- --> --- ### Adjusted Rand index (ARI) - LCMM ```r funLBM::ari(intensity_guts$REAL, intensity_guts$LCMM) ``` ``` ## [1] 0.2858984 ``` - SMM ```r funLBM::ari(intensity_guts$REAL, intensity_guts$SMM) ``` ``` ## [1] 0.1905177 ``` --- ### ERPdata {erp.easy} .pull-left[ ```r ggplot(dta_erp_f, aes(Time, V6, group = Subject)) + geom_point(alpha=0.1, size=0.5)+ geom_smooth(se = F, lwd = .5, alpha = .2, color="lightcoral") + labs(x = "Time", y = "Amplitude")+ theme_minimal() ``` ] .pull-right[ <!-- --> ] /* custom.css */ .left-code { color: #777; width: 35%; height: 90%; float: left; } .right-plot { width: 60%; float: right; padding-left: 2%; } --- ### LCMM ```r model_1 <- lcmm::lcmm(fixed = V6 ~1+ Time + I(Time^2), mixture = ~1 + Time + I(Time^2), random = ~1 + Time, ng = 2, nwg = T, link="splines", idiag = FALSE, data = data.frame(dta_erp_f[1:13530,]), subject = "Subject") ``` ## 電腦不會抱怨,但等待的時間真的太長 --- ### SMM (Two group) speedy than LCMM? .pull-left[ ```r par(mfrow=c(1,1)) plot(out1_mean_avg$Time, out1_mean_avg$perp_new, type="l", col="black", xlab="Time", ylab="Mean Y",ylim=range(-15, 15), xlim=range(0, 1800)) lines(out2_mean_avg$Time,out2_mean_avg$perp_new, type="l", col="red") legend(0, 40, legend=c("Group 1", "Group 2"), col=c("black", "red"), lty=1:1, cex=0.9) ``` ] .pull-right[ <!-- --> ] /* custom.css */ .left-code { color: #777; width: 35%; height: 90%; float: left; } .right-plot { width: 60%; float: right; padding-left: 2%; } --- ### SMM ERP versus REAL <!-- --> --- ### SMM ERP versus REAL 2 <!-- --> --- ### SMM Limitation - BIC as criterion, tended to identify too many groups in the scenario of low separation among groups. (better in the scenario of medium to high separation with relatively low heterogeneity in trajectories) - Ignored the uncertainty within the data among groups. - Asymptotic biases (particularly when one of the groups is rare). - Class ML (initial assignments of group membership) may affect the local maximum identified and the speed of convergence. --- ### Summary 1. It recovers hidden groups from observed data. 2. Cluster over time (longitudinal). 3. Random components in regression coefficients. 4. Response is an unknown smooth function of a number of continuous covariates. {gamm4} (Fast) versus {lcmm} (Time consume)