Tutorial

class: center, middle, inverse, title-slide

# Tutorial
## Smoothing mixture model (SMM)
### Jenny Hsu & Chang-Chun Chen
### 2020-12-14

---

### Intro

> {lcmm} Latent class mixed-effect models  
  
> {gamm4} Generalized additive mixed models

---

### Time intensity curves

.pull-left[

```r
dta1 <- read.table("time_intensity.txt", header = TRUE)

ggplot(dta1, aes(Time, Intensity)) +
  geom_point(alpha = .5, size = 1) +
  stat_smooth(aes(group = ID), method = "loess", se = F, lwd = .7) +
  labs(x = "Time (sec)", 
       y = "Intensity") +
  theme_minimal()
```
]

.pull-right[
![](2020tutorial_files/figure-html/plot-label-out-1.png)
]

/* custom.css */
.left-code {
  color: #777;
  width: 35%;
  height: 90%;
  float: left;
}
.right-plot {
  width: 60%;
  float: right;
  padding-left: 2%;
}
---

### LCMM

```r
library(lcmm)
model_2 <- lcmm(fixed = Intensity ~ Time + I(Time^2), mixture = ~ Time + I(Time^2), 
                random = ~ Time, ng = 2, nwg = T, idiag = FALSE, link='splines',
                data = dta1, subject = "ID")
```

```
## Be patient, lcmm is running ... 
## The program took 3.59 seconds
```

```r
model_3 <- lcmm(fixed = Intensity ~ Time + I(Time^2), mixture = ~ Time + I(Time^2),
                random = ~ Time, ng = 3, nwg = T, idiag = FALSE, link='splines',
                data = dta1, subject = "ID")
```

```
## Be patient, lcmm is running ... 
## The program took 13.2 seconds
```

```r
model_4 <- lcmm(fixed = Intensity ~ Time + I(Time^2), mixture = ~ Time + I(Time^2),
                random = ~ Time, ng = 4, nwg = T, idiag = FALSE, link='splines',
                data = dta1, subject = "ID")
```

```
## Be patient, lcmm is running ... 
## The program took 21.86 seconds
```

---
### LCMM comparison

```r
summarytable(model_2, model_3, model_4, 
             which = c("G", "loglik", "npm", "AIC", "BIC",  "%class"))
```

```
##         G    loglik npm      AIC      BIC %class1 %class2 %class3 %class4
## model_2 2 -554.4949  17 1142.990 1156.124   56.25   43.75                
## model_3 3 -549.0244  22 1142.049 1159.046   18.75   50.00   31.25        
## model_4 4 -534.9250  27 1123.850 1144.710   18.75   12.50   37.50   31.25
```

---

### LCMM

.pull-left[
![](2020tutorial_files/figure-html/unnamed-chunk-4-1.png)
]

.pull-right[
![](2020tutorial_files/figure-html/unnamed-chunk-5-1.png)
]

/* custom.css */

.left-plot {
  width: 48%;
  height: 90%;
  float: right;
  padding-right: 2%;
}

.right-plot {
  width: 48%;
  height: 90%;
  float: right;
  padding-left: 2%;
}
---

### SMM (Three group)

```r
library(MASS) ##generate data
library(nlme)
library(gmodels) ##cross table
library(mgcv) ##GAMM
library(gamm4) ##GAMM4
```

1. Initial assignment of groups: used mean intensity to categorize individuals into three groups.
2. E-M algorithm  
- M step:   
GAMM4  
- E step:   
calculate Pearson residual for each participant  
assign participants to groups with the lowest residual  
MAX LLK, MINI BIC

---
### SMM

.pull-left[
![](2020tutorial_files/figure-html/unnamed-chunk-9-1.png)
]

.pull-right[
![](2020tutorial_files/figure-html/unnamed-chunk-10-1.png)
]

/* custom.css */

.left-plot {
  width: 48%;
  height: 90%;
  float: right;
  padding-right: 2%;
}

.right-plot {
  width: 48%;
  height: 90%;
  float: right;
  padding-left: 2%;
}
---
### LCMM vs. SMM vs. REAL

![](2020tutorial_files/figure-html/unnamed-chunk-11-1.png)

---

### Adjusted Rand index (ARI)

- LCMM

```r
funLBM::ari(intensity_guts$REAL, intensity_guts$LCMM)
```

```
## [1] 0.2858984
```

- SMM

```r
funLBM::ari(intensity_guts$REAL, intensity_guts$SMM)
```

```
## [1] 0.1905177
```
---

### ERPdata {erp.easy}

.pull-left[

```r
ggplot(dta_erp_f, aes(Time, V6, group = Subject)) +
  geom_point(alpha=0.1, size=0.5)+
  geom_smooth(se = F, lwd = .5, alpha = .2, color="lightcoral") +
  labs(x = "Time", y = "Amplitude")+
  theme_minimal()
```
]

.pull-right[
![](2020tutorial_files/figure-html/2-out-1.png)
]

/* custom.css */
.left-code {
  color: #777;
  width: 35%;
  height: 90%;
  float: left;
}
.right-plot {
  width: 60%;
  float: right;
  padding-left: 2%;
}
---
### LCMM

```r
model_1 <- lcmm::lcmm(fixed = V6 ~1+ Time + I(Time^2),
                      mixture = ~1 + Time + I(Time^2),
                      random = ~1 + Time,
                      ng = 2, nwg = T, 
                      link="splines",
                      idiag = FALSE, 
                      data = data.frame(dta_erp_f[1:13530,]), subject = "Subject")
```

## 電腦不會抱怨，但等待的時間真的太長

---

### SMM (Two group) speedy than LCMM?

.pull-left[

```r
par(mfrow=c(1,1))
plot(out1_mean_avg$Time, out1_mean_avg$perp_new, type="l", col="black", xlab="Time", ylab="Mean Y",ylim=range(-15, 15),  xlim=range(0, 1800))
lines(out2_mean_avg$Time,out2_mean_avg$perp_new, type="l", col="red")
legend(0, 40, legend=c("Group 1", "Group 2"), col=c("black", "red"),  lty=1:1, cex=0.9)
```
]

.pull-right[
![](2020tutorial_files/figure-html/SMM_erp-out-1.png)
]

/* custom.css */
.left-code {
  color: #777;
  width: 35%;
  height: 90%;
  float: left;
}
.right-plot {
  width: 60%;
  float: right;
  padding-left: 2%;
}

---
### SMM ERP versus REAL

![](2020tutorial_files/figure-html/unnamed-chunk-17-1.png)

---
### SMM ERP versus REAL 2

![](2020tutorial_files/figure-html/unnamed-chunk-18-1.png)

---

### SMM Limitation

- BIC as criterion, tended to identify too many groups in the scenario of low separation among groups. (better in the scenario of medium to high separation with relatively low heterogeneity in trajectories)  
- Ignored the uncertainty within the data among groups.  
- Asymptotic biases (particularly when one of the groups is rare).  
- Class ML (initial assignments of group membership) may affect the local maximum identified and the speed of convergence.

---

### Summary

1. It recovers hidden groups from observed data.  
2. Cluster over time (longitudinal).  
3. Random components in regression coefficients.  
4. Response is an unknown smooth function of a number of continuous covariates.

{gamm4} (Fast) versus {lcmm} (Time consume)