Plots

As you can see, there is a pattern in the data for each group. After repeated testing, while the first group learns and their data get better over time, the second gets fatigued and so their data get worse over time. If you only looked at the overall average performance plot on the right, you would conclude that the performance is flat, but this would be wrong, as the data hide two separate groups.

Expectation-Maximization (EM)

If we fit a repeated-measures regression aggregating across students, each of them is classified into one or the other cluster. From the ratio below, we can see two distinct classifications (about 91-100%).


Call:
flexmix(formula = value ~ time | as.factor(sub), data = long, 
    k = 2, control = list(iter.max = 10))

       prior size post>0 ratio
Comp.1   0.5  100    100 1.000
Comp.2   0.5  100    110 0.909

'log Lik.' -382.1782 (df=7)
AIC: 778.3564   BIC: 801.4446 
Model parameters
                     Comp.1    Comp.2
coef.(Intercept)  9.9607590 5.1754053
coef.time        -0.4886075 0.4633109
sigma             1.5150882 1.5525637
Clusters
   
     1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
  1  0  0  0  0  0  0  0  0  0  0 10 10 10 10 10 10 10 10 10 10
  2 10 10 10 10 10 10 10 10 10 10  0  0  0  0  0  0  0  0  0  0

Model Fit Measures

Model Fit Measures Plot

The best trade-off between accuracy and complexity can be visualized by plotting performance measures such as the Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC), and the Integrated Complete-data Likelihood (ICL) for different models. As you can see, BIC, AIC and ICL are all optimal (minimum value) for two groups or clusters.

Rootograms

In a rootogram, the height of the bars correspond to square roots of counts rather than the counts themselves, so that low counts are more visible and peaks less so. A peak near probability 1 indicates that many of the points are overwhelmingly well-represented by that cluster. A peak near 0 would indicate that many points clearly don’t fit the category (which is also usually good). Points in the middle indicate a lack of separation–there are points that are only moderately well-described by that group, which should be considered with respect to the number of groups you think exist.