The Analysis
The following analysis pulled geochemistry data from the 2015 Reference Larvae data set, and tested the effects of larval source, and co-variance matrix prior, on the ability of the infinite mixture model (IMM) to correctly assign larvae to sources. During analysis, each larvae was removed from the overall data set, and the remaining larvae used as baseline data to train the IMM. The individual larvae removed from the data set was then assigned to a source based on the IMM, allowing for 7 possible extra, and untrained, source assignments, using each of four different co-variance matrix priors:
- The Identity Matrix for all sources
- Source Specific Co-variance Matrices for baseline sources, and then the Identity Matrix for extra sources
- Source Specific Co-variance Matrices for baseline sources, and then the Universal Co-variance Matrix for extra sources
- The Universal Co-variance Matrix for all sources
For each co-variance matrix prior for each larvae, the IMM was run for an initial 1000 adaptation and 2000 burn-in iterations. The posterior distribution of source assignments for a final 3000 iterations were retained and used for further analysis.
The Data Set
Data were \(Log(x+1)\) transformed, and centered (by elemental mean) and scaled (by elemental standard deviation) prior to analysis. Together, these transformations were meant to normalize the elemental data, and then standardize the center and variance so that unknown sources could be easily modeled in the IMM. Principal Component Analysis suggested a lot of overlap in the multivariate geochemical signals among sources along the first 2 principle components (accounting for 36 and 22 percent of total variation respectively). Of all elements, Mg accounted for the most variation among individual larvae, while La accounted for the least.

Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Standard deviation 0.8440 0.6633 0.4559 0.43804 0.38877 0.34044 0.32125 0.20066
Proportion of Variance 0.3629 0.2242 0.1059 0.09778 0.07702 0.05906 0.05259 0.02052
Cumulative Proportion 0.3629 0.5871 0.6930 0.79082 0.86784 0.92689 0.97948 1.00000
Results by Individual
Below are the posterior distribution results of the IMM assignment for each individual larvae (color) to each potential source (x-axis; numbers represent possible extra sources). Results are separated by the actual larval source (panel rows) and the co-variance matrix prior used in the IMM (panel columns).

We can see that generally, individual sources assign most frequently to one source, which is usually the correct one. Sources like CHR, FBE and GB are more multi-modal and diffuse in their assignments though.
Results by Source
To summarize assignment results by site, we take the mode source assignment for each individual and use that as the IMM predicted source assignment. Then, for each co-variance matrix prior (panels), we plot the percent of individuals from each source (Actual Source) assigned to each possible source by the IMM (Predicted Source).

Just as before, when we looked at data on an individual level, we generally see IMM assignment to correct sources, but CHR is mostly confused with PHB and FBW, and GB and FBW are confused with each other. Maximum percentage correctly assigned was only 62% for FBE though.
Effects of Source and Coviariance Matrix Prior on Correct Assignment
To understand what is driving misclassification by the IMM, we test the effects of Actual Source, Co-variance Prior, and their interaction with a GLM (family=binomial, link=log), using individual nested within actual source as a random effect.
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: Correct
Chisq Df Pr(>Chisq)
(Intercept) 12.9177 1 0.0003255 ***
Actual_Source 7.9274 4 0.0942732 .
Cov_Prior 4.9148 3 0.1781406
Actual_Source:Cov_Prior 11.7227 12 0.4681974
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We find that a larva’s actual source has the greatest, and only marginally insignificant, effect on correct classification.
We can use the regression coefficient summaries for a more in-depth look.
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: Correct ~ Actual_Source * Cov_Prior + (1 | Actual_Source/Individual_Code)
Data: leave.one.out.results.byind.assignment.table
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 20000))
AIC BIC logLik deviance df.resid
621.0 722.4 -288.5 577.0 722
Scaled residuals:
Min 1Q Median 3Q Max
-3.2871 -0.1157 -0.0579 0.1652 2.4500
Random effects:
Groups Name Variance Std.Dev.
Individual_Code:Actual_Source (Intercept) 4.722e+01 6.872e+00
Actual_Source (Intercept) 1.018e-15 3.191e-08
Number of obs: 744, groups: Individual_Code:Actual_Source, 186; Actual_Source, 5
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.5205 1.8142 -3.594 0.000325 ***
Actual_SourceFBE 6.3633 2.9343 2.169 0.030117 *
Actual_SourceFBW 4.9025 2.4853 1.973 0.048542 *
Actual_SourceGB 3.5646 2.1834 1.633 0.102548
Actual_SourcePHB 5.4763 2.2603 2.423 0.015399 *
Cov_PriorSource_and_ID 1.3127 1.1820 1.111 0.266759
Cov_PriorSource_and_Universal 1.8499 1.1897 1.555 0.119957
Cov_PriorUniversal -0.7606 1.2383 -0.614 0.539071
Actual_SourceFBE:Cov_PriorSource_and_ID 1.2209 1.7028 0.717 0.473394
Actual_SourceFBW:Cov_PriorSource_and_ID -0.2013 1.5930 -0.126 0.899459
Actual_SourceGB:Cov_PriorSource_and_ID -2.3905 1.4677 -1.629 0.103378
Actual_SourcePHB:Cov_PriorSource_and_ID -0.1891 1.4697 -0.129 0.897644
Actual_SourceFBE:Cov_PriorSource_and_Universal 0.6837 1.7017 0.402 0.687865
Actual_SourceFBW:Cov_PriorSource_and_Universal -0.7385 1.5954 -0.463 0.643451
Actual_SourceGB:Cov_PriorSource_and_Universal -2.5551 1.4638 -1.746 0.080885 .
Actual_SourcePHB:Cov_PriorSource_and_Universal -0.7263 1.4709 -0.494 0.621479
Actual_SourceFBE:Cov_PriorUniversal 1.9547 1.6763 1.166 0.243569
Actual_SourceFBW:Cov_PriorUniversal 0.7606 1.6338 0.466 0.641565
Actual_SourceGB:Cov_PriorUniversal 0.7606 1.4870 0.511 0.609020
Actual_SourcePHB:Cov_PriorUniversal 1.1372 1.5158 0.750 0.453104
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Relative to the intercept (reflecting the CHR source and using the ID co-variance matrix), larvae from GB were the only ones to have a significantly similar frequency of correct assignments. This is pretty consistent with what we saw in the figures before - CHR and GB had low frequencies of correct assignments, while other sources were better.
Results by Covariance Matrix Prior
If we wanted to look at the effect of different co-variance priors, we see that, generally, using the source specific co-variance matrices leads to the best assignments.

Taken together, the results suggest that for best assignment, we should run the IMM with source specific and universal co-variance priors. Taking these IMM assignments and projecting the correct and incorrect assignments onto the PCA from before, we see that there is no clear pattern, geochemically, for why individuals are misassigned.

Overall, we learn that source specific and universal co-variance matrices produce marginally better IMM results when assigning larvae, but that the greatest influence on IMM misclassifications is source-specific geochemistry.
