The Analysis
The following analysis pulled geochemistry data from shells of juvenile mussels rasied in separate bays in eastern Maine in 2015 (along side the larvae from the previous analysis), and tested the effects of source (bay), and co-variance matrix prior, on the ability of the infinite mixture model (IMM) to correctly assign juvenile chemistry to sources. During analysis, each juvenile was removed from the overall data set, and the remaining juveniles used as baseline data to train the IMM. The individual juvenile removed from the data set was then assigned to a source based on the IMM, allowing for 7 possible extra, and untrained, source assignments, using each of four different co-variance matrix priors:
- The Identity Matrix for all sources
- Source Specific Co-variance Matrices for baseline sources, and then the Identity Matrix for extra sources
- Source Specific Co-variance Matrices for baseline sources, and then the Universal Co-variance Matrix for extra sources
- The Universal Co-variance Matrix for all sources
For each co-variance matrix prior for each juvenile, the IMM was run for an initial 1000 adaptation and 2000 burn-in iterations. The posterior distribution of source assignments for a final 3000 iterations were retained and used for further analysis.
The Data Set
Data were \(Log(x+1)\) transformed, and centered (by elemental mean) and scaled (by elemental standard deviation) prior to analysis. Together, these transformations were meant to normalize the elemental data, and then standardize the center and variance so that unknown sources could be easily modeled in the IMM. Principal Component Analysis suggested a lot of overlap in the multivariate geochemical signals among sources along the first 2 principle components (accounting for 72 and 12 percent of total variation respectively). Of all elements, Mn and Co accounted for the most variation among individual larvae, while La accounted for the least.

Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Standard deviation 0.8952 0.3571 0.24539 0.19649 0.17896 0.15648 0.11961 0.08027
Proportion of Variance 0.7252 0.1154 0.05449 0.03494 0.02898 0.02216 0.01295 0.00583
Cumulative Proportion 0.7252 0.8407 0.89514 0.93008 0.95907 0.98122 0.99417 1.00000
Results by Individual
Below are the posterior distribution results of the IMM assignment for each individual juveniles (color) to each potential source (x-axis; numbers represent possible extra sources). Results are separated by the actual juvenile source (panel rows) and the co-variance matrix prior used in the IMM (panel columns).

We can see that generally, individual sources assign most frequently to one source, which is usually the correct one. Depending on the covariance used, sources like MBR, GB and DB are more multi-modal and diffuse in their assignments though.
Results by Source
To summarize assignment results by site, we take the mode source assignment for each individual and use that as the IMM predicted source assignment. Then, for each co-variance matrix prior (panels), we plot the percent of individuals from each source (Actual Source) assigned to each possible source by the IMM (Predicted Source).

Just as before, when we looked at data on an individual level, we generally see IMM assignment to correct sources, but, depending on the covariance used, MBR and GB are mostly confused with DB, and DB is confused with GB. Maximum percentage correctly assigned was ~ 88% for FBE though.
Effects of Source and Coviariance Matrix Prior on Correct Assignment
To understand what is driving misclassification by the IMM, we test the effects of Actual Source, Co-variance Prior, and their interaction with a GLM (family=binomial, link=log), using individual nested within actual source as a random effect.
Analysis of Deviance Table (Type III Wald chisquare tests)
Response: Correct
Chisq Df Pr(>Chisq)
(Intercept) 0.9668 1 0.325467
Actual_Source 17.3475 4 0.001654 **
Cov_Prior 15.9057 3 0.001186 **
Actual_Source:Cov_Prior 44.2806 12 1.368e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We find that a juveniles’s actual source interacts with the covariance used to affect correct classifications.
We can use the regression coefficient summaries for a more in-depth look.
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Correct ~ Actual_Source * Cov_Prior + (1 | Actual_Source/Individual_Code)
Data: leave.one.out.results.byind.assignment.table
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 20000))
AIC BIC logLik deviance df.resid
404.7 499.5 -180.4 360.7 526
Scaled residuals:
Min 1Q Median 3Q Max
-28.9230 -0.0205 0.0143 0.0461 25.5696
Random effects:
Groups Name Variance Std.Dev.
Individual_Code:Actual_Source (Intercept) 3.751e+01 6.124e+00
Actual_Source (Intercept) 3.435e-15 5.861e-08
Number of obs: 548, groups: Individual_Code:Actual_Source, 137; Actual_Source, 5
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4624 1.4873 -0.983 0.325467
Actual_SourceFBE 9.8616 3.0361 3.248 0.001162 **
Actual_SourceGB 4.4980 2.0893 2.153 0.031331 *
Actual_SourceMBR 8.6472 2.5131 3.441 0.000580 ***
Actual_SourcePHB 7.5958 2.1069 3.605 0.000312 ***
Cov_PriorSource_and_ID 5.8002 1.6774 3.458 0.000544 ***
Cov_PriorSource_and_Universal 5.8002 1.6775 3.458 0.000545 ***
Cov_PriorUniversal -0.8929 0.9713 -0.919 0.357924
Actual_SourceFBE:Cov_PriorSource_and_ID -5.8002 3.0633 -1.893 0.058294 .
Actual_SourceGB:Cov_PriorSource_and_ID -12.9372 2.5655 -5.043 4.59e-07 ***
Actual_SourceMBR:Cov_PriorSource_and_ID -20.8495 4.0346 -5.168 2.37e-07 ***
Actual_SourcePHB:Cov_PriorSource_and_ID -5.1599 2.0087 -2.569 0.010207 *
Actual_SourceFBE:Cov_PriorSource_and_Universal -8.0393 3.1013 -2.592 0.009536 **
Actual_SourceGB:Cov_PriorSource_and_Universal -12.5134 2.5307 -4.945 7.63e-07 ***
Actual_SourceMBR:Cov_PriorSource_and_Universal -20.8495 4.0349 -5.167 2.37e-07 ***
Actual_SourcePHB:Cov_PriorSource_and_Universal -5.8002 1.9956 -2.906 0.003655 **
Actual_SourceFBE:Cov_PriorUniversal 0.8929 2.7411 0.326 0.744604
Actual_SourceGB:Cov_PriorUniversal 1.3311 1.3566 0.981 0.326494
Actual_SourceMBR:Cov_PriorUniversal -0.3952 1.9531 -0.202 0.839635
Actual_SourcePHB:Cov_PriorUniversal 1.5333 1.5010 1.021 0.307028
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Relative to the intercept (reflecting the DB source and using the ID co-variance matrix), it seems that, generally, using the ID co-variance matrix, or the universal covariance matrix, give better assignments than integrating source specific covariance-matrices.
Results by Covariance Matrix Prior
If we wanted to look at the effect of different co-variance matrices in more detail, we again see that, generally, using the ID or University co-variance matrices leads to the best assignments.

Taken together, the results suggest that for best assignment, we should run the IMM with ID or universal co-variance priors, which is somewhat different than what the larval data suggested (that using source-specific covariance matrices is better). Taking these IMM assignments and projecting the correct and incorrect assignments onto the PCA from before, we see that where there is multivariate overlap in geochemistry, individuals are misassigned.

Overall, we learn that ID or universal co-variance matrices produce better IMM results when assigning juveniles, and that IMM misclassifications are mostly confined to the mutivariate overlap in geochemistry.
