Statistical Analysis of Partially Observed Shapes in Two Dimensions with Applications in Biological Anthropology

Gregory J. Matthews
July 28, 2019

Acknowledgements

  • Juliet Brophy
  • Ofer Harel
  • George Thiruvathukal
  • Sebastian Kurtek
  • Karthik Barath
  • Grady Flanary
  • Kajal Chokshi

Background

  • Biological anthropologists are interested in studying early human behavior.
  • One way to accomplish this is by reconstructing past environments.
  • This is accomplished by identifying the fossil animals associated with the early humans, which in southern Africa is dominated by isolated bovid teeth.

plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1

Classification

  • Classification can be performed using the shape created by the outline of the occlusal (i.e. chewing) surface of the tooth.
  • In the past, we looked at Elliptical Fourier Analysis (EFA) and used the coefficients generated as features in machine learning models. (Matthews et. al., 2018)
  • We observed classification rates of approximately 90% and 70% for tribe and species level in our EFA experiments.

plot of chunk unnamed-chunk-2

Teeth

  • Classifying fully observed teeth works well.

plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3

  • But what about teeth like this?

plot of chunk unnamed-chunk-4plot of chunk unnamed-chunk-4

  • How can we do this?

Shapes as Functions

  • We have moved away from EFA and are now viewing the shapes as functions.
  • This allows us to use previously developed functional data analysis techniques for shapes such as the square root velocity framework and an elastic metric (Srivastava and Klassen, 2016).
  • \( C: \mathbb{S} \rightarrow \mathbb{R}^2 \)
  • With scaling: \( C \mapsto \frac{C'}{\|C'\|_{\mathbb{L}^2}}=:Q_C \)
  • Without scaling: \( C \mapsto C'=:Q_C \)
  • \( d(C_1,C_2):=\inf_{(\gamma,O) \in \Gamma \times SO(2)} \|Q_{C_1} - OQ_{C_2}(\gamma)\sqrt{\gamma'}\|_{\mathbb{L}L^2} \)
  • Using this let's look at the proposed method.

Proposed Method

  • Collection of partial shapes: Choose one \( C^p_j \) for \( j \) in \( 1, 2, \dots, n_p \).
  • Collection of full shapes: \( C^f_i \) for \( i = 1, 2, \dots, n_f \).
  • Measure \( d_{ij} = d(C^p_j, C^f_i) \) for all \( i \) (this is minimized across all starting points!).
  • Include \( C^f_i \) in the donor set if \( d_{ij} \le d_{(l)} \) where \( d_{(l)} \) is the \( l \)-th smallest distance across all \( i \). So the donor set will include \( l \) full shapes.
  • Randomly choose a shape from the donor set, \( C^{f\star}_i \) and use it to complete \( C^p_j \).
  • Repeat this \( M \) times to create \( M \) completed shapes.
  • Perform an analysis on the \( M \) completed shapes and combine across imputations. We are interested in classification and used k-nearest neighbors.

Completing the shape: Example

  • Gray tooth is the donor
  • Black part is the donee
  • Red part is the completion

plot of chunk unnamed-chunk-5

Example Imputations - Lower Molar 1

  • Five imputations (\( M \) = 5) with five potential donors (\( l \) = 5 )

plot of chunk unnamed-chunk-6

  • Note that only three unique teeth from the set of potential donors were chosen.

Example Imputations - Lower Molar 3

plot of chunk unnamed-chunk-7

Results - Classification Accuracy Tribe

plot of chunk unnamed-chunk-8

Results - Classification Accuracy Species

plot of chunk unnamed-chunk-9

Results - Classification Accuracy Species (with scaling)

plot of chunk unnamed-chunk-10

Results - Classification Accuracy Tribe (with scaling)

plot of chunk unnamed-chunk-11

Results - Log Loss

plot of chunk unnamed-chunk-12

Results - Log Loss

plot of chunk unnamed-chunk-13

Conclusions and Future Work

  • When size and shape are preserved, the proposed hot deck type imputation works well; however, this method does not work as well as k-nearest neighbor matching of partial shapes
  • When scaling is considered, the imputation method proposed here performs better than k-nearest neighbor of the partial shape in terms of log loss, but the results are still worse in terms of accuracy.
  • We are going to other classification models.
  • There are many analyses where the full shape is required to perform the analysis, and some type of imputation is necessary (e.g. area of shapes, computing karcher mean, PCA, etc.)

Cheers!

  • This work was supported by the National Science Foundation DMS-1812124

Bibliography

  • Matthews, G.J., Brophy, J.K., Luetkemeier, M., Gu, H., Thiruvathukal, G.K. 2018. “A comparison of machine learning techniques for taxonomic classification of teeth from the Family Bovidae.” Journal of Applied Statistics, 45(15), 2773-2787.

  • Srivastava, A. and Klassen, E.P. “Functional and Shape Data Analysis.” Springer. 2016.