-We assume learners have already converged on a first set of phones, which correspond to physical instantiation of phonemes in various linguistic contexts.
This phonetic variation is largely driven by the phenomenon of coarticulation, that is, when vocal tract gestures for one sound overlap with gestures for another.
-Experimental designs probing phoneme learning in infant studies compare the young infants’ behavior (e.g., looking time) when presented with a pair of contrastive sounds (e.g., da-ta) to their behaviour when they are presented with a pair of phones which are not contrastive in their native language (e.g., da-ɖa).
-We use a task similar in spirit to the laboratory experiment, in order to evaluate how the learning mechanisms fare in learning the phonemic status of phones:
-> For each corpus, we list all possible combinations of pairs of phones. Some of these pairs are instantiations of the same phoneme and are labeled “0” (non-contrastive), and others are instantiations of different phonemes and are labeled “1” (contrastive).
-> Each of these pairs are then assigned a score from each of the cues under investigation.
-> These scores should allow the learner to rank contrastive pairs higher than non-contrastive ones.
-> Since these scores are continuous, we compute the Receiver Operating Characteristic (ROC) which illustrates the performance of the binary classification that results from the cue at hand and a discrimination threshold (which we vary accross values).
-> Finally, the overall performance of the mechanism is summarized through the Area Under the ROC-Curve (AUC)
-Bottom-up cue: -> Acoustic cue applies to all pairs, since each phone has its HMM model
-Top-down cue: -> Top-down cue does not apply to all pairs: only to those pairs that give rise to word-form variation. Since it is coarticulation data, word-form variation occur at the first or last segment. But we also test how this subset generalize to the full set of contrasts using a basic matrix completion algorithm
-For generalized pairs, features mean the dimension to which the matrix was reduced to (using SVD)
-Bottom-up cue is better and has a wider scope than the top-down cue
-Using a basic matrix completion algorithm, we show that phonemic information can, in principle, be generalized to the totality of pairs but only when variation is not too extreme
-Previous results tested pair ranking with respect to gold phonemic categorization. However, this leaves out the question of the hierarchy level.
-We test whether acoustic and top-down cues encode a preference for the phonemic categorization. To this end, we evaluate how the cues fare under various hierarchical categorizagion (hypotheses in the motor space).
-The task is the same as before, except that we vary the gold standard against which we compute the ROC score. In part 1, this gold standard was the phonemes (“1” for true phonemic contrast, and “0” for true allophonic contrast). In part 2, we do a similar labeling but across different hierarchies. For example a hierarchy with only Consonants and Vowels we label “1” pairs of any two consonants C/C, or any two vowels C/V, and “0” to pairs of any consonant and vowel C/V.
-… as a function of various initial size of the phonetic inventory (2, 4, 8, 16).
-The idea is that each hierarchy level instantiates a hypothesis in the motor space. Although here I used only one hierarchical representation. The real actual space of hypothesis can be larger