Preschoolers’ production of L2 vowels is affected by input quality: A longitudinal study

M. Kučerová, Š. Šimáčková

L2 input

  • Learners initially use L1 representations to process L2 speech, representations of L2 words affected
  • Learning: updating and forming L2 representations
  • Quality of L2 input impacts creation of new representations and their updating [1]
  • FLA: less L2 input, few L2 speakers, variable accents(?)
  • Variable native input affects vowel production[2], [3]

[1] Llompart, M. (2021). Phonetic categorization ability and vocabulary size contribute to the encoding of difficult second-language phonological contrasts into the lexicon.

[2] Bohn, O.-S., & Bundgaard-Nielsen, R. L. (2009). Second language speech learning with diverse inputs.

[3] Bosch, L., & Ramon-Casas, M. (2011). Variability in vowel production by bilingual speakers: Can input properties hinder the early stabilization of contrastive categories?

Assumptions

  1. Speakers store words as strings of phonemes, and as phonetically-detailed tokens in exemplar clouds. [4] This is also true for L2 learners [6]
  2. Phonological categories emerge from sound exemplar clouds [5]

[4] Nijveld, A., Ten Bosch, L., & Ernestus, M. (2022). The use of exemplars differs between native and non-native listening.

[5] Wilder, R. J. (2018). Investigating hybrid models of speech perception.

Assumptions

Schweitzer, A. (2019). Exemplar-theoretic integration of phonetics and phonology

Exemplar-based Phonetic Space

Schweitzer, A. (2019). Exemplar-theoretic integration of phonetics and phonology: Detecting prominence categories in phonetic space. Journal of Phonetics, 77, 100915.

Assumptions: exemplars and categories

  1. Perceived L2 sounds can be integrated into L1 clouds
  • compatibility with equivalence classification in SLM-r [7]

Language experience modulates categories:

  • shift of cloud centre: phonetic drift
  • creation of new cloud: category formation

[6] Goldinger, S. D. (2007). A complementary-systems approach to abstract and episodic speech perception.

[7] Flege, J. E., & Bohn, O.-S. (2021). The revised speech learning model (SLM-r).

Children’s non-native language phonology

  • Shared L1~L2 sound system, bidirectional influences (interference [8], drift [9])
  • One hour of articulatory feedback training [10]
  • Weaker compactness of L1 categories may lead to stronger drift [10]
  • This study: children exposed to L2 around age 3 \(\rightarrow\) less developed L1 phonology, weaker crosslinguistic influences, increased likelihood of new category formation.
    • BUT no immersion, creating L2 categories may be delayed or prevented

[8] Lee, S. A. S., & Iverson, G. K. (2012). Stop consonant productions of Korean–English bilingual children.

[9] Yang, J., & Fox, R. A. (2017). L1–L2 interactions of vowel systems in young bilingual Mandarin-English children.

[10] Kartushina, N., Hervais-Adelman, A., Frauenfelder, U. H., & Golestani, N. (2016). Mutual influences between native and non-native vowels in production: Evidence from short-term visual articulatory feedback training.

This study: input during experiment

  • In-class input from a teacher with a SSBE-like accent, separated target vowels in quality.
  • L2 input from parents: reported their English to be Czech-accented
  • We assumed that input from parents included vowel mergers typical of Moravian Czech: /ɛ, æ/ as /e/, and possibly /i,ɪ/ as /i/

Questions

How does in-class input influence pre-literate FL learners’ production of L2 and L1 vowels in terms of quality?

  1. Are /i,ɪ,ɛ,æ,ʌ/ better separated in the learners’ pronunciation of words heard in class compared to words heard mainly from parents?

  2. Does pronunciation of /i,ɪ,ɛ,æ,ʌ/ in words heard mainly from parents change in time, do the vowels become separated better?

  3. Does production of L2 vowels change over time?

Participants and L2 exposure

  • 7 girls, 3;9 to 5;9

  • L1 Moravian Czech monolinguals (+ 1 Czech-Slovak bilingual)

  • AOL between 1;9 and 3;4 yrs; low proficiency in English

  • Weekly 45 min EFL classes (role-playing, daily activities, dancing, drawing, problem solving); teacher: L1 Czech, L2 English with SSBE-like accent

  • School-provided audio-visual materials featuring SSBE speakers, media exposure

Data collection

  • 8 recording sessions: 2 in Czech (10 weeks apart), 6 in English
  • Picture naming task, repetition after the teacher when word not recalled
  • Mono-/disyllabic words, initial stress
    • 39 English words: SSBE /i,ɪ,ɛ,æ,ʌ/ in stressed syllable
    • 16 Czech words: /iː,i,e,a/
    • CVC context
    • Fillers with back vowels (14 English, 8 Czech)
  • New words from teacher (n=20), Old words from parents (n=19)

Measurement and analysis

  • Manual segmentation in Praat [11]; F0, F1, and F2 measurements from segments of 25–50 ms inside vowel intervals
  • F1, F2 extracted using Praat built-in LPC with the Burg algorithm
  • Normalization: ERB in phonR [13]
  • Linear mixed effects models in R [14] using lme4 [15], fitted by REML using BOBYQA optimization [16]
  • Vowel height as F1e-F0e, vowel retraction as F2e-F0e

[11] Boersma, P., & Weenink, D. (2022). Praat: doing phonetics by computer.

[13] McCloy, D. R. (2016). phonR: Tools for phoneticians and phonologists.

[14] R Core Team (2023). R: A language and environment for statistical computing.

[15] Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4.

[16] Powell, M. J. (2009). The BOBYQA algorithm for bound constrained optimization without derivatives.

English vowels

  • F1e-F0e~Vowel*Time*Input+(1|Participant)+(1|Word)
  • F2e-F0e~Vowel*Time*Input+(1|Participant)+(1|Word)
  • Time (levels: T1, T2), Input (levels: New, Old), Vowel: factor predictor with levels /i,ɪ,ɛ,æ,ʌ/
  • Intercept: /æ/ in New words at Time 2
  • 771 vowel tokens from 39 words spoken by 7 children were provided to each of two models

Input effects

  • Are the learners’ L2 vowels better separated acoustically in newly learned words?
  • /æ/ and /ʌ/, /i/ and /ɪ/ separated to a similar degree in all words (words form class and from home were pronounced similarly)
  • Better separation of /æ/ and /ɛ/ in height in New words (/ɛ/ was raised in New words)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: F1e - F0e ~ Vowel * Time * Input + (1 | Speaker) + (1 | Word)
   Data: efl_En
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 2910.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2344 -0.5884  0.0313  0.6212  4.6127 

Random effects:
 Groups   Name        Variance Std.Dev.
 Word     (Intercept) 0.007136 0.08447 
 Speaker  (Intercept) 0.547916 0.74021 
 Residual             2.506805 1.58329 
Number of obs: 771, groups:  Word, 39; Speaker, 7

Fixed effects:
                      Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)            8.17908    0.42215  25.34662  19.375  < 2e-16 ***
Vowelɛ                -1.65913    0.42750  65.25649  -3.881 0.000245 ***
Voweli                -6.81817    0.42429  70.57411 -16.070  < 2e-16 ***
Vowelɪ                -5.83383    0.41642  63.21942 -14.010  < 2e-16 ***
Vowelʌ                -0.60507    0.44187  77.53509  -1.369 0.174854    
TimeT1                 0.18895    0.44501  97.81900   0.425 0.672060    
Input2                -0.67679    0.36795  56.71397  -1.839 0.071101 .  
Vowelɛ:TimeT1          0.07196    0.63277  64.55410   0.114 0.909812    
Voweli:TimeT1          1.25329    0.62458  93.20023   2.007 0.047687 *  
Vowelɪ:TimeT1          0.39512    0.62386  82.57810   0.633 0.528257    
Vowelʌ:TimeT1          0.65275    0.63346  87.78026   1.030 0.305625    
Vowelɛ:Input2          1.33467    0.50248  50.62762   2.656 0.010543 *  
Voweli:Input2          0.54214    0.60875  45.82272   0.891 0.377804    
Vowelɪ:Input2          0.68234    0.59754  37.39647   1.142 0.260749    
Vowelʌ:Input2          0.45141    0.59827  41.91904   0.755 0.454745    
TimeT1:Input2         -0.01154    0.50845 140.28707  -0.023 0.981925    
Vowelɛ:TimeT1:Input2  -0.09302    0.71815  94.29086  -0.130 0.897221    
Voweli:TimeT1:Input2  -0.70617    0.83835 240.29708  -0.842 0.400441    
Vowelɪ:TimeT1:Input2  -0.43778    0.81398 201.45291  -0.538 0.591288    
Vowelʌ:TimeT1:Input2   0.35259    0.82006 202.79438   0.430 0.667686    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: F2e - F0e ~ Vowel * Time * Input + (1 | Speaker) + (1 | Word)
   Data: efl_En
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 2605.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.7869 -0.5532  0.0743  0.6249  2.7322 

Random effects:
 Groups   Name        Variance Std.Dev.
 Word     (Intercept) 0.05589  0.2364  
 Speaker  (Intercept) 0.12434  0.3526  
 Residual             1.66113  1.2888  
Number of obs: 771, groups:  Word, 39; Speaker, 7

Fixed effects:
                      Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)           14.58216    0.31940  38.38354  45.655  < 2e-16 ***
Vowelɛ                 0.46909    0.39701  30.08963   1.182  0.24664    
Voweli                 3.00112    0.39195  32.52439   7.657 8.98e-09 ***
Vowelɪ                 1.80708    0.38756  29.14001   4.663 6.41e-05 ***
Vowelʌ                -1.27638    0.40621  34.54353  -3.142  0.00343 ** 
TimeT1                 0.27377    0.40103  55.62807   0.683  0.49764    
Input2                 0.09848    0.34311  30.00992   0.287  0.77607    
Vowelɛ:TimeT1          0.38320    0.58404  37.11399   0.656  0.51579    
Voweli:TimeT1         -0.23492    0.56383  54.85623  -0.417  0.67856    
Vowelɪ:TimeT1          0.18168    0.56722  47.83518   0.320  0.75014    
Vowelʌ:TimeT1         -0.25691    0.57428  49.75271  -0.447  0.65656    
Vowelɛ:Input2          0.35508    0.47224  26.98158   0.752  0.45862    
Voweli:Input2         -0.64233    0.57795  23.02039  -1.111  0.27787    
Vowelɪ:Input2         -0.49674    0.57554  19.71051  -0.863  0.39848    
Vowelʌ:Input2         -0.79904    0.57098  21.97747  -1.399  0.17565    
TimeT1:Input2         -0.01022    0.45007  79.40295  -0.023  0.98194    
Vowelɛ:TimeT1:Input2  -0.63259    0.64889  52.12673  -0.975  0.33413    
Voweli:TimeT1:Input2   0.21649    0.72600 135.83916   0.298  0.76601    
Vowelɪ:TimeT1:Input2  -0.58373    0.71009 109.80786  -0.822  0.41283    
Vowelʌ:TimeT1:Input2   0.76005    0.71520 110.14986   1.063  0.29024    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

L2 /ɛ, æ, ʌ/ produced by the learners in Old words. Ellipses include 68% of tokens.

L2 /ɛ, æ, ʌ/ produced by the learners in New words. Ellipses include 68% of tokens.

Input effects: low vowels

  • /ɛ/-raising reflects the teacher’s pronunciation, children able to attend to the phonetic detail in the input as well as to refer to the recently perceived input in their own production
  • Lexical misrepresentation (some /æ/ tokens [ɛ]-like, also in [17]
  • Even in New words (exemplar vs. abstract representations)

[17] Šimáčková, Š., & Podlipský, V. J. (2018). Production accuracy of L2 vowels: Phonological parsimony and phonetic flexibility.

Time effects: low vowels

Does production of L2 vowels change over time?

  • No sufficient evidence for change in /ɛ, æ, ʌ/

Low English vowels at T1 (blue) and T2 (red)

Time effects: low vowels

  • No evidence of /ɛ/ raising: no generalisation of new experience to old representations?
  • Spectral distinction in teacher’s input not substantial enough to change the learners’ production of /æ, ʌ/? (consistent separation in all words throughout the experiment)

Time effects: high vowels

  • /i/ raised at Time 2
  • /ɪ/ became more variable at Time 2

L2 /i, ɪ/ produced by the learners at Time 1 (blue) and Time 2 (pink) across all words. Ellipses include 68% of tokens.

Czech Vowels: phonetic drift

Does production of L1 vowels change over time?

  • Vowel height: F1e-F0e~Vowel*Time+(1|Participant)+(1|Word)
  • Vowel retraction: F2e-F0e~Vowel*Time+(1|Participant)+(1|Word)
  • 4 vowels, 19 word types, 201 tokens total
  • /i/ raised, /iː/ raised
  • /e/ retracted, /a/ retracted
  • the relative distance between /e/ and /a/ remained similar
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: F1e - F0e ~ Vowel * Time + (1 | Speaker) + (1 | Word)
   Data: dataCz
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 692.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.87716 -0.70084  0.05063  0.69690  2.13584 

Random effects:
 Groups   Name        Variance Std.Dev.
 Word     (Intercept) 0.1074   0.3277  
 Speaker  (Intercept) 0.2007   0.4480  
 Residual             1.7279   1.3145  
Number of obs: 201, groups:  Word, 19; Speaker, 7

Fixed effects:
               Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)      6.9630     0.3548 16.7963  19.625 5.14e-13 ***
Vowela           1.7117     0.4621  9.9595   3.704  0.00411 ** 
Voweli          -3.9927     0.4565 10.9725  -8.746 2.82e-06 ***
Voweliː         -4.8652     0.4699 11.4315 -10.353 3.75e-07 ***
TimeT2           1.0453     0.4878 12.4126   2.143  0.05257 .  
Vowela:TimeT2   -1.1694     0.6879  9.8533  -1.700  0.12044    
Voweli:TimeT2   -2.6379     0.6753 11.1618  -3.907  0.00238 ** 
Voweliː:TimeT2  -1.8968     0.7096 11.4791  -2.673  0.02097 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Vowela Voweli Vowelː TimeT2 Vowela:TmT2 Voweli:TmT2
Vowela      -0.593                                                    
Voweli      -0.597  0.457                                             
Voweliː     -0.580  0.445  0.466                                      
TimeT2      -0.557  0.425  0.431  0.420                               
Vowela:TmT2  0.396 -0.668 -0.305 -0.298 -0.707                        
Voweli:TmT2  0.403 -0.307 -0.673 -0.315 -0.722  0.510                 
Vowelː:TmT2  0.382 -0.291 -0.306 -0.662 -0.687  0.486       0.504     
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: F2e - F0e ~ Vowel * Time + (1 | Speaker) + (1 | Word)
   Data: dataCz
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 592.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6424 -0.6003  0.0871  0.5569  1.9062 

Random effects:
 Groups   Name        Variance Std.Dev.
 Word     (Intercept) 0.02976  0.1725  
 Speaker  (Intercept) 0.24645  0.4964  
 Residual             1.02201  1.0109  
Number of obs: 201, groups:  Word, 19; Speaker, 7

Fixed effects:
               Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)     15.6148     0.2864 16.2353  54.513  < 2e-16 ***
Vowela          -1.6422     0.3143 10.5500  -5.225 0.000325 ***
Voweli           1.0790     0.3122 11.4783   3.457 0.005051 ** 
Voweliː          2.4007     0.3220 12.0443   7.455 7.52e-06 ***
TimeT2          -0.9183     0.3366 13.9409  -2.728 0.016376 *  
Vowela:TimeT2   -0.2385     0.4675 10.5500  -0.510 0.620478    
Voweli:TimeT2    0.4483     0.4624 11.9056   0.969 0.351600    
Voweliː:TimeT2   0.8686     0.4871 12.5815   1.783 0.098677 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Vowela Voweli Vowelː TimeT2 Vowela:TmT2 Voweli:TmT2
Vowela      -0.521                                                    
Voweli      -0.521  0.472                                             
Voweliː     -0.504  0.459  0.470                                      
TimeT2      -0.480  0.434  0.439  0.426                               
Vowela:TmT2  0.346 -0.667 -0.314 -0.307 -0.716                        
Voweli:TmT2  0.350 -0.315 -0.671 -0.316 -0.727  0.521                 
Vowelː:TmT2  0.331 -0.298 -0.307 -0.660 -0.690  0.495       0.506     

Vowels produced in Czech words at T1 (green) and T2 (red). Ellipses use 68% CIs.

Phonetic drift: high L1 and L2 vowels

  • L1 /iː, i/ came closer to L2 /i/ in height (large overlap)
  • the distance between L1 /iː, i/ and L2 /ɪ/ increased (category centres)

English (blue) and Czech (red) high vowels produced by learners at T1.

English (blue) and Czech (red) high vowels produced by learners at T2.

Phonetic drift: low vowels

  • L1 /e/ assimilated to L2 /æ/
  • L1 /e/ dissimilated from L2 /ɛ/ in New words
  • L1 /a/ dissimilated from L2 /æ/
  • L1 /a/ assimilated to L2 /ʌ/

English /ɛ, æ, ʌ/ in New words at T1 and Czech /e/ at T1

English /ɛ, æ, ʌ/ in New words at T2 and Czech /e/ at T2.

Summary

In-class input influences pre-literate FL learners’ production of L1 and L2 vowels - Recent input affected the learners’ L2 output - New words: /ɛ/ and /æ/ were separated in quality due to the raising of /ɛ/.

No evidence for a cumulative effect of classroom exposure on the production of Old words. - /ɛ/-raising from New words not present in Old words

Production of L1 vowels shifted over time. - learners’ vowels undergo continual changes throughout L2 exposure, even in time constrained FL classroom exposure

Limitations

  • Parental input not controlled
  • Amount of exposure necessary for updating representations of Old words?
  • Children were worse at remembering New words than Old words
  • Perception test
  • No tracking of New word development

References

[1] Llompart, M. (2021). Phonetic categorization ability and vocabulary size contribute to the encoding of difficult second-language phonological contrasts into the lexicon. Bilingualism: Language and cognition, 24(3), 481-496. doi:10.1017/S1366728920000656

[2] Bohn, O.-S., & Bundgaard-Nielsen, R. L. (2009). Second language speech learning with diverse inputs. In T. Piske, & M. Young-Scholten (Eds.), Input matters in SLA (pp. 207-218). Multilingual Matters. https://pure.au.dk/portal/en/publications/second-language-speech-learning-with-diverse-inputs

[3] Bosch, L., & Ramon-Casas, M. (2011). Variability in vowel production by bilingual speakers: Can input properties hinder the early stabilization of contrastive categories? Journal of Phonetics, 39(4), 514–526. doi:10.1016/j.wocn.2011.02.001

[4] Nijveld, A., Ten Bosch, L., & Ernestus, M. (2022). The use of exemplars differs between native and non-native listening. Bilingualism: Language and Cognition, 25(5), 841-855. doi:10.1017/S1366728922000116

References

[5] Wilder, R. J. (2018). Investigating hybrid models of speech perception. Publicly Accessible Penn Dissertations. 3202. https://repository.upenn.edu/edissertations/3202

[6] Goldinger, S. D. (2007). A complementary-systems approach to abstract and episodic speech perception. In Proceedings of the 16th international congress of phonetic sciences (pp. 49-54).

[7] Flege, J. E., & Bohn, O.-S. (2021). The revised speech learning model (SLM-r). In R. Wayland (Ed.), Second language speech learning: Theoretical and empirical progress (pp. 3–83). Cambridge University Press. doi:10.1017/9781108886901

[8] Lee, S., Potamianos, A., & Narayanan, S. (1999). Acoustics of children’s speech: Developmental changes of temporal and spectral parameters. The Journal of the Acoustical Society of America, 105(3), 1455-1468. doi:10.1121/1.426686

References

[9] Yang, J., & Fox, R. A. (2017). L1–L2 interactions of vowel systems in young bilingual Mandarin-English children. Journal of Phonetics, 65, 60-76. doi:10.1515/phon-2021-2006

[10] Kartushina, N., Hervais-Adelman, A., Frauenfelder, U. H., & Golestani, N. (2016). Mutual influences between native and non-native vowels in production: Evidence from short-term visual articulatory feedback training. Journal of Phonetics, 57, 21-39. doi:10.1016/j.wocn.2016.05.001

[11] Boersma, P., & Weenink, D. (2022). Praat: doing phonetics by computer [Computer software]. Version 6.2.23. http://www.praat.org/

[13] McCloy, D. R. (2016). phonR: Tools for phoneticians and phonologists. R package version 1.0-7. Online: https://cran.r-project.org/web/packages/phonR/phonR.pdf

[14] R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

References

[15] Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01

[16] Powell, M. J. (2009). The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06. University of Cambridge, Cambridge, 26. https://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf

[17] Šimáčková, Š., & Podlipský, V. J. (2018). Production accuracy of L2 vowels: Phonological parsimony and phonetic flexibility. Research in Language, 16(2). doi:10.2478/rela-2018-0009

Thank you for your attention

kucerova@psu.cas.cz