Overview / Motivation

I wanted to work on something speech-related for this lab, which led me to the Speech Accent Archive - a database with lots of speech files of native and non-native English speakers reading the same passage1. Crucially, the native languages of the speakers vary, which allows us to compare English productions across different linguistic backgrounds.

For this lab, I decided to focus on Hong Kong English (HKE). One of the unique aspects of HKE is the blend of linguistic influences from both British English and American English. While HKE has historically been based on British English (Hong Kong was a British colony until 1997), there has been a growing influence of American English in recent years2 as the younger generation of Hong Kongers turn to American media to learn English. For the purposes of this lab, American and British English are simplified to General American (GA) and Received Pronunciation (RP), respectively (for a justification of this, see the notes at the end).

So nowadays, is HKE “more like” GA or RP? I found some research that compares HKE to one or the other, but not both3. Of course, it’s hard to say, given that it depends on the individual speaker (e.g. whether they consume American media) or the particular phonetic context (i.e. they may produce certain sounds similarly to RP, but others like GA). Using the Speech Accent Archive, though, we can look at specific cases and see if there’s any patterns!

So, the goal for this lab is to look at some specific phonetic contexts and see, visually, if HKE tends to lean towards GA or RP for any of them.

Visualizations

I was particularly interested in comparing vowel data, so I looked at the different vowel productions that were elicited in the Speech Accent Archive. Here’s an excerpt from the passage that all speakers were asked to read:

“Please call Stella. Ask her to bring these things with her from the store…”

In the end, I decided on 3 vowels: the vowel in CALL, the vowel in ASK, and the (sometimes rhotacized) vowel in STORE. I gathered formant measurements (F1, F2, and F3) for each of these vowels from 8 male and 8 female speakers from each of US, UK, and HK (Hong Kong), for a total of 16 * 3 = 48 speakers.

V1_data <- read.csv('V1_formant_data.csv')
V2_data <- read.csv('V2_formant_data.csv')
V3_data <- read.csv('V3_formant_data.csv')

I then used the library phonR4 to plot the vowels for visual comparison.

library(phonR) 

Below are the visualizations for each of these vowel plots, along with a bit of explanation/analyses. For more notes on my rationale/methodology, see the section “Notes on Methodology” at the end.

CALL

For the vowel in CALL, GA speakers tend to use the low back vowel [ɑ], whereas RP speakers tend to use the rounded version [ɒ]. To compare the HKE productions with the GA and RP ones, I plot them all on the same vowel chart. Before I can plot them formant values, though, I normalize them using the Bark Difference Method (see the notes section at the end for an explanation).

# convert the Hz values to Bark
V1_bark <- data.frame(with(V1_data, normBark(cbind(F1, F2, F3))))

# add the Bark-converted differences to the dataframe
V1_data <- cbind(V1_data, normed_F1=V1_bark$F3-V1_bark$F1)
V1_data <- cbind(V1_data, normed_F2=V1_bark$F3-V1_bark$F2)

Following the normalization scheme, I then plot the Bark-converted differences of F3-F1 and F3-F2, rather than F1 and F2 as you would typically see on a vowel chart. The result, however, is still a normal vowel chart - with reversed axes indicating frontness and height - but with normalized values:

with(V1_data, plotVowels(normed_F1, normed_F2, country, 
                         var.col.by=country, 
                         plot.tokens=TRUE, alpha.tokens=0.45,
                         plot.means=TRUE, pch.means=country, cex.means=3,
                         legend.kwd='topleft', pretty=TRUE, 
                         family = "Gentium Plus Bold", 
                         ellipse.line = FALSE, ellipse.conf = 0.68,
                         ellipse.fill=TRUE, fill.opacity=0.1, 
                         xlim=c(9.5, 3.5), ylim=c(13, 5.5), 
                         xlab="F3 - F2 (Bark)", ylab="F3 - F1 (Bark)",
                         main="Normalized Formants for the Vowel in CALL", cex.main=1.3))

Each of the lighter dots represents a data point, with each color representing a different country. The bolded labels “US”, “UK”, and “HK” each represent the mean for each country, and the ellipses indicate one standard deviation away from the mean.

As can be seen, the US and UK datapoints have distinct regions on the plot, and interestingly enough, the HK mean is right in the middle! That’s not to say that all the HK speakers tended to produce vowels “in between” the UK and US ones, but rather that some produced US-like vowels, and some produced UK-like vowels, which is really neat to see visually.

ASK

For the vowel in ASK, GA speakers tend to use the front low vowel [æ] whereas RP speakers tend to use the low back vowel [ɑ]. Once again, I use the Bark-Difference Method to normalize the frequency values.

# convert the Hz values to Bark
V2_bark <- data.frame(with(V2_data, normBark(cbind(F1, F2, F3))))

# add the Bark-converted differences to the dataframe
V2_data <- cbind(V2_data, normed_F1=V2_bark$F3-V2_bark$F1)
V2_data <- cbind(V2_data, normed_F2=V2_bark$F3-V2_bark$F2)

I then plot them:

with(V2_data, plotVowels(normed_F1, normed_F2, country, 
                         var.col.by = country, 
                         plot.tokens=TRUE, alpha.tokens=0.45, 
                         plot.means=TRUE, pch.means=country, cex.means=3, 
                         legend.kwd = 'topleft', pretty=TRUE,
                         family = "Gentium Plus Bold",
                         ellipse.line = FALSE, ellipse.conf = 0.68, 
                         ellipse.fill=TRUE, fill.opacity=0.1, 
                         xlab="F3 - F2 (Bark)", ylab="F3 - F1 (Bark)", 
                         main="Normalized Formants for the Vowel in ASK", cex.main=1.3))

Once again, we can see distinct regions for each of the UK and US data points (which confirms the dialectal difference). This time, though, the mean for the HK speakers is closer to the mean for the UK speakers, and the distribution of data points seems to closer match the UK ellipse, as well. Given this, it would seem like in this context, the HKE vowel tends to be more like the RP one.

STORE P

This vowel is different, because it involves formant movement. In this word, GA will tend to use a rhoticized, or “r-colored”, vowel (for example, something like [ɔɹ]), whereas RP will not (and instead use something like [ɔə]). Formant-wise, r-coloring corresponds to a sharp drop in F3 in the rhoticized part of the vowel (I also noticed that, in most cases, F2 rises to “meet” F3 as well). So, to capture this movement, when taking F2 and F3 measurements I made sure to take one at the beginning of the vowel, and another at the end. In the GA speech, there should be a sharp decrease in F3 and maybe an increase in F2 between these two points, while the formants in the RP speech should remain about the same.

This plot will be noticeably different than the last two:

  1. Firstly, since it involves vowel movement, the data points will display as vectors.
  2. Secondly, the data is not normalized, because I didn’t know how to apply the Bark-Difference Method to normalize F3 (the method uses F3 as a normalization measure for F1 and F2, but what if you wanted to normalize F3 itself? I wasn’t sure) but visualizing the vectors might be insightful anyway.
  3. Lastly, since plotting all the vectors on a single graph looks pretty messy, I plot each speaker group separately first. I use tidyverse to isolate each country’s data:
library(tidyverse)

V3_US_data = V3_data %>%
  filter(country=="US")
V3_UK_data = V3_data %>%
  filter(country=="UK")
V3_HK_data = V3_data %>%
  filter(country=="HK")

Then I plot the vectors for each, side-by-side:

par(mfrow=c(1,3))

with(V3_US_data, plotVowels(cbind(F3, F3_glide), cbind(F2, F2_glide),
                            diph.arrows = TRUE,
                            ylim=c(1600,3200), xlim=c(750,1525),
                            plot.tokens=TRUE, alpha.tokens=0.5, 
                            plot.means=TRUE, pch.means=country,
                            cex.means=3.0, pretty=FALSE, 
                            family="Gentium Plus Bold",
                            xlab="F2 (Hertz)", ylab="F3 (Hertz)",
                            cex.main=2.0, col="olive drab", 
                            diph.args.tokens=list(col='olive drab',length=0.1, angle=20),
                            diph.args.means=list(lwd=3, length=0.15)))

with(V3_UK_data, plotVowels(cbind(F3, F3_glide), cbind(F2, F2_glide),
                            diph.arrows = TRUE,
                            ylim=c(1600,3200), xlim=c(575,1400),
                            plot.tokens=TRUE, alpha.tokens=0.5, 
                            plot.means=TRUE, pch.means=country,
                            cex.means=3.0, pretty=FALSE, 
                            family="Gentium Plus Bold",
                            xlab="F2 (Hertz)", ylab="F3 (Hertz)",
                            main="F3 and F2 Movement for the Vowel in STORE",
                            cex.main=1.5, col="steel blue", 
                            diph.args.tokens=list(col='steel blue',length=0.1, angle=20),
                            diph.args.means=list(lwd=3, length=0.15)))

with(V3_HK_data, plotVowels(cbind(F3, F3_glide), cbind(F2, F2_glide),
                            diph.arrows = TRUE,
                            plot.tokens=TRUE, alpha.tokens=0.5, 
                            ylim=c(1600,3200), xlim=c(650,1650),
                            plot.means=TRUE, pch.means=country,
                            cex.means=3.0, pretty=FALSE, 
                            family="Gentium Plus Bold",
                            xlab="F2 (Hertz)", ylab="F3 (Hertz)", 
                            cex.main=2.0, col="coral4",
                            diph.args.tokens=list(col='darkred',length=0.1, angle=20),
                            diph.args.means=list(lwd=3, length=0.15)))

The US vectors confirm that F3 drops sharply in GA, and the UK vectors are in scattered directions, but with low magnitudes (low movement), which confirms a lack of major movement. The HK vectors, meanwhile, appear to mostly head the same directions as the US vectors (indicating drops in F3), but the magnitudes aren’t as high, and there seem to be some shorter, scattered vectors (more like the UK vectors) that are hard to see.

Again, plotting all these vectors on the same space looks a bit messy. Below are some different, less messy alternatives only comparing a subset of the data at a time (you can view different ones using the tabs):

Comparison of STORE

US vs HK

# Only take US and HK measurements
V3_US_HK_data = V3_data %>%
  filter(country=="US"|country=="HK")

with(V3_US_HK_data, plotVowels(cbind(F3, F3_glide), cbind(F2, F2_glide), country,
                            diph.arrows = TRUE, var.col.by = country,
                            plot.tokens=TRUE, alpha.tokens=0.5, 
                            ylim=c(1600,2800), xlim=c(650,1650),
                            plot.means=TRUE, pch.means=country,
                            cex.means=2.5, pretty=FALSE, 
                            family="Gentium Plus Bold",
                            xlab="F2 (Hertz)", ylab="F3 (Hertz)", cex.main=1.5,
                            main = "US vs HK Formant Movements for the Vowel in STORE",
                            diph.args.tokens=list(length=0.1, angle=20),
                            diph.args.means=list(lwd=3, length=0.15)))

UK vs HK

# Only take UK and HK measurements
V3_UK_HK_data = V3_data %>%
  filter(country=="UK"|country=="HK")

with(V3_US_HK_data, plotVowels(cbind(F3, F3_glide), cbind(F2, F2_glide), country,
                            diph.arrows = TRUE, var.col.by = country,
                            plot.tokens=TRUE, alpha.tokens=0.5, 
                            ylim=c(1600,2800), xlim=c(650,1650),
                            plot.means=TRUE, pch.means=country,
                            cex.means=2.5, pretty=FALSE, 
                            family="Gentium Plus Bold",
                            xlab="F2 (Hertz)", ylab="F3 (Hertz)", cex.main=1.5,
                            main = "UK vs HK Formant Movements for the Vowel in STORE",
                            diph.args.tokens=list(length=0.1, angle=20),
                            diph.args.means=list(lwd=3, length=0.15)))

US vs UK vs HK - Means

with(V3_data, plotVowels(cbind(F3, F3_glide), cbind(F2, F2_glide), country,
                            diph.arrows = TRUE, var.col.by = country,
                            plot.tokens=FALSE, alpha.tokens=0.5, 
                            ylim=c(1600,2800), xlim=c(650,1650),
                            plot.means=TRUE, pch.means=country,
                            cex.means=3.0, pretty=FALSE, 
                            family="Gentium Plus Bold",
                            xlab="F2 (Hertz)", ylab="F3 (Hertz)", 
                            cex.main=1.5,
                            main = "Formant Movement Means for the Vowel in STORE",
                            diph.args.tokens=list(length=0.1, angle=20),
                            diph.args.means=list(lwd=3, length=0.15)))

It seems the simple majority of the HK vectors head in the same direction as the US ones (indicating similar formant movements), but a lot are also like the scattered, low-magnitude UK vectors, too. The final tab with the plot of the mean movements, while neat-looking, is somewhat misleading because it seems the HK vector matches the direction of the US one, but really only some of the data points do, and the direction is a result of the the US formant movements being more dramatic.

Overall, it seems this is similar to the case for the vowel in CALL. Some HK speakers lean towards the rhotic GA, and some towards non-rhotic RP, which is a really cool (tentative) result. Of course, it’s hard to say for sure from this limited data - I only had 16 HK speakers.

Conclusions

I’m really pleased with how the visualizations turned out. Using them, it seems like for the vowels in CALL and STORE, the HKE speakers are fairly evenly split between GA and RP pronunciations, which supports that both GA and RP currently have an influence on HKE. For the vowel in CALL, the HKE speakers are more aligned with RP.

Of course, all observations made in this lab are tentative, since I didn’t have a lot of speakers, or many tokens per vowel. Aside from having more of each, here are other potential future directions I can think of:

Notes on Methodology

Setup / Data Used

How did I decide on these 3 vowels?

I picked these three for the following reasons:

  1. They demonstrate known vowel differences between General American (GA) and Received Pronunciation (RP), the “standard” dialects of the US and UK5.
  2. The vowels used in GA and RP for these sounds are not found in the Cantonese vowel inventory (so the HKE speakers are not just directly copying their native phonology)6
  3. The phonetic contexts were reasonably clean (e.g., for ASK, the vowel comes after a pause and before a stop).

How were the audios chosen?

To choose speakers based on location, I simplified “American English” to General American English (GA), which is often used as a basis of comparison for American English. I thought this was a good choice to make, for a number of reasons: GA consistently demonstrates the rhotacized vowel I’m interested in, generally avoids the more complicated vowel movements in Southern US dialects7, and is probably closest to the kind of speech that influences HKE.

GA most resembles English from the North Midland, Western New England, and Western US regions8, so I picked speakers who were from these, but avoided cities like Boston and Brooklyn, where the local dialects are known to sometimes contain elements of RP9.

Likewise, I simplified British English to Received Pronunciation (RP), which is most common in areas to the southeast of Mainland UK (e.g. London, Oxford, Cambridge)10. I tried to choose speakers from areas close to this region, and avoided speakers not from Mainland UK (e.g. Scotland, Ireland).

How many speakers?

Researchers typically collect 5-10 samples per vowel, but the Speech Accent Archive’s passages don’t have that many instances per vowel, and was limited in the number of female speakers from Hong Kong, and the parts of UK that I wanted to limit to. Given this limitation, I decided on 8 male and 8 female speakers from each of US, UK, and HK (Hong Kong), for a total of 16 * 3 = 48 speakers.

Collecting/Plotting the Data

Analyzing vowels is a common task in sociophonetics. First, researchers collect formant data (resonant frequency values) for each of the vowels, and then plot them on a vowel space plot, where the axes are reversed (as is done in phonetics).

How did I collect the data?

There are actually many ways you could choose to get the exact frequency values for each vowel. I ended up using the following guideline which I took from the book “Sociophonetics”11: select the region of the vowel between the 25% mark and 75% mark of the vowel’s duration, and take the average of all the measurements within that region. This better accounts for any movement that may have taken place during the vowel (although it does kind of smooth any significant movement over), while mostly ignoring any potential formant transitions from nearby sounds.

I also annotated the exact region I took the averages from, for reproducibility.

Normalization

Since vowel formants will vary across speakers for a variety of non-linguistic reasons, such as natural pitch or vocal tract length, formant data must be normalized across speakers before they are analyzed. This helps eliminate as much non-linguistic differences in frequencies as possible, so the differences in vowel measurements are mostly due to dialectical, rather than physiological, differences.

In many cases, researchers take samples from a speaker’s entire vowel space, and use that for normalizing formant measurements. Such normalization techniques are called vowel-extrinsic, because they rely on vowel’s relationship to other vowels. However, in my case, I only want to look at a select few vowels, and don’t have measurements from the entire vowel space. So, I used a vowel-intrinsic method called the Bark Difference Method. In this method, each formant value is first converted to Bark (a frequency scale which closer mimics human perception than Hertz). Then, the difference between the third and second formant is plotted in place of the second formant, and the difference between the third and the first formant in place of the first. This is found to be an effective normalization technique for when only one vowel is available, though it does rely on good F3 measurements12.

Plotting the Data

There are a few different R packages designed for plotting vowels. I ended up choosing phonR because it has a lot of customization options, both for the data and the plot aesthetics13. Another one is the “vowels” package, which is simpler to use but isn’t as customizable, and I couldn’t get it to work well for plotting F3. Another option is ggplot, which isn’t designed specifically for vowels, but there are tutorials online for how to make it work well.

References


  1. Weinberger, Steven. (2015). Speech Accent Archive. George Mason University. Retrieved from http://accent.gmu.edu↩︎

  2. Daria Poklad (Author), 2013, Phonological Differences in Hong Kong English, Munich, GRIN Verlag, https://www.grin.com/document/310859↩︎

  3. English vowels produced by Cantonese–English bilingual speakers - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Comparison-of-F2-F1-vowel-spaces-of-11-English-vowels-produced-by-the-Cantonese-English_fig1_232007046 [accessed 16 Dec, 2021] ↩︎

  4. Daniel R. McCloy (2016). phonR: tools for phoneticians and phonologists. R package version 1.0-7.↩︎

  5. Mirzaie, Naghmeh & Kord-e Zafaranlu Kambuziya, Aliyeh & Shariati, Mansour. (2015). British and American Phonetic Varieties. Journal of Language Teaching and Research. 6. 647. 10.17507/jltr.0603.23.↩︎

  6. https://www.howtostudycantonese.com/wp-content/uploads/Cantonese-IPA-chart.svg↩︎

  7. Chen, Yang & Ng, Manwa & Li, Tie-Shan. (2012). English vowels produced by Cantonese–English bilingual speakers. International journal of speech-language pathology. 14. 10.3109/17549507.2012.718360.↩︎

  8. https://en.wikipedia.org/wiki/General_American_English↩︎

  9. Gómez, P. (n.d.). British and American English Pronunciation Differences. Web de Paco Gómez. Retrieved December 16, 2021, from https://www.webpgomez.com/english/404-british-and-american-english-pronunciation-differences#x1-40003 ↩︎

  10. https://en.wikipedia.org/wiki/Received_Pronunciation↩︎

  11. Kendall, T., & Fridland, V. (2021). Sociophonetics (Key Topics in Sociolinguistics). Cambridge: Cambridge University Press. doi:10.1017/9781316809709↩︎

  12. Thomas, Erik R. and Tyler Kendall. 2007. NORM: The vowel normalization and plotting suite. [ Online Resource: http://ncslaap.lib.ncsu.edu/tools/norm/ ]↩︎

  13. McCloy, D. R. (2016, August 26). Normalizing and plotting vowels with phonr 1.0.7. Retrieved December 16, 2021, from https://drammock.github.io/phonR/ ↩︎