D.J. Copland

12/18/2025

What has happened to the Cockney dialect? This has been a question asked numerous times by British linguists since the start of the 21st Century. Like many variants of British English, the so-called “Cockney” dialect, traditionally spoken by the inhabitants of London’s East End and immortalized in popular culture by works such as the film musical My Fair Lady, has been subject to significant change since the end of World War 2. Much of this change has been attributed to demographic changes in East London, and particularly to an influx of immigrants to the area in the latter half of the 20th Century.

But is this something we can actually visualize? In fact, yes! A database of recorded speech called the SPADE corpus contains an entire set of speech samples from East Londonders, sorted by age. In this dataset, speakers with a birth year between 1914 and 1943 are categorized as “old” speakers, and those with a birth year between 1985 and 1987 as “young” speakers. We can visualize this data using a vowel chart, which maps the various speech sounds within a speaker’s accent according to their formant frequencies. Simply put, formants are harmonic frequencies within a speech signal that can be used to measure differences between vowel sounds. Points on the chart will be labeled with their corresponding vowel symbol from the IPA, or International Phonetic Alphabet. This will let us see whether the vowel set of East London dialects really has changed over time. Let’s take a look:

library(tidyverse)
library(phonR)
library(Fonology)
library(readr)

spade_oldyoung <- read_csv("~/Montclair/Grad/Fall2025/Quantitative Linguistics/Final project/Cockney Datasets/spade_oldyoung.csv")

spade_oldyoung %>% 
group_by(age_group, ipa) %>% 
summarise(across(c(F1, F2), .fns = mean))  %>% 
  ggplot(aes(x = F2, y = F1, 
             color = age_group, label = ipa))+
  geom_text() + 
  formants() +
  theme(legend.position = "none",
        text = element_text(size = 13)) +
  labs(x = "F2", y = "F1", 
       color = "Age Group", 
       title = "SPADE East London: Old vs. Young Cockneys") +
  theme_classic()

Interesting! There does certainly seem to have been some change in the vowel space. In general, you can see that many of the younger speakers’ vowels seem to have been pulled inward toward the center and down from their older counterparts. One interesting exception to that trend seems to be the /eɪ/ diphthong, as heard in the word “play.” In traditional Cockney, that vowel actually sounds closer to the /aɪ/ sound (as in “pie) in American English, but younger speakers seem to have reverted to a version much closer to the higher and more fronted /e/ vowel, as in the word”pet.”

Cockney Elsewhere?

You may now be thinking, “was this dialect really only spoken in a couple neighborhoods in East London?” As a matter of fact, a 2019 study by Cole & Strycharczuk found what they claimed was a community of Cockney dialect speakers in an area called the Debden Estate within the borough of Loughton, which lies just northeast of the Greater London border. Though the study claims that the dialect there is very close to the older version of Cockney, it unfortunately doesn’t provide any data from East London speakers for comparison… But since we have both datasets, that means we can do the comparison ourselves!

This time, let’s use the geom_mark_hull function from the ggforce package, which will let us see the entire vowel space of each variant as a solid shape so it will be easier to see how the spaces overlap. Let’s also split the East Londoners into Inner East London and Outer East London, since the accents in each area are also slightly different.

library(ggforce)

cockneydata_QL %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", 
       title = "Cockney Formants By Region") 

Whoops! I expected the vowel spaces to be different, but not this different. It’s actually kind of hard to compare them when they’re this offset from each other. It looks like we’ll need to normalize these formant values first, so they’re easier to compare. Let’s see how this looks when Debden is normalized with the other regions.

normcockney_QL %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region") 

Huh. Well, it looks a little closer now, but the Debden vowel space is still pretty different-looking from the other ones. There might be some other factor that accounts for the difference here, or else it seems like Cole & Strycharczuk might need to get their ears checked.

Testing Other Variables

Let’s take a closer look at our data. I can reframe our combined dataset to show just the demographic information for each speaker, then make a table that tallies the number of speakers per demographic. Let’s start by looking at the makeup of each region by age group and see if there are any discrepancies there.

It looks like the East London dataset had a way larger proportion of young speakers than the Debden dataset! That could certainly account for the difference we’re seeing. Maybe that means Debden really is closer to an older variant of Cockney after all, and the younger speakers in East London are skewing the data.

With that in mind, let’s try this again, but this time comparing only the older speakers from each region:

normcockney_QL %>% 
  filter(age_group != "Young") %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", title = "Cockney Formants By Region: Older Speakers")

That certainly seems to have been part of the issue, but there’s still a discrepancy here. Let’s try faceting this plot by male and female speakers.

normcockney_QL %>% 
  filter(age_group != "Young") %>% 
  group_by(ipa, InnerOuter, sex) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", title = "Cockney Formants By Region: Older Speakers") +
  facet_wrap(~sex)

Well, there’s a surprise! It looks like it was the female East Londoners who were the real difference here all along. Apparently the older women in East London have moved with the times and started speaking a dialect much closer to what the youth are speaking by pulling their vowel space inward and down. In contrast, the Debden folks, regardless of age or gender, seem to have actually stuck much closer to the much wider vowel space of old-fashioned Cockney– and maybe even widened it a little further in some places. Of course, the SPADE dataset is from a pretty small sample size, so we should probably take the results here with a grain of salt– particularly regarding the female speakers. But regardless, it does indeed seem like Debden might be your best bet if you want to hear a 20th-Century Cockney accent.

---
title: "Tracking the Cockney Accent Across Time and Space"
output: html_notebook
link-citations: TRUE
code_folding: hide
---

#### *D.J. Copland*

#### *12/18/2025*

What has happened to the Cockney dialect? This has been a question asked numerous times by British linguists since the start of the 21st Century. Like many variants of British English, the so-called "Cockney" dialect, traditionally spoken by the inhabitants of London's East End and immortalized in popular culture by works such as the film musical *My Fair Lady*, has been subject to significant change since the end of World War 2. Much of this change has been attributed to demographic changes in East London, and particularly to an influx of immigrants to the area in the latter half of the 20th Century.

But is this something we can actually visualize? In fact, yes! A database of recorded speech called the SPADE corpus contains an entire set of speech samples from East Londonders, sorted by age. In this dataset, speakers with a birth year between 1914 and 1943 are categorized as "old" speakers, and those with a birth year between 1985 and 1987 as "young" speakers. We can visualize this data using a vowel chart, which maps the various speech sounds within a speaker's accent according to their *formant frequencies*. Simply put, formants are harmonic frequencies within a speech signal that can be used to measure differences between vowel sounds. Points on the chart will be labeled with their corresponding vowel symbol from the IPA, or International Phonetic Alphabet. This will let us see whether the vowel set of East London dialects really has changed over time. Let's take a look:

```{r}
library(tidyverse)
library(phonR)
library(Fonology)
library(readr)

spade_oldyoung <- read_csv("~/Montclair/Grad/Fall2025/Quantitative Linguistics/Final project/Cockney Datasets/spade_oldyoung.csv")

spade_oldyoung %>% 
group_by(age_group, ipa) %>% 
summarise(across(c(F1, F2), .fns = mean))  %>% 
  ggplot(aes(x = F2, y = F1, 
             color = age_group, label = ipa))+
  geom_text() + 
  formants() +
  theme(legend.position = "none",
        text = element_text(size = 13)) +
  labs(x = "F2", y = "F1", 
       color = "Age Group", 
       title = "SPADE East London: Old vs. Young Cockneys") +
  theme_classic()
```
Interesting! There does certainly seem to have been some change in the vowel space. In general, you can see that many of the younger speakers' vowels seem to have been pulled inward toward the center and down from their older counterparts. One interesting exception to that trend seems to be the /eɪ/ diphthong, as heard in the word "play." In traditional Cockney, that vowel actually sounds closer to the /aɪ/ sound (as in "pie) in American English, but younger speakers seem to have reverted to a version much closer to the higher and more fronted /e/ vowel, as in the word "pet."  

## Cockney Elsewhere?  

You may now be thinking, "was this dialect really only spoken in a couple neighborhoods in East London?" As a matter of fact, a 2019 study by Cole & Strycharczuk found what they claimed was a community of Cockney dialect speakers in an area called the Debden Estate within the borough of Loughton, which lies just northeast of the Greater London border. Though the study claims that the dialect there is very close to the older version of Cockney, it unfortunately doesn't provide any data from East London speakers for comparison... But since we have both datasets, that means we can do the comparison ourselves!  

This time, let's use the geom_mark_hull function from the ggforce package, which will let us see the entire vowel space of each variant as a solid shape so it will be easier to see how the spaces overlap. Let's also split the East Londoners into Inner East London and Outer East London, since the accents in each area are also slightly different.

```{r}
library(ggforce)

cockneydata_QL %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", 
       title = "Cockney Formants By Region") 
```
Whoops! I expected the vowel spaces to be different, but not *this* different. It's actually kind of hard to compare them when they're this offset from each other. It looks like we'll need to *normalize* these formant values first, so they're easier to compare. Let's see how this looks when Debden is normalized with the other regions.

```{r}
normcockney_QL %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region") 
```

Huh. Well, it looks a *little* closer now, but the Debden vowel space is still pretty different-looking from the other ones. There might be some other factor that accounts for the difference here, or else it seems like Cole & Strycharczuk might need to get their ears checked.  


## Testing Other Variables

Let's take a closer look at our data. I can reframe our combined dataset to show just the demographic information for each speaker, then make a table that tallies the number of speakers per demographic. Let's start by looking at the makeup of each region by age group and see if there are any discrepancies there.

```{r}
cockney_perspeaker <- normcockney_QL %>% 
  group_by(speaker_id) %>% 
  reframe(InnerOuter, sex, ethnic_background, age_group) %>% 
  distinct()

part_age <- cockney_perspeaker %>% 
  count(InnerOuter, age_group)

part_age
```

It looks like the East London dataset had a way larger proportion of young speakers than the Debden dataset! That could certainly account for the difference we're seeing. Maybe that means Debden really is closer to an older variant of Cockney after all, and the younger speakers in East London are skewing the data.  

With that in mind, let's try this again, but this time comparing only the older speakers from each region:

```{r}
normcockney_QL %>% 
  filter(age_group != "Young") %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", title = "Cockney Formants By Region: Older Speakers")
```

That certainly seems to have been part of the issue, but there's still a discrepancy here. Let's try *faceting* this plot by male and female speakers.

```{r}
normcockney_QL %>% 
  filter(age_group != "Young") %>% 
  group_by(ipa, InnerOuter, sex) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", title = "Cockney Formants By Region: Older Speakers") +
  facet_wrap(~sex)
```
Well, there's a surprise! It looks like it was the *female East Londoners* who were the real difference here all along. Apparently the older women in East London have moved with the times and started speaking a dialect much closer to what the youth are speaking by pulling their vowel space inward and down. In contrast, the Debden folks, *regardless of age or gender,* seem to have actually stuck much closer to the much wider vowel space of old-fashioned Cockney-- and maybe even widened it a little further in some places. Of course, the SPADE dataset is from a pretty small sample size, so we should probably take the results here with a grain of salt-- particularly regarding the female speakers. But regardless, it does indeed seem like Debden might be your best bet if you want to hear a 20th-Century Cockney accent.