D.J. Copland
12/18/2025
What has happened to the Cockney dialect? This has been a question
asked numerous times by British linguists since the start of the 21st
Century. Like many variants of British English, the so-called “Cockney”
dialect, traditionally spoken by the inhabitants of London’s East End
and immortalized in popular culture by works such as the film musical
My Fair Lady, has been subject to significant change since the
end of World War 2. Much of this change has been attributed to
demographic changes in East London, and particularly to an influx of
immigrants to the area in the latter half of the 20th Century.
But is this something we can actually visualize? In fact, yes! A
database of recorded speech called the SPADE corpus contains an entire
set of speech samples from East Londonders, sorted by age. In this
dataset, speakers with a birth year between 1914 and 1943 are
categorized as “old” speakers, and those with a birth year between 1985
and 1987 as “young” speakers. We can visualize this data using a vowel
chart, which maps the various speech sounds within a speaker’s accent
according to their formant frequencies. Simply put, formants
are harmonic frequencies within a speech signal that can be used to
measure differences between vowel sounds. Points on the chart will be
labeled with their corresponding vowel symbol from the IPA, or
International Phonetic Alphabet. This will let us see whether the vowel
set of East London dialects really has changed over time. Let’s take a
look:
library(tidyverse)
library(phonR)
library(Fonology)
library(readr)
spade_oldyoung <- read_csv("~/Montclair/Grad/Fall2025/Quantitative Linguistics/Final project/Cockney Datasets/spade_oldyoung.csv")
spade_oldyoung %>%
group_by(age_group, ipa) %>%
summarise(across(c(F1, F2), .fns = mean)) %>%
ggplot(aes(x = F2, y = F1,
color = age_group, label = ipa))+
geom_text() +
formants() +
theme(legend.position = "none",
text = element_text(size = 13)) +
labs(x = "F2", y = "F1",
color = "Age Group",
title = "SPADE East London: Old vs. Young Cockneys") +
theme_classic()

Interesting! There does certainly seem to have been some change in
the vowel space. In general, you can see that many of the younger
speakers’ vowels seem to have been pulled inward toward the center and
down from their older counterparts. One interesting exception to that
trend seems to be the /eɪ/ diphthong, as heard in the word “play.” In
traditional Cockney, that vowel actually sounds closer to the /aɪ/ sound
(as in “pie) in American English, but younger speakers seem to have
reverted to a version much closer to the higher and more fronted /e/
vowel, as in the word”pet.”
Cockney Elsewhere?
You may now be thinking, “was this dialect really only spoken in a
couple neighborhoods in East London?” As a matter of fact, a 2019 study
by Cole & Strycharczuk found what they claimed was a community of
Cockney dialect speakers in an area called the Debden Estate within the
borough of Loughton, which lies just northeast of the Greater London
border. Though the study claims that the dialect there is very close to
the older version of Cockney, it unfortunately doesn’t provide any data
from East London speakers for comparison… But since we have both
datasets, that means we can do the comparison ourselves!
This time, let’s use the geom_mark_hull function from the ggforce
package, which will let us see the entire vowel space of each variant as
a solid shape so it will be easier to see how the spaces overlap. Let’s
also split the East Londoners into Inner East London and Outer East
London, since the accents in each area are also slightly different.
library(ggforce)
cockneydata_QL %>%
group_by(ipa, InnerOuter) %>%
summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>%
ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
geom_text(aes(label = ipa))+
ggforce::geom_mark_hull(aes(fill = InnerOuter))+
formants() +
guides(fill="none") +
labs(x = "F2", y = "F1",
color = "Region",
title = "Cockney Formants By Region")

Whoops! I expected the vowel spaces to be different, but not
this different. It’s actually kind of hard to compare them when
they’re this offset from each other. It looks like we’ll need to
normalize these formant values first, so they’re easier to
compare. Let’s see how this looks when Debden is normalized with the
other regions.
normcockney_QL %>%
group_by(ipa, InnerOuter) %>%
summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>%
ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
geom_text(aes(label = ipa))+
ggforce::geom_mark_hull(aes(fill = InnerOuter))+
formants() +
guides(fill="none") +
labs(x = "F2", y = "F1",
color = "Region")

Huh. Well, it looks a little closer now, but the Debden
vowel space is still pretty different-looking from the other ones. There
might be some other factor that accounts for the difference here, or
else it seems like Cole & Strycharczuk might need to get their ears
checked.
Testing Other Variables
Let’s take a closer look at our data. I can reframe our combined
dataset to show just the demographic information for each speaker, then
make a table that tallies the number of speakers per demographic. Let’s
start by looking at the makeup of each region by age group and see if
there are any discrepancies there.
It looks like the East London dataset had a way larger proportion of
young speakers than the Debden dataset! That could certainly account for
the difference we’re seeing. Maybe that means Debden really is closer to
an older variant of Cockney after all, and the younger speakers in East
London are skewing the data.
With that in mind, let’s try this again, but this time comparing only
the older speakers from each region:
normcockney_QL %>%
filter(age_group != "Young") %>%
group_by(ipa, InnerOuter) %>%
summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>%
ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
geom_text(aes(label = ipa))+
ggforce::geom_mark_hull(aes(fill = InnerOuter))+
formants() +
guides(fill="none") +
labs(x = "F2", y = "F1",
color = "Region", title = "Cockney Formants By Region: Older Speakers")

That certainly seems to have been part of the issue, but there’s
still a discrepancy here. Let’s try faceting this plot by male
and female speakers.
normcockney_QL %>%
filter(age_group != "Young") %>%
group_by(ipa, InnerOuter, sex) %>%
summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>%
ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
geom_text(aes(label = ipa))+
ggforce::geom_mark_hull(aes(fill = InnerOuter))+
formants() +
guides(fill="none") +
labs(x = "F2", y = "F1",
color = "Region", title = "Cockney Formants By Region: Older Speakers") +
facet_wrap(~sex)

Well, there’s a surprise! It looks like it was the female East
Londoners who were the real difference here all along. Apparently
the older women in East London have moved with the times and started
speaking a dialect much closer to what the youth are speaking by pulling
their vowel space inward and down. In contrast, the Debden folks,
regardless of age or gender, seem to have actually stuck much
closer to the much wider vowel space of old-fashioned Cockney– and maybe
even widened it a little further in some places. Of course, the SPADE
dataset is from a pretty small sample size, so we should probably take
the results here with a grain of salt– particularly regarding the female
speakers. But regardless, it does indeed seem like Debden might be your
best bet if you want to hear a 20th-Century Cockney accent.
---
title: "Tracking the Cockney Accent Across Time and Space"
output: html_notebook
link-citations: TRUE
code_folding: hide
---

#### *D.J. Copland*

#### *12/18/2025*

What has happened to the Cockney dialect? This has been a question asked numerous times by British linguists since the start of the 21st Century. Like many variants of British English, the so-called "Cockney" dialect, traditionally spoken by the inhabitants of London's East End and immortalized in popular culture by works such as the film musical *My Fair Lady*, has been subject to significant change since the end of World War 2. Much of this change has been attributed to demographic changes in East London, and particularly to an influx of immigrants to the area in the latter half of the 20th Century.

But is this something we can actually visualize? In fact, yes! A database of recorded speech called the SPADE corpus contains an entire set of speech samples from East Londonders, sorted by age. In this dataset, speakers with a birth year between 1914 and 1943 are categorized as "old" speakers, and those with a birth year between 1985 and 1987 as "young" speakers. We can visualize this data using a vowel chart, which maps the various speech sounds within a speaker's accent according to their *formant frequencies*. Simply put, formants are harmonic frequencies within a speech signal that can be used to measure differences between vowel sounds. Points on the chart will be labeled with their corresponding vowel symbol from the IPA, or International Phonetic Alphabet. This will let us see whether the vowel set of East London dialects really has changed over time. Let's take a look:

```{r}
library(tidyverse)
library(phonR)
library(Fonology)
library(readr)

spade_oldyoung <- read_csv("~/Montclair/Grad/Fall2025/Quantitative Linguistics/Final project/Cockney Datasets/spade_oldyoung.csv")

spade_oldyoung %>% 
group_by(age_group, ipa) %>% 
summarise(across(c(F1, F2), .fns = mean))  %>% 
  ggplot(aes(x = F2, y = F1, 
             color = age_group, label = ipa))+
  geom_text() + 
  formants() +
  theme(legend.position = "none",
        text = element_text(size = 13)) +
  labs(x = "F2", y = "F1", 
       color = "Age Group", 
       title = "SPADE East London: Old vs. Young Cockneys") +
  theme_classic()
```
Interesting! There does certainly seem to have been some change in the vowel space. In general, you can see that many of the younger speakers' vowels seem to have been pulled inward toward the center and down from their older counterparts. One interesting exception to that trend seems to be the /eɪ/ diphthong, as heard in the word "play." In traditional Cockney, that vowel actually sounds closer to the /aɪ/ sound (as in "pie) in American English, but younger speakers seem to have reverted to a version much closer to the higher and more fronted /e/ vowel, as in the word "pet."  

## Cockney Elsewhere?  

You may now be thinking, "was this dialect really only spoken in a couple neighborhoods in East London?" As a matter of fact, a 2019 study by Cole & Strycharczuk found what they claimed was a community of Cockney dialect speakers in an area called the Debden Estate within the borough of Loughton, which lies just northeast of the Greater London border. Though the study claims that the dialect there is very close to the older version of Cockney, it unfortunately doesn't provide any data from East London speakers for comparison... But since we have both datasets, that means we can do the comparison ourselves!  

This time, let's use the geom_mark_hull function from the ggforce package, which will let us see the entire vowel space of each variant as a solid shape so it will be easier to see how the spaces overlap. Let's also split the East Londoners into Inner East London and Outer East London, since the accents in each area are also slightly different.

```{r}
library(ggforce)

cockneydata_QL %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", 
       title = "Cockney Formants By Region") 
```
Whoops! I expected the vowel spaces to be different, but not *this* different. It's actually kind of hard to compare them when they're this offset from each other. It looks like we'll need to *normalize* these formant values first, so they're easier to compare. Let's see how this looks when Debden is normalized with the other regions.

```{r}
normcockney_QL %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region") 
```

Huh. Well, it looks a *little* closer now, but the Debden vowel space is still pretty different-looking from the other ones. There might be some other factor that accounts for the difference here, or else it seems like Cole & Strycharczuk might need to get their ears checked.  


## Testing Other Variables

Let's take a closer look at our data. I can reframe our combined dataset to show just the demographic information for each speaker, then make a table that tallies the number of speakers per demographic. Let's start by looking at the makeup of each region by age group and see if there are any discrepancies there.

```{r}
cockney_perspeaker <- normcockney_QL %>% 
  group_by(speaker_id) %>% 
  reframe(InnerOuter, sex, ethnic_background, age_group) %>% 
  distinct()

part_age <- cockney_perspeaker %>% 
  count(InnerOuter, age_group)

part_age
```

It looks like the East London dataset had a way larger proportion of young speakers than the Debden dataset! That could certainly account for the difference we're seeing. Maybe that means Debden really is closer to an older variant of Cockney after all, and the younger speakers in East London are skewing the data.  

With that in mind, let's try this again, but this time comparing only the older speakers from each region:

```{r}
normcockney_QL %>% 
  filter(age_group != "Young") %>% 
  group_by(ipa, InnerOuter) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", title = "Cockney Formants By Region: Older Speakers")
```

That certainly seems to have been part of the issue, but there's still a discrepancy here. Let's try *faceting* this plot by male and female speakers.

```{r}
normcockney_QL %>% 
  filter(age_group != "Young") %>% 
  group_by(ipa, InnerOuter, sex) %>% 
  summarize(meanF1 = mean(F1), meanF2 = mean(F2)) %>% 
  ggplot(aes(x = meanF2, y = meanF1, color = InnerOuter))+
  geom_text(aes(label = ipa))+
  ggforce::geom_mark_hull(aes(fill = InnerOuter))+
  formants() +
  guides(fill="none") +
  labs(x = "F2", y = "F1", 
       color = "Region", title = "Cockney Formants By Region: Older Speakers") +
  facet_wrap(~sex)
```
Well, there's a surprise! It looks like it was the *female East Londoners* who were the real difference here all along. Apparently the older women in East London have moved with the times and started speaking a dialect much closer to what the youth are speaking by pulling their vowel space inward and down. In contrast, the Debden folks, *regardless of age or gender,* seem to have actually stuck much closer to the much wider vowel space of old-fashioned Cockney-- and maybe even widened it a little further in some places. Of course, the SPADE dataset is from a pretty small sample size, so we should probably take the results here with a grain of salt-- particularly regarding the female speakers. But regardless, it does indeed seem like Debden might be your best bet if you want to hear a 20th-Century Cockney accent.