Preamble

Last month saw the return of BTS after their lengthy hiatus. This meant both a concert at Gwanghwamun in Seoul that streamed internationally on Netflix and the release of the album Arirang. Both the album title and the location of the come-back concert are highly emblematic of Korean identity. Arirang is the title and refrain of a popular Korean folk song, sometimes characterised as an unofficial national anthem. Gwanghwamun is not only a key thoroughfare linking historic sites, for example the Joseon palace Gyeongbokgung, and monuments to such figures as King Sejong and Admiral Yi Sun-sin, but has also been a site for expressions of contemporary South Korean identity and political expression, for example the candle light protests preceding Park Geun-hye’s impeachment or, more recently, following the December 2024 declaration of martial law. That the album is inspired by or reflects the band’s connection to their home country has been emphasised in much of the discourse around its release from band members and commentators alike. This emphasis on ‘Korean-ness’ has provoked a great deal of discussion around its expression, especially with regard to language choice on the part of fans and commentators.

While is is entirely beyond the scope of this piece to make any determination about the expression of Korean identity, having previously made much of the changing prevalence of English in K-Pop lyrics over the early twenty first century in general it is certainly possible to visualise aspects of the language choice on the album’s tracks and contextualise them. Below, we present a simple quantitative summary of the language choice over BTS’ discography.

The Data

For the most relevant comparators to Arirang, we restrict our corpus to lyrics that have appeared on BTS’ other studio albums and EPs, with some principled restrictions, as follows. Dynamite and Butter are not included here. Although they are often considered EPs, the tracklists of their various releases tend to feature single songs accompanied by many remixes rather than the diversity of tracks that characterise true albums or even mini-albums. We further exclude re-releases and repackages, such as Skool Luv Affair Special Addition, as well as compilations such as Love Yourself: Answer. While some new material on these releases is missed, we make these sorts of exclusions to avoid counting the same songs multiple times. Finally, as our focus is the relationship between Korean and English, we exclude explicitly Japanese-language releases, such as Face Yourself.

Proportion of English

Before we get going in earnest, let’s be clear about the limitations of these visualisations and get some terminology straight. What we are actually counting when we count English ‘words’ in song lyrics are two distinct things. First, we count tokens. For our purposes here, these are sequences of Roman alphabet letters with whitespace on either side. Thus, a line of lyrics like like “Yeah yeah yeah yeah yeah yeah yeah yeah” (hat-tip to The Flaming Lips) would be considered eight tokens. The number tokens of English for each song included in our dataset by year is visualised below.

Impressionistically, that is a large increase in English tokens. We are also able to count types. In this specific case, this refers to unique sequences of Roman alphabet letters delineated by surrounding whitespace. Counting types is especially desirable for pop songs as these texts tend to feature quite a lot of repetition. Considering once more the example of “Yeah yeah yeah yeah yeah yeah yeah yeah”, we can see two types, or just one if the case of the letters is ignored. While a large number of tokens could simply represent a much-repeated refrain, a larger number of types is more suggestive of more diverse language, for example a verse or rap break rather than just an attention grabbing single utterance. The number English types for each song included in our dataset by year are visualised below.

This, too, is suggestive of the increase in the amount of English in the lyrics of ARIRANG in comparison to earlier studio releases. Below, we present a comparison between the mean number of tokens of Korean per song per album and the mean number of tokens of English per song per album. This demonstrates that the increase in the prevalence of English tokens has been accompanied by a decrease in the prevalence of Korean tokens.

Given that Korean and English are very different in terms of their linguistic characteristics and how they are best processed as text, a direct comparison must be taken as suggestive only, rather than definitive in terms of the proportions of each language in the lyrics of each song. What’s more, as pointed out in this paper an increasing prevalence of English does not necessarily mean that Korean content in lyrics is being directly replaced, even in the case of translation equivalents.

Conclusion

This month’s visualisations reveal that ARIRANG features both a greater amount and larger proportion of English tokens in its lyrics than any other BTS studio-recorded release. More advanced statistical techniques (i.e., detrending) could be employed to test whether this merely conforms with the broader, long-term trend of increasing English usage in K-Pop lyrics, but perhaps perceptually exacerbated for fans by the lack of releases over the band’s hiatus, or an increase specific to this album that goes beyond this trend. As visualised, though, the development is striking.

This is not the first time that questions of language, ‘Korean-ness’, and ‘K-ness’ have come up here and it certain not to be the last. The controversy that language choice on the album has caused speaks to the strongly-perceived and strongly-felt connection between Korean identity and the Korean language. We conclude by noting that the quantitative examination of language choice carried out in this report does not touch upon qualitative elements of the album, such as the themes expressed in the lyrics or musical choices. Arguably, these are much more relevant to the expression of identity than simply the language used to express it. The latter, language choice, is possible and instructive to visualise, the latter, meaningful content, far less so. We do not rule out attempts to try doing just that in a future report, though.

Acknowledgement
This work was supported by the Core University Program for Korean Studies of the Ministry of Education of the Republic of Korea and Korean Studies Promotion Service at the Academy of Korean Studies (AKS-2021-OLU-2250004)