This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
Introduction
English is a dynamic language that is constantly borrowing new vocabulary and phrases from foreign sources. However, Spanish may not be the first language that people think of as a language that has influenced English. In different moments in history, Spanish and English speakers have interacted closely so it is not a surprise that the languages have exchanged elements of their lexicon. English began importing Spanish words in the 15th Century when England and Spain were engaging in overseas trade and continues to borrow words from Spanish in the modern day.
Some Spanish words borrowed by English between the 15-17th Centuries
- galleon -> galeón
- cork -> alcorque
- cargo -> cargo
The main question I am trying to answer with this project is which English speaking country uses the most Spanish words in their lexicon. This is something I am interested in because I want to see how the history and geography of different parts of the world affect the language that people use. Specifically with this project, I want to see how these factors affect the English language. I hope that other people interested in language change and the effects of cultural contact will learn something from this notebook.
Dataset & Method
For this project, I chose to use the Corpus of Global Web-Based English (GloWbE) because it provides data from different varieties of English organized by country. This made it easy to compare the amount of Spanish vocabulary each dialect of English uses. Since this corpus is web-based, it contains more contemporary sources of English. I think this is important because language is constantly changing so it is essential for this project that data be as current as possible.
To analyze the data, I wrote a python program that read through data from the United States, Great Britain, Canada and Australia. As it read through the data, the program counted the words of Spanish origin and divided that number by the total word count for each country’s data set to get a percentage. The program knew which words were of Spanish origin from a list of Spanish to English borrowings I compiled from several sources.
Who Uses The Most Spanish Words
df <- read.csv("fp_data.csv")
Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
incomplete final line found by readTableHeader on 'fp_data.csv'
ggplot(df, aes(x = country, y = percentage, fill = country)) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette="Blues") +
theme(legend.position="none") +
ggtitle("Percentage of Spanish Words Used In English by Variety") +
xlab("English Variety") + ylab("Percentage")

In this figure, you can see that American English contains the highest percentage of Spanish vocabulary. This could be attributed to a few factors.
- Many names for plants and animals native to the Americas came to English either directly from Spanish (direct borrowing) or from indigenous languages with Spanish serving as the middle-man (indirect borrowing).
- avocado -> aguacate -> ahuacatl (nahuatl)
- armadillo -> armadillo
- barbecue -> barbacoa -> barbacoa (taino)
- Most of the Western United States was colonized by Spain and later part of Mexico, this left behind a strata of words that reflect the culture and environment of the time.
- lasso -> lazo
- buckaroo -> vaquero
- canyon -> cañón
- This is also why the names of many places in the United States come from Spanish.
- Colorado: Ruddy
- Florida: Flowery
- Nevada: Snowy
- The United States is much closer to Latin America than other English speaking countries are to any Spanish speaking part of the world. This has resulted in cultural exchange between the two regions.
- Several parts of the United States have a significantly high Spanish speaking population.
Spanish Speakers in the United States
library("usmap")
library("ggplot2")
pop <- read.csv("spanish_pop.csv")
plot_usmap(regions = "states", data = pop, values = "S1601_C02_004E") +
scale_fill_continuous(low = "white", high = "blue", name = "Percentage", label = scales::comma) +
labs(title = "Percentage of Population that Speaks Spanish by State") +
theme(legend.position = "right")

According to the US census, over 41 million people in the United States speak Spanish at home. Which is around 13.5% of the population. This map shows that Spanish speakers in the United States tend to either live in states that have large cities (which makes sense since larger cities have a more diverse population) or in states like California, Texas and Florida which are in close proximity to Latin American countries. Just like other languages that have had or continue to have a high population of speakers in the United States, Spanish has made a significant impact on American English.
Future Direction
In the future, I would like to see if it would be possible to compare English data from different US states to see which parts of the country use the most Spanish words when speaking English. However, I am unsure if a data set like this currently exists. I would also like to look at the data from other countries that the GloWbE corpus provides. I had to keep the amount of countries I looked at short in this project because the python program took around a half hour to run with just four countries. The GloWbE corpus has a data set of English from the Philippines which I think would be especially interesting to look at considering how commonly spoken Spanish once was in the Philippines. It would also be worth replicating this experiment with borrowings from other languages in different varieties of English.
Works Cited
Davies, Mark. Corpus of Global Web-Based English (GloWbE), dataset, Date Unknown; (https://digital.library.unt.edu/ark:/67531/metadc1181238/: accessed December 15, 2021), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; .
Jordan, J.-E. (n.d.). 111 English words that are actually Spanish. Retrieved December 15, 2021, from https://www.babbel.com/en/magazine/english-words-actually-spanish
Peknic, M. (2015). Spanish loanwords in contemporary English (dissertation).
U.S. Census Bureau (2019). S1601 LANGUAGE SPOKEN AT HOME. Retrieved from [https://data.census.gov/cedsci/table?q=spanish&tid=ACSST1Y2019.S1601].
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
