My investigation is inspired by Beekhuizen, B., Watson, J., & Stevenson, S. (2017). Semantic typology and parallel corpora: Something about indefinite pronouns. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 39). While they studied the semantic functions of indefinite pronouns, I chose to study the semantic functions of prepositions, specifically the Spanish prepositions ‘por’ and ‘para’ and their English equivalents.
Using the OpenSubtitles database, I collected instances of ‘por’ and
‘para’, as well as the corresponding subtitles in English, and coded
them by function. For example:
> Spanish subtitle: Las toallas sirven para secarse.
> English subtitle: The towels are for drying.
Here you can see that the Spanish uses ‘para’ to serve the same function as English uses ‘for’. They both identify the PURPOSE of the towels.
| Function | Description |
|---|---|
| Purpose | Function or purpose of an object, e.g. 'Las toallas sirven para secarse' |
| Recipient | Recipient of a gift, e.g. 'Para quien es este pastel?' |
| Destination | Final destination of a journey, e.g. 'Vamos para casa' |
| Time destination | Destination in the sense of a time period, the intention to use a thing at a certain time, e.g. 'La publicacion es para otoño' |
| Perception | Expressing that a perception is specific to a person, e.g. 'Esta vida es sagrada para mi' |
| Infinitive | Bears the infinitive form of the verb (in English, the 'to' that goes before the bare verb), e.g. 'Fue demasiado debil para llevar el acero' |
| Indeterminate space | Around, near, or through a space, space in the sense of 'by means of' instead of 'destination', e.g. 'Me he dado una vuelta por el parque' |
| Indeterminate time | A time period, such as the morning, e.g. 'Mañana por la mañana' |
| Cause | Cause or reason for an action, e.g. 'Gracias por el regalo' |
| Distribution | In English, the sense of 'per' in 'one globe per child', e.g. 'he repartido dos globos por niño' |
| Interchange | Change of one thing for another in a barter or payment, e.g. 'Lo compramos solo por 20 euros' |
| Media | The media by which people communicate, e.g. 'hemos hablado por telefono' |
| Attribution/agency | Assigning an action to an entity, e.g. 'fue iniciado por el gobierno' |
| Feelings for | In English, the 'for' in 'my feelings for her', e.g. 'Deberia comprender mi amor por ella' |
| Topic | The topic about which the verb is, e.g. 'despreocupada por los celos' |
| Adverbializer | Turns an adjective into an adverb, e.g. 'Jenny se calma por completa' |
| Address | Addressing the verb to a person, e.g. 'pregunte por Mr. Barnes' |
| Identity | In English, the 'for' in 'take her for your wife,' e.g. 'darle por esposa' |
| Stock phrase | Part of a larger phrase, unlikely to be parsed as a separate function, e.g. 'por favor' or 'por Dios' |
labels <- c("Group 1", "Group 2", "Group 3")
get_code_name <- function(var1) {
x = switch(var1,
"PU"= "Purpose",
"RE"= "Recipient",
"DE"= "Destination",
"TD"= "Time destination",
"PE"= "Perception",
"INF"= "Infinitive",
"IS" = "Indeterminate space",
"IT" = "Indeterminate time",
"CA" = "Cause",
"DI" = "Distribution",
"IN" = "Interchange",
"ME" = "Media",
"AT" = "Attribution/Agency",
"FE" = "Feelings for",
"TO" = "Topic",
"ADV" = "Adverbializer",
"AD" = "Address",
"ID" = "Identity",
"POR FAVOR" = "Stock phrase",
"FOR DIOS" = "Stock phrase")
return(x)
}
t <- t %>%
mutate("semantic_func" = sapply(code, get_code_name))
Here is the distribution of semantic functions for the two prepositions. Each is distributed through a variety of functions. Some appear in both plots, some do not.
t %>% ggplot(aes(x=code)) + geom_bar() + facet_grid(~ spanish_prep_found, scales="free") +
labs(x="Semantic Function") + guides(x=guide_axis(angle=45)) + scale_x_discrete(label = t$semantic_func)
For each of the English words or phrases coded (where n>1), here is the distribution as to when they’re realized with ‘por’ and when with ‘para’. A significant number of English prepositions can be realized as either, depending on the semantic function. This is a challenge for L1 English L2 learners of Spanish.
t %>% group_by(translated_to) %>% filter(n() > 1) %>%
ggplot(aes(y=translated_to, fill=spanish_prep_found, label=spanish_prep_found)) +
geom_bar(position="fill") +
labs(x="Percent Realization", y="English Translation") +
scale_fill_discrete(name="Spanish Preposition")
Here is a scatterplot of all the data together. Semantic functions are much closer to a split between ‘por’ and ‘para’ (most columns have only pink or blue) than they are among English prepositions (most rows have both pink and blue), despite there being many more of them in this dataset.
t %>% group_by(translated_to) %>% filter(n() > 1) %>%
ggplot(aes(code,translated_to, color=spanish_prep_found)) + geom_count() +
labs(x="Semantic Function", y="English Translation", color="Spanish Preposition") +
scale_fill_discrete(name=c("Spanish Preposition", "n")) +
guides(x=guide_axis(angle=45), size = guide_legend(title='Number of Occurrences')) +
scale_x_discrete(label = t$semantic_func)