Pronouns are words that refer to places, people, and things. They work to replace other nouns and prevent the speaker from having to repeat specific names over and over again. Some languages strictly require pronouns to be used in sentences. English is an example of such a language. In most situations, a subject is needed otherwise it creates confusion. Other languages don’t use pronouns as often and explicitly. These are called pro-drop languages. Additional information in the utterance, such as verb conjugation, lets the listener know what the subject of the sentence is without it being stated.

In this statistical analysis, I will investigate whether speaking a pro-drop or non pro-drop language has an impact on the age that children acquire pronouns and its subsequent usage based on data from Wordbank, a database of children’s vocabulary.

Patterns in Age of Acquisition

library(wordbankr)
library(dplyr)
library(ggplot2)
library(tidyverse)

#Getting the Full Datasets

pronouns_usa <- get_item_data(language = "English (American)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_usa_data <- get_instrument_data(language = "English (American)",
                                             form = "WS",
                                             items = pronouns_usa$item_id,
                                             administration_info = TRUE,
                                             item_info = TRUE)


pronouns_esp <- get_item_data(language = "Spanish (Mexican)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_esp_data <- get_instrument_data(language = "Spanish (Mexican)",
                                         form = "WS",
                                         items = pronouns_esp$item_id,
                                         administration_info = TRUE,
                                         item_info = TRUE)

pronouns_kor <- get_item_data(language = "Korean", form = "WS") %>%
  filter(category == "pronouns")


pronouns_kor_data <- get_instrument_data(language = "Korean",
                                        form = "WS",
                                        items = pronouns_kor$item_id,
                                        administration_info = TRUE,
                                        item_info = TRUE)

pronouns_china <- get_item_data(language = "Mandarin (Beijing)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_china_data <- get_instrument_data(language = "Mandarin (Beijing)",
                                         form = "WS",
                                         items = pronouns_china$item_id,
                                         administration_info = TRUE,
                                         item_info = TRUE)

pronouns_ita <- get_item_data(language = "Italian", form = "WS") %>%
  filter(category == "pronouns")


pronouns_ita_data <- get_instrument_data(language = "Italian",
                                           form = "WS",
                                           items = pronouns_ita$item_id,
                                           administration_info = TRUE,
                                           item_info = TRUE)

pronouns_ibe <- get_item_data(language = "Spanish (European)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_ibe_data <- get_instrument_data(language = "Spanish (European)",
                                         form = "WS",
                                         items = pronouns_ibe$item_id,
                                         administration_info = TRUE,
                                         item_info = TRUE)

# Getting Age of Acquisition
aoa_eng <- fit_aoa(pronouns_usa_data)


aoa_eng_pronouns <- aoa_eng[!is.na(aoa_eng$aoa), ]


aoa_esp <- fit_aoa(pronouns_esp_data)


aoa_esp_pronouns <- aoa_esp[!is.na(aoa_esp$aoa), ]


aoa_kor <- fit_aoa(pronouns_kor_data)


aoa_kor_pronouns <- aoa_kor[!is.na(aoa_kor$aoa), ]


aoa_china <- fit_aoa(pronouns_china_data)


aoa_china_pronouns <- aoa_china[!is.na(aoa_china$aoa), ]


aoa_ita <- fit_aoa(pronouns_ita_data)


aoa_ita_pronouns <- aoa_ita[!is.na(aoa_ita$aoa), ]


aoa_ibe <- fit_aoa(pronouns_ibe_data)


aoa_ibe_pronouns <- aoa_ibe[!is.na(aoa_ibe$aoa), ]


  print(ggplot(aoa_eng_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "coral", size = 1) +
    geom_point(shape = 16, size = 3, color = "coral") +
    labs(title = "Age of Acquisition for English (American) Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
)

print(ggplot(aoa_esp_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "aquamarine3", size = 1) +
    geom_point(shape = 16, size = 3, color = "aquamarine3") +
    labs(title = "Age of Acquisition for Spanish (Mexican) Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)


print(ggplot(aoa_ibe_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "chartreuse", size = 1) +
    geom_point(shape = 16, size = 3, color = "chartreuse") +
    labs(title = "Age of Acquisition for Spanish (European) Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)

print(ggplot(aoa_ita_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "goldenrod", size = 1) +
    geom_point(shape = 16, size = 3, color = "goldenrod") +
    labs(title = "Age of Acquisition for Italian Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)

print(ggplot(aoa_kor_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "palevioletred1", size = 1) +
    geom_point(shape = 16, size = 3, color = "palevioletred1") +
    labs(title = "Age of Acquisition for Korean Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)

English is a non pro-drop language while Spanish, Italian, Korean, and Mandarin are pro-drop languages. The earliest pronoun acquisition occurs at a similar age in English, Spanish, and Italian. English and Italian are at 21 months old while Spanish is at 20 months. Mandarin’s earliest age of acquisition happens a little younger than the rest–at 19 months. Korean is the latest with their initial pronoun acquisition happening at around 25 months.

Despite English not being pro-drop and from a different family, there’s little distinction from Spanish, Italian, and Mandarin. Across these groups, children consistently acquire first-person pronouns first. Remarkably, according to the data, the first pronoun acquired by Korean children is the demonstrative pronoun, “this.” They don’t acquire a first person pronoun until 27 months old. However, it’s worth noting that the age of acquisition for “this” is actually the same for the rest of the languages as Korean.

Patterns in Production



pronouns_usa_mean <- pronouns_usa_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

pronouns_esp_mean <- pronouns_esp_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))


print(ggplot() +
  geom_line(data = pronouns_usa_mean, aes(x = age, y = mean, color = "English (American)")) +
  geom_line(data = pronouns_esp_mean, aes(x = age, y = mean, color = "Spanish (Mexican)")) +
  labs(title = "Pronoun Production by Age", x = "Age (months)", y = "Mean pronouns produced") +
  scale_color_manual(name = "Language", 
                     values = c("English (American)" = "coral", "Spanish (Mexican)" = "aquamarine3"))

)

This graph appears to show Spanish-speaking children using pronouns more frequently than English-speaking children, although I expected the opposite. The average amount of pronouns produced fluctuates as Spanish speakers get older whereas English speakers have a steady increase in production year by year. Even so, in the periods of increased production, Spanish children clearly use more pronouns when they are older just like English-speaking children. An explanation for this could be that it is reflecting the fluidity in the need for pronouns in Spanish. Sometimes, the children find that they really want to use them and at other times, they don’t. At 30 months old, Spanish takes a significant dip in pronouns produced compared to about 27 months. English has about two periods where the production decreases.

Besides the language itself, I also wanted to explore the role culture may play in pronoun acquisition and usage by using two varieties of the same pro drop language. The European and Mexican varieties of Spanish have very close average numbers, but Mexican Spanish surpasses its European counterpart overall. They also both have periods of fluctuation. Mexican Spanish features more pronounced rises and falls while European Spanish demonstrates less pronounced and dramatic fluctuations. For instance, the difference in production at around 27 months to 30 months is little compared to Mexican Spanish. It seems as though European Spanish-speaking children are most consistent in their production. Nevertheless, at close to 30 months old, the two varieties seems to have the average.

When compared to Italian, Spanish children use more pronouns on average. Seeing as they are both Romance languages, I wondered if they would have similar production. They are somewhat closer in terms of figures than English and Spanish, but not as much as Mexican and European Spanish. Similar to both varieties of Spanish, Italian and Spanish also fluctuate. Fluctuation seems to be a pattern among non pro-drop languages.

Korean and Spanish have significant differences in average production. Korean-speaking children use far less pronouns on average. Korean and Italian, another Romance language, have similar results. Italian-speaking children produce more pronouns on average. At 31 months old, Korean children produce the highest average amount of pronouns. At the age of 31 months, Korean children exhibit the highest average pronoun production, which is relatively close to that of Spanish-speaking children at 29 months. However, the disparity is notable, with Korean children producing approximately 4 pronouns, whereas Spanish speakers produce nearly 20 pronouns by the age of 31 months. The older both groups of children become, the more pronouns they use, but the differences in Korean are not as discernible as the other languages.

In this set of languages, Korean seems to generate the smallest number of pronouns. The contrast in production is striking. At the age of about 17 months, Mandarin and Korean, the graph indicates that Mandarin produces more pronouns, although the averages are somewhat close. As the children get older, this proximity diminishes. After 35 months old, the mean production increases to slightly over five pronouns. Mandarin-speaking children reach an average of five pronouns at 20 months old, marking a fifteen-month gap.

Patterns in Usage

 ggplot() +
    geom_bar(data = most_common_word_usa_low, aes(x = item_definition, y = num, fill = "English (American)"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_esp_low, aes(x = item_definition, y = num, fill = "Spanish (Mexican)"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_ita_low, aes(x = item_definition, y = num, fill = "Italian"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_kor_low, aes(x = item_definition, y = num, fill = "Korean"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_china_low, aes(x = item_definition, y = num, fill = "Mandarin"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    labs(title = "Most Common Pronoun Produced at 19 Months Old", x = "Pronoun", y = "Frequency") +
    
    theme_minimal() +
    
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    
    scale_fill_manual(name = "Language", values = c("English (American)" = "coral", "Spanish (Mexican)" = "aquamarine3", "Italian" = "goldenrod", "Korean" = "palevioletred1", "Mandarin" = "darkorchid"))

In the early developmental stages, the commonality in first-person pronouns spans across languages, except for Korean. English and Italian use the exact same word–“mine.” Spanish and Mandarin use, “I.” Mine is a possessive pronoun while I is a personal pronoun. English-speaking children have the highest frequency with over 200 instances.This high frequency might suggest a tendency for English-speaking Children to frequently discuss their possessions and personal affiliations. However, such an interpretation could mistakenly paint them as more self-centered. This assumption lacks a basis, especially considering that Italian is a pro-drop language, and all the children, regardless of language, primarily use the first person.

In later years, the pattern remains. English-speakers use first person pronouns the most, followed by Spanish-speakers. At 30 months old, Korean speakers are using first person pronouns, but they use it the least. Besides that, the only other word change is for Spanish. The most common pronoun used is now “mío” instead of “yo,” joining English and Italian in the possessive pronoun.

Conclusion

The distinction between pro-drop and non-pro-drop languages doesn’t appear to impact the age of acquisition. Therefore, even if children in pro-drop languages don’t frequently use it, they still learn it at a similar age. This suggests the existence of a shared neurological structure in all children that plays a role in this process regardless of their native language. Usage, on the other hand, might be affected by the culture and society in which the children are immersed. Even among pro-drop languages, there are significant variations in the production of pronouns.

---
title: "Pronouns Playground: Exploring Pronoun Development In Pro-Drop and Non Pro-Drop
  Language-Speaking Children"
author: "Donaelle Benoit"
date: "December 15, 2023"
output:
  html_notebook:
    code_folding: hide
  html_document:
    df_print: paged
---

Pronouns are words that refer to places, people, and things. They work to replace other nouns and prevent the speaker from having to repeat specific names over and over again. Some languages strictly require pronouns to be used in sentences. English is an example of such a language. In most situations, a subject is needed otherwise it creates confusion. Other languages don't use pronouns as often and explicitly. These are called pro-drop languages. Additional information in the utterance, such as verb conjugation, lets the listener know what the subject of the sentence is without it being stated.

In this statistical analysis, I will investigate whether speaking a pro-drop or non pro-drop language has an impact on the age that children acquire pronouns and its subsequent usage based on data from Wordbank, a database of children's vocabulary.




### **Patterns in Age of Acquisition**

```{r}
library(wordbankr)
library(dplyr)
library(ggplot2)
library(tidyverse)

#Getting the Full Datasets

pronouns_usa <- get_item_data(language = "English (American)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_usa_data <- get_instrument_data(language = "English (American)",
                                             form = "WS",
                                             items = pronouns_usa$item_id,
                                             administration_info = TRUE,
                                             item_info = TRUE)


pronouns_esp <- get_item_data(language = "Spanish (Mexican)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_esp_data <- get_instrument_data(language = "Spanish (Mexican)",
                                         form = "WS",
                                         items = pronouns_esp$item_id,
                                         administration_info = TRUE,
                                         item_info = TRUE)

pronouns_kor <- get_item_data(language = "Korean", form = "WS") %>%
  filter(category == "pronouns")


pronouns_kor_data <- get_instrument_data(language = "Korean",
                                        form = "WS",
                                        items = pronouns_kor$item_id,
                                        administration_info = TRUE,
                                        item_info = TRUE)

pronouns_china <- get_item_data(language = "Mandarin (Beijing)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_china_data <- get_instrument_data(language = "Mandarin (Beijing)",
                                         form = "WS",
                                         items = pronouns_china$item_id,
                                         administration_info = TRUE,
                                         item_info = TRUE)

pronouns_ita <- get_item_data(language = "Italian", form = "WS") %>%
  filter(category == "pronouns")


pronouns_ita_data <- get_instrument_data(language = "Italian",
                                           form = "WS",
                                           items = pronouns_ita$item_id,
                                           administration_info = TRUE,
                                           item_info = TRUE)

pronouns_ibe <- get_item_data(language = "Spanish (European)", form = "WS") %>%
  filter(category == "pronouns")


pronouns_ibe_data <- get_instrument_data(language = "Spanish (European)",
                                         form = "WS",
                                         items = pronouns_ibe$item_id,
                                         administration_info = TRUE,
                                         item_info = TRUE)
```
```{r}

# Getting Age of Acquisition
aoa_eng <- fit_aoa(pronouns_usa_data)


aoa_eng_pronouns <- aoa_eng[!is.na(aoa_eng$aoa), ]


aoa_esp <- fit_aoa(pronouns_esp_data)


aoa_esp_pronouns <- aoa_esp[!is.na(aoa_esp$aoa), ]


aoa_kor <- fit_aoa(pronouns_kor_data)


aoa_kor_pronouns <- aoa_kor[!is.na(aoa_kor$aoa), ]


aoa_china <- fit_aoa(pronouns_china_data)


aoa_china_pronouns <- aoa_china[!is.na(aoa_china$aoa), ]


aoa_ita <- fit_aoa(pronouns_ita_data)


aoa_ita_pronouns <- aoa_ita[!is.na(aoa_ita$aoa), ]


aoa_ibe <- fit_aoa(pronouns_ibe_data)


aoa_ibe_pronouns <- aoa_ibe[!is.na(aoa_ibe$aoa), ]

```
```{r, message=FALSE}


  print(ggplot(aoa_eng_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "coral", size = 1) +
    geom_point(shape = 16, size = 3, color = "coral") +
    labs(title = "Age of Acquisition for English (American) Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
)
```
```{r, message=FALSE}
print(ggplot(aoa_esp_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "aquamarine3", size = 1) +
    geom_point(shape = 16, size = 3, color = "aquamarine3") +
    labs(title = "Age of Acquisition for Spanish (Mexican) Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)
```
```{r, message=FALSE}

print(ggplot(aoa_ibe_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "chartreuse", size = 1) +
    geom_point(shape = 16, size = 3, color = "chartreuse") +
    labs(title = "Age of Acquisition for Spanish (European) Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)

```
```{r, message=FALSE}
print(ggplot(aoa_ita_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "goldenrod", size = 1) +
    geom_point(shape = 16, size = 3, color = "goldenrod") +
    labs(title = "Age of Acquisition for Italian Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)
```
```{r, message=FALSE}
print(ggplot(aoa_kor_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "palevioletred1", size = 1) +
    geom_point(shape = 16, size = 3, color = "palevioletred1") +
    labs(title = "Age of Acquisition for Korean Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)
```  
```{r, echo=FALSE, message=FALSE}
print(ggplot(aoa_china_pronouns, aes(x = item_definition, y = aoa)) +
    geom_segment(aes(x = item_definition, xend = item_definition, y = 0, yend = aoa), 
                 color = "darkorchid", size = 1) +
    geom_point(shape = 16, size = 3, color = "darkorchid") +
    labs(title = "Age of Acquisition for Mandarin Pronouns", x = "Pronouns", y = "Age (months)") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
)
  
```

English is a non pro-drop language while Spanish, Italian, Korean, and Mandarin are pro-drop languages. The earliest pronoun acquisition occurs at a similar age in English, Spanish, and Italian. English and Italian are at 21 months old while Spanish is at 20 months. Mandarin's earliest age of acquisition happens a little younger than the rest--at 19 months. Korean is the latest with their initial pronoun acquisition happening at around 25 months.


Despite English not being pro-drop and from a different family, there's little distinction from Spanish, Italian, and Mandarin. Across these groups, children consistently acquire first-person pronouns first. Remarkably, according to the data, the first pronoun acquired by Korean children is the demonstrative pronoun, "this." They don't acquire a first person pronoun until 27 months old. However, it's worth noting that the age of acquisition for "this" is actually the same for the rest of the languages as Korean.

### **Patterns in Production**

```{r, message=FALSE}


pronouns_usa_mean <- pronouns_usa_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

pronouns_esp_mean <- pronouns_esp_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))


print(ggplot() +
  geom_line(data = pronouns_usa_mean, aes(x = age, y = mean, color = "English (American)")) +
  geom_line(data = pronouns_esp_mean, aes(x = age, y = mean, color = "Spanish (Mexican)")) +
  labs(title = "Pronoun Production by Age", x = "Age (months)", y = "Mean pronouns produced") +
  scale_color_manual(name = "Language", 
                     values = c("English (American)" = "coral", "Spanish (Mexican)" = "aquamarine3"))

)

```
This graph appears to show Spanish-speaking children using pronouns more frequently than English-speaking children, although I expected the opposite. The average amount of pronouns produced fluctuates as Spanish speakers get older whereas English speakers have a steady increase in production year by year. Even so, in the periods of increased production, Spanish children clearly use more pronouns when they are older just like English-speaking children. An explanation for this could be that it is reflecting the fluidity in the need for pronouns in Spanish. Sometimes, the children find that they really want to use them and at other times, they don't. At 30 months old, Spanish takes a significant dip in pronouns produced compared to about 27 months. English has about two periods where the production decreases. 




















```{r, echo=FALSE, message=FALSE}

pronouns_esp_mean <- pronouns_esp_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

pronouns_ibe_mean <- pronouns_ibe_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

print(ggplot() +
  geom_line(data = pronouns_esp_mean, aes(x = age, y = mean, color = "Spanish (Mexican)")) +
  geom_line(data = pronouns_ibe_mean, aes(x = age, y = mean, color = "Spanish (European)")) +
  labs(title = "Pronoun Production by Age", x = "Age (months)", y = "Mean pronouns produced") +
  scale_color_manual(name = "Language", values = c("Spanish (Mexican)" = "aquamarine3", "Spanish (European)" = "chartreuse"))

)

```
Besides the language itself, I also wanted to explore the role culture may play in pronoun acquisition and usage by using two varieties of the same pro drop language. The European and Mexican varieties of Spanish have very close average numbers, but Mexican Spanish  surpasses its European counterpart overall. They also both have periods of fluctuation. Mexican Spanish features more pronounced rises and falls while European Spanish demonstrates less pronounced and dramatic fluctuations. For instance, the difference in production at around 27 months to 30 months is little compared to Mexican Spanish. It seems as though European Spanish-speaking children are most consistent in their production. Nevertheless, at close to 30 months old, the two varieties seems to have the average. 



```{r, echo=FALSE, message=FALSE}

pronouns_esp_mean <- pronouns_esp_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

pronouns_ita_mean <- pronouns_ita_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

print(ggplot() +
  geom_line(data = pronouns_esp_mean, aes(x = age, y = mean, color = "Spanish (Mexican)")) +
  geom_line(data = pronouns_ita_mean, aes(x = age, y = mean, color = "Italian")) +
  labs(title = "Pronoun Production by Age", x = "Age (months)", y = "Mean pronouns produced") +
  scale_color_manual(name = "Language", values = c("Spanish (Mexican)" = "aquamarine3", "Italian" = "goldenrod"))

)

```

When compared to Italian, Spanish children use more pronouns on average. Seeing as they are both Romance languages, I wondered if they would have similar production. They are somewhat closer in terms of figures than English and Spanish, but not as much as Mexican and European Spanish. Similar to both varieties of Spanish, Italian and Spanish also fluctuate. Fluctuation seems to be a pattern among non pro-drop languages. 

















```{r, echo=FALSE, message=FALSE}

pronouns_esp_mean <- pronouns_esp_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

pronouns_kor_mean <- pronouns_kor_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

print(ggplot() +
  geom_line(data = pronouns_esp_mean, aes(x = age, y = mean, color = "aquamarine3")) +
  geom_line(data = pronouns_kor_mean, aes(x = age, y = mean, color = "palevioletred1")) +
  labs(title = "Pronoun Production by Age", x = "Age (months)", y ="Mean pronouns produced") +
  scale_color_manual(name = "Language", values = c("aquamarine3", "palevioletred1"),
                     labels = c("Spanish (Mexican)", "Korean"))
)
```
```{r, echo=FALSE, message=FALSE}

pronouns_ita_mean <- pronouns_ita_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

pronouns_kor_mean <- pronouns_kor_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

print(ggplot() +
  geom_line(data = pronouns_ita_mean, aes(x = age, y = mean, color = "goldenrod")) +
  geom_line(data = pronouns_kor_mean, aes(x = age, y = mean, color = "hotpink")) +
  labs(title = "Pronoun Production by Age", x = "Age (months)", y ="Mean pronouns produced") +
  scale_color_manual(name = "Language", values = c("goldenrod", "hotpink"),
                     labels = c("Italian", "Korean"))
)
```
Korean and Spanish have significant differences in average production. Korean-speaking children use far less pronouns on average. 
Korean and Italian, another Romance language, have similar results. Italian-speaking children produce more pronouns on average. At 31 months old, Korean children produce the highest average amount of pronouns. At the age of 31 months, Korean children exhibit the highest average pronoun production, which is relatively close to that of Spanish-speaking children at 29 months. However, the disparity is notable, with Korean children producing approximately 4 pronouns, whereas Spanish speakers produce nearly 20 pronouns by the age of 31 months. The older both groups of children become, the more pronouns they use, but the differences in Korean are not as discernible as the other languages.























```{r, echo=FALSE, message=FALSE}

pronouns_kor_mean <- pronouns_kor_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

pronouns_china_mean <- pronouns_china_data %>%
  group_by(age, data_id) %>%
  summarise(num = sum(produces, na.rm = TRUE)) %>%
  group_by(age) %>%
  summarise(mean = mean(num, na.rm = TRUE))

print(ggplot() +
  geom_line(data = pronouns_kor_mean, aes(x = age, y = mean, color = "Korean")) +
  geom_line(data = pronouns_china_mean, aes(x = age, y = mean, color = "Mandarin")) +
  labs(title = "Pronoun Production by Age", x = "Age (months)", y = "Mean pronouns produced") +
  scale_color_manual(name = "Language", values = c("Korean" = "hotpink", "Mandarin" = "darkorchid"))
  
)

```
In this set of languages, Korean seems to generate the smallest number of pronouns. The contrast in production is striking. At the age of about 17 months, Mandarin and Korean, the graph indicates that Mandarin produces more pronouns, although the averages are somewhat close. As the children get older, this proximity diminishes. After 35 months old, the mean production increases to slightly over five pronouns. Mandarin-speaking children reach an average of five pronouns at 20 months old, marking a fifteen-month gap. 












### **Patterns in Usage**
```{r, echo=FALSE, message=FALSE}
pronouns_usa_data_summary <- pronouns_usa_data %>%
  group_by(age, item_definition) %>%
  summarise(num = sum(produces, na.rm = TRUE))

pronouns_esp_data_summary <- pronouns_esp_data %>%
  group_by(age, item_definition) %>%
  summarise(num = sum(produces, na.rm = TRUE))

pronouns_kor_data_summary <- pronouns_kor_data %>%
  group_by(age, item_definition, english_gloss) %>%
  summarise(num = sum(produces, na.rm = TRUE))

pronouns_china_data_summary <- pronouns_china_data %>%
  group_by(age, item_definition, english_gloss) %>%
  summarise(num = sum(produces, na.rm = TRUE))

pronouns_ita_data_summary <- pronouns_ita_data %>%
  group_by(age, item_definition, english_gloss) %>%
  summarise(num = sum(produces, na.rm = TRUE))
```
```{r, echo=FALSE, message=FALSE}
most_common_word_usa_high <- pronouns_usa_data_summary  %>%
  filter(age == 30, num == 647)


most_common_word_usa_low<- pronouns_usa_data_summary  %>%
  filter(age == 19, num == 230)


most_common_word_esp_low <- pronouns_esp_data_summary  %>%
  filter(age == 19, num == 51)


most_common_word_esp_high <- pronouns_esp_data_summary  %>%
  filter(age == 30, num == 229)


most_common_word_ita_low<- pronouns_ita_data_summary  %>%
  filter(age == 19, num == 21)


most_common_word_ita_high <- pronouns_ita_data_summary  %>%
  filter(age == 30, num == 64)


most_common_word_kor_low<- pronouns_kor_data_summary  %>%
  filter(age == 19, num == 21)


most_common_word_kor_high<- pronouns_kor_data_summary  %>%
  filter(age == 30, num == 47)


most_common_word_china_low <- pronouns_china_data_summary  %>%
  filter(age == 19, num == 37 )


most_common_word_china_high <- pronouns_china_data_summary  %>%
  filter(age == 30, num == 70)

```
```{r,message=FALSE}
 ggplot() +
    geom_bar(data = most_common_word_usa_low, aes(x = item_definition, y = num, fill = "English (American)"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_esp_low, aes(x = item_definition, y = num, fill = "Spanish (Mexican)"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_ita_low, aes(x = item_definition, y = num, fill = "Italian"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_kor_low, aes(x = item_definition, y = num, fill = "Korean"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_china_low, aes(x = item_definition, y = num, fill = "Mandarin"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    labs(title = "Most Common Pronoun Produced at 19 Months Old", x = "Pronoun", y = "Frequency") +
    
    theme_minimal() +
    
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    
    scale_fill_manual(name = "Language", values = c("English (American)" = "coral", "Spanish (Mexican)" = "aquamarine3", "Italian" = "goldenrod", "Korean" = "palevioletred1", "Mandarin" = "darkorchid"))
```
```{r, echo=FALSE, message=FALSE}  
 print(ggplot() +
    geom_bar(data = most_common_word_usa_high, aes(x = item_definition, y = num, fill = "English (American)"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_esp_high, aes(x = item_definition, y = num, fill = "Spanish (Mexican)"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_ita_high, aes(x = item_definition, y = num, fill = "Italian"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_kor_high, aes(x = item_definition, y = num, fill = "Korean"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    geom_bar(data = most_common_word_china_high, aes(x = item_definition, y = num, fill = "Mandarin"), stat = "identity", position = position_dodge(width = 0.8)) +
    
    labs(title = "Most Common Pronoun Produced at 30 Months Old", x = "Pronoun", y = "Frequency") +
    
    theme_minimal() +
    
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
    
    scale_fill_manual(name = "Language", values = c("English (American)" = "coral", "Spanish (Mexican)" = "aquamarine3", "Italian" = "goldenrod", "Korean" = "palevioletred1", "Mandarin" = "darkorchid"))
 ) 
  
```
In the early developmental stages, the commonality in first-person pronouns spans across languages, except for Korean. English and Italian use the exact same  word--"mine." Spanish and Mandarin use, "I." Mine is a possessive pronoun while I is a personal pronoun. English-speaking children have the highest frequency with over 200 instances.This high frequency might suggest a tendency for English-speaking Children to frequently discuss their possessions and personal affiliations. However, such an interpretation could mistakenly paint them as more self-centered. This assumption lacks a basis, especially considering that Italian is a pro-drop language, and all the children, regardless of language, primarily use the first person.


In later years, the pattern remains. English-speakers use first person pronouns the most, followed by Spanish-speakers. At 30 months old, Korean speakers are using first person pronouns, but they use it the least. Besides that, the only other word change is for Spanish. The most common pronoun used is now "mío" instead of "yo," joining English and Italian in the possessive pronoun.





### **Conclusion**
The distinction between pro-drop and non-pro-drop languages doesn't appear to impact the age of acquisition. Therefore, even if children in pro-drop languages don't frequently use it, they still learn it at a similar age. This suggests the existence of a shared neurological structure in all children that plays a role in this process regardless of their native language. Usage, on the other hand, might be affected by the culture and society in which the children are immersed. Even among pro-drop languages, there are significant variations in the production of pronouns.







