Finalproject.knit

title: “Peru and its Endangered Languages”

output: html_notebook

Introduction: The 2007 Census of Peru records just four major languages, although over 72 Indigenous languages and dialects are spoken in the country. Around 84% of Peruvians speak Spanish, the official national language. Even so, over 26% of the population speaks a first language other than Spanish. Quechua is the second most commonly spoken language (13%), followed by Aymara (2%), and both have official status. In this case, the different endangered languages will be visualized using the dataset of Extinct Languages from Kaggle.

library(dplyr)
library(magrittr)
library(ggplot2)
library(tidyverse)
library(mapdata)
library(readr)
data<- read.csv("/Users/luzcordovavalladares/Documents/MSU/Quantitative Linguistics/extinctlanguages.csv")
length(data)
colnames(data)
str(data)
Peru = data[(data$Countries == "Peru"),]

The dataset includes names of languages, number of speakers, the names of countries where the language is still spoken, and the degree of endangerment. The UNESCO endangerment classification is as follows:

Vulnerable: most children speak the language, but it may be restricted to certain domains (e.g., home). Definitely endangered: children no longer learn the language as a ‘mother tongue’ in the home. Severely endangered: language is spoken by grandparents and older generations; while the parent generation may understand it, they do not speak it to children or among themselves. Critically endangered: the youngest speakers are grandparents and older, and they speak the language partially and infrequently. Extinct: there are no speakers left.

The mapdata library was downloaded in order to visualize the map of Peru and the location of the languages with different degrees of endangerment.

#Map of Peru
library(mapdata)
cols<-c("Black","Blue","Purple","Darkgreen","Yellow")
dots<- cols[factor(Peru$Degree.of.endangerment)]
map("worldHires","Peru",  col="red", fill=TRUE, )
points(Peru$Longitude, Peru$Latitude, col=dots, cex=0.8)

We can also visualize the quantity of speakers by degree of endangerment

#Degree of endangerment vs N°of speakers
Comparison <- Peru %>%
  group_by(Degree.of.endangerment) %>%
  count()
  print (Comparison)
ggplot(Comparison, aes(x = Degree.of.endangerment, y = n)) +
  geom_bar(stat = "identity", fill = "darkgreen", color = "black") +
  labs(title = "Types of degrees of endangerment", x = "Degree of Endangerment", y = "Count")

Furthermore, we can visualize the different languages whose degree of endangerment is Vulnerable in Peru

Peru = Peru[(Peru$Degree.of.endangerment == "Vulnerable"),]   
ggplot(Peru, aes(x=Peru$Name.in.English, y=Peru$Number.of.speakers, group=Peru$Degree.of.endangerment,color=endangerment_degree))+
  geom_line(size=1,color="Purple") +
  theme(axis.text.x = element_text(angle = -45, vjust=0.5))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

We can visualize the different languages whose degree of endangerment is Definitely endangered in Peru.

data1 = data[(data$Countries == "Peru"),]
Peru = data1[(data1$Degree.of.endangerment == "Definitely endangered"),]   
ggplot(Peru, aes(x=Peru$Name.in.English, y=Peru$Number.of.speakers, group=Peru$Degree.of.endangerment,color=endangerment_degree))+
  geom_line(size=1,color="Blue") +
  theme(axis.text.x = element_text(angle = -45, vjust=0.5))

In the following chart, we can visualize the different languages whose degree of endangerment is Critically endangered in Peru

data1 = data[(data$Countries == "Peru"),]
Peru = data1[(data1$Degree.of.endangerment == "Critically endangered"),]   
ggplot(Peru, aes(x=Peru$Name.in.English, y=Peru$Number.of.speakers, group=Peru$Degree.of.endangerment,color=endangerment_degree))+
  geom_line(size=1,color="Black") +
theme(axis.text.x = element_text(angle = -45, vjust=0.5))

Also, we can visualize the different languages whose degree of endangerment is Severely endangered in Peru.

data1 = data[(data$Countries == "Peru"),]
Peru = data1[(data1$Degree.of.endangerment == "Severely endangered"),]   
ggplot(Peru, aes(x=Peru$Name.in.English, y=Peru$Number.of.speakers, group=Peru$Degree.of.endangerment,color=endangerment_degree))+
  geom_line(size=0.5,color="Purple") +
theme(axis.text.x = element_text(angle = -45, vjust=0.5))

Finally, we can visualize the extinct languages in Peru. By using theme (axis.text.x = element_text (angle = -45, vjust= 0.5), we can visualize the names of the languages in English.

data1 = data[(data$Countries == "Peru"),]
Peru = data1[(data1$Degree.of.endangerment == "Extinct"),]   
ggplot(Peru, aes(x=Peru$Name.in.English, y=Peru$Number.of.speakers, group=Peru$Degree.of.endangerment,color=endangerment_degree))+
  geom_line(size=0.5,color="Purple") +
  theme(axis.text.x = element_text(angle = -45, vjust=0.5))

Conclusion: With this data, we can identify all the Peruvian languages which are vulnerable and where we can focus in order to prevent from its extinction.