Introduction

With there being approximately seven-thousand known languages in the world, it is statistically impossible that they are all as widely-spoken and proliferated as Mandarin and English, for example. Some of these are known as endangered languages, or languages that are spoken by a limited number of people. Fewer still are known as extinct languages, those that have no fluent speakers left alive. This notebook will provide visualizations based on the number and geographic location of endangered/extinct languages.

There are five different words used to describe the status of an endangered language: vulnerable, definitely endangered, severely endangered, critically endangered, and extinct. Starting at the least severe, a vulnerable language is one that is still being learned by children, but is not spoken much outside of the home or other isolated spaces. A definitely endangered language is similar to a vulnerable language except for the fact that children are not learning it, but it is still spoken within the community to a certain extent. The majority of a severely endangered language’s speakers are elderly individuals, but it might be understood by the younger generations. Speakers of a critically endangered language are almost all elderly, and they also most likely speak another, more common language as well as the endangered one. Finally, a language with no living speakers is an extinct language.

Data

#bar graph showing count of all endangered languages by status
db %>%
  ggplot(aes(x = Degree.of.endangerment.ordered, fill = Degree.of.endangerment.ordered)) +
  geom_bar(color = "black") +
  coord_flip() +
  xlab("Status") +
  ylab("Number of Languages") +
  theme_bw() +
  theme(legend.position = "none") +
  ggtitle("Number of Endangered Languages, By Status")

This bar graph models the total number of languages in the data set as they correspond to the five statuses, ordered by amount from greatest to least. It is reassuring to know that the majority of languages in the data set are not extinct and that they have a chance to become more prevalent. Fortunately, the two least severe statuses an endangered language can have, vulnerable and definitely endangered, are also the most common.

db %>%
  ggplot(aes(x = Degree.of.endangerment.reordered, fill = Degree.of.endangerment.reordered)) +
  geom_bar(color = "black") +
  xlab("Country") +
  ylab("Number of Languages") +
  ggtitle("No. of Endangered Languages, USA Vs. Other Countries") +
  labs(fill = "Language Status") +
  theme_bw() +
  facet_wrap(~USA) +
  coord_flip()

This bar graph compares the number and status of the languages in the data set, much like the first. However, in this visualization, languages originating in what is now the United States of America are separated from the rest of the world. Looking at the lengths of the bars relative to each other gives an idea about which statuses are more common domestically versus globally. For example, the USA has more critically endangered languages than the rest of the world, which leads in definitely endangered languages. The USA, unfortunately, also has comparatively more extinct languages than severely endangered, definitely endangered, and vulnerable languages.

#scatterplot of latitudes and longitudes
db %>%
  ggplot(aes(x = Longitude, y = Latitude, color = Degree.of.endangerment.reordered)) +
  geom_point(alpha = 0.50) +
  labs(color = "Language Status") +
  theme(legend.key.size = unit(0.1, "cm")) +
  theme_bw()

This scatter plot maps the latitude and longitude of each language on the data set, with the color of the dot representing the status of the language, all together forming a shape that is quite familiar. With this visualization, the realization that the United States has proportionately more extinct languages becomes clearer; the west coast and the Midwest are colored a deep purple because many transparent extinct languages are layered on top of each other. Also of note is that Canada has a significant amount of vulnerable languages spread out throughout the northern part of the country. Parts of northern Europe with a similarly harsh and snowy environment also have their dots more spread out than in more heavily populated regions of Earth. The same is true for exceptionally hot parts of the world, such as the Saharan desert and central Australia. It is interesting to see how extreme climates slow down the spread of languages.

Even if someone is pressed for time, they can look at a bar graph or a scatter plot and quickly learn more about a certain topic. That is the beauty of visualizations. They are useful for any data set, but they particularly enhance this data about endangered languages.