Justification for Reshaping: The dataset is currently in an “untidy” format, meaning it is not in a wide format suitable for extensive analysis. Reshaping the data into a wide format would improve its usability and facilitate efficient analysis.
Variable Renaming: Additionally, certain variables within the dataset could be renamed for enhanced readability and clarity.
Tidy Dataset Features
Variable Definintions
Features
Description
Data Type
Continent
Name of the Continent
chr
Country
Name of the Country
chr
Languages
Number of Languages Spoken
num
Area
Area of land (km2)
num
Population
Population (Thousands)
num
Weather Stations
The Number of weather stations used in calculation Mean Growth
num
Mean Growth
The Mean Growing Season (months)
num
Growth Deviation
The standard deviation of the Growing Season values from the different weather stations in that country
num
Data Import
Dataset was automatically loaded into the environment directly from dev tools package. Can also be found in repo.
library(untidydata)library(tidyverse)library(untidydata)# Load the datasetdata(language_diversity)str(language_diversity)
# A tibble: 6 × 4
Continent Country Measurement Value
<chr> <chr> <chr> <dbl>
1 Africa Algeria Langs 18
2 Africa Angola Langs 42
3 Oceania Australia Langs 234
4 Asia Bangladesh Langs 37
5 Africa Benin Langs 52
6 Americas Bolivia Langs 38
Data Tidying and Transformation
The data underwent a transformation to a wider format using the pivot_wider() function. This is done by using the names from the Measurement column and the values from the Value column. Subsequently, the column names were modified to enhance readability.
# Spread the data out using the pivot_wider functionwide_languages <- language_diversity |>pivot_wider(names_from = Measurement, values_from = Value)# Change names of specific columnswide_languages <- wide_languages |>rename("Languages"="Langs","Weather Stations"="Stations","Mean Growth"="MGS","Growth Deviation"="Std" )str(wide_languages)
# A tibble: 6 × 8
Continent Country Languages Area Population `Weather Stations` `Mean Growth`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Africa Algeria 18 2.38e6 25660 102 6.6
2 Africa Angola 42 1.25e6 10303 50 6.22
3 Oceania Austra… 234 7.71e6 17336 134 6
4 Asia Bangla… 37 1.44e5 118745 20 7.4
5 Africa Benin 52 1.13e5 4889 7 7.14
6 Americas Bolivia 38 1.10e6 7612 48 6.92
# ℹ 1 more variable: `Growth Deviation` <dbl>
Analysis
This data will be analyzed to determine which countries have the highest amount of languages per capita. Some countries have a vast amount of different cultures and diversity and to determine which has the most we’ll create a new column with the values of the amount of languages per capita. By calculating this average language count per individual, we aim to identify the countries within this dataset that boast the highest average number of languages spoken per resident.
# A tibble: 74 × 9
Continent Country Langs Area Population Stations MGS Std
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Africa Algeria 18 2381741 25660 102 6.6 2.29
2 Africa Angola 42 1246700 10303 50 6.22 1.87
3 Oceania Australia 234 7713364 17336 134 6 4.17
4 Asia Bangladesh 37 143998 118745 20 7.4 0.73
5 Africa Benin 52 112622 4889 7 7.14 0.99
6 Americas Bolivia 38 1098581 7612 48 6.92 2.5
7 Africa Botswana 27 581730 1348 10 4.6 1.69
8 Americas Brazil 209 8511965 153322 245 9.71 5.87
9 Africa Burkina Faso 75 274000 9242 6 5.17 1.07
10 Africa CAR 94 622984 3127 13 8.08 1.21
# ℹ 64 more rows
# ℹ 1 more variable: Avg_Langs_Per_Person <dbl>
Conclusion
Based on the analysis of the dataset, the countries with the highest amount of languages per capita are Vanuatu, Papua New Guinea, French Guiana, Suriname and Gabon. The table above has been sorted by average amount of languages column which gives us this value.
# | warning: FALSEsorted_data <- pivoted_data[order(-pivoted_data$Avg_Langs_Per_Person),]top_10 <-head(sorted_data, 10)ggplot(top_10, aes(reorder(x = Country, Avg_Langs_Per_Person), y = Avg_Langs_Per_Person)) +geom_bar(stat ="identity", fill ="skyblue") +theme(axis.text.x =element_text(angle =45, hjust =1)) +labs(title ="Top 10 Countries with the Highest Average Languages per Capita",x ="Country",y ="Average Languages per Capita")