ProjectDos

##Date Set Introduction I chose a dataset which contains information on global education rates including rate of enrolled of students, literacy rates, rates of tertiary education(college). All the data was collected from UNESCO institute for statistics and flobal database. I mostly cleaned up the data by selecting the columns I wanted to focus on. There were almost 30 columns and I narrowed it down to less than 10 by focusing on literacy rates and birth rates. I also created new columns to explore relationships such as the difference in literacy rates between genders.

This data set caught my eye because I believe that bringing things to light will increase awareness. Whether it is gender disparity or general education faults, it shouls all be brought to light.

Importing Data/Showing Data

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(ggplot2)
library(leaflet)
library(sf)

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

setwd("/Users/Briancaceres/Desktop/Data_110")

education_dataset <- read.csv("projectdosdata - Sheet1.csv")
head(education_dataset)

##   Countries.and.areas Latitude Longitude OOSR_Pre0Primary_Age_Male
## 1         Afghanistan  34.5289   69.1725                         0
## 2             Albania  41.3275   19.8189                         4
## 3             Algeria  36.7525    3.0420                         0
## 4             Andorra  42.5078    1.5211                         0
## 5              Angola  -8.8368   13.2343                        31
## 6            Anguilla  18.2170  -63.0578                        14
##   OOSR_Pre0Primary_Age_Female OOSR_Primary_Age_Male OOSR_Primary_Age_Female
## 1                           0                     0                       0
## 2                           2                     6                       3
## 3                           0                     0                       0
## 4                           0                     0                       0
## 5                          39                     0                       0
## 6                           0                     0                       0
##   OOSR_Lower_Secondary_Age_Male OOSR_Lower_Secondary_Age_Female
## 1                             0                               0
## 2                             6                               1
## 3                             0                               0
## 4                             0                               0
## 5                             0                               0
## 6                             0                               0
##   OOSR_Upper_Secondary_Age_Male OOSR_Upper_Secondary_Age_Female
## 1                            44                              69
## 2                            21                              15
## 3                             0                               0
## 4                             0                               0
## 5                             0                               0
## 6                             0                               0
##   Completion_Rate_Primary_Male Completion_Rate_Primary_Female
## 1                           67                             40
## 2                           94                             96
## 3                           93                             93
## 4                            0                              0
## 5                           63                             57
## 6                            0                              0
##   Completion_Rate_Lower_Secondary_Male Completion_Rate_Lower_Secondary_Female
## 1                                   49                                     26
## 2                                   98                                     97
## 3                                   49                                     65
## 4                                    0                                      0
## 5                                   42                                     32
## 6                                    0                                      0
##   Completion_Rate_Upper_Secondary_Male Completion_Rate_Upper_Secondary_Female
## 1                                   32                                     14
## 2                                   76                                     80
## 3                                   22                                     37
## 4                                    0                                      0
## 5                                   24                                     15
## 6                                    0                                      0
##   Grade_2_3_Proficiency_Reading Grade_2_3_Proficiency_Math
## 1                            22                         25
## 2                             0                          0
## 3                             0                          0
## 4                             0                          0
## 5                             0                          0
## 6                             0                          0
##   Primary_End_Proficiency_Reading Primary_End_Proficiency_Math
## 1                              13                           11
## 2                               0                            0
## 3                               0                            0
## 4                               0                            0
## 5                               0                            0
## 6                               0                            0
##   Lower_Secondary_End_Proficiency_Reading Lower_Secondary_End_Proficiency_Math
## 1                                       0                                    0
## 2                                      48                                   58
## 3                                      21                                   19
## 4                                       0                                    0
## 5                                       0                                    0
## 6                                       0                                    0
##   Youth_15_24_Literacy_Rate_Male Youth_15_24_Literacy_Rate_Female Birth_Rate
## 1                             74                               56      32.49
## 2                             99                              100      11.78
## 3                             98                               97      24.28
## 4                              0                                0       7.20
## 5                              0                                0      40.73
## 6                              0                                0       0.00
##   Gross_Primary_Education_Enrollment Gross_Tertiary_Education_Enrollment
## 1                              104.0                                 9.7
## 2                              107.0                                55.0
## 3                              109.9                                51.4
## 4                              106.4                                 0.0
## 5                              113.5                                 9.3
## 6                                0.0                                 0.0
##   Unemployment_Rate
## 1             11.12
## 2             12.33
## 3             11.70
## 4              0.00
## 5              6.89
## 6              0.00

I want to make all letters lowercase to keep consistency in later coding. I also see that a lot of countries have missing data in the form of 0 so I want to convert these 0 to NA.

names(education_dataset) <-tolower(names(education_dataset))

education_dataset[education_dataset == 0] <- NA

education_dataset |>
  head()

##   countries.and.areas latitude longitude oosr_pre0primary_age_male
## 1         Afghanistan  34.5289   69.1725                        NA
## 2             Albania  41.3275   19.8189                         4
## 3             Algeria  36.7525    3.0420                        NA
## 4             Andorra  42.5078    1.5211                        NA
## 5              Angola  -8.8368   13.2343                        31
## 6            Anguilla  18.2170  -63.0578                        14
##   oosr_pre0primary_age_female oosr_primary_age_male oosr_primary_age_female
## 1                          NA                    NA                      NA
## 2                           2                     6                       3
## 3                          NA                    NA                      NA
## 4                          NA                    NA                      NA
## 5                          39                    NA                      NA
## 6                          NA                    NA                      NA
##   oosr_lower_secondary_age_male oosr_lower_secondary_age_female
## 1                            NA                              NA
## 2                             6                               1
## 3                            NA                              NA
## 4                            NA                              NA
## 5                            NA                              NA
## 6                            NA                              NA
##   oosr_upper_secondary_age_male oosr_upper_secondary_age_female
## 1                            44                              69
## 2                            21                              15
## 3                            NA                              NA
## 4                            NA                              NA
## 5                            NA                              NA
## 6                            NA                              NA
##   completion_rate_primary_male completion_rate_primary_female
## 1                           67                             40
## 2                           94                             96
## 3                           93                             93
## 4                           NA                             NA
## 5                           63                             57
## 6                           NA                             NA
##   completion_rate_lower_secondary_male completion_rate_lower_secondary_female
## 1                                   49                                     26
## 2                                   98                                     97
## 3                                   49                                     65
## 4                                   NA                                     NA
## 5                                   42                                     32
## 6                                   NA                                     NA
##   completion_rate_upper_secondary_male completion_rate_upper_secondary_female
## 1                                   32                                     14
## 2                                   76                                     80
## 3                                   22                                     37
## 4                                   NA                                     NA
## 5                                   24                                     15
## 6                                   NA                                     NA
##   grade_2_3_proficiency_reading grade_2_3_proficiency_math
## 1                            22                         25
## 2                            NA                         NA
## 3                            NA                         NA
## 4                            NA                         NA
## 5                            NA                         NA
## 6                            NA                         NA
##   primary_end_proficiency_reading primary_end_proficiency_math
## 1                              13                           11
## 2                              NA                           NA
## 3                              NA                           NA
## 4                              NA                           NA
## 5                              NA                           NA
## 6                              NA                           NA
##   lower_secondary_end_proficiency_reading lower_secondary_end_proficiency_math
## 1                                      NA                                   NA
## 2                                      48                                   58
## 3                                      21                                   19
## 4                                      NA                                   NA
## 5                                      NA                                   NA
## 6                                      NA                                   NA
##   youth_15_24_literacy_rate_male youth_15_24_literacy_rate_female birth_rate
## 1                             74                               56      32.49
## 2                             99                              100      11.78
## 3                             98                               97      24.28
## 4                             NA                               NA       7.20
## 5                             NA                               NA      40.73
## 6                             NA                               NA         NA
##   gross_primary_education_enrollment gross_tertiary_education_enrollment
## 1                              104.0                                 9.7
## 2                              107.0                                55.0
## 3                              109.9                                51.4
## 4                              106.4                                  NA
## 5                              113.5                                 9.3
## 6                                 NA                                  NA
##   unemployment_rate
## 1             11.12
## 2             12.33
## 3             11.70
## 4                NA
## 5              6.89
## 6                NA

I want to see if there is a correlation between a countries birth rate and various variables. First I have to simplify some data, so I will combine male and female categories for literacy rates and call it total average literacy rates.

total_education_data <- education_dataset |>
  group_by(countries.and.areas) |>
  mutate(mean_literacy_rate = (youth_15_24_literacy_rate_female +youth_15_24_literacy_rate_male)/2)

total_education_data |>
  head()

## # A tibble: 6 × 30
## # Groups:   countries.and.areas [6]
##   countries.and.areas latitude longitude oosr_pre0primary_age_male
##   <chr>                  <dbl>     <dbl>                     <int>
## 1 Afghanistan            34.5      69.2                         NA
## 2 Albania                41.3      19.8                          4
## 3 Algeria                36.8       3.04                        NA
## 4 Andorra                42.5       1.52                        NA
## 5 Angola                 -8.84     13.2                         31
## 6 Anguilla               18.2     -63.1                         14
## # ℹ 26 more variables: oosr_pre0primary_age_female <int>,
## #   oosr_primary_age_male <int>, oosr_primary_age_female <int>,
## #   oosr_lower_secondary_age_male <int>, oosr_lower_secondary_age_female <int>,
## #   oosr_upper_secondary_age_male <int>, oosr_upper_secondary_age_female <int>,
## #   completion_rate_primary_male <int>, completion_rate_primary_female <int>,
## #   completion_rate_lower_secondary_male <int>,
## #   completion_rate_lower_secondary_female <int>, …

Scatterplot:

ggplot(total_education_data, aes(x=birth_rate, y = mean_literacy_rate)) +
  labs(
    x = "Birth Rate per 1000",
    y = "Youth Literacy Rate", 
    caption = "Youth defined as ages between 15-24 years old", 
    title = "Comparing a Country's Birthrate to their Literacy Rate")+
  geom_point()+
  geom_smooth(method = "lm",
              se = FALSE)+
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 125 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 125 rows containing missing values (`geom_point()`).

I can see a clear correlation with literacy rate and birthrate. The higher the birthrate a country has the lower the literacy rate. This may be an indicator of lack of acess to education.

I also want to explore how literacy rates compare between each sex. I will do that with a bar graph below.

First I created a new data set to keep things organized. I selected a handful of columns after creating a new column using mutate. The new columns “literacy_difference” is the female literacy rate minus the male literacy rate.

bar_plot <- total_education_data |>
  group_by(countries.and.areas) |>
  mutate(literacy_difference =
           (youth_15_24_literacy_rate_female - 
              youth_15_24_literacy_rate_male)) |> #this is how I calculate the difference in literacy rate between gender and store it in a new column
  select(
    countries.and.areas,
    literacy_difference, 
    longitude, 
    latitude, 
    mean_literacy_rate, 
    birth_rate
    ) |>
  filter(literacy_difference != 0) |>
  mutate(pos = literacy_difference >= 0)|>
  na.omit() |>
  arrange(desc(literacy_difference))

bar_plot |>
  head()

## # A tibble: 6 × 7
## # Groups:   countries.and.areas [6]
##   countries.and.areas literacy_difference longitude latitude mean_literacy_rate
##   <chr>                             <int>     <dbl>    <dbl>              <dbl>
## 1 Rwanda                                5     26.1     44.4                86.5
## 2 Gabon                                 3      2.35    48.9                89.5
## 3 Honduras                              3    -58.2      6.80               96.5
## 4 East Timor                            3     36.3     33.5                83.5
## 5 Bangladesh                            2     90.4     23.7                95  
## 6 Namibia                               2     -6.83    34.0                95  
## # ℹ 2 more variables: birth_rate <dbl>, pos <lgl>

ggplot(bar_plot, aes(x=reorder(countries.and.areas, -literacy_difference), y = literacy_difference, fill = pos))+
  geom_col(stat = "identity", 
           show.legend = FALSE)+
  labs(
    x = "Country", 
    y = "Literacy Rate Difference",
    title = "Youth Female Literacy Rate - Youth Male Literacy Rate", 
    caption = "Youth defined as age group between 15-24"
  )+
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 90, size = 10))

## Warning in geom_col(stat = "identity", show.legend = FALSE): Ignoring unknown
## parameters: `stat`

Looking at the graphs above, I want to focus on the literacy disparity between sex for each country. I will also try to incorporate birthrate in my final visualization.

##Attempting to plot onto world map. I again created a new data set to keep things organized. I only took the date columns that I wanted to show in my final visualization

finalviz <- total_education_data |>
  group_by(countries.and.areas) |>
  mutate(literacy_difference =
           (youth_15_24_literacy_rate_female - 
              youth_15_24_literacy_rate_male)) |> 
  select(
    countries.and.areas,
    literacy_difference, 
    longitude, 
    latitude, 
    mean_literacy_rate, 
    birth_rate
    ) 

finalviz |>
  head()

## # A tibble: 6 × 6
## # Groups:   countries.and.areas [6]
##   countries.and.areas literacy_difference longitude latitude mean_literacy_rate
##   <chr>                             <int>     <dbl>    <dbl>              <dbl>
## 1 Afghanistan                         -18     69.2     34.5                65  
## 2 Albania                               1     19.8     41.3                99.5
## 3 Algeria                              -1      3.04    36.8                97.5
## 4 Andorra                              NA      1.52    42.5                NA  
## 5 Angola                               NA     13.2     -8.84               NA  
## 6 Anguilla                             NA    -63.1     18.2                NA  
## # ℹ 1 more variable: birth_rate <dbl>

creating a new data set for map interactivity:

literacy <- finalviz 

literacy$longitude <- as.numeric(literacy$longitude)
literacy$latitude <- as.numeric(literacy$latitude)

literacy |>
  head()

## # A tibble: 6 × 6
## # Groups:   countries.and.areas [6]
##   countries.and.areas literacy_difference longitude latitude mean_literacy_rate
##   <chr>                             <int>     <dbl>    <dbl>              <dbl>
## 1 Afghanistan                         -18     69.2     34.5                65  
## 2 Albania                               1     19.8     41.3                99.5
## 3 Algeria                              -1      3.04    36.8                97.5
## 4 Andorra                              NA      1.52    42.5                NA  
## 5 Angola                               NA     13.2     -8.84               NA  
## 6 Anguilla                             NA    -63.1     18.2                NA  
## # ℹ 1 more variable: birth_rate <dbl>

Using the paste0 function to call interactivity later. Also using the leaflet package to map our data points using the existing latitude and longitude columns.

labels <- paste0(
 "Birth Rate: ", bar_plot$birth_rate,"<br>",
 "Average Literacy Rate: ", bar_plot$mean_literacy_rate,"<br>",
 "Female Literacy Rate:",total_education_data$youth_15_24_literacy_rate_female,"<br>",
 "Male Literacy Rate:",total_education_data$youth_15_24_literacy_rate_male,"<br>"
)

literacy <- leaflet() |>
   setView(lng = -0, lat =  0, zoom = 1.5) |>
  addProviderTiles("Esri.WorldStreetMap") |>
 addCircles(data = finalviz, 
            radius = bar_plot$birth_rate*3000, 
            color = "brown", 
            popup = labels
  )

## Assuming "longitude" and "latitude" are longitude and latitude, respectively

literacy

##Final Essay Sources: https://unstats.un.org/sdgs/report/2019/goal-04/

UNstats provided some background information as to possible reasons some countries have lower literacy rates. The article shows that the Sub-Saharan Africa region has the lowest percentages of trained teachers in pre-primary school (48%). Taking a look at my data visualization it shows that this correalates to their low literacy rates.

The final visualization shows a few things. First it can show how regions/countries vary from eachother in literacy rates and birthrates. It is powerful seeing it on the map as one can start making educated guesses as to whether the differences in statistics are related to geographical reasons and/or geography reasons.

ProjectDos

Brian Caceres

2023-11-15