# Required packages for our course. Do not delete.
library(tidyverse)
library(mosaic)

Research Question

How have literacy rates among young women in Liberia evolved over time, and what trends can be observed in gender parity in education during the same period?


Data

Description of data

The dataset focuses on gender equality metrics in Liberia, detailing indicators like the proportion of firms with female top managers, female ownership participation, and youth literacy rates (specifically for females aged 15-24). It also includes the gender parity index for youth literacy, highlighting disparities in literacy rates between genders. This dataset is sourced from the Humanitarian Data Exchange (HDX), an open platform for sharing data across various crises and organizations. HDX aims to simplify the discovery and usage of humanitarian data for analysis https://data.humdata.org/faq.

Load data into R

gender_lbr <- read.csv("gender_lbr.csv")
head(gender_lbr)
Country.Name Country.ISO3 Year Indicator.Name Indicator.Code Value
#country+name #country+code #date+year #indicator+name #indicator+code #indicator+value+num
Liberia LBR 2017 Firms with female top manager (% of firms) IC.FRM.FEMM.ZS 20.4
Liberia LBR 2009 Firms with female top manager (% of firms) IC.FRM.FEMM.ZS 29.9
Liberia LBR 2017 Firms with female participation in ownership (% of firms) IC.FRM.FEMO.ZS 37.4
Liberia LBR 2009 Firms with female participation in ownership (% of firms) IC.FRM.FEMO.ZS 53
Liberia LBR 2019 Literacy rate, youth female (% of females ages 15-24) SE.ADT.1524.LT.FE.ZS 71.8499984741211

Variables

names(gender_lbr)
## [1] "Country.Name"   "Country.ISO3"   "Year"           "Indicator.Name"
## [5] "Indicator.Code" "Value"

The variables I used in my infographic design are:

  1. Year
  2. Indicator.Name
  3. Value

Data Analysis

Summary Statistics

# Inspect the data
summary(gender_lbr)
##  Country.Name       Country.ISO3           Year           Indicator.Name    
##  Length:5644        Length:5644        Length:5644        Length:5644       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##  Indicator.Code        Value          
##  Length:5644        Length:5644       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character
# Filter the data for literacy rates and gender parity index
literacy_data <- subset(gender_lbr, Indicator.Name == "Literacy rate, youth female (% of females ages 15-24)")
gender_parity_data <- subset(gender_lbr, Indicator.Name == "Literacy rate, youth (ages 15-24), gender parity index (GPI)")

#convert to numeric
literacy_data$Year <- as.numeric(literacy_data$Year)
literacy_data$Value <- as.numeric(gsub(",", "", literacy_data$Value)) 

gender_parity_data$Year <- as.numeric(gender_parity_data$Year)
gender_parity_data$Value <- as.numeric(gsub(",", "", gender_parity_data$Value)) 

#Checked for NA values to ensure data integrity.
sum(is.na(literacy_data$Year))
## [1] 0
sum(is.na(literacy_data$Value))
## [1] 0
sum(is.na(gender_parity_data$Year))
## [1] 0
sum(is.na(gender_parity_data$Value))
## [1] 0

Summary Statistics *Using the favstats() function

statistics1 <- favstats(Value ~ Year, data = literacy_data )
statistics2 <- favstats(Value ~ Year, data = literacy_data )
head(statistics1)
Year min Q1 median Q3 max mean sd n missing
1984 33.72456 33.72456 33.72456 33.72456 33.72456 33.72456 NA 1 0
2007 37.17031 37.17031 37.17031 37.17031 37.17031 37.17031 NA 1 0
2013 63.20000 63.20000 63.20000 63.20000 63.20000 63.20000 NA 1 0
2017 45.63871 45.63871 45.63871 45.63871 45.63871 45.63871 NA 1 0
2019 71.85000 71.85000 71.85000 71.85000 71.85000 71.85000 NA 1 0
head(statistics2)
Year min Q1 median Q3 max mean sd n missing
1984 33.72456 33.72456 33.72456 33.72456 33.72456 33.72456 NA 1 0
2007 37.17031 37.17031 37.17031 37.17031 37.17031 37.17031 NA 1 0
2013 63.20000 63.20000 63.20000 63.20000 63.20000 63.20000 NA 1 0
2017 45.63871 45.63871 45.63871 45.63871 45.63871 45.63871 NA 1 0
2019 71.85000 71.85000 71.85000 71.85000 71.85000 71.85000 NA 1 0
ggplot(data = literacy_data, aes(x = Year, y = Value)) +
    geom_line(color = "red") + 
    theme_minimal() +
    labs(title = "Trend of Female Literacy Rates in Liberia (Ages 15-24)", x = "Year", y = "Literacy Rate (%)")

ggplot(data = gender_parity_data, aes(x = Year, y = Value)) +
    geom_line(color = "blue") + 
    theme_minimal() +
    labs(title = "Trend of Gender Parity Index in Liberia (Ages 15-24)", x = "Year", y = "Gender Parity Index(%)")

The red line graph illustrates an overall positive trend in the literacy rates of young women in Liberia, despite some fluctuations. Initially, the literacy rates show modest growth, followed by a significant rise and a period of decline, before a strong recovery to the highest rates observed towards the end of the period. This pattern reflects the ongoing efforts and varying success in enhancing educational access and quality for young women.

The blue line graph tracks the gender parity index, serving as a measure of equality in educational access and achievement between genders. The trend generally moves towards greater parity, indicating strides towards equal educational opportunities for both genders. However, the journey shows gradual progress with some setbacks, highlighting the intricate challenges involved in consistently advancing gender parity in education in Liberia.

Infographic

I input the data for the year and values for both female literacy rates and gender parity index into a Google Sheet and downloaded it as a CSV file. I found a line graph template in Canva and uploaded the CSV file, and the numbers were directly integrated into the line graph. I then added all of the necessary text distributions such as the title, the legends, and a brief explanation of the findings. I also included visuals that represent a girl walking to school, which I colored pink for the female literacy graph, and a boy and a girl standing equally to represent gender parity in the gender parity graph.

Blue Modern Line Chart Graph by Etta Brooks

(c)

Provide a brief description of your data visualization process.

Data Preparation:

Sourced and loaded data from the Humanitarian Data Exchange into R, focusing on gender equality metrics specific to Liberia. Data Processing in R:

Conducted data cleaning, transformation, and statistical analysis using R packages like tidyverse and mosaic. Visualization:

Utilized ggplot in R for initial data visualization and then exported key data for enhanced visual representation using Canva. Infographic Enhancement:

Integrated the data into a Canva line graph template, enriching the infographic with meaningful visuals and textual annotations to ensure clarity and impact. Narrative and Presentation:

Carefully embedded the final infographic in the R Markdown document, ensuring a cohesive narrative and visual flow.


References

The creation of the infographic involved the use of various sources for data, graphics, and analysis techniques. Below are the references used in this project:

  1. Data Source: Humanitarian Data Exchange (HDX). Specific details about the gender equality metrics in Liberia. Available at data.humdata.org.

  2. R Packages:

    • tidyverse for data manipulation and visualization.
    • mosaic for statistical analysis.
  3. Visualization Tool:

    • Canva for enhancing data visualizations and creating infographics. Canva.
  4. AI Assistance:

    • OpenAI’s ChatGPT was consulted for guidance on visualization techniques, as well as for proofreading and grammar suggestions.

