Female Higher Education Rate

Introduction

Increasing the attainment of higher education is a pursuit of many Governments in improving their respective countries economy and development. In particular, sociologists have been interested in the rate of female higher education as an indicator of a nation’s gender equality and it’s downstream impacts on population, productivity and various other societal issues. As such, in the following report, I will be looking into the rate of female higher education attainment in Asia.

The data is downloaded from the WorldBank website and includes all countries in East Asia & Pacific. The statistics shown are a cumulative % of all females above the age of 25 who have attained a minimum of a Bachelor’s degree.

#import libraries and data
library(readr)
library(ggplot2)
library(tidyr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
edu<-read.csv('Female Education Data.csv',check.names = FALSE)
#view the data
(head(edu, 10))

##                                                                                           Series Name
## 1  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 2  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 3  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 4  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 5  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 6  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 7  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 8  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 9  Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 10 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
##             Series Code   Country Name Country Code    2004 [YR2004]
## 1  SE.TER.CUAT.BA.FE.ZS    Afghanistan          AFG               ..
## 2  SE.TER.CUAT.BA.FE.ZS     Bangladesh          BGD               ..
## 3  SE.TER.CUAT.BA.FE.ZS         Bhutan          BTN               ..
## 4  SE.TER.CUAT.BA.FE.ZS          India          IND               ..
## 5  SE.TER.CUAT.BA.FE.ZS       Maldives          MDV               ..
## 6  SE.TER.CUAT.BA.FE.ZS          Nepal          NPL               ..
## 7  SE.TER.CUAT.BA.FE.ZS       Pakistan          PAK               ..
## 8  SE.TER.CUAT.BA.FE.ZS      Sri Lanka          LKA               ..
## 9  SE.TER.CUAT.BA.FE.ZS American Samoa          ASM               ..
## 10 SE.TER.CUAT.BA.FE.ZS      Australia          AUS 21.0028247833252
##       2005 [YR2005]    2006 [YR2006]    2007 [YR2007]     2008 [YR2008]
## 1                ..               ..               ..                ..
## 2                ..               ..               ..                ..
## 3                ..               ..               ..                ..
## 4                ..               ..               ..                ..
## 5                ..               ..               ..                ..
## 6                ..               ..               .. 0.910622537136078
## 7  2.10417008399963 3.52049994468689 3.23870253562927  3.65621995925903
## 8                ..               ..               ..                ..
## 9                ..               ..               ..                ..
## 10 21.5452346801758 22.6497402191162 24.3102226257324   25.090461730957
##       2009 [YR2009]     2010 [YR2010]     2011 [YR2011]    2012 [YR2012]
## 1                ..                ..                ..               ..
## 2                ..                ..  3.05580997467041 3.20914006233215
## 3                ..                ..                .. 2.56905007362366
## 4                ..                ..  6.72489976882935               ..
## 5                ..                ..                ..               ..
## 6                ..                ..  2.23696994781494               ..
## 7  4.50910997390747  4.73928022384644  4.57596015930176 4.91368007659912
## 8                .. 0.383627027273178 0.365902453660965               ..
## 9                ..                ..                ..               ..
## 10               ..                ..                ..               ..
##        2013 [YR2013]     2014 [YR2014]     2015 [YR2015]     2016 [YR2016]
## 1                 .. 0.506414830684662                ..                ..
## 2   3.11523008346558  3.76830005645752  5.30487012863159  5.45023012161255
## 3                 ..  15.0053186416626  15.3042860031128                ..
## 4                 ..                ..                ..  7.28427982330322
## 5                 ..  1.04722499847412                ..  9.00475788116455
## 6                 ..                ..                ..                ..
## 7   5.22017002105713  6.08884000778198  6.03219413757324  16.7618560791016
## 8  0.660946011543274 0.573716342449188 0.781386315822601 0.715153634548187
## 9                 ..                ..                ..                ..
## 10  28.9809894561768  30.3131504058838  31.8225193023682  32.1725997924805
##        2017 [YR2017]     2018 [YR2018]    2019 [YR2019]    2020 [YR2020]
## 1   1.93327450752258                ..               .. 1.43156433105469
## 2   5.91620016098022  6.35262012481689  6.6717700958252               ..
## 3   7.38642978668213   4.5947847366333 5.99403810501099 6.76300430297852
## 4                 ..  7.78669023513794 9.24438953399658 9.88014030456543
## 5   5.71459007263184                .. 9.80011940002441               ..
## 6   3.51821613311768                ..               ..               ..
## 7   6.16782999038696  2.65594005584717 3.66774988174438               ..
## 8  0.592809021472931 0.615735113620758   0.641581594944 0.62948739528656
## 9                 ..                ..               ..               ..
## 10  34.3086013793945   34.782398223877 35.4619789123535 38.2943000793457
##       2021 [YR2021]    2022 [YR2022] 2023 [YR2023]
## 1  1.15299999713898 1.53609001636505            ..
## 2  6.92283010482788               ..            ..
## 3  17.8103160858154 18.5730381011963            ..
## 4                ..  9.7648401260376            ..
## 5                ..               ..            ..
## 6                ..               ..            ..
## 7  7.08136034011841               ..            ..
## 8                ..               ..            ..
## 9                ..               ..            ..
## 10 38.8049201965332 40.0237808227539            ..

Prepare the Data

#create a subset of the columns that are useful
edu<-edu[,c(3,5:23)]

#reshape the data in the long form
tidy_edu <- 
edu %>%
gather('2004 [YR2004]':'2022 [YR2022]', key="year", value="edu_rate")

#trim the name of the years
tidy_edu[,2]<-substr(tidy_edu[,2],start=1,stop=4)

#Delete any rows with blanks/not useful data 
clean_edu <- tidy_edu[!apply(tidy_edu == ""|tidy_edu=="..", 1, any), ]   

#change education rate data to numeric data
clean_edu$edu_rate<-as.numeric(clean_edu$edu_rate)

What is the average Female Higher Education rate in East Asia & Pacific over the years

clean_edu %>% 
  group_by(year) %>% #Use group-by to group the data by year
  summarise(mean = mean(edu_rate)) #Find the mean of education rate in the region by year

## # A tibble: 19 × 2
##    year   mean
##    <chr> <dbl>
##  1 2004  17.4 
##  2 2005  14.1 
##  3 2006  13.7 
##  4 2007  15.9 
##  5 2008  14.6 
##  6 2009  16.3 
##  7 2010  11.9 
##  8 2011   9.14
##  9 2012  10.4 
## 10 2013  15.7 
## 11 2014  14.2 
## 12 2015  15.9 
## 13 2016  15.7 
## 14 2017  13.5 
## 15 2018  14.7 
## 16 2019  14.7 
## 17 2020  16.2 
## 18 2021  15.8 
## 19 2022  19.4

From the table above, while there is an uptick in education rate towards the end, it appears that a time-wise analysis will not be useful due to the lack of data in less developed countries in the earlier years. We can verify this by counting the number of observations by country below.

clean_edu %>% count(`Country Name`) #Use count by to check how many years of data we have for each country

##                 Country Name  n
## 1                Afghanistan  5
## 2                  Australia 15
## 3                 Bangladesh 10
## 4                     Bhutan  9
## 5          Brunei Darussalam  8
## 6                   Cambodia  8
## 7                      China  2
## 8                       Fiji  3
## 9                      India  6
## 10                 Indonesia 12
## 11                     Japan  2
## 12                  Kiribati  4
## 13 Korea, Dem. People's Rep. 19
## 14               Korea, Rep. 19
## 15                   Lao PDR  3
## 16          Macao SAR, China  1
## 17                  Malaysia  7
## 18                  Maldives  4
## 19          Marshall Islands  3
## 20     Micronesia, Fed. Sts.  1
## 21                  Mongolia 12
## 22                   Myanmar  4
## 23                     Nauru  1
## 24                     Nepal  3
## 25             New Caledonia  1
## 26               New Zealand  7
## 27                  Pakistan 16
## 28                     Palau  2
## 29          Papua New Guinea  1
## 30               Philippines  7
## 31                     Samoa  1
## 32                 Singapore 17
## 33           Solomon Islands  1
## 34                 Sri Lanka 10
## 35                  Thailand  5
## 36               Timor-Leste  2
## 37                     Tonga  3
## 38                    Tuvalu  3
## 39                   Vanuatu  4
## 40                  Viet Nam  4

From this table, it can be noted that most countries do not have a complete set of observations. As such, the remainder of this report will focus on findings in 2022.

How are the countries in East Asia & the Pacific distributed in terms of Female Higher Education Rate in 2022

#create subset of data in 2022
edu_2022<-subset(clean_edu,year=='2022')

#plot education data of countries in 2022
ggplot(edu_2022, aes(x=edu_rate)) + 
  geom_histogram(binwidth = 10,color='black',fill="blue")+
  labs(x='Education Rate (Cumulative %)',y='Frequency of Country',title='Country Count by Female Higher Education %',caption = "Source: World Bank (2022)")

From the histogram we can see that in East Asia & Pacific today, we have quite a mix of countries with higher education rates for females. However, this does not show us if having higher female education indicates a higher level of development.

Is there a relationship between GDP per capita and Female Higher Education Rate

To view the relationship between the attainment of female higher education and development, GDP per capita is used as an indicator of each country’s development. GDP per capita data is imported from the Worldbank data base for a similar year of 2022.

#import gdp data
gdp_2022<-read.csv('gdp_2022.csv')
#view gdp data
head(gdp_2022)

##        Country.Name Country.Code                  Series.Name    Series.Code
## 1    American Samoa          ASM GDP per capita (current US$) NY.GDP.PCAP.CD
## 2         Australia          AUS GDP per capita (current US$) NY.GDP.PCAP.CD
## 3 Brunei Darussalam          BRN GDP per capita (current US$) NY.GDP.PCAP.CD
## 4          Cambodia          KHM GDP per capita (current US$) NY.GDP.PCAP.CD
## 5             China          CHN GDP per capita (current US$) NY.GDP.PCAP.CD
## 6              Fiji          FJI GDP per capita (current US$) NY.GDP.PCAP.CD
##     X2022..YR2022.
## 1 19673.3901023197
## 2  65077.676668821
## 3 37152.4769749763
## 4 1759.60802346044
## 5 12662.5831692254
## 6 5356.16438036291

#create subset of data
gdp_2022<-gdp_2022[,c(1,5)]
#rename column data
names(gdp_2022)[names(gdp_2022) == "X2022..YR2022."] <- "gdp"
names(gdp_2022)[names(gdp_2022) == "Country.Name"] <- "Country Name"
#merge education and gdp data
data<-merge(edu_2022,gdp_2022,by="Country Name")
#remove any rows with blanks/not useful data
data <- data[!apply(data == ""|data=="..", 1, any), ]  
#change gdp to numeric data
data$gdp<-as.numeric(data$gdp)

ggplot(data, aes(x=edu_rate,y=gdp)) + 
  geom_point(color='blue') +
  geom_smooth(method = "lm", se = FALSE,color='black')+
  labs(x='Education Rate (Cumulative %)',y='GDP per Capita',title='GDP per Capita against Female Higher Education %',caption = "Source: World Bank")

## `geom_smooth()` using formula = 'y ~ x'

From this chart, we can see there is a general relationship between Education rate and GDP per capita. This makes sense when we compare countries like Australia and Singapore who are developed and have high gender equality against less developed countries such as Laos. However, there are some outliers such as Mongolia where the female education rate is high despite low GDP per capita. This could be potentially explained by cultural differences and the unique economy of their country but would require a deeper analysis to draw accurate conclusions.

Week 3 Challenge

Anthea Tay

2024-09-14