Increasing the attainment of higher education is a pursuit of many Governments in improving their respective countries economy and development. In particular, sociologists have been interested in the rate of female higher education as an indicator of a nation’s gender equality and it’s downstream impacts on population, productivity and various other societal issues. As such, in the following report, I will be looking into the rate of female higher education attainment in Asia.
The data is downloaded from the WorldBank website and includes all countries in East Asia & Pacific. The statistics shown are a cumulative % of all females above the age of 25 who have attained a minimum of a Bachelor’s degree.
#import libraries and data
library(readr)
library(ggplot2)
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
edu<-read.csv('Female Education Data.csv',check.names = FALSE)
#view the data
(head(edu, 10))
## Series Name
## 1 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 2 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 3 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 4 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 5 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 6 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 7 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 8 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 9 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## 10 Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)
## Series Code Country Name Country Code 2004 [YR2004]
## 1 SE.TER.CUAT.BA.FE.ZS Afghanistan AFG ..
## 2 SE.TER.CUAT.BA.FE.ZS Bangladesh BGD ..
## 3 SE.TER.CUAT.BA.FE.ZS Bhutan BTN ..
## 4 SE.TER.CUAT.BA.FE.ZS India IND ..
## 5 SE.TER.CUAT.BA.FE.ZS Maldives MDV ..
## 6 SE.TER.CUAT.BA.FE.ZS Nepal NPL ..
## 7 SE.TER.CUAT.BA.FE.ZS Pakistan PAK ..
## 8 SE.TER.CUAT.BA.FE.ZS Sri Lanka LKA ..
## 9 SE.TER.CUAT.BA.FE.ZS American Samoa ASM ..
## 10 SE.TER.CUAT.BA.FE.ZS Australia AUS 21.0028247833252
## 2005 [YR2005] 2006 [YR2006] 2007 [YR2007] 2008 [YR2008]
## 1 .. .. .. ..
## 2 .. .. .. ..
## 3 .. .. .. ..
## 4 .. .. .. ..
## 5 .. .. .. ..
## 6 .. .. .. 0.910622537136078
## 7 2.10417008399963 3.52049994468689 3.23870253562927 3.65621995925903
## 8 .. .. .. ..
## 9 .. .. .. ..
## 10 21.5452346801758 22.6497402191162 24.3102226257324 25.090461730957
## 2009 [YR2009] 2010 [YR2010] 2011 [YR2011] 2012 [YR2012]
## 1 .. .. .. ..
## 2 .. .. 3.05580997467041 3.20914006233215
## 3 .. .. .. 2.56905007362366
## 4 .. .. 6.72489976882935 ..
## 5 .. .. .. ..
## 6 .. .. 2.23696994781494 ..
## 7 4.50910997390747 4.73928022384644 4.57596015930176 4.91368007659912
## 8 .. 0.383627027273178 0.365902453660965 ..
## 9 .. .. .. ..
## 10 .. .. .. ..
## 2013 [YR2013] 2014 [YR2014] 2015 [YR2015] 2016 [YR2016]
## 1 .. 0.506414830684662 .. ..
## 2 3.11523008346558 3.76830005645752 5.30487012863159 5.45023012161255
## 3 .. 15.0053186416626 15.3042860031128 ..
## 4 .. .. .. 7.28427982330322
## 5 .. 1.04722499847412 .. 9.00475788116455
## 6 .. .. .. ..
## 7 5.22017002105713 6.08884000778198 6.03219413757324 16.7618560791016
## 8 0.660946011543274 0.573716342449188 0.781386315822601 0.715153634548187
## 9 .. .. .. ..
## 10 28.9809894561768 30.3131504058838 31.8225193023682 32.1725997924805
## 2017 [YR2017] 2018 [YR2018] 2019 [YR2019] 2020 [YR2020]
## 1 1.93327450752258 .. .. 1.43156433105469
## 2 5.91620016098022 6.35262012481689 6.6717700958252 ..
## 3 7.38642978668213 4.5947847366333 5.99403810501099 6.76300430297852
## 4 .. 7.78669023513794 9.24438953399658 9.88014030456543
## 5 5.71459007263184 .. 9.80011940002441 ..
## 6 3.51821613311768 .. .. ..
## 7 6.16782999038696 2.65594005584717 3.66774988174438 ..
## 8 0.592809021472931 0.615735113620758 0.641581594944 0.62948739528656
## 9 .. .. .. ..
## 10 34.3086013793945 34.782398223877 35.4619789123535 38.2943000793457
## 2021 [YR2021] 2022 [YR2022] 2023 [YR2023]
## 1 1.15299999713898 1.53609001636505 ..
## 2 6.92283010482788 .. ..
## 3 17.8103160858154 18.5730381011963 ..
## 4 .. 9.7648401260376 ..
## 5 .. .. ..
## 6 .. .. ..
## 7 7.08136034011841 .. ..
## 8 .. .. ..
## 9 .. .. ..
## 10 38.8049201965332 40.0237808227539 ..
#create a subset of the columns that are useful
edu<-edu[,c(3,5:23)]
#reshape the data in the long form
tidy_edu <-
edu %>%
gather('2004 [YR2004]':'2022 [YR2022]', key="year", value="edu_rate")
#trim the name of the years
tidy_edu[,2]<-substr(tidy_edu[,2],start=1,stop=4)
#Delete any rows with blanks/not useful data
clean_edu <- tidy_edu[!apply(tidy_edu == ""|tidy_edu=="..", 1, any), ]
#change education rate data to numeric data
clean_edu$edu_rate<-as.numeric(clean_edu$edu_rate)
clean_edu %>%
group_by(year) %>% #Use group-by to group the data by year
summarise(mean = mean(edu_rate)) #Find the mean of education rate in the region by year
## # A tibble: 19 × 2
## year mean
## <chr> <dbl>
## 1 2004 17.4
## 2 2005 14.1
## 3 2006 13.7
## 4 2007 15.9
## 5 2008 14.6
## 6 2009 16.3
## 7 2010 11.9
## 8 2011 9.14
## 9 2012 10.4
## 10 2013 15.7
## 11 2014 14.2
## 12 2015 15.9
## 13 2016 15.7
## 14 2017 13.5
## 15 2018 14.7
## 16 2019 14.7
## 17 2020 16.2
## 18 2021 15.8
## 19 2022 19.4
From the table above, while there is an uptick in education rate towards the end, it appears that a time-wise analysis will not be useful due to the lack of data in less developed countries in the earlier years. We can verify this by counting the number of observations by country below.
clean_edu %>% count(`Country Name`) #Use count by to check how many years of data we have for each country
## Country Name n
## 1 Afghanistan 5
## 2 Australia 15
## 3 Bangladesh 10
## 4 Bhutan 9
## 5 Brunei Darussalam 8
## 6 Cambodia 8
## 7 China 2
## 8 Fiji 3
## 9 India 6
## 10 Indonesia 12
## 11 Japan 2
## 12 Kiribati 4
## 13 Korea, Dem. People's Rep. 19
## 14 Korea, Rep. 19
## 15 Lao PDR 3
## 16 Macao SAR, China 1
## 17 Malaysia 7
## 18 Maldives 4
## 19 Marshall Islands 3
## 20 Micronesia, Fed. Sts. 1
## 21 Mongolia 12
## 22 Myanmar 4
## 23 Nauru 1
## 24 Nepal 3
## 25 New Caledonia 1
## 26 New Zealand 7
## 27 Pakistan 16
## 28 Palau 2
## 29 Papua New Guinea 1
## 30 Philippines 7
## 31 Samoa 1
## 32 Singapore 17
## 33 Solomon Islands 1
## 34 Sri Lanka 10
## 35 Thailand 5
## 36 Timor-Leste 2
## 37 Tonga 3
## 38 Tuvalu 3
## 39 Vanuatu 4
## 40 Viet Nam 4
From this table, it can be noted that most countries do not have a complete set of observations. As such, the remainder of this report will focus on findings in 2022.
#create subset of data in 2022
edu_2022<-subset(clean_edu,year=='2022')
#plot education data of countries in 2022
ggplot(edu_2022, aes(x=edu_rate)) +
geom_histogram(binwidth = 10,color='black',fill="blue")+
labs(x='Education Rate (Cumulative %)',y='Frequency of Country',title='Country Count by Female Higher Education %',caption = "Source: World Bank (2022)")
From the histogram we can see that in East Asia & Pacific today, we have quite a mix of countries with higher education rates for females. However, this does not show us if having higher female education indicates a higher level of development.
To view the relationship between the attainment of female higher education and development, GDP per capita is used as an indicator of each country’s development. GDP per capita data is imported from the Worldbank data base for a similar year of 2022.
#import gdp data
gdp_2022<-read.csv('gdp_2022.csv')
#view gdp data
head(gdp_2022)
## Country.Name Country.Code Series.Name Series.Code
## 1 American Samoa ASM GDP per capita (current US$) NY.GDP.PCAP.CD
## 2 Australia AUS GDP per capita (current US$) NY.GDP.PCAP.CD
## 3 Brunei Darussalam BRN GDP per capita (current US$) NY.GDP.PCAP.CD
## 4 Cambodia KHM GDP per capita (current US$) NY.GDP.PCAP.CD
## 5 China CHN GDP per capita (current US$) NY.GDP.PCAP.CD
## 6 Fiji FJI GDP per capita (current US$) NY.GDP.PCAP.CD
## X2022..YR2022.
## 1 19673.3901023197
## 2 65077.676668821
## 3 37152.4769749763
## 4 1759.60802346044
## 5 12662.5831692254
## 6 5356.16438036291
#create subset of data
gdp_2022<-gdp_2022[,c(1,5)]
#rename column data
names(gdp_2022)[names(gdp_2022) == "X2022..YR2022."] <- "gdp"
names(gdp_2022)[names(gdp_2022) == "Country.Name"] <- "Country Name"
#merge education and gdp data
data<-merge(edu_2022,gdp_2022,by="Country Name")
#remove any rows with blanks/not useful data
data <- data[!apply(data == ""|data=="..", 1, any), ]
#change gdp to numeric data
data$gdp<-as.numeric(data$gdp)
ggplot(data, aes(x=edu_rate,y=gdp)) +
geom_point(color='blue') +
geom_smooth(method = "lm", se = FALSE,color='black')+
labs(x='Education Rate (Cumulative %)',y='GDP per Capita',title='GDP per Capita against Female Higher Education %',caption = "Source: World Bank")
## `geom_smooth()` using formula = 'y ~ x'
From this chart, we can see there is a general relationship between Education rate and GDP per capita. This makes sense when we compare countries like Australia and Singapore who are developed and have high gender equality against less developed countries such as Laos. However, there are some outliers such as Mongolia where the female education rate is high despite low GDP per capita. This could be potentially explained by cultural differences and the unique economy of their country but would require a deeper analysis to draw accurate conclusions.