The data used in this project will be used to study the average Gender Inequality and Gender Development Indices of countries over the world. It gives us a measure of the state of affairs in terms of gender equality for men and women and how well or poor circumstances have changed or developed, over the years.
Data has been sourced from “Gender Budgeting and Gender Equality macroeconomic and financial data” published by IMF at https://data.imf.org/?sk=AC81946B-43E4-4FF3-84C7-217A6BDE8191&sId=1472837511014 (Table downloaded from tab “Query the data and create your own table”. Data taken for all countries and indicators for the years 2000-2013)
The original dataset has the following columns:
1. Countries : Every country over the world has been included in the data.
2. Gender_index : Gender Index is a column consisting of two values which are indicators themselves, Gender Development Index and Gender Inequality Index.
3. 2001, 2002….2013 : Discrete values of years for which the respective data has been collected.
Project coding
Loading the libraries required in the project
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(tidyr)
library(dplyr)
Reading the data from dataset and displaying top records in R Markdown in kable table.
df_xl = read_csv("Assignment1-Misra-Gender-Inequality-Development-Index-of-Countries.csv")
## Rows: 294 Columns: 16
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Country, Gender_Index
## dbl (14): 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, ...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
kbl(head(df_xl, n=5))%>%
kable_paper(bootstrap_options = "striped", full_width=F)
| Country | Gender_Index | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan, Islamic Republic of | Gender_Development_Index | 0.45 | 0.46 | 0.45 | 0.47 | 0.45 | 0.46 | 0.47 | 0.48 | 0.49 | 0.55 | 0.57 | 0.59 | 0.60 | 0.60 |
| Afghanistan, Islamic Republic of | Gender_Inequality_Index | 0.77 | 0.77 | 0.76 | 0.75 | 0.75 | 0.74 | 0.73 | 0.73 | 0.72 | 0.72 | 0.71 | 0.70 | 0.70 | 0.69 |
| Albania | Gender_Development_Index | 0.94 | 0.94 | 0.94 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 | 0.95 |
| Albania | Gender_Inequality_Index | 0.39 | 0.38 | 0.37 | 0.36 | 0.35 | 0.33 | 0.32 | 0.31 | 0.31 | 0.25 | 0.24 | 0.24 | 0.24 | 0.22 |
| Algeria | Gender_Development_Index | NA | NA | NA | NA | 0.80 | 0.81 | 0.81 | 0.82 | 0.83 | 0.84 | 0.84 | 0.84 | 0.85 | 0.85 |
Cleaning the data using tidyr package, and making sure:
Every observation has its own row (Country-wise data)
Every variable has its own column (Gender_Development_Index and Gender_Inequality_Index)
Pivot longer : First, I have made a new column for all the years listed and added the values to a temporary column called Index_value. The structure of the table is now narrower and longer.
Pivot wider : Then I have put values of Gender_Development_Index and Gender_Inequality_Index from Gender_Index column to make two separate columns, and fed the values according to Index_value. The structure of the table is now wider and shorter. Hence, now every variable has its own column.
Even though NA values do not violate tidy data principles, proceeding to clean the rows with NA values.
df_long=df_xl %>% pivot_longer(c('2000','2001','2002','2003','2004','2005','2006','2007'
,'2008','2009','2010','2011','2012','2013'),
names_to = "Year", values_to = "Index_value")
df_wide=df_long %>% pivot_wider(names_from = "Gender_Index" ,
values_from = "Index_value")
df_wo_na=df_wide[complete.cases(df_wide),]
Creating vector of a finite amount of countries from every continent for better visualization.
vec= c('Afghanistan, Islamic Republic of','Canada','France','Ghana','Iceland','India',
'Iran, Islamic Republic of','Iraq','Israel','Japan', 'Kazakhstan','Kenya',
'Korea, Republic of','Kuwait','Malaysia', 'Mexico','New Zealand','South Africa',
'Thailand','Turkey','United States' )
df=df_wo_na[df_wo_na$Country %in% vec,]
Loading libraries necessary for plotting
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(RColorBrewer)
For measuring how countries compare to others in terms of their Gender Development index to other countries, I first deduce the mean of the variable Gender_Development_Index from the dataframe consisting of all countries without NA values.
Comparing the index values of countries we can see if they were above average or below average. Some countries have had better performance on average (shown in blue) and others below average (shown in red). Using facets, year-wise visualization has been done.
Most countries have been consistent in their performance although some have shown improvement over the years in being above average like Mexico, Republic of Korea.
(Clearer labels of years and index values available in Rstudio view than Rmarkdown)
above_avg_GDI=df$Gender_Development_Index>mean(df_wo_na$Gender_Development_Index)
ggplot(data = df, mapping = aes(
x=Country, y=Gender_Development_Index, color=above_avg_GDI)) +
geom_point(size=2)+
facet_grid(~Year,) +
ggtitle("Gender Development of Countries 2000-2013")+
xlab("Country")+ylab("Gender Development Index")+
coord_flip()
Now, plotting how countries have fared for their Gender Inequality Index. Using a scatter plot, it is easier to see how the distribution of the values has been.
Some countries have been consistent with their good performance, like France, Iceland while Afghanistan has consistently fared poor. Countries like Turkey, Thailand, Mexico have had intermediate values of inequalities over the years.
The graph has been color coded to denote good equality among genders (low Gender Inequality Index) as Green and poor equality among genders (high Gender Inequality Index) as Red
ggplot(data = df, mapping = aes(
x=Country, y=Gender_Inequality_Index )) +
geom_point(stat="identity",aes(col=Gender_Inequality_Index), size=5)+
scale_color_gradient(low="green",
high = "tomato1",
guide="colourbar",
aesthetics = "colour")+
ggtitle("Gender Inequality in Countries 2000-2013")+
xlab("Country")+ylab("Gender Inequality Index")+
coord_flip()
Adding the values of the two indices and grouping by countries, we create a new vector to plot both the indices together.
Plotting both the values together.
df_twin=df %>%
group_by(Country) %>%
summarise(Devlpm_Score = sum(Gender_Development_Index),
Ineqlty_Score=sum(Gender_Inequality_Index)) %>%
arrange(desc(Ineqlty_Score))
Index = c(df_twin$Devlpm_Score,df_twin$Ineqlty_Score)
ggplot(df_twin %>% gather(Devlpm_Score, Ineqlty_Score, -Country),
aes(x = Country,
y = Index ,
fill = Devlpm_Score, Ineqlty_Score)) +
geom_bar(stat = 'identity', position='dodge') + coord_flip() +
ggtitle("Indices of Countries")+
xlab("Country")+ylab("Index")+
scale_fill_brewer(palette = "Paired")
Plotting both aggregate indices over the years there is a clearer trend in Inequality Index than Development Index.
p1=ggplot(df, aes(x=reorder(Country, Gender_Development_Index),
y= Gender_Development_Index))+
geom_bar(stat='identity', size=4)+
ggtitle("Gender Development in Descending order")+
xlab("Country")+ylab("Gender Development Index")+
coord_flip()
p1
p2=ggplot(df_twin, aes(x=reorder(Country, Ineqlty_Score),
y= Ineqlty_Score))+
geom_bar(stat='identity',color="tan", size=8)+
ggtitle("Gender Inequality in Descending order")+
xlab("Country")+ylab("Gender Inequality Index")+
coord_flip()
p2
Thus, concluding that:
1. Most countries have had consistent and comparable gender development over the years.
2. Despite development, gender inequality still exists in many countries.