Assignment1-Misra-Gender-Inequality-Development-Index-of-Countries.Rmd

Gender Inequality and Development Index of Countries

The data used in this project will be used to study the average Gender Inequality and Gender Development Indices of countries over the world. It gives us a measure of the state of affairs in terms of gender equality for men and women and how well or poor circumstances have changed or developed, over the years.

Data has been sourced from “Gender Budgeting and Gender Equality macroeconomic and financial data” published by IMF at https://data.imf.org/?sk=AC81946B-43E4-4FF3-84C7-217A6BDE8191&sId=1472837511014 (Table downloaded from tab “Query the data and create your own table”. Data taken for all countries and indicators for the years 2000-2013)

The original dataset has the following columns:

1. Countries : Every country over the world has been included in the data.

2. Gender_index : Gender Index is a column consisting of two values which are indicators themselves, Gender Development Index and Gender Inequality Index.

3. 2001, 2002….2013 : Discrete values of years for which the respective data has been collected.

Project coding

Loading the libraries required in the project

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library(tidyr)
library(dplyr)

Reading the data from dataset and displaying top records in R Markdown in kable table.

df_xl = read_csv("Assignment1-Misra-Gender-Inequality-Development-Index-of-Countries.csv")

## Rows: 294 Columns: 16

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (2): Country, Gender_Index
## dbl (14): 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, ...

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

kbl(head(df_xl, n=5))%>%
  kable_paper(bootstrap_options = "striped", full_width=F)

Country	Gender_Index	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013
Afghanistan, Islamic Republic of	Gender_Development_Index	0.45	0.46	0.45	0.47	0.45	0.46	0.47	0.48	0.49	0.55	0.57	0.59	0.60	0.60
Afghanistan, Islamic Republic of	Gender_Inequality_Index	0.77	0.77	0.76	0.75	0.75	0.74	0.73	0.73	0.72	0.72	0.71	0.70	0.70	0.69
Albania	Gender_Development_Index	0.94	0.94	0.94	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95
Albania	Gender_Inequality_Index	0.39	0.38	0.37	0.36	0.35	0.33	0.32	0.31	0.31	0.25	0.24	0.24	0.24	0.22
Algeria	Gender_Development_Index	NA	NA	NA	NA	0.80	0.81	0.81	0.82	0.83	0.84	0.84	0.84	0.85	0.85

Data Cleaning and Wrangling

Cleaning the data using tidyr package, and making sure:

Every observation has its own row (Country-wise data)
Every variable has its own column (Gender_Development_Index and Gender_Inequality_Index)
Pivot longer : First, I have made a new column for all the years listed and added the values to a temporary column called Index_value. The structure of the table is now narrower and longer.
Pivot wider : Then I have put values of Gender_Development_Index and Gender_Inequality_Index from Gender_Index column to make two separate columns, and fed the values according to Index_value. The structure of the table is now wider and shorter. Hence, now every variable has its own column.

Even though NA values do not violate tidy data principles, proceeding to clean the rows with NA values.

df_long=df_xl %>% pivot_longer(c('2000','2001','2002','2003','2004','2005','2006','2007'
                                 ,'2008','2009','2010','2011','2012','2013'), 
                               names_to = "Year", values_to = "Index_value")                


df_wide=df_long %>% pivot_wider(names_from = "Gender_Index" , 
                          values_from = "Index_value")

df_wo_na=df_wide[complete.cases(df_wide),]

Data Visualization

Creating vector of a finite amount of countries from every continent for better visualization.

vec= c('Afghanistan, Islamic Republic of','Canada','France','Ghana','Iceland','India', 
       'Iran, Islamic Republic of','Iraq','Israel','Japan', 'Kazakhstan','Kenya',
       'Korea, Republic of','Kuwait','Malaysia', 'Mexico','New Zealand','South Africa',
       'Thailand','Turkey','United States' )

df=df_wo_na[df_wo_na$Country %in% vec,]

Loading libraries necessary for plotting

library(ggplot2)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

library(RColorBrewer)

Gender Development Index of Countries over the years

For measuring how countries compare to others in terms of their Gender Development index to other countries, I first deduce the mean of the variable Gender_Development_Index from the dataframe consisting of all countries without NA values.

Comparing the index values of countries we can see if they were above average or below average. Some countries have had better performance on average (shown in blue) and others below average (shown in red). Using facets, year-wise visualization has been done.

Most countries have been consistent in their performance although some have shown improvement over the years in being above average like Mexico, Republic of Korea.

(Clearer labels of years and index values available in Rstudio view than Rmarkdown)

above_avg_GDI=df$Gender_Development_Index>mean(df_wo_na$Gender_Development_Index)

ggplot(data = df, mapping = aes(
  x=Country, y=Gender_Development_Index, color=above_avg_GDI)) +
  geom_point(size=2)+
  facet_grid(~Year,) +
  ggtitle("Gender Development of Countries 2000-2013")+
  xlab("Country")+ylab("Gender Development Index")+
  coord_flip()

Gender Inequality Index of Countries

Now, plotting how countries have fared for their Gender Inequality Index. Using a scatter plot, it is easier to see how the distribution of the values has been.

Some countries have been consistent with their good performance, like France, Iceland while Afghanistan has consistently fared poor. Countries like Turkey, Thailand, Mexico have had intermediate values of inequalities over the years.

The graph has been color coded to denote good equality among genders (low Gender Inequality Index) as Green and poor equality among genders (high Gender Inequality Index) as Red

ggplot(data = df, mapping = aes(
  x=Country, y=Gender_Inequality_Index )) +
  geom_point(stat="identity",aes(col=Gender_Inequality_Index), size=5)+
  scale_color_gradient(low="green",
                       high = "tomato1",
                       guide="colourbar",
                       aesthetics = "colour")+
  ggtitle("Gender Inequality in Countries 2000-2013")+
  xlab("Country")+ylab("Gender Inequality Index")+
  coord_flip()

Comparing Indices

Adding the values of the two indices and grouping by countries, we create a new vector to plot both the indices together.

Plotting both the values together.

df_twin=df %>%
  group_by(Country) %>%
  summarise(Devlpm_Score = sum(Gender_Development_Index), 
            Ineqlty_Score=sum(Gender_Inequality_Index)) %>%
  arrange(desc(Ineqlty_Score))

Index = c(df_twin$Devlpm_Score,df_twin$Ineqlty_Score)


ggplot(df_twin %>% gather(Devlpm_Score, Ineqlty_Score, -Country),
       aes(x = Country,
           y = Index , 
           fill = Devlpm_Score, Ineqlty_Score)) +
  geom_bar(stat = 'identity', position='dodge') + coord_flip() +
  ggtitle("Indices of Countries")+
  xlab("Country")+ylab("Index")+
  scale_fill_brewer(palette = "Paired")

Plotting both aggregate indices over the years there is a clearer trend in Inequality Index than Development Index.

p1=ggplot(df, aes(x=reorder(Country, Gender_Development_Index),
                       y= Gender_Development_Index))+ 
  geom_bar(stat='identity', size=4)+
  ggtitle("Gender Development in Descending order")+
  xlab("Country")+ylab("Gender Development Index")+
  coord_flip()

p1

p2=ggplot(df_twin, aes(x=reorder(Country, Ineqlty_Score),
                       y= Ineqlty_Score))+ 
            geom_bar(stat='identity',color="tan", size=8)+
   ggtitle("Gender Inequality in Descending order")+
  xlab("Country")+ylab("Gender Inequality Index")+
  coord_flip()

p2

Thus, concluding that:

1. Most countries have had consistent and comparable gender development over the years.

2. Despite development, gender inequality still exists in many countries.