An R Package For Manipulating UNESCO Institute of Statistics Datasets

Learning Evidence in Math 204: Introduction to Computer Programming

Francis Dave Cabanting

December 28, 2020

Introduction

The datasets in the UNESCO Institute of Statistics Data Browser is open for every researcher who wants to peruse them. However, upon downloading the .csv file from the repository, the researcher is faced with a flat and unorganized data.

Disclaimer: This is a work-in-progress. Some errors and warnings might sometimes appear. Please refer to the help files for details.

Objectives

This package aims to lessen the time and augment the efforts of researchers when manipulating UIS datasets. This package will:

  1. Filter and summarize the dataset by:
  • indicator, and
  • country; and
  1. Create a quick comparison plot between countries.

Installation

Install the package from Github.

library(devtools)
install_github("cabantingdave/cabanting",force=TRUE)

Load the package.

library(cabanting)

Suggested Use

For a streamlined process, follow the following steps.

1. Download UIS Dataset

Download a .csv dataset from UIS repository (link) by following these steps.

  1. Go to UIS Data Browser.
  2. On the left, navigate to your desired categories.
  3. Choose Export → Text file (CSV).
  4. Tick Default Format and Download.
  5. Read the .csv files using read.csv.

For ease in this documentation, we use the author’s downloaded UIS dataset that is stored in Github.

The sample dataset contains UIS data for Socioeconomic Indicators of Countries.

file <- url("https://raw.github.com/cabantingdave/cabanting/main/test_files/sample_dataset_socioeconomics.csv")
socioenonomics <- read.csv(file)

2. Show the indicators available on UIS dataset

To show the different indicators under this dataset, use the show_indicator_ids function under this package.

show_indicator_ids(socioenonomics)
##                                                                       
## NY_GDP_MKTP_CN                                       GDP (current LCU)
## NY_GDP_MKTP_CD                                       GDP (current US$)
## NY_GDP_DEFL_ZS              GDP deflator (base year varies by country)
## NY_GDP_MKTP_KD_ZG                                GDP growth (annual %)
## NY_GDP_PCAP_CD                            GDP per capita (current US$)
## NY_GDP_PCAP_PP_CD        GDP per capita, PPP (current international $)
## NY_GDP_MKTP_PP_CD                   GDP, PPP (current international $)
## XTGOV_IMF           General government total expenditure (current LCU)
## NY_GNP_PCAP_CD              GNI per capita, Atlas method (current US$)
## NY_GNP_PCAP_PP_CD        GNI per capita, PPP (current international $)
## PA_NUS_PPP        PPP conversion factor, GDP (LCU per international $)

3. Filter using uis_filter_indicator (or uis_filter_country).

The downloaded dataset is flat and unorganized. To filter a single indicator and arrange the years in column, use uis_filter_indicator.

gdp_per_capita <- uis_filter_indicator(df = socioenonomics,
                                      indicator = "NY_GDP_PCAP_CD")
head(gdp_per_capita)
##         COUNTRY                    INDICATOR  2014  2015  2016  2017  2018
## ABW       Aruba GDP per capita (current US$) 26648 27981 28281 29008    NA
## AFG Afghanistan GDP per capita (current US$)   614   578   547   556   524
## AGO      Angola GDP per capita (current US$)  5408  4167  3506  4096  3290
## ALB     Albania GDP per capita (current US$)  4579  3953  4124  4531  5284
## AND     Andorra GDP per capita (current US$) 41304 35763 37475 38963 41793
## ARG   Argentina GDP per capita (current US$) 12335 13789 12790 14592 11684
##      2019
## ABW    NA
## AFG   502
## AGO  2974
## ALB  5353
## AND 40886
## ARG 10006

4. Compare the values

To have a quick look at the data and compare the statistics of different countries, use plot_compare.

Note: The values are computed statistics (default is mean) of the available years per country in the data.frame. To isolate a single year, use years = <year> parameter in uis_filter_indicator before feeding the result into plot_compare.

plot_compare(df = gdp_per_capita,
             graph="segment") + 
  theme(axis.text.y = element_blank())

Highlight specific countries

To compare countries and the world statistics (default is median), use the countries and world_stat parameters, respectively.

plot_compare(df = gdp_per_capita,
             graph="segment",
             countries=c("IDN","CHN"),
             use_code_in_label = FALSE
             ) + 
  theme(axis.text.y = element_blank())

Choose countries by rank.

To filter the top countries, use the parameter top. Negative values will yield the bottom countries.

If the bars are relatively few, the author suggests to set the graph parameter to 'bar'.

plot_compare(df = gdp_per_capita,
             graph="bar",
             countries=c("MCO","LIE","LUX"),
             top=20,
             use_code_in_label = FALSE,
             title = "Top 20 Nations With Highest GDP per Capita (US$)")

Exclude the world statistics from the plot

plot_compare(df = gdp_per_capita,
             graph="bar",
             countries=c("BDI","MWI","CAF","MDG","COD"),
             top=-20,
             use_code_in_label = FALSE,
             include_world_stat = FALSE,
             title = "20 Nations With Lowest GDP per Capita (US$)")

More parameters are included in the function. Use help('plot_compare') to explore the function for further modifications.

Documentation

1. show_indicator_ids

Before using the functions uis_filter_indicator and uis_filter_country, determine the indicators and their IDs present in your downloaded dataset. Run help("show_indicator_ids") for the help file.

help("show_indicator_ids")

Show UIS Indicators

Description

show_indicator_ids shows the indicators and their corresponding indicator IDs on the current dataframe

Usage
show_indicator_ids(df)
Arguments
df

The data.frame to be filtered. Leave the UIS dataset as is

2. uis_filter_indicator() and uis_filter_country()

These functions will filter and summarize the downloaded dataset based on a single indicator or a single country. Refer to the following for the details.

help("uis_filter")

Filter UNESCO Institute of Statistics Dataset

Description

uis_filter_country filters UIS Dataset using a single country code.

uis_filter_indicator filters UIS dataset using a single Indicator ID.

This dataset must be downloaded from the UNESCO Institute of Statistics (UIS) Data Browser as a .csv file in default (,) format.

Usage
uis_filter_country(df, country_code, years = NULL, indicator_ids = NULL)

uis_filter_indicator(df, indicator_id, years = NULL, countries = NULL)
Arguments
df

The data.frame to be filtered. Leave the UIS dataset as is for it should have the Indicator ID as its first column, regardless of its column name, and it should contain the following columns:

  • Indicator : Indicator names;

  • LOCATION: Three-letter ISO country codes;

  • Country : Official country names;

  • TIME : Year; and

  • Value : Numerical values.

country_code

The three-letter ISO code of the country to be filtered. It should be a character object with length 1.

years

The years to filter. It should be a numerical vector. Defaults to all available years in the dataset.

indicator_ids

The IDs of the indicator to be filtered as character vector. Defaults to all indicators in the asked country.

indicator_id

The ID of the indicator to be filtered. Try df[match(unique(df[,1]),df[,1]),c(1,2)] to know more about the indicator IDs.

countries

The countries to be filtered. It should be a character vector in the country code format. Defaults to all countries (including the UIS Regional summaries, if available).

Examples
df <- read.csv("NATMON_DS_04012021100059822.csv")

uis_filter_country(df,
                   "PHL")
uis_filter_country(df,
                   "PHL",
                    indicator_ids=c("FOSGP_5T8_F900","FOSGP_5T8_F300","FOSGP_5T8_FUK"))
uis_filter_country(df,
                   "PHL",
                   years=seq(2010,2013),
                   indicator_ids=c("FOSGP_5T8_F900","FOSGP_5T8_F300","FOSGP_5T8_FUK"))
uis_filter_indicator(df,
                     "FOSGP_5T8_F500")
uis_filter_indicator(df,
                     "FOSGP_5T8_F500",
                      countries=c("PHL","USA","CHN"))
uis_filter_indicator(df,
                      "FOSGP_5T8_F500",
                       years=seq(2010,2013),
                       countries=c("PHL","USA","CHN"))

3. plot_compare()

help("plot_compare")

Plot Comparison of Nations in UIS Dataset

Description

plot_compare plots a ggplot comparing the statistics of nations with the dataset made using the function uis_filter_indicator

Usage
plot_compare(
  df,
  countries = NULL,
  top = 0,
  stat = "mean",
  graph = "bar",
  include_world_stat = TRUE,
  world_stat = "median",
  remove_regions = TRUE,
  desc = TRUE,
  axis = 0,
  round_places = 0,
  labs = TRUE,
  use_code_in_axis = TRUE,
  title = NULL,
  use_code_in_label = TRUE,
  color_palette = NULL
)
Arguments
df

Dataframe. The data.frame to be filtered. This should be created using the uis_filter_indicator function.

countries

Character Vector. The three-letter codes of countries to compare. The function will highlight and label these countries.

top

Numerical vector or numerical variable. This will filter the top countries. Negative numbers will filter the bottom countries. Defaults to all countries. See examples for more details.

stat

‘mean’, ‘median’, ‘sd’, ‘min’, and ‘max’. Since the output of uis_filter_indicator contains multiple years, this is the statistic that will be calculated per country for all the available years.

graph

‘bar’ or ‘segment’. Type of graph to be plotted. Defaults to ‘bar’.

include_world_stat

Logical. Defaults to TRUE. If FALSE, it will exclude the world statistic to be highlighted and labeled.

world_stat

‘mean’ or ‘median’. The world statistics to calculate. Defaults to ‘median’. Will be ignored if include_world_stat is FALSE.

remove_regions

Logical. This will remove the regional summaries in the UIS dataset. Defaults to TRUE.

desc

Logical. Sorts the plot into descending order. Defaults to TRUE.

axis

0 or 1. Orientation of plot.

round_places

Integer. The number of places to round in calculations. Defaults to 0.

labs

Logical. Determines whether to show the labels in the plot. Determines

use_code_in_axis

Determines whether to show the code on the axis instead of the whole name of the country. Defaults to TRUE.

title

Character. The title to show on top.

use_code_in_label

Determines whether to show the code on the labels instead of the whole name of the country. Defaults to TRUE.

color_palette

RColorBrewer palettes. EXPERIMENTAL. Sets the color of the highlighted bars/segments.

Examples
df_source <- read.csv("NATMON_DS_04012021100059822.csv")
df <- uis_filter_indicator(df_source,"FOSGP_5T8_F900")

plot_compare(df)
plot_compare(df1,
             countries = c("PHL"),
             round_places = 3) +
   theme(axis.text.y = element_blank())
plot_compare(df,
           countries = c("PHL","MMR","ITA"),
           stat = "median",
           graph = "segment",
           top = 30,
           axis=1,
           round_places = 3)
plot_compare(df,
           countries = c("PHL","MMR","ITA"),
           stat = "median",
           world_stat = "mean",
           top = seq(20,40),
           axis=1,
           round_places = 3)
plot_compare(df,
           countries = c("PHL","MMR","ITA"),
           stat = "median",
           graph = "segment",
           axis=1,
           round_places = 3,
           top = -20
           include_world_stat = FALSE,
           use_code_in_label = FALSE,
           color_palette = "BuGn")

Miscellaneous

Sources

Versions

v1.0.0. (Initial Release)

Over the course of the author’s exploration of the UIS dataset, more functions will be introduced. This is a work-in-progress. The author will publish these developments in the same repository in GitHub for open-access.