Introduction

The datasets in the UNESCO Institute of Statistics Data Browser is open for every researcher who wants to peruse them. However, upon downloading the .csv file from the repository, the researcher is faced with a flat and unorganized data.

Disclaimer: This is a work-in-progress. Some errors and warnings might sometimes appear. Please refer to the help files for details.

Objectives

This package aims to lessen the time and augment the efforts of researchers when manipulating UIS datasets. This package will:

Filter and summarize the dataset by:

indicator, and
country; and

Create a quick comparison plot between countries.

Installation

Install the package from Github.

library(devtools)
install_github("cabantingdave/cabanting",force=TRUE)

Load the package.

library(cabanting)

Suggested Use

For a streamlined process, follow the following steps.

1. Download UIS Dataset

Download a .csv dataset from UIS repository (link) by following these steps.

Go to UIS Data Browser.
On the left, navigate to your desired categories.
Choose Export → Text file (CSV).
Tick Default Format and Download.
Read the .csv files using read.csv.

For ease in this documentation, we use the author’s downloaded UIS dataset that is stored in Github.

The sample dataset contains UIS data for Socioeconomic Indicators of Countries.

file <- url("https://raw.github.com/cabantingdave/cabanting/main/test_files/sample_dataset_socioeconomics.csv")
socioenonomics <- read.csv(file)

2. Show the indicators available on UIS dataset

To show the different indicators under this dataset, use the show_indicator_ids function under this package.

show_indicator_ids(socioenonomics)

##                                                                       
## NY_GDP_MKTP_CN                                       GDP (current LCU)
## NY_GDP_MKTP_CD                                       GDP (current US$)
## NY_GDP_DEFL_ZS              GDP deflator (base year varies by country)
## NY_GDP_MKTP_KD_ZG                                GDP growth (annual %)
## NY_GDP_PCAP_CD                            GDP per capita (current US$)
## NY_GDP_PCAP_PP_CD        GDP per capita, PPP (current international $)
## NY_GDP_MKTP_PP_CD                   GDP, PPP (current international $)
## XTGOV_IMF           General government total expenditure (current LCU)
## NY_GNP_PCAP_CD              GNI per capita, Atlas method (current US$)
## NY_GNP_PCAP_PP_CD        GNI per capita, PPP (current international $)
## PA_NUS_PPP        PPP conversion factor, GDP (LCU per international $)

3. Filter using `uis_filter_indicator` (or `uis_filter_country`).

The downloaded dataset is flat and unorganized. To filter a single indicator and arrange the years in column, use uis_filter_indicator.

gdp_per_capita <- uis_filter_indicator(df = socioenonomics,
                                      indicator = "NY_GDP_PCAP_CD")
head(gdp_per_capita)

##         COUNTRY                    INDICATOR  2014  2015  2016  2017  2018
## ABW       Aruba GDP per capita (current US$) 26648 27981 28281 29008    NA
## AFG Afghanistan GDP per capita (current US$)   614   578   547   556   524
## AGO      Angola GDP per capita (current US$)  5408  4167  3506  4096  3290
## ALB     Albania GDP per capita (current US$)  4579  3953  4124  4531  5284
## AND     Andorra GDP per capita (current US$) 41304 35763 37475 38963 41793
## ARG   Argentina GDP per capita (current US$) 12335 13789 12790 14592 11684
##      2019
## ABW    NA
## AFG   502
## AGO  2974
## ALB  5353
## AND 40886
## ARG 10006

4. Compare the values

To have a quick look at the data and compare the statistics of different countries, use plot_compare.

Note: The values are computed statistics (default is mean) of the available years per country in the data.frame. To isolate a single year, use years = <year> parameter in uis_filter_indicator before feeding the result into plot_compare.

plot_compare(df = gdp_per_capita,
             graph="segment") + 
  theme(axis.text.y = element_blank())

Highlight specific countries

To compare countries and the world statistics (default is median), use the countries and world_stat parameters, respectively.

plot_compare(df = gdp_per_capita,
             graph="segment",
             countries=c("IDN","CHN"),
             use_code_in_label = FALSE
             ) + 
  theme(axis.text.y = element_blank())

Choose countries by rank.

To filter the top countries, use the parameter top. Negative values will yield the bottom countries.

If the bars are relatively few, the author suggests to set the graph parameter to 'bar'.

plot_compare(df = gdp_per_capita,
             graph="bar",
             countries=c("MCO","LIE","LUX"),
             top=20,
             use_code_in_label = FALSE,
             title = "Top 20 Nations With Highest GDP per Capita (US$)")

Exclude the world statistics from the plot

plot_compare(df = gdp_per_capita,
             graph="bar",
             countries=c("BDI","MWI","CAF","MDG","COD"),
             top=-20,
             use_code_in_label = FALSE,
             include_world_stat = FALSE,
             title = "20 Nations With Lowest GDP per Capita (US$)")

More parameters are included in the function. Use help('plot_compare') to explore the function for further modifications.

Documentation

1. `show_indicator_ids`

Before using the functions uis_filter_indicator and uis_filter_country, determine the indicators and their IDs present in your downloaded dataset. Run help("show_indicator_ids") for the help file.

help("show_indicator_ids")

Show UIS Indicators

Description

show_indicator_ids shows the indicators and their corresponding indicator IDs on the current dataframe

Usage

show_indicator_ids(df)

Arguments

`df`	The `data.frame` to be filtered. Leave the UIS dataset as is

2. `uis_filter_indicator()` and `uis_filter_country()`

These functions will filter and summarize the downloaded dataset based on a single indicator or a single country. Refer to the following for the details.

help("uis_filter")

Filter UNESCO Institute of Statistics Dataset

Description

uis_filter_country filters UIS Dataset using a single country code.

uis_filter_indicator filters UIS dataset using a single Indicator ID.

This dataset must be downloaded from the UNESCO Institute of Statistics (UIS) Data Browser as a .csv file in default (,) format.

Usage

uis_filter_country(df, country_code, years = NULL, indicator_ids = NULL)

uis_filter_indicator(df, indicator_id, years = NULL, countries = NULL)

Arguments

`df`	The `data.frame` to be filtered. Leave the UIS dataset as is for it should have the Indicator ID as its first column, regardless of its column name, and it should contain the following columns: `Indicator` : Indicator names; `LOCATION`: Three-letter ISO country codes; `Country` : Official country names; `TIME` : Year; and `Value` : Numerical values.
`country_code`	The three-letter ISO code of the country to be filtered. It should be a character object with length 1.
`years`	The years to filter. It should be a numerical vector. Defaults to all available years in the dataset.
`indicator_ids`	The IDs of the indicator to be filtered as character vector. Defaults to all indicators in the asked country.
`indicator_id`	The ID of the indicator to be filtered. Try `df[match(unique(df[,1]),df[,1]),c(1,2)]` to know more about the indicator IDs.
`countries`	The countries to be filtered. It should be a character vector in the country code format. Defaults to all countries (including the UIS Regional summaries, if available).

Examples

df <- read.csv("NATMON_DS_04012021100059822.csv")

uis_filter_country(df,
                   "PHL")
uis_filter_country(df,
                   "PHL",
                    indicator_ids=c("FOSGP_5T8_F900","FOSGP_5T8_F300","FOSGP_5T8_FUK"))
uis_filter_country(df,
                   "PHL",
                   years=seq(2010,2013),
                   indicator_ids=c("FOSGP_5T8_F900","FOSGP_5T8_F300","FOSGP_5T8_FUK"))
uis_filter_indicator(df,
                     "FOSGP_5T8_F500")
uis_filter_indicator(df,
                     "FOSGP_5T8_F500",
                      countries=c("PHL","USA","CHN"))
uis_filter_indicator(df,
                      "FOSGP_5T8_F500",
                       years=seq(2010,2013),
                       countries=c("PHL","USA","CHN"))

3. `plot_compare()`

help("plot_compare")

Plot Comparison of Nations in UIS Dataset

Description

plot_compare plots a ggplot comparing the statistics of nations with the dataset made using the function uis_filter_indicator

Usage

plot_compare(
  df,
  countries = NULL,
  top = 0,
  stat = "mean",
  graph = "bar",
  include_world_stat = TRUE,
  world_stat = "median",
  remove_regions = TRUE,
  desc = TRUE,
  axis = 0,
  round_places = 0,
  labs = TRUE,
  use_code_in_axis = TRUE,
  title = NULL,
  use_code_in_label = TRUE,
  color_palette = NULL
)

Arguments

`df`	Dataframe. The `data.frame` to be filtered. This should be created using the `uis_filter_indicator` function.
`countries`	Character Vector. The three-letter codes of countries to compare. The function will highlight and label these countries.
`top`	Numerical vector or numerical variable. This will filter the top countries. Negative numbers will filter the bottom countries. Defaults to all countries. See examples for more details.
`stat`	`‘mean’`, `‘median’`, `‘sd’`, `‘min’`, and `‘max’`. Since the output of `uis_filter_indicator` contains multiple years, this is the statistic that will be calculated per country for all the available years.
`graph`	`‘bar’` or `‘segment’`. Type of graph to be plotted. Defaults to `‘bar’`.
`include_world_stat`	Logical. Defaults to `TRUE`. If `FALSE`, it will exclude the world statistic to be highlighted and labeled.
`world_stat`	`‘mean’` or `‘median’`. The world statistics to calculate. Defaults to `‘median’`. Will be ignored if `include_world_stat` is `FALSE`.
`remove_regions`	Logical. This will remove the regional summaries in the UIS dataset. Defaults to `TRUE`.
`desc`	Logical. Sorts the plot into descending order. Defaults to `TRUE`.
`axis`	`0` or `1`. Orientation of plot.
`round_places`	Integer. The number of places to round in calculations. Defaults to 0.
`labs`	Logical. Determines whether to show the labels in the plot. Determines
`use_code_in_axis`	Determines whether to show the code on the axis instead of the whole name of the country. Defaults to TRUE.
`title`	Character. The title to show on top.
`use_code_in_label`	Determines whether to show the code on the labels instead of the whole name of the country. Defaults to TRUE.
`color_palette`	RColorBrewer palettes. EXPERIMENTAL. Sets the color of the highlighted bars/segments.

Examples

df_source <- read.csv("NATMON_DS_04012021100059822.csv")
df <- uis_filter_indicator(df_source,"FOSGP_5T8_F900")

plot_compare(df)
plot_compare(df1,
             countries = c("PHL"),
             round_places = 3) +
   theme(axis.text.y = element_blank())
plot_compare(df,
           countries = c("PHL","MMR","ITA"),
           stat = "median",
           graph = "segment",
           top = 30,
           axis=1,
           round_places = 3)
plot_compare(df,
           countries = c("PHL","MMR","ITA"),
           stat = "median",
           world_stat = "mean",
           top = seq(20,40),
           axis=1,
           round_places = 3)
plot_compare(df,
           countries = c("PHL","MMR","ITA"),
           stat = "median",
           graph = "segment",
           axis=1,
           round_places = 3,
           top = -20
           include_world_stat = FALSE,
           use_code_in_label = FALSE,
           color_palette = "BuGn")

Miscellaneous

Sources

United Nations Educational, Scientific and Cultural Organization Institute of Statistics (UIS) Data Browser, Retrieved December 28, 2020, http://data.uis.unesco.org/
RStudio Support. (2020, September 21). Developing Packages with RStudio. Retrieved December 28, 2020, from https://support.rstudio.com/hc/en-us/articles/200486488-Developing-Packages-with-RStudio

Versions

v1.0.0. (Initial Release)

Over the course of the author’s exploration of the UIS dataset, more functions will be introduced. This is a work-in-progress. The author will publish these developments in the same repository in GitHub for open-access.

An R Package For Manipulating UNESCO Institute of Statistics Datasets

Learning Evidence in Math 204: Introduction to Computer Programming

Francis Dave Cabanting

December 28, 2020

Introduction

Objectives

Installation

Suggested Use

1. Download UIS Dataset

2. Show the indicators available on UIS dataset

3. Filter using `uis_filter_indicator` (or `uis_filter_country`).

4. Compare the values

Highlight specific countries

Choose countries by rank.

Exclude the world statistics from the plot

Documentation

1. `show_indicator_ids`

Show UIS Indicators

Description

Usage

Arguments

2. `uis_filter_indicator()` and `uis_filter_country()`

Filter UNESCO Institute of Statistics Dataset

Description

Usage

Arguments

Examples

3. `plot_compare()`

Plot Comparison of Nations in UIS Dataset

Description

Usage

Arguments

Examples

Miscellaneous

Sources

Versions

v1.0.0. (Initial Release)

Introduction

Objectives

Installation

Suggested Use

1. Download UIS Dataset

2. Show the indicators available on UIS dataset

3. Filter using uis_filter_indicator (or uis_filter_country).

4. Compare the values

Highlight specific countries

Choose countries by rank.

Exclude the world statistics from the plot

Documentation

1. show_indicator_ids

Show UIS Indicators

Description

Usage

Arguments

2. uis_filter_indicator() and uis_filter_country()

Filter UNESCO Institute of Statistics Dataset

Description

Usage

Arguments

Examples

3. plot_compare()

Plot Comparison of Nations in UIS Dataset

Description

Usage

Arguments

Examples

Miscellaneous

Sources

Versions

v1.0.0. (Initial Release)

3. Filter using `uis_filter_indicator` (or `uis_filter_country`).

1. `show_indicator_ids`

2. `uis_filter_indicator()` and `uis_filter_country()`

3. `plot_compare()`