Introduction
The datasets in the UNESCO Institute of Statistics Data Browser is open for every researcher who wants to peruse them. However, upon downloading the .csv file from the repository, the researcher is faced with a flat and unorganized data.
Disclaimer: This is a work-in-progress. Some errors and warnings might sometimes appear. Please refer to the help files for details.
Objectives
This package aims to lessen the time and augment the efforts of researchers when manipulating UIS datasets. This package will:
- Filter and summarize the dataset by:
- indicator, and
- country; and
- Create a quick comparison plot between countries.
Suggested Use
For a streamlined process, follow the following steps.
1. Download UIS Dataset
Download a .csv dataset from UIS repository (link) by following these steps.
- Go to UIS Data Browser.
- On the left, navigate to your desired categories.
- Choose Export → Text file (CSV).
- Tick Default Format and Download.
- Read the
.csvfiles usingread.csv.
For ease in this documentation, we use the author’s downloaded UIS dataset that is stored in Github.
The sample dataset contains UIS data for Socioeconomic Indicators of Countries.
2. Show the indicators available on UIS dataset
To show the different indicators under this dataset, use the show_indicator_ids function under this package.
##
## NY_GDP_MKTP_CN GDP (current LCU)
## NY_GDP_MKTP_CD GDP (current US$)
## NY_GDP_DEFL_ZS GDP deflator (base year varies by country)
## NY_GDP_MKTP_KD_ZG GDP growth (annual %)
## NY_GDP_PCAP_CD GDP per capita (current US$)
## NY_GDP_PCAP_PP_CD GDP per capita, PPP (current international $)
## NY_GDP_MKTP_PP_CD GDP, PPP (current international $)
## XTGOV_IMF General government total expenditure (current LCU)
## NY_GNP_PCAP_CD GNI per capita, Atlas method (current US$)
## NY_GNP_PCAP_PP_CD GNI per capita, PPP (current international $)
## PA_NUS_PPP PPP conversion factor, GDP (LCU per international $)
3. Filter using uis_filter_indicator (or uis_filter_country).
The downloaded dataset is flat and unorganized. To filter a single indicator and arrange the years in column, use uis_filter_indicator.
gdp_per_capita <- uis_filter_indicator(df = socioenonomics,
indicator = "NY_GDP_PCAP_CD")
head(gdp_per_capita)## COUNTRY INDICATOR 2014 2015 2016 2017 2018
## ABW Aruba GDP per capita (current US$) 26648 27981 28281 29008 NA
## AFG Afghanistan GDP per capita (current US$) 614 578 547 556 524
## AGO Angola GDP per capita (current US$) 5408 4167 3506 4096 3290
## ALB Albania GDP per capita (current US$) 4579 3953 4124 4531 5284
## AND Andorra GDP per capita (current US$) 41304 35763 37475 38963 41793
## ARG Argentina GDP per capita (current US$) 12335 13789 12790 14592 11684
## 2019
## ABW NA
## AFG 502
## AGO 2974
## ALB 5353
## AND 40886
## ARG 10006
4. Compare the values
To have a quick look at the data and compare the statistics of different countries, use plot_compare.
Note: The values are computed statistics (default is mean) of the available years per country in the data.frame. To isolate a single year, use years = <year> parameter in uis_filter_indicator before feeding the result into plot_compare.
Highlight specific countries
To compare countries and the world statistics (default is median), use the countries and world_stat parameters, respectively.
plot_compare(df = gdp_per_capita,
graph="segment",
countries=c("IDN","CHN"),
use_code_in_label = FALSE
) +
theme(axis.text.y = element_blank())Choose countries by rank.
To filter the top countries, use the parameter top. Negative values will yield the bottom countries.
If the bars are relatively few, the author suggests to set the graph parameter to 'bar'.
plot_compare(df = gdp_per_capita,
graph="bar",
countries=c("MCO","LIE","LUX"),
top=20,
use_code_in_label = FALSE,
title = "Top 20 Nations With Highest GDP per Capita (US$)")Exclude the world statistics from the plot
plot_compare(df = gdp_per_capita,
graph="bar",
countries=c("BDI","MWI","CAF","MDG","COD"),
top=-20,
use_code_in_label = FALSE,
include_world_stat = FALSE,
title = "20 Nations With Lowest GDP per Capita (US$)")More parameters are included in the function. Use help('plot_compare') to explore the function for further modifications.
Documentation
1. show_indicator_ids
Before using the functions uis_filter_indicator and uis_filter_country, determine the indicators and their IDs present in your downloaded dataset. Run help("show_indicator_ids") for the help file.
Show UIS Indicators
Description
show_indicator_ids shows the indicators and their corresponding indicator IDs on the current dataframe
Usage
show_indicator_ids(df)
Arguments
df
|
The |
2. uis_filter_indicator() and uis_filter_country()
These functions will filter and summarize the downloaded dataset based on a single indicator or a single country. Refer to the following for the details.
Filter UNESCO Institute of Statistics Dataset
Description
uis_filter_country filters UIS Dataset using a single country code.
uis_filter_indicator filters UIS dataset using a single Indicator ID.
This dataset must be downloaded from the UNESCO Institute of Statistics (UIS) Data Browser as a .csv file in default (,) format.
Usage
uis_filter_country(df, country_code, years = NULL, indicator_ids = NULL) uis_filter_indicator(df, indicator_id, years = NULL, countries = NULL)
Arguments
df
|
The
|
country_code
|
The three-letter ISO code of the country to be filtered. It should be a character object with length 1. |
years
|
The years to filter. It should be a numerical vector. Defaults to all available years in the dataset. |
indicator_ids
|
The IDs of the indicator to be filtered as character vector. Defaults to all indicators in the asked country. |
indicator_id
|
The ID of the indicator to be filtered. Try |
countries
|
The countries to be filtered. It should be a character vector in the country code format. Defaults to all countries (including the UIS Regional summaries, if available). |
Examples
df <- read.csv("NATMON_DS_04012021100059822.csv")
uis_filter_country(df,
"PHL")
uis_filter_country(df,
"PHL",
indicator_ids=c("FOSGP_5T8_F900","FOSGP_5T8_F300","FOSGP_5T8_FUK"))
uis_filter_country(df,
"PHL",
years=seq(2010,2013),
indicator_ids=c("FOSGP_5T8_F900","FOSGP_5T8_F300","FOSGP_5T8_FUK"))
uis_filter_indicator(df,
"FOSGP_5T8_F500")
uis_filter_indicator(df,
"FOSGP_5T8_F500",
countries=c("PHL","USA","CHN"))
uis_filter_indicator(df,
"FOSGP_5T8_F500",
years=seq(2010,2013),
countries=c("PHL","USA","CHN"))
3. plot_compare()
Plot Comparison of Nations in UIS Dataset
Description
plot_compare plots a ggplot comparing the statistics of nations with the dataset made using the function uis_filter_indicator
Usage
plot_compare( df, countries = NULL, top = 0, stat = "mean", graph = "bar", include_world_stat = TRUE, world_stat = "median", remove_regions = TRUE, desc = TRUE, axis = 0, round_places = 0, labs = TRUE, use_code_in_axis = TRUE, title = NULL, use_code_in_label = TRUE, color_palette = NULL )
Arguments
df
|
Dataframe. The |
countries
|
Character Vector. The three-letter codes of countries to compare. The function will highlight and label these countries. |
top
|
Numerical vector or numerical variable. This will filter the top countries. Negative numbers will filter the bottom countries. Defaults to all countries. See examples for more details. |
stat
|
|
graph
|
|
include_world_stat
|
Logical. Defaults to |
world_stat
|
|
remove_regions
|
Logical. This will remove the regional summaries in the UIS dataset. Defaults to |
desc
|
Logical. Sorts the plot into descending order. Defaults to |
axis
|
|
round_places
|
Integer. The number of places to round in calculations. Defaults to 0. |
labs
|
Logical. Determines whether to show the labels in the plot. Determines |
use_code_in_axis
|
Determines whether to show the code on the axis instead of the whole name of the country. Defaults to TRUE. |
title
|
Character. The title to show on top. |
use_code_in_label
|
Determines whether to show the code on the labels instead of the whole name of the country. Defaults to TRUE. |
color_palette
|
RColorBrewer palettes. EXPERIMENTAL. Sets the color of the highlighted bars/segments. |
Examples
df_source <- read.csv("NATMON_DS_04012021100059822.csv")
df <- uis_filter_indicator(df_source,"FOSGP_5T8_F900")
plot_compare(df)
plot_compare(df1,
countries = c("PHL"),
round_places = 3) +
theme(axis.text.y = element_blank())
plot_compare(df,
countries = c("PHL","MMR","ITA"),
stat = "median",
graph = "segment",
top = 30,
axis=1,
round_places = 3)
plot_compare(df,
countries = c("PHL","MMR","ITA"),
stat = "median",
world_stat = "mean",
top = seq(20,40),
axis=1,
round_places = 3)
plot_compare(df,
countries = c("PHL","MMR","ITA"),
stat = "median",
graph = "segment",
axis=1,
round_places = 3,
top = -20
include_world_stat = FALSE,
use_code_in_label = FALSE,
color_palette = "BuGn")
Miscellaneous
Sources
United Nations Educational, Scientific and Cultural Organization Institute of Statistics (UIS) Data Browser, Retrieved December 28, 2020, http://data.uis.unesco.org/
RStudio Support. (2020, September 21). Developing Packages with RStudio. Retrieved December 28, 2020, from https://support.rstudio.com/hc/en-us/articles/200486488-Developing-Packages-with-RStudio
Versions
v1.0.0. (Initial Release)
Over the course of the author’s exploration of the UIS dataset, more functions will be introduced. This is a work-in-progress. The author will publish these developments in the same repository in GitHub for open-access.