Workshop name

Data Wrangling and Graphing World Bank Data in R with the WDI Package
The World Bank’s DataBank currently hosts over 80 databases that include “time series data on a multitude of topics for many countries around the world.” The World Development Indicators (WDI) database is probably the DataBank’s most popular, and a crucial resource for policy experts and academics.
The portal allows users to create different types of graphs, which can be downloaded and pasted into their documents. In addition, users can download the data associated with their queries in many different formats
We will use R‘s WDI package, developed by Vincent Arel-Bundock, to download, wrangle, analyze and plot the World Bank’s data.

Working with the WDI Package

Open RStudio, create a new project, and if you have not done so already install the tidyverse, WDI, and the scales packages. The last of these packages work with ggplot2 (which is part of the tidyverse) to transform numbers in scientific notations into other values or formats.

install.packages (“WDI”) install.packages (“tidyverse”) install.packages (“scales”)

Including libraries

Once the installations are complete, you will then load the libraries.

library (WDI)
library (tidyverse)
library (scales)

One of the challenges of working with the World Bank’s WDI is that it “contains 1,400 time-series indicators for 217 economies and more than 40 country groups, with data for many indicators going back more than 50 years.” Luckily, Arel-Bundock’s WDI package includes an easy-to-use search function that produces the indicator’s ID and its name.

Example 1: No Data Wrangling

If you are doing research on military issues, you can use this search function to see if the WDI includes any associated indicators. Using the function is straightforward.

WDIsearch ("military")
##               indicator
## 10941    MS.MIL.XPND.CD
## 10942    MS.MIL.XPND.CN
## 10943 MS.MIL.XPND.GD.ZS
## 10944 MS.MIL.XPND.GN.ZS
## 10945    MS.MIL.XPND.ZS
## 20213    VC.PKP.TOTL.UN
##                                                                                          name
## 10941                                                      Military expenditure (current USD)
## 10942                                                      Military expenditure (current LCU)
## 10943                                                         Military expenditure (% of GDP)
## 10944                                                         Military expenditure (% of GNI)
## 10945                              Military expenditure (% of general government expenditure)
## 20213 Presence of peace keepers (number of troops, police, and military observers in mandate)

To access the data for the first indicator, we use the following function.

df_mil <- WDI(
  country = "all",
  indicator = "MS.MIL.XPND.CD",
  start = 1960,
  end = 2020,
  extra = FALSE,
  cache = NULL,
  latest = NULL,
  language = "en")

head(df_mil)
##                       country iso2c iso3c year MS.MIL.XPND.CD
## 1 Africa Eastern and Southern    ZH   AFE 2020    10576660181
## 2 Africa Eastern and Southern    ZH   AFE 2019    11708245732
## 3 Africa Eastern and Southern    ZH   AFE 2018    12265438298
## 4 Africa Eastern and Southern    ZH   AFE 2017    16158643503
## 5 Africa Eastern and Southern    ZH   AFE 2016    13659413872
## 6 Africa Eastern and Southern    ZH   AFE 2015    15383318098

If you are looking for military expenditure data for the AR, you can do so easily.

df_az_mil <- WDI(
country = "AZ",
indicator = "MS.MIL.XPND.CD",
start = 1990,
end = 2020,
extra = FALSE,
cache = NULL,
latest = NULL,
language = "en")

df_az_mil
##       country iso2c iso3c year MS.MIL.XPND.CD
## 1  Azerbaijan    AZ   AZE 2020     2237764706
## 2  Azerbaijan    AZ   AZE 2019     1854235294
## 3  Azerbaijan    AZ   AZE 2018     1672176471
## 4  Azerbaijan    AZ   AZE 2017     1528859592
## 5  Azerbaijan    AZ   AZE 2016     1396969108
## 6  Azerbaijan    AZ   AZE 2015     2900551382
## 7  Azerbaijan    AZ   AZE 2014     3427179917
## 8  Azerbaijan    AZ   AZE 2013     3367574161
## 9  Azerbaijan    AZ   AZE 2012     3246122613
## 10 Azerbaijan    AZ   AZE 2011     3080084996
## 11 Azerbaijan    AZ   AZE 2010     1476608734
## 12 Azerbaijan    AZ   AZE 2009     1472909977
## 13 Azerbaijan    AZ   AZE 2008     1607799226
## 14 Azerbaijan    AZ   AZE 2007      946599792
## 15 Azerbaijan    AZ   AZE 2006      717111854
## 16 Azerbaijan    AZ   AZE 2005      304521478
## 17 Azerbaijan    AZ   AZE 2004      228249632
## 18 Azerbaijan    AZ   AZE 2003      176552162
## 19 Azerbaijan    AZ   AZE 2002      139894092
## 20 Azerbaijan    AZ   AZE 2001      131963660
## 21 Azerbaijan    AZ   AZE 2000      119575652
## 22 Azerbaijan    AZ   AZE 1999      120262174
## 23 Azerbaijan    AZ   AZE 1998      107262859
## 24 Azerbaijan    AZ   AZE 1997       92086692
## 25 Azerbaijan    AZ   AZE 1996       71606841
## 26 Azerbaijan    AZ   AZE 1995       66159969
## 27 Azerbaijan    AZ   AZE 1994       43942747
## 28 Azerbaijan    AZ   AZE 1993       77519380
## 29 Azerbaijan    AZ   AZE 1992       11070111
## 30 Azerbaijan    AZ   AZE 1991             NA
## 31 Azerbaijan    AZ   AZE 1990             NA

Now lets make comparision for three Caucasian countries

df_caucas_mil <- WDI(
  country = c("AZ", "AM", "GE"), 
  indicator = "MS.MIL.XPND.CD",
  start = 1992,
  end = 2020,
  extra = FALSE,
  cache = NULL,
  latest = NULL,
  language = "en"
)

Let’s use ggplot2 to generate a quick line graph.

ggplot (df_caucas_mil, aes (x=year, y= MS.MIL.XPND.CD, color = country))+
  geom_line (size = 1)+
  theme_minimal ()
## Warning: Removed 5 row(s) containing missing values (geom_path).

This is a good starting point but the graph needs some extra elements

ggplot (df_caucas_mil, aes (x=year, y= MS.MIL.XPND.CD, color = country))+
  geom_line (size = 1)+
  theme_minimal()+
  scale_y_continuous (labels =unit_format(unit = "", scale = 1e-9, prefix = "$"))+
  labs (title = "The Caucasus republics Military Expenditures (1992-2020)",
        x = "",
        y= "US Dollars (Billions)")
## Warning: Removed 5 row(s) containing missing values (geom_path).

What did we add? We changed the scales from the scientific notation “8e+11 to “$800”. In order to make sure our scale was in the billions of US Dollars we added a new label to the y axis, from “MS.MIL.XPND.CD” to”US Dollars (Billions)”. In addition, we removed the x-axis label from year to “” in the labs function. We can add other theme layers to change the size of the font, the font family, the color palette, and so forth. But this graph works for now.

This graph informs us what????

Source: https://worldpoliticsdatalab.org/resources/data-wrangling-and-graphing-world-bank-data-in-r-with-the-wdi-package/