Data Wrangling and Graphing World Bank Data in R with the WDI Package
The World Bank’s DataBank currently hosts over 80 databases that
include “time series data on a multitude of topics for many countries
around the world.” The World Development Indicators (WDI) database is
probably the DataBank’s most popular, and a crucial resource for policy
experts and academics.
The portal allows users to create different
types of graphs, which can be downloaded and pasted into their
documents. In addition, users can download the data associated with
their queries in many different formats
We will use R‘s WDI
package, developed by Vincent Arel-Bundock, to download, wrangle,
analyze and plot the World Bank’s data.
Open RStudio, create a new project, and if you have not done so already install the tidyverse, WDI, and the scales packages. The last of these packages work with ggplot2 (which is part of the tidyverse) to transform numbers in scientific notations into other values or formats.
install.packages (“WDI”) install.packages (“tidyverse”) install.packages (“scales”)
Once the installations are complete, you will then load the libraries.
library (WDI)
library (tidyverse)
library (scales)
One of the challenges of working with the World Bank’s WDI is that it “contains 1,400 time-series indicators for 217 economies and more than 40 country groups, with data for many indicators going back more than 50 years.” Luckily, Arel-Bundock’s WDI package includes an easy-to-use search function that produces the indicator’s ID and its name.
If you are doing research on military issues, you can use this search function to see if the WDI includes any associated indicators. Using the function is straightforward.
WDIsearch ("military")
## indicator
## 10941 MS.MIL.XPND.CD
## 10942 MS.MIL.XPND.CN
## 10943 MS.MIL.XPND.GD.ZS
## 10944 MS.MIL.XPND.GN.ZS
## 10945 MS.MIL.XPND.ZS
## 20213 VC.PKP.TOTL.UN
## name
## 10941 Military expenditure (current USD)
## 10942 Military expenditure (current LCU)
## 10943 Military expenditure (% of GDP)
## 10944 Military expenditure (% of GNI)
## 10945 Military expenditure (% of general government expenditure)
## 20213 Presence of peace keepers (number of troops, police, and military observers in mandate)
To access the data for the first indicator, we use the following function.
df_mil <- WDI(
country = "all",
indicator = "MS.MIL.XPND.CD",
start = 1960,
end = 2020,
extra = FALSE,
cache = NULL,
latest = NULL,
language = "en")
head(df_mil)
## country iso2c iso3c year MS.MIL.XPND.CD
## 1 Africa Eastern and Southern ZH AFE 2020 10576660181
## 2 Africa Eastern and Southern ZH AFE 2019 11708245732
## 3 Africa Eastern and Southern ZH AFE 2018 12265438298
## 4 Africa Eastern and Southern ZH AFE 2017 16158643503
## 5 Africa Eastern and Southern ZH AFE 2016 13659413872
## 6 Africa Eastern and Southern ZH AFE 2015 15383318098
If you are looking for military expenditure data for the AR, you can do so easily.
df_az_mil <- WDI(
country = "AZ",
indicator = "MS.MIL.XPND.CD",
start = 1990,
end = 2020,
extra = FALSE,
cache = NULL,
latest = NULL,
language = "en")
df_az_mil
## country iso2c iso3c year MS.MIL.XPND.CD
## 1 Azerbaijan AZ AZE 2020 2237764706
## 2 Azerbaijan AZ AZE 2019 1854235294
## 3 Azerbaijan AZ AZE 2018 1672176471
## 4 Azerbaijan AZ AZE 2017 1528859592
## 5 Azerbaijan AZ AZE 2016 1396969108
## 6 Azerbaijan AZ AZE 2015 2900551382
## 7 Azerbaijan AZ AZE 2014 3427179917
## 8 Azerbaijan AZ AZE 2013 3367574161
## 9 Azerbaijan AZ AZE 2012 3246122613
## 10 Azerbaijan AZ AZE 2011 3080084996
## 11 Azerbaijan AZ AZE 2010 1476608734
## 12 Azerbaijan AZ AZE 2009 1472909977
## 13 Azerbaijan AZ AZE 2008 1607799226
## 14 Azerbaijan AZ AZE 2007 946599792
## 15 Azerbaijan AZ AZE 2006 717111854
## 16 Azerbaijan AZ AZE 2005 304521478
## 17 Azerbaijan AZ AZE 2004 228249632
## 18 Azerbaijan AZ AZE 2003 176552162
## 19 Azerbaijan AZ AZE 2002 139894092
## 20 Azerbaijan AZ AZE 2001 131963660
## 21 Azerbaijan AZ AZE 2000 119575652
## 22 Azerbaijan AZ AZE 1999 120262174
## 23 Azerbaijan AZ AZE 1998 107262859
## 24 Azerbaijan AZ AZE 1997 92086692
## 25 Azerbaijan AZ AZE 1996 71606841
## 26 Azerbaijan AZ AZE 1995 66159969
## 27 Azerbaijan AZ AZE 1994 43942747
## 28 Azerbaijan AZ AZE 1993 77519380
## 29 Azerbaijan AZ AZE 1992 11070111
## 30 Azerbaijan AZ AZE 1991 NA
## 31 Azerbaijan AZ AZE 1990 NA
Now lets make comparision for three Caucasian countries
df_caucas_mil <- WDI(
country = c("AZ", "AM", "GE"),
indicator = "MS.MIL.XPND.CD",
start = 1992,
end = 2020,
extra = FALSE,
cache = NULL,
latest = NULL,
language = "en"
)
Let’s use ggplot2 to generate a quick line graph.
ggplot (df_caucas_mil, aes (x=year, y= MS.MIL.XPND.CD, color = country))+
geom_line (size = 1)+
theme_minimal ()
## Warning: Removed 5 row(s) containing missing values (geom_path).
This is a good starting point but the graph needs some extra elements
ggplot (df_caucas_mil, aes (x=year, y= MS.MIL.XPND.CD, color = country))+
geom_line (size = 1)+
theme_minimal()+
scale_y_continuous (labels =unit_format(unit = "", scale = 1e-9, prefix = "$"))+
labs (title = "The Caucasus republics Military Expenditures (1992-2020)",
x = "",
y= "US Dollars (Billions)")
## Warning: Removed 5 row(s) containing missing values (geom_path).
What did we add? We changed the scales from the scientific notation “8e+11 to “$800”. In order to make sure our scale was in the billions of US Dollars we added a new label to the y axis, from “MS.MIL.XPND.CD” to”US Dollars (Billions)”. In addition, we removed the x-axis label from year to “” in the labs function. We can add other theme layers to change the size of the font, the font family, the color palette, and so forth. But this graph works for now.
This graph informs us what????