# Load necessary libraries
library(tidyr) # For cleaning the data
library(dplyr) # For manipulating the data
library(rstatix) # For descriptive statistics and analysis
library(ggplot2) # For advanced graphics
library(ggfortify) # For quick graphics in support of ggplot2
library(patchwork) # For combining two or more graphs
library(WDI) # For withdrawing the data from the World Bank
library(imf.data)
Withdraw Data from the World Bank Using R
Hin Lyhour (Ph.D.)
Senoir lecturer and researcher
Faculty of Agricultural Biosystems Engineering
Royal University of Agriculture, Cambodia
Date: May 30, 2025
I. Introduction
The World Bank provides rich sources of worldwide information that covers many sectors such as growth domestic product (GDP)
, employment rate, agriculture, and education. In the digital age, it also provides free access to the data, with the use of coding. Among that, R is one of the most important coding programs that can make the access quicker. Nevertheless, a basic understanding of the program is needed, so that the user can read and modify the code at their preferences.
In order to access the World Bank data, first we need to install the WDI
packages, by using the code install.packages("WDI")
, while the installation is done only once. Then, we need to import the library, by writing the code library(WDI)
whenever the project is opened for operation. Along with this library, we may recall others to support the coding, making it easier and faster. Among them, the tidyr
packages are used to tidy and clean the data, while the ggplot2
packages are utilized to plot beautiful graphs. Below are a list of the packages displayed for educational purposes. Some of them may not be used in this tutorial.
2. Retrieve All Indicators in World Bank
After importing the WDI
library, we can use the code WDIsearch(" ")
to look for indicators that we need use for the analysis. In the quotations ""
, we can type words such as water
, gdp
, population
, or education
. Normally, we need to give the name to the code. For instance, we can use the name water_data
for the water data imported from the World Bank, or edu_data for the education data. However, we can name the code anything we like, but we just remember to use it subsequently. After we type the code, we can run it by clicking ctrl + shift + enter
; then, a list corresponding to a specific sector will pop up.
# Import water data from the World Bank
<- WDIsearch("water")
water_data
# Show the first four rows of the result
head(water_data, 4)
indicator name
77 110400 110400:HOUSING, WATER, ELECTRICITY, GAS, AND OTHER FUELS
136 2.0.cov.Wat Coverage: Water
161 2.0.hoi.Wat HOI: Water
972 9060000 9060000:ACTUAL HOUSING, WATER, ELECTRICITY, GAS AND OTHER FUELS
# Import GDP data from the World Bank
<- WDIsearch("gdp")
gdp_data
# Show the first three rows of the result
head(gdp_data, 3)
indicator name
689 6.0.GDP_current GDP (current $)
690 6.0.GDP_growth GDP growth (annual %)
691 6.0.GDP_usd GDP (constant 2005 $)
# Import population data from WB
<- WDIsearch("education")
edu_data
# Show the first six rows of the result
head(edu_data, 6)
indicator
191 3.1_LOW.SEC.NEW.TEACHERS
192 3.1_PRI.NEW.ENTRANTS
194 3.11_LOW.SEC.CLASSROOMS
195 3.12_LOW.SEC.NEW.CLASSROOMS
196 3.13_PRI.MATH.BOOK.PER.PUPIL
197 3.14_PRI.LANGU.BOOK.PER.PUPIL
name
191 Lower secondary education, new teachers, national source
192 Primary education, new entrants, national source
194 Lower secondary education, classrooms, national source
195 Lower secondary education, new classrooms, national source
196 Ratio of textbooks per pupil, primary education, mathematics
197 Ratio of textbooks per pupil, primary education, language
By looking at the three coding examples above, the reader may find them frustrating because the data inside are written using acronyms or abbreviations. However, the data can be seen, checked, or remembered later if the focus is on specific topics.
3. Retrieve Specific Data by Country
If we want to retrieve specific data by country from the World Bank, we can use the code WDI(country = " ", indicator = " ", start = , end = )
. Similarly, we should name the code for subsequent use.
As an example below, the coding for the country
, we can use the acronym of an individual country. For instance, KH
stands for Cambodia, VN
for Vietnam, and FR
for France. If we need two countries, we should type country = c("KH", "VN")
.
As for the code indicator
, the word used for is hard to remember. Therefore, we may use artificial intelligence (AI)
such as ChatGPT
to provide the acronym for specific topics. For example, we type NY.GDP.MKTP.CD
for the total GDP, or NY.GDP.PCAP.CD
for GDP per capital. If we want to withdraw both types of information, we can type indicator = c("NY.GDP.MKTP.CD","NY.GDP.PCAP.CD"
.
As for the code start = & end =
, they are used to specify the starting year and ending year for data to be retrieved.
Example 1: Import GDP data
# Import the total GDP data (USD) for Cambodia
<- WDI(country = "KH", indicator = "NY.GDP.MKTP.CD",
gdp_total start = 1986, end = 2023)
# Check the result
head(gdp_total, 4)
country iso2c iso3c year NY.GDP.MKTP.CD
1 Cambodia KH KHM 2023 42335646896
2 Cambodia KH KHM 2022 39994532960
3 Cambodia KH KHM 2021 36790163687
4 Cambodia KH KHM 2020 34818073901
From the code above, we can successfully import the total GDP data for Cambodia between 1869 and 2023 from the World Bank. However, the column name for the total GDP is the acronym, so we have to rename it for better remembrance.
# Rename column 5
names(gdp_total)[5] <- "gdp"
Example 2: Import GDP data for two countries
# Import the total GDP data (USD) for Cambodia and VN
<- WDI(country = c("KH", "VN"),
gdp_two indicator = "NY.GDP.MKTP.CD",
start = 1986, end = 2023)
# Check the country
|> freq_table(country) gdp_two
# A tibble: 2 × 3
country n prop
<chr> <int> <dbl>
1 Cambodia 38 50
2 Viet Nam 38 50
After checking the country names, we found that the two countries are extracted. Similarly, the column name for the total GDP is the acronym, so we can follow the code used for the data extraction for Cambodia
# Rename column 5
names(gdp_two)[5] <- "gdp"
Example 3: Import GDP and population data
# Import GDP and population data for Cambodia and Vietnam from 1986 to 2023
<- WDI(country = c("KH", "VN"),
pop_gdp indicator = c("NY.GDP.MKTP.CD", "SP.POP.TOTL"),
start = 1986, end = 2023)
# Check the result
colnames(pop_gdp)
[1] "country" "iso2c" "iso3c" "year"
[5] "NY.GDP.MKTP.CD" "SP.POP.TOTL"
We can do the same to change the name for the total GDP and the total population in the two countries.
# Rename columns 5 and 6
names(pop_gdp)[5:6] <- c("gdp", "pop")
4. Graphic Display
We have extracted the data on the total GDP and population. For instance, we have gpd_total
for the total GDP data for Cambodia, gdg_two
for Cambodia and Vietnam, and pop_gdp
that includes both GPD and population for the two countries.
# Plot a line graph using ggplot2
## First, check column names
colnames(gdp_total)
[1] "country" "iso2c" "iso3c" "year" "gdp"
## Then, plot the graph
|> ggplot(aes(year, gdp/10^9)) +
gdp_total geom_line(color="red") +
labs(x= "Year", y="Total GDP (billion USD)")+
theme(text = element_text(size=14))
Cambodia’s total GDP grows over time and keeps growing more sharply from 2005.
# Plot a line graph using ggplot2
## First, check column names
colnames(gdp_two)
[1] "country" "iso2c" "iso3c" "year" "gdp"
## Then, plot the graph
|> ggplot(aes(year, gdp/10^9,
gdp_two color=country)) +
geom_line(aes(color=country, linetype=country)) +
labs(x= "Year", y="Total GDP (billion USD)")+
theme(text = element_text(size=14),
legend.title = element_blank())
# Plot a line graph using ggplot2
## First, check column names
colnames(gdp_two)
[1] "country" "iso2c" "iso3c" "year" "gdp"
## Then, plot the graph
|> ggplot(aes(year, gdp/10^9)) +
gdp_two geom_line(color="red") +
labs(x= "Year", y="Total GDP (billion USD)")+
theme(text = element_text(size=14))+
facet_grid(~country)
In Figures 2 and 3, both countries experience GDP increases over time, but Vietnam growth is faster.
# Plot a line graph using ggplot2
## First, check column names
colnames(pop_gdp)
[1] "country" "iso2c" "iso3c" "year" "gdp" "pop"
## Then, plot the graph
|> ggplot(aes(pop/10^6, gdp/10^9)) +
pop_gdp geom_point(shape=1, color="blue") +
geom_smooth(color="red")+
labs(x= "Population (million)", y="Total GDP (billion USD)")+
theme(text = element_text(size=14))+
facet_wrap(~country, scales = "free")+
theme(legend.position ="none")
It can be seen that the total GDP grows exponentially in response to the increasing populations in both countries. Nevertheless, Greater GDP and population are observed in Vietnam.
For further R learning, please give your comment, or contact me at hlyhour@rua.edu.kh.