Withdraw Data from the World Bank Using R

Hin Lyhour (Ph.D.)

Senoir lecturer and researcher
Faculty of Agricultural Biosystems Engineering
Royal University of Agriculture, Cambodia
Date: May 30, 2025

I. Introduction

The World Bank provides rich sources of worldwide information that covers many sectors such as growth domestic product (GDP), employment rate, agriculture, and education. In the digital age, it also provides free access to the data, with the use of coding. Among that, R is one of the most important coding programs that can make the access quicker. Nevertheless, a basic understanding of the program is needed, so that the user can read and modify the code at their preferences.

In order to access the World Bank data, first we need to install the WDI packages, by using the code install.packages("WDI"), while the installation is done only once. Then, we need to import the library, by writing the code library(WDI) whenever the project is opened for operation. Along with this library, we may recall others to support the coding, making it easier and faster. Among them, the tidyr packages are used to tidy and clean the data, while the ggplot2 packages are utilized to plot beautiful graphs. Below are a list of the packages displayed for educational purposes. Some of them may not be used in this tutorial.

# Load necessary libraries
library(tidyr) # For cleaning the data
library(dplyr) # For manipulating the data
library(rstatix) # For descriptive statistics and analysis
library(ggplot2) # For advanced graphics
library(ggfortify) # For quick graphics in support of ggplot2
library(patchwork) # For combining two or more graphs
library(WDI) # For withdrawing the data from the World Bank
library(imf.data)

2. Retrieve All Indicators in World Bank

After importing the WDI library, we can use the code WDIsearch(" ") to look for indicators that we need use for the analysis. In the quotations "", we can type words such as water, gdp, population, or education. Normally, we need to give the name to the code. For instance, we can use the name water_data for the water data imported from the World Bank, or edu_data for the education data. However, we can name the code anything we like, but we just remember to use it subsequently. After we type the code, we can run it by clicking ctrl + shift + enter; then, a list corresponding to a specific sector will pop up.

# Import water data from the World Bank
water_data <- WDIsearch("water") 

# Show the first four rows of the result
head(water_data, 4)

      indicator                                                            name
77       110400        110400:HOUSING, WATER, ELECTRICITY, GAS, AND OTHER FUELS
136 2.0.cov.Wat                                                 Coverage: Water
161 2.0.hoi.Wat                                                      HOI: Water
972     9060000 9060000:ACTUAL HOUSING, WATER, ELECTRICITY, GAS AND OTHER FUELS

# Import GDP data from the World Bank
gdp_data <- WDIsearch("gdp")

# Show the first three rows of the result
head(gdp_data, 3)

          indicator                  name
689 6.0.GDP_current       GDP (current $)
690  6.0.GDP_growth GDP growth (annual %)
691     6.0.GDP_usd GDP (constant 2005 $)

# Import population data from WB
edu_data <- WDIsearch("education")

# Show the first six rows of the result
head(edu_data, 6)

                        indicator
191      3.1_LOW.SEC.NEW.TEACHERS
192          3.1_PRI.NEW.ENTRANTS
194       3.11_LOW.SEC.CLASSROOMS
195   3.12_LOW.SEC.NEW.CLASSROOMS
196  3.13_PRI.MATH.BOOK.PER.PUPIL
197 3.14_PRI.LANGU.BOOK.PER.PUPIL
                                                            name
191     Lower secondary education, new teachers, national source
192             Primary education, new entrants, national source
194       Lower secondary education, classrooms, national source
195   Lower secondary education, new classrooms, national source
196 Ratio of textbooks per pupil, primary education, mathematics
197    Ratio of textbooks per pupil, primary education, language

By looking at the three coding examples above, the reader may find them frustrating because the data inside are written using acronyms or abbreviations. However, the data can be seen, checked, or remembered later if the focus is on specific topics.

3. Retrieve Specific Data by Country

If we want to retrieve specific data by country from the World Bank, we can use the code WDI(country = " ", indicator = " ", start = , end = ). Similarly, we should name the code for subsequent use.

As an example below, the coding for the country, we can use the acronym of an individual country. For instance, KH stands for Cambodia, VN for Vietnam, and FR for France. If we need two countries, we should type country = c("KH", "VN").

As for the code indicator, the word used for is hard to remember. Therefore, we may use artificial intelligence (AI) such as ChatGPT to provide the acronym for specific topics. For example, we type NY.GDP.MKTP.CD for the total GDP, or NY.GDP.PCAP.CD for GDP per capital. If we want to withdraw both types of information, we can type indicator = c("NY.GDP.MKTP.CD","NY.GDP.PCAP.CD".

As for the code start = & end =, they are used to specify the starting year and ending year for data to be retrieved.

Example 1: Import GDP data

# Import the total GDP data (USD) for Cambodia
gdp_total <- WDI(country = "KH", indicator = "NY.GDP.MKTP.CD",
           start = 1986, end = 2023)

# Check the result
head(gdp_total, 4)

   country iso2c iso3c year NY.GDP.MKTP.CD
1 Cambodia    KH   KHM 2023    42335646896
2 Cambodia    KH   KHM 2022    39994532960
3 Cambodia    KH   KHM 2021    36790163687
4 Cambodia    KH   KHM 2020    34818073901

From the code above, we can successfully import the total GDP data for Cambodia between 1869 and 2023 from the World Bank. However, the column name for the total GDP is the acronym, so we have to rename it for better remembrance.

# Rename column 5
names(gdp_total)[5] <- "gdp"

Example 2: Import GDP data for two countries

# Import the total GDP data (USD) for Cambodia and VN
gdp_two <- WDI(country = c("KH", "VN"), 
               indicator = "NY.GDP.MKTP.CD",
           start = 1986, end = 2023)

# Check the country
gdp_two |> freq_table(country)

# A tibble: 2 × 3
  country      n  prop
  <chr>    <int> <dbl>
1 Cambodia    38    50
2 Viet Nam    38    50

After checking the country names, we found that the two countries are extracted. Similarly, the column name for the total GDP is the acronym, so we can follow the code used for the data extraction for Cambodia

# Rename column 5
names(gdp_two)[5] <- "gdp"

Example 3: Import GDP and population data

# Import GDP and population data for Cambodia and Vietnam from 1986 to 2023
pop_gdp <- WDI(country = c("KH", "VN"), 
               indicator =  c("NY.GDP.MKTP.CD", "SP.POP.TOTL"), 
               start = 1986, end = 2023)

# Check the result
colnames(pop_gdp)

[1] "country"        "iso2c"          "iso3c"          "year"          
[5] "NY.GDP.MKTP.CD" "SP.POP.TOTL"

We can do the same to change the name for the total GDP and the total population in the two countries.

# Rename columns 5 and 6
names(pop_gdp)[5:6] <- c("gdp", "pop")

4. Graphic Display

We have extracted the data on the total GDP and population. For instance, we have gpd_total for the total GDP data for Cambodia, gdg_two for Cambodia and Vietnam, and pop_gdp that includes both GPD and population for the two countries.

# Plot a line graph using ggplot2
## First, check column names
colnames(gdp_total)

[1] "country" "iso2c"   "iso3c"   "year"    "gdp"

## Then, plot the graph
gdp_total |> ggplot(aes(year, gdp/10^9)) +
  geom_line(color="red") +
  labs(x= "Year", y="Total GDP (billion USD)")+
  theme(text = element_text(size=14))

Figure 1: Total GDP growth (billion USD) in Cambodia from 1986 to 2023

Cambodia’s total GDP grows over time and keeps growing more sharply from 2005.

# Plot a line graph using ggplot2
## First, check column names
colnames(gdp_two)

[1] "country" "iso2c"   "iso3c"   "year"    "gdp"

## Then, plot the graph
gdp_two |> ggplot(aes(year, gdp/10^9,
                      color=country)) +
  geom_line(aes(color=country, linetype=country)) +
  labs(x= "Year", y="Total GDP (billion USD)")+
  theme(text = element_text(size=14),
        legend.title = element_blank())

Figure 2: [Option 1] Total GDP growth (billion USD) for Cambodia and Vietnam from 1986 to 2023

# Plot a line graph using ggplot2
## First, check column names
colnames(gdp_two)

[1] "country" "iso2c"   "iso3c"   "year"    "gdp"

## Then, plot the graph
gdp_two |> ggplot(aes(year, gdp/10^9)) +
  geom_line(color="red") +
  labs(x= "Year", y="Total GDP (billion USD)")+
  theme(text = element_text(size=14))+
  facet_grid(~country)

Figure 3: [Option 2] Total GDP growth (billion USD) for Cambodia and Vietnam from 1986 to 2023

In Figures 2 and 3, both countries experience GDP increases over time, but Vietnam growth is faster.

# Plot a line graph using ggplot2
## First, check column names
colnames(pop_gdp)

[1] "country" "iso2c"   "iso3c"   "year"    "gdp"     "pop"

## Then, plot the graph
pop_gdp |> ggplot(aes(pop/10^6, gdp/10^9)) +
  geom_point(shape=1, color="blue") +
  geom_smooth(color="red")+
  labs(x= "Population (million)", y="Total GDP (billion USD)")+
  theme(text = element_text(size=14))+
  facet_wrap(~country, scales = "free")+
  theme(legend.position ="none")

Figure 4: Correlation between the total GDP and population

It can be seen that the total GDP grows exponentially in response to the increasing populations in both countries. Nevertheless, Greater GDP and population are observed in Vietnam.

For further R learning, please give your comment, or contact me at hlyhour@rua.edu.kh.