Getting Data from the World Bank Website using Python and Pandas-Datareader Module

Independent Data Analysis Project

Author

Affiliations

John Karuitha, PhD

Karatina University, Department of Business and Economics

University of the Witwatersrand, School of Construction Economics & Management

Published

May 18, 2025

Modified

May 18, 2025

Executive Summary

This article demonstrates how to access and analyze global development indicators from the World Bank using Python’s pandas-datareader module. We walk through package installation, querying available countries and indicators, and extracting time-series data for selected metrics. Using real-world examples such as GDP per capita, life expectancy, and access to electricity, we illustrate how to generate insightful visualizations that highlight stark regional disparities—particularly between Sub-Saharan Africa and Europe. The approach outlined provides a reproducible workflow for data analysts, researchers, and policy professionals seeking to work with high-quality international development data in Python. The article concludes with suggestions for deeper analysis and integration with geospatial and statistical tools.

Keywords

Data analysis, Python, Pandas, Pandas-datareader, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning, Scikit-learn, K-Nearest neigbors (KNN)

Background

Accessing high-quality, reliable, and up-to-date economic and development data is critical for research, policy analysis, and informed decision-making. The World Bank is a key source of such data, offering thousands of indicators spanning multiple dimensions of development, from poverty to education, environment, trade, and infrastructure.

In this article, we explore how to programmatically access the World Bank databases using Python, with a specific focus on the pandas-datareader module. We will cover package installation, data querying, and visual representation of selected indicators—highlighting regional disparities, such as between Europe and Sub-Saharan Africa.

Installing the Package

Depending on your operating system and environment setup, you can install the required packages using pip. Open the terminal or command prompt and run the following commands:

pip install pandas
pip install numpy
pip install matplotlib
pip install seaborn
pip install pandas-datareader

If you are using the Anaconda distribution, the packages can also be installed via the Anaconda Navigator or using conda install commands in the terminal.

Loading the Packages

Import the necessary libraries to facilitate data acquisition, manipulation, and visualization.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas_datareader import wb
import datetime

Getting the Data

Countries Represented in the Database

Let us begin by exploring all the countries and regions available in the World Bank database.

countries = wb.get_countries()
countries.head()

	iso3c	iso2c	name	region	adminregion	incomeLevel	lendingType	capitalCity	longitude	latitude
0	ABW	AW	Aruba	Latin America & Caribbean		High income	Not classified	Oranjestad	-70.0167	12.5167
1	AFE	ZH	Africa Eastern and Southern	Aggregates		Aggregates	Aggregates		NaN	NaN
2	AFG	AF	Afghanistan	South Asia	South Asia	Low income	IDA	Kabul	69.1761	34.5228
3	AFR	A9	Africa	Aggregates		Aggregates	Aggregates		NaN	NaN
4	AFW	ZI	Africa Western and Central	Aggregates		Aggregates	Aggregates		NaN	NaN

Variables in the Country Table

We examine the metadata provided for each country or region.

countries.columns

Index(['iso3c', 'iso2c', 'name', 'region', 'adminregion', 'incomeLevel',
       'lendingType', 'capitalCity', 'longitude', 'latitude'],
      dtype='object')

Focusing on Sub-Saharan Africa

To narrow our analysis, we extract data specifically for Sub-Saharan Africa. Note the region name has a trailing space, which must be preserved in the search.

africa = countries.loc[countries["region"] == "Sub-Saharan Africa ", :]
africa.head()

	iso3c	iso2c	name	region	adminregion	incomeLevel	lendingType	capitalCity	longitude	latitude
5	AGO	AO	Angola	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Lower middle income	IBRD	Luanda	13.24200	-8.81155
17	BDI	BI	Burundi	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Low income	IDA	Bujumbura	29.36390	-3.37840
21	BEN	BJ	Benin	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Lower middle income	IDA	Porto-Novo	2.63230	6.47790
22	BFA	BF	Burkina Faso	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Low income	IDA	Ouagadougou	-1.53395	12.36050
40	BWA	BW	Botswana	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Upper middle income	IBRD	Gaborone	25.92010	-24.65440

Indicators Available in the Database

The World Bank offers an extensive range of development indicators. Let’s inspect a few of them:

indicators = wb.get_indicators()
indicators.head()

	id	name	source	sourceNote	sourceOrganization	topics
0	1.0.HCount.1.90usd	Poverty Headcount ($1.90 a day)	LAC Equity Lab	The poverty headcount index measures the propo...	b'LAC Equity Lab tabulations of SEDLAC (CEDLAS...	Poverty
1	1.0.HCount.2.5usd	Poverty Headcount ($2.50 a day)	LAC Equity Lab	The poverty headcount index measures the propo...	b'LAC Equity Lab tabulations of SEDLAC (CEDLAS...	Poverty
2	1.0.HCount.Mid10to50	Middle Class ($10-50 a day) Headcount	LAC Equity Lab	The poverty headcount index measures the propo...	b'LAC Equity Lab tabulations of SEDLAC (CEDLAS...	Poverty
3	1.0.HCount.Ofcl	Official Moderate Poverty Rate-National	LAC Equity Lab	The poverty headcount index measures the propo...	b'LAC Equity Lab tabulations of data from Nati...	Poverty
4	1.0.HCount.Poor4uds	Poverty Headcount ($4 a day)	LAC Equity Lab	The poverty headcount index measures the propo...	b'LAC Equity Lab tabulations of SEDLAC (CEDLAS...	Poverty

Total Number of Unique Indicators

We find that over 26,000 indicators are available, opening up diverse avenues for data analysis.

indicators['name'].nunique()

Downloading Specific Data

Let’s extract and compare GDP per capita (current US$) for a set of countries in Sub-Saharan Africa and Europe. We use the indicator code: NY.GDP.PCAP.CD.

indicator_code = 'NY.GDP.PCAP.CD'
countries_of_interest = ['KEN', 'NGA', 'ZAF', 'FRA', 'DEU', 'GBR']

mydata = wb.download(indicator=indicator_code, country=countries_of_interest,
                   start=2000, end=2022)
mydata.reset_index(inplace=True)
mydata.head()

	country	year	NY.GDP.PCAP.CD
0	Germany	2022	49686.115458
1	Germany	2021	52265.654162
2	Germany	2020	47379.765195
3	Germany	2019	47623.865607
4	Germany	2018	48874.859503

Visualizing Regional Disparities

GDP Per Capita Over Time

We plot GDP per capita trends to illustrate economic disparities between Europe and Sub-Saharan Africa.

plt.figure(figsize=(12, 6))
sns.boxplot(data=mydata, x='country', y='NY.GDP.PCAP.CD', hue = "country")
plt.title('GDP Per Capita (Current US$), 2000–2022')
plt.ylabel('GDP per capita (US$)')
plt.xlabel('Year')
plt.grid(True)
plt.tight_layout()
plt.show()

Boxplot of GDP per Capita (Grouped by Region)

To better compare the economic spread, we enrich our dataset by adding a region label.

region_map = {'Kenya': 'Sub-Saharan Africa', 'Nigeria': 'Sub-Saharan Africa', 'South Africa': 'Sub-Saharan Africa',
              'France': 'Europe', 'Germany': 'Europe', 'United Kingdom': 'Europe'}
mydata['region'] = mydata['country'].map(region_map)

Now, we draw a boxplot to compare the GDP per capita distribution.

plt.figure(figsize=(10, 6))
sns.boxplot(data = mydata, x = 'region', y = 'NY.GDP.PCAP.CD', hue = "region")
plt.title('GDP Per Capita Distribution by Region (2000–2022)')
plt.ylabel('GDP per capita (US$)')
plt.xlabel('Region')
plt.yscale('log')  # To accommodate wide disparities
plt.grid(True)
plt.tight_layout()
plt.show()

Exploring Other Key Indicators

Let’s now download and visualize other indicators such as:

Life Expectancy at Birth (SP.DYN.LE00.IN)
Access to Electricity (% of population) (EG.ELC.ACCS.ZS)

We define a reusable function for this purpose:

def plot_indicator(indicator_code, title, ylabel):
    df = wb.download(indicator=indicator_code, country=countries_of_interest, start=2000, end=2022)
    df.reset_index(inplace=True)
    plt.figure(figsize=(12, 6))
    sns.lineplot(data=df, x='year', y=indicator_code, hue='country')
    plt.title(title)
    plt.ylabel(ylabel)
    plt.xlabel('Year')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

Life Expectancy

plot_indicator('SP.DYN.LE00.IN', 'Life Expectancy at Birth (2000–2022)', 'Years')

Access to Electricity

plot_indicator('EG.ELC.ACCS.ZS', 'Access to Electricity (% of Population)', '%')

Conclusion

The pandas-datareader module offers an efficient and reproducible way to access development indicators from the World Bank. This article demonstrated how to extract, manipulate, and visualize data to uncover meaningful insights—especially regional disparities such as those between Europe and Sub-Saharan Africa.

Researchers, analysts, and policymakers can leverage these tools to explore further indicators, enrich their understanding, and support evidence-based decision-making.

Next Steps:

Explore additional regions or income classifications.
Merge multiple indicators for multivariate analysis.
Integrate with geospatial visualization tools such as Folium or Plotly for map-based insights (Fisher 1936; James et al. 2013; Kodinariya, Makwana, et al. 2013; Muddana and Vinayakam 2024).

References

Fisher, Ronald A. 1936. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics 7 (2): 179–88.

James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, et al. 2013. An Introduction to Statistical Learning. Vol. 112. Springer.

Kodinariya, Trupti M, Prashant R Makwana, et al. 2013. “Review on Determining Number of Cluster in k-Means Clustering.” International Journal 1 (6): 90–95.

Muddana, A Lakshmi, and Sandhya Vinayakam. 2024. Python for Data Science. Springer.