Getting Data from the World Bank Website using Python and Pandas-Datareader Module

Independent Data Analysis Project

Published

May 18, 2025

Modified

May 18, 2025

Executive Summary

This article demonstrates how to access and analyze global development indicators from the World Bank using Python’s pandas-datareader module. We walk through package installation, querying available countries and indicators, and extracting time-series data for selected metrics. Using real-world examples such as GDP per capita, life expectancy, and access to electricity, we illustrate how to generate insightful visualizations that highlight stark regional disparities—particularly between Sub-Saharan Africa and Europe. The approach outlined provides a reproducible workflow for data analysts, researchers, and policy professionals seeking to work with high-quality international development data in Python. The article concludes with suggestions for deeper analysis and integration with geospatial and statistical tools.

Keywords

Data analysis, Python, Pandas, Pandas-datareader, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning, Scikit-learn, K-Nearest neigbors (KNN)

Background

Accessing high-quality, reliable, and up-to-date economic and development data is critical for research, policy analysis, and informed decision-making. The World Bank is a key source of such data, offering thousands of indicators spanning multiple dimensions of development, from poverty to education, environment, trade, and infrastructure.

In this article, we explore how to programmatically access the World Bank databases using Python, with a specific focus on the pandas-datareader module. We will cover package installation, data querying, and visual representation of selected indicators—highlighting regional disparities, such as between Europe and Sub-Saharan Africa.

Installing the Package

Depending on your operating system and environment setup, you can install the required packages using pip. Open the terminal or command prompt and run the following commands:

pip install pandas
pip install numpy
pip install matplotlib
pip install seaborn
pip install pandas-datareader

If you are using the Anaconda distribution, the packages can also be installed via the Anaconda Navigator or using conda install commands in the terminal.

Loading the Packages

Import the necessary libraries to facilitate data acquisition, manipulation, and visualization.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas_datareader import wb
import datetime

Getting the Data

Countries Represented in the Database

Let us begin by exploring all the countries and regions available in the World Bank database.

countries = wb.get_countries()
countries.head()
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude
0 ABW AW Aruba Latin America & Caribbean High income Not classified Oranjestad -70.0167 12.5167
1 AFE ZH Africa Eastern and Southern Aggregates Aggregates Aggregates NaN NaN
2 AFG AF Afghanistan South Asia South Asia Low income IDA Kabul 69.1761 34.5228
3 AFR A9 Africa Aggregates Aggregates Aggregates NaN NaN
4 AFW ZI Africa Western and Central Aggregates Aggregates Aggregates NaN NaN

Variables in the Country Table

We examine the metadata provided for each country or region.

countries.columns
Index(['iso3c', 'iso2c', 'name', 'region', 'adminregion', 'incomeLevel',
       'lendingType', 'capitalCity', 'longitude', 'latitude'],
      dtype='object')

Focusing on Sub-Saharan Africa

To narrow our analysis, we extract data specifically for Sub-Saharan Africa. Note the region name has a trailing space, which must be preserved in the search.

africa = countries.loc[countries["region"] == "Sub-Saharan Africa ", :]
africa.head()
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude
5 AGO AO Angola Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IBRD Luanda 13.24200 -8.81155
17 BDI BI Burundi Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Low income IDA Bujumbura 29.36390 -3.37840
21 BEN BJ Benin Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IDA Porto-Novo 2.63230 6.47790
22 BFA BF Burkina Faso Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Low income IDA Ouagadougou -1.53395 12.36050
40 BWA BW Botswana Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Upper middle income IBRD Gaborone 25.92010 -24.65440

Indicators Available in the Database

The World Bank offers an extensive range of development indicators. Let’s inspect a few of them:

indicators = wb.get_indicators()
indicators.head()
id name unit source sourceNote sourceOrganization topics
0 1.0.HCount.1.90usd Poverty Headcount ($1.90 a day) LAC Equity Lab The poverty headcount index measures the propo... b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... Poverty
1 1.0.HCount.2.5usd Poverty Headcount ($2.50 a day) LAC Equity Lab The poverty headcount index measures the propo... b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... Poverty
2 1.0.HCount.Mid10to50 Middle Class ($10-50 a day) Headcount LAC Equity Lab The poverty headcount index measures the propo... b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... Poverty
3 1.0.HCount.Ofcl Official Moderate Poverty Rate-National LAC Equity Lab The poverty headcount index measures the propo... b'LAC Equity Lab tabulations of data from Nati... Poverty
4 1.0.HCount.Poor4uds Poverty Headcount ($4 a day) LAC Equity Lab The poverty headcount index measures the propo... b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... Poverty

Total Number of Unique Indicators

We find that over 26,000 indicators are available, opening up diverse avenues for data analysis.

indicators['name'].nunique()
26080

Downloading Specific Data

Let’s extract and compare GDP per capita (current US$) for a set of countries in Sub-Saharan Africa and Europe. We use the indicator code: NY.GDP.PCAP.CD.

indicator_code = 'NY.GDP.PCAP.CD'
countries_of_interest = ['KEN', 'NGA', 'ZAF', 'FRA', 'DEU', 'GBR']

mydata = wb.download(indicator=indicator_code, country=countries_of_interest,
                   start=2000, end=2022)
mydata.reset_index(inplace=True)
mydata.head()
country year NY.GDP.PCAP.CD
0 Germany 2022 49686.115458
1 Germany 2021 52265.654162
2 Germany 2020 47379.765195
3 Germany 2019 47623.865607
4 Germany 2018 48874.859503

Visualizing Regional Disparities

GDP Per Capita Over Time

We plot GDP per capita trends to illustrate economic disparities between Europe and Sub-Saharan Africa.

plt.figure(figsize=(12, 6))
sns.boxplot(data=mydata, x='country', y='NY.GDP.PCAP.CD', hue = "country")
plt.title('GDP Per Capita (Current US$), 2000–2022')
plt.ylabel('GDP per capita (US$)')
plt.xlabel('Year')
plt.grid(True)
plt.tight_layout()
plt.show()

Boxplot of GDP per Capita (Grouped by Region)

To better compare the economic spread, we enrich our dataset by adding a region label.

region_map = {'Kenya': 'Sub-Saharan Africa', 'Nigeria': 'Sub-Saharan Africa', 'South Africa': 'Sub-Saharan Africa',
              'France': 'Europe', 'Germany': 'Europe', 'United Kingdom': 'Europe'}
mydata['region'] = mydata['country'].map(region_map)

Now, we draw a boxplot to compare the GDP per capita distribution.

plt.figure(figsize=(10, 6))
sns.boxplot(data = mydata, x = 'region', y = 'NY.GDP.PCAP.CD', hue = "region")
plt.title('GDP Per Capita Distribution by Region (2000–2022)')
plt.ylabel('GDP per capita (US$)')
plt.xlabel('Region')
plt.yscale('log')  # To accommodate wide disparities
plt.grid(True)
plt.tight_layout()
plt.show()

Exploring Other Key Indicators

Let’s now download and visualize other indicators such as:

  • Life Expectancy at Birth (SP.DYN.LE00.IN)
  • Access to Electricity (% of population) (EG.ELC.ACCS.ZS)

We define a reusable function for this purpose:

def plot_indicator(indicator_code, title, ylabel):
    df = wb.download(indicator=indicator_code, country=countries_of_interest, start=2000, end=2022)
    df.reset_index(inplace=True)
    plt.figure(figsize=(12, 6))
    sns.lineplot(data=df, x='year', y=indicator_code, hue='country')
    plt.title(title)
    plt.ylabel(ylabel)
    plt.xlabel('Year')
    plt.grid(True)
    plt.tight_layout()
    plt.show()

Life Expectancy

plot_indicator('SP.DYN.LE00.IN', 'Life Expectancy at Birth (2000–2022)', 'Years')

Access to Electricity

plot_indicator('EG.ELC.ACCS.ZS', 'Access to Electricity (% of Population)', '%')

Conclusion

The pandas-datareader module offers an efficient and reproducible way to access development indicators from the World Bank. This article demonstrated how to extract, manipulate, and visualize data to uncover meaningful insights—especially regional disparities such as those between Europe and Sub-Saharan Africa.

Researchers, analysts, and policymakers can leverage these tools to explore further indicators, enrich their understanding, and support evidence-based decision-making.


Next Steps:

References

Fisher, Ronald A. 1936. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics 7 (2): 179–88.
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, et al. 2013. An Introduction to Statistical Learning. Vol. 112. Springer.
Kodinariya, Trupti M, Prashant R Makwana, et al. 2013. “Review on Determining Number of Cluster in k-Means Clustering.” International Journal 1 (6): 90–95.
Muddana, A Lakshmi, and Sandhya Vinayakam. 2024. Python for Data Science. Springer.