import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pandas_datareader import wb
import datetime
Getting Data from the World Bank Website using Python and Pandas-Datareader Module
Independent Data Analysis Project
This article demonstrates how to access and analyze global development indicators from the World Bank using Python’s pandas-datareader
module. We walk through package installation, querying available countries and indicators, and extracting time-series data for selected metrics. Using real-world examples such as GDP per capita, life expectancy, and access to electricity, we illustrate how to generate insightful visualizations that highlight stark regional disparities—particularly between Sub-Saharan Africa and Europe. The approach outlined provides a reproducible workflow for data analysts, researchers, and policy professionals seeking to work with high-quality international development data in Python. The article concludes with suggestions for deeper analysis and integration with geospatial and statistical tools.
Data analysis, Python, Pandas, Pandas-datareader, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning, Scikit-learn, K-Nearest neigbors (KNN)
Background
Accessing high-quality, reliable, and up-to-date economic and development data is critical for research, policy analysis, and informed decision-making. The World Bank is a key source of such data, offering thousands of indicators spanning multiple dimensions of development, from poverty to education, environment, trade, and infrastructure.
In this article, we explore how to programmatically access the World Bank databases using Python, with a specific focus on the pandas-datareader
module. We will cover package installation, data querying, and visual representation of selected indicators—highlighting regional disparities, such as between Europe and Sub-Saharan Africa.
Installing the Package
Depending on your operating system and environment setup, you can install the required packages using pip. Open the terminal or command prompt and run the following commands:
pip install pandas
pip install numpy
pip install matplotlib
pip install seaborn
pip install pandas-datareader
If you are using the Anaconda distribution, the packages can also be installed via the Anaconda Navigator or using conda install
commands in the terminal.
Loading the Packages
Import the necessary libraries to facilitate data acquisition, manipulation, and visualization.
Getting the Data
Countries Represented in the Database
Let us begin by exploring all the countries and regions available in the World Bank database.
= wb.get_countries()
countries countries.head()
iso3c | iso2c | name | region | adminregion | incomeLevel | lendingType | capitalCity | longitude | latitude | |
---|---|---|---|---|---|---|---|---|---|---|
0 | ABW | AW | Aruba | Latin America & Caribbean | High income | Not classified | Oranjestad | -70.0167 | 12.5167 | |
1 | AFE | ZH | Africa Eastern and Southern | Aggregates | Aggregates | Aggregates | NaN | NaN | ||
2 | AFG | AF | Afghanistan | South Asia | South Asia | Low income | IDA | Kabul | 69.1761 | 34.5228 |
3 | AFR | A9 | Africa | Aggregates | Aggregates | Aggregates | NaN | NaN | ||
4 | AFW | ZI | Africa Western and Central | Aggregates | Aggregates | Aggregates | NaN | NaN |
Variables in the Country Table
We examine the metadata provided for each country or region.
countries.columns
Index(['iso3c', 'iso2c', 'name', 'region', 'adminregion', 'incomeLevel',
'lendingType', 'capitalCity', 'longitude', 'latitude'],
dtype='object')
Focusing on Sub-Saharan Africa
To narrow our analysis, we extract data specifically for Sub-Saharan Africa. Note the region name has a trailing space, which must be preserved in the search.
= countries.loc[countries["region"] == "Sub-Saharan Africa ", :]
africa africa.head()
iso3c | iso2c | name | region | adminregion | incomeLevel | lendingType | capitalCity | longitude | latitude | |
---|---|---|---|---|---|---|---|---|---|---|
5 | AGO | AO | Angola | Sub-Saharan Africa | Sub-Saharan Africa (excluding high income) | Lower middle income | IBRD | Luanda | 13.24200 | -8.81155 |
17 | BDI | BI | Burundi | Sub-Saharan Africa | Sub-Saharan Africa (excluding high income) | Low income | IDA | Bujumbura | 29.36390 | -3.37840 |
21 | BEN | BJ | Benin | Sub-Saharan Africa | Sub-Saharan Africa (excluding high income) | Lower middle income | IDA | Porto-Novo | 2.63230 | 6.47790 |
22 | BFA | BF | Burkina Faso | Sub-Saharan Africa | Sub-Saharan Africa (excluding high income) | Low income | IDA | Ouagadougou | -1.53395 | 12.36050 |
40 | BWA | BW | Botswana | Sub-Saharan Africa | Sub-Saharan Africa (excluding high income) | Upper middle income | IBRD | Gaborone | 25.92010 | -24.65440 |
Indicators Available in the Database
The World Bank offers an extensive range of development indicators. Let’s inspect a few of them:
= wb.get_indicators()
indicators indicators.head()
id | name | unit | source | sourceNote | sourceOrganization | topics | |
---|---|---|---|---|---|---|---|
0 | 1.0.HCount.1.90usd | Poverty Headcount ($1.90 a day) | LAC Equity Lab | The poverty headcount index measures the propo... | b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... | Poverty | |
1 | 1.0.HCount.2.5usd | Poverty Headcount ($2.50 a day) | LAC Equity Lab | The poverty headcount index measures the propo... | b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... | Poverty | |
2 | 1.0.HCount.Mid10to50 | Middle Class ($10-50 a day) Headcount | LAC Equity Lab | The poverty headcount index measures the propo... | b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... | Poverty | |
3 | 1.0.HCount.Ofcl | Official Moderate Poverty Rate-National | LAC Equity Lab | The poverty headcount index measures the propo... | b'LAC Equity Lab tabulations of data from Nati... | Poverty | |
4 | 1.0.HCount.Poor4uds | Poverty Headcount ($4 a day) | LAC Equity Lab | The poverty headcount index measures the propo... | b'LAC Equity Lab tabulations of SEDLAC (CEDLAS... | Poverty |
Total Number of Unique Indicators
We find that over 26,000 indicators are available, opening up diverse avenues for data analysis.
'name'].nunique() indicators[
26080
Downloading Specific Data
Let’s extract and compare GDP per capita (current US$) for a set of countries in Sub-Saharan Africa and Europe. We use the indicator code: NY.GDP.PCAP.CD
.
= 'NY.GDP.PCAP.CD'
indicator_code = ['KEN', 'NGA', 'ZAF', 'FRA', 'DEU', 'GBR']
countries_of_interest
= wb.download(indicator=indicator_code, country=countries_of_interest,
mydata =2000, end=2022)
start=True)
mydata.reset_index(inplace mydata.head()
country | year | NY.GDP.PCAP.CD | |
---|---|---|---|
0 | Germany | 2022 | 49686.115458 |
1 | Germany | 2021 | 52265.654162 |
2 | Germany | 2020 | 47379.765195 |
3 | Germany | 2019 | 47623.865607 |
4 | Germany | 2018 | 48874.859503 |
Visualizing Regional Disparities
GDP Per Capita Over Time
We plot GDP per capita trends to illustrate economic disparities between Europe and Sub-Saharan Africa.
=(12, 6))
plt.figure(figsize=mydata, x='country', y='NY.GDP.PCAP.CD', hue = "country")
sns.boxplot(data'GDP Per Capita (Current US$), 2000–2022')
plt.title('GDP per capita (US$)')
plt.ylabel('Year')
plt.xlabel(True)
plt.grid(
plt.tight_layout() plt.show()
Boxplot of GDP per Capita (Grouped by Region)
To better compare the economic spread, we enrich our dataset by adding a region label.
= {'Kenya': 'Sub-Saharan Africa', 'Nigeria': 'Sub-Saharan Africa', 'South Africa': 'Sub-Saharan Africa',
region_map 'France': 'Europe', 'Germany': 'Europe', 'United Kingdom': 'Europe'}
'region'] = mydata['country'].map(region_map) mydata[
Now, we draw a boxplot to compare the GDP per capita distribution.
=(10, 6))
plt.figure(figsize= mydata, x = 'region', y = 'NY.GDP.PCAP.CD', hue = "region")
sns.boxplot(data 'GDP Per Capita Distribution by Region (2000–2022)')
plt.title('GDP per capita (US$)')
plt.ylabel('Region')
plt.xlabel('log') # To accommodate wide disparities
plt.yscale(True)
plt.grid(
plt.tight_layout() plt.show()
Exploring Other Key Indicators
Let’s now download and visualize other indicators such as:
- Life Expectancy at Birth (
SP.DYN.LE00.IN
) - Access to Electricity (% of population) (
EG.ELC.ACCS.ZS
)
We define a reusable function for this purpose:
def plot_indicator(indicator_code, title, ylabel):
= wb.download(indicator=indicator_code, country=countries_of_interest, start=2000, end=2022)
df =True)
df.reset_index(inplace=(12, 6))
plt.figure(figsize=df, x='year', y=indicator_code, hue='country')
sns.lineplot(data
plt.title(title)
plt.ylabel(ylabel)'Year')
plt.xlabel(True)
plt.grid(
plt.tight_layout() plt.show()
Life Expectancy
'SP.DYN.LE00.IN', 'Life Expectancy at Birth (2000–2022)', 'Years') plot_indicator(
Access to Electricity
'EG.ELC.ACCS.ZS', 'Access to Electricity (% of Population)', '%') plot_indicator(
Conclusion
The pandas-datareader
module offers an efficient and reproducible way to access development indicators from the World Bank. This article demonstrated how to extract, manipulate, and visualize data to uncover meaningful insights—especially regional disparities such as those between Europe and Sub-Saharan Africa.
Researchers, analysts, and policymakers can leverage these tools to explore further indicators, enrich their understanding, and support evidence-based decision-making.
Next Steps:
- Explore additional regions or income classifications.
- Merge multiple indicators for multivariate analysis.
- Integrate with geospatial visualization tools such as Folium or Plotly for map-based insights (Fisher 1936; James et al. 2013; Kodinariya, Makwana, et al. 2013; Muddana and Vinayakam 2024).