The dataset that I am using is the Dataset of World Development Indicators published by World Bank. It contains time series data on various economic indicators for over a hundred countries around the world.
Various indicators that are listed in the original dataset can be found here
A list of countries for which data is collected can be found here
This WDI databook can be used to study the variation in different development indicators over time for any country and also across different countries.
The dataset has multiple tables in it. I am using only the WDI_Data table for my project.
Country Name: The name of the country or region
Country Code: Abbreviated code to identify a country or region
Indicator Name: Name of the measured development indicator
Indicator Code: Abbrevaited code for the indicator
1960 to 2015: A number of columns for years from 1960 to 2016 that has the corresponding indicator value for each row
Since the dataset is huge with over a thousand indicators, I am having trouble hosting it on a proper platform and later importing it to R. Hence, I have selected a few indicators from the datset for the purpose of my project and has hosted only the data corresponding to these indicators on Github.
wdi_url <- "https://raw.githubusercontent.com/suchith91/wdi/master/WDI_Data_Selected.csv"
wdi_data <-read.csv(wdi_url)
str(wdi_data)
## 'data.frame': 2112 obs. of 60 variables:
## $ Country.Name : Factor w/ 264 levels "Afghanistan",..: 8 8 8 8 8 8 8 8 38 38 ...
## $ Country.Code : Factor w/ 264 levels "ABW","ADO","AFG",..: 6 6 6 6 6 6 6 6 47 47 ...
## $ Indicator.Name: Factor w/ 8 levels "CO2 emissions (metric tons per capita)",..: 1 2 3 4 5 6 7 8 1 2 ...
## $ Indicator.Code: Factor w/ 8 levels "AG.LND.FRST.ZS",..: 2 3 1 5 4 7 8 6 2 3 ...
## $ X1960 : num 0.644 NA NA NA NA NA NA NA 1.4 50.6 ...
## $ X1961 : num 0.685 NA NA NA NA NA NA NA 2.06 55.3 ...
## $ X1962 : num 0.761 NA NA NA NA NA NA NA 2.7 53.3 ...
## $ X1963 : num 0.875 NA NA NA NA NA NA NA 1.35 53.2 ...
## $ X1964 : num 0.999 NA NA NA NA NA NA NA 2.36 53.8 ...
## $ X1965 : num 1.17 NA NA NA NA NA NA NA 2.6 52.8 ...
## $ X1966 : num 1.27 NA NA NA NA NA NA NA 2.53 42.9 ...
## $ X1967 : num 1.33 NA NA NA NA NA NA NA 2.98 42.6 ...
## $ X1968 : num 1.55 NA NA NA NA NA NA NA 2.94 43.3 ...
## $ X1969 : num 1.79 NA NA NA NA NA NA NA 3.08 44.2 ...
## $ X1970 : num 1.81 37.2 NA NA 22 NA NA NA 4.29 42.4 ...
## $ X1971 : num 2 38.8 NA NA 20.7 NA NA NA 5.07 43 ...
## $ X1972 : num 2.12 41.2 NA NA 21.1 NA NA NA 5.12 40.7 ...
## $ X1973 : num 2.41 57.3 NA NA 23.9 NA NA NA 6.06 42.3 ...
## $ X1974 : num 2.29 49.9 NA NA 24.8 NA NA NA 5.76 52.7 ...
## $ X1975 : num 2.2 51 NA NA 31.9 NA NA NA 6.1 49.7 ...
## $ X1976 : num 2.58 45.7 NA 12.9 31.9 NA NA NA 6.64 48.6 ...
## $ X1977 : num 2.65 43 NA 8.65 36.4 NA NA NA 7.29 50.5 ...
## $ X1978 : num 2.76 40 NA 1.91 38.5 NA NA NA 6.93 51.3 ...
## $ X1979 : num 2.86 45.4 NA 10.6 35.9 NA NA NA 6.94 52.4 ...
## $ X1980 : num 3.09 50.5 NA 10.1 34.3 NA NA NA 7.1 56.6 ...
## $ X1981 : num 2.93 47.4 NA 3.98 40 NA NA NA 5.89 51.4 ...
## $ X1982 : num 2.72 39.5 NA -3.11 43.1 NA NA NA 5.65 44.4 ...
## $ X1983 : num 2.82 34.4 NA -2.79 40.6 NA NA NA 5.05 42.1 ...
## $ X1984 : num 2.98 32.5 NA 1.71 39.2 NA NA NA 5.04 46.8 ...
## $ X1985 : num 3.06 29.8 NA -0.532 35 NA NA NA 5.53 46.9 ...
## $ X1986 : num 3.29 24.5 NA 0.469 32.7 NA NA NA 4.78 45.7 ...
## $ X1987 : num 3.2 27.7 NA 0.605 31.2 NA NA NA 4.99 45.1 ...
## $ X1988 : num 3.3 28 NA 3.78 32.7 NA NA NA 4.61 43.7 ...
## $ X1989 : num 3.27 29.7 NA 2.43 36.2 NA NA NA 5.05 46.7 ...
## $ X1990 : num 3.18 32.1 3.68 10.8 33.6 NA NA 72.9 5.3 48.9 ...
## $ X1991 : num 3.26 28.8 3.67 -1.19 40.1 NA NA NA 6.03 48.5 ...
## $ X1992 : num 3.45 30.3 3.65 5.1 33.8 NA NA NA 5.93 51.2 ...
## $ X1993 : num 3.7 28.5 3.63 4.05 32 NA NA NA 5.28 47.3 ...
## $ X1994 : num 3.71 29.5 3.62 3.44 29.3 NA NA NA 5.77 47.8 ...
## $ X1995 : num 3.45 31.3 3.6 2.93 30.6 NA NA NA 6.1 51.4 ...
## $ X1996 : num 3.35 32.7 3.58 4.86 29.3 NA NA NA 6.27 48.4 ...
## $ X1997 : num 3.16 37.4 3.57 4.86 31.2 NA NA NA 5.87 47.8 ...
## $ X1998 : num 3.36 33.1 3.55 5.28 33.5 NA NA NA 5.87 48 ...
## $ X1999 : num 3.34 36.2 3.53 3.05 30.3 NA NA NA 6.43 48.9 ...
## $ X2000 : num 3.72 42.6 3.51 5.13 28.3 NA 12.2 81.6 6.61 52 ...
## $ X2001 : num 3.62 40.9 3.5 2.2 31.6 NA NA NA 6.83 49 ...
## $ X2002 : num 3.62 41.7 3.49 1.5 32.1 NA NA NA 6.93 43.3 ...
## $ X2003 : num 3.82 45.6 3.48 4.23 34.4 NA NA NA 7.08 45 ...
## $ X2004 : num 4.09 48.8 3.47 9.74 36.2 NA 14.8 NA 7.53 47.2 ...
## $ X2005 : num 4.21 52.6 3.45 6.15 36.4 NA NA NA 7.26 51.1 ...
## $ X2006 : num 4.3 53.8 3.45 7.31 35.9 NA 11 NA 9.89 58.7 ...
## $ X2007 : num 4.14 54 3.44 5.65 39.7 NA 9.94 NA 9.9 52 ...
## $ X2008 : num 4.43 57.2 3.43 6.31 40.8 NA 10.5 NA 9.7 54.6 ...
## $ X2009 : num 4.59 47.7 3.43 1.52 42.3 NA 9.56 NA 9.07 43.9 ...
## $ X2010 : num 4.67 50.2 3.42 4.62 39.6 NA NA 88.7 9.6 46.7 ...
## $ X2011 : num 4.58 55.1 2.87 3.33 38.3 NA 10.3 NA 9.33 49.8 ...
## $ X2012 : num 4.87 56.7 2.86 6.64 39.2 NA NA NA 9.07 47 ...
## $ X2013 : num 4.7 54.9 2.84 2.9 40.7 NA NA NA 9.35 43.6 ...
## $ X2014 : num NA 50.9 2.82 2.14 42.7 NA NA NA NA 41 ...
## $ X2015 : num NA NA 2.8 2.97 NA NA NA NA NA 36.1 ...
library(tidyverse)
wdi<-as_tibble(wdi_data)
wdi
## # A tibble: 2,112 x 60
## Country.Name Country.Code
## <fctr> <fctr>
## 1 Arab World ARB
## 2 Arab World ARB
## 3 Arab World ARB
## 4 Arab World ARB
## 5 Arab World ARB
## 6 Arab World ARB
## 7 Arab World ARB
## 8 Arab World ARB
## 9 Caribbean small states CSS
## 10 Caribbean small states CSS
## # ... with 2,102 more rows, and 58 more variables: Indicator.Name <fctr>,
## # Indicator.Code <fctr>, X1960 <dbl>, X1961 <dbl>, X1962 <dbl>,
## # X1963 <dbl>, X1964 <dbl>, X1965 <dbl>, X1966 <dbl>, X1967 <dbl>,
## # X1968 <dbl>, X1969 <dbl>, X1970 <dbl>, X1971 <dbl>, X1972 <dbl>,
## # X1973 <dbl>, X1974 <dbl>, X1975 <dbl>, X1976 <dbl>, X1977 <dbl>,
## # X1978 <dbl>, X1979 <dbl>, X1980 <dbl>, X1981 <dbl>, X1982 <dbl>,
## # X1983 <dbl>, X1984 <dbl>, X1985 <dbl>, X1986 <dbl>, X1987 <dbl>,
## # X1988 <dbl>, X1989 <dbl>, X1990 <dbl>, X1991 <dbl>, X1992 <dbl>,
## # X1993 <dbl>, X1994 <dbl>, X1995 <dbl>, X1996 <dbl>, X1997 <dbl>,
## # X1998 <dbl>, X1999 <dbl>, X2000 <dbl>, X2001 <dbl>, X2002 <dbl>,
## # X2003 <dbl>, X2004 <dbl>, X2005 <dbl>, X2006 <dbl>, X2007 <dbl>,
## # X2008 <dbl>, X2009 <dbl>, X2010 <dbl>, X2011 <dbl>, X2012 <dbl>,
## # X2013 <dbl>, X2014 <dbl>, X2015 <dbl>
The data is very untidy now. I am planning to do the following on the data.
I am tyring to get the trend across years for different countris for the fllowing indicators
Exports of goods and services
Imports of goods and services
GDP growth
Youth literacy rate
CO2 emissions
Poverty headcount ratio at National poverty line
Unemployment, total
Forest Area
NB Please note that some of the indicators maybe removed/replaced from the data as the project progresses.
I originally planned to use the Kaggle dataset for WDI. But it was too large and I couldnt host it online. I also couldn’t find out how to extract it directly from Kaggle. So, I decided to go for the raw dataset from World Bank Data Repsoitory.