Growth and Debt

This is an example of data retrieval and manipulation using Stata. We will try to assess the relationship between debt and economic growth along the lines of Reinhart and Rogoff (2010). However, instead of using excel as they apparently did, we will try to use Stata. The advantage is that our program will precisely document what we do with the raw data and allow any researchers to replicate or build on our work. The disadvantage is that we will only use data that is easily accessible through Stata.

1. Retriving data

Let’s start by setting a working directory.

clear 
cd "C:\Users\dvorakt\Google Drive\reproducibility"

We will use data available in the World Development Indicators database. This data can be access through Stata using command ‘wbopendata’. The command accesses the internet and retrieves the series listed in the indicators option. The names of series can be found at here We only need to run ‘wbopendata’ command once because the retrieval takes time. After the download we save the data in a local directory.

wbopendata, indicator(NY.GDP.MKTP.KD.ZG ; GC.DOD.TOTL.GD.ZS; NY.GDP.PCAP.KD) clear long 
save wdidataMay2016, replace

2. Selecting observations and variables

Now that we have the data in our working directory we can use it. Let’s give our variables more recognizable names.

use wdidataMay2016
rename gc_dod_totl_gd_zs debttogdp
rename ny_gdp_mktp_kd_zg gdpgrowth
rename ny_gdp_pcap_kd gdppc

Let’s drop data that is for groups of countries, i.e. “Aggregates”. This way we will only have data for countries rather than countries and groups of countries. We also delet San Marino as a small city state. We should also keep only only observations for which none of the three variables are missing.

drop if region=="Aggregates" | countryname=="San Marino"
keep if debttogdp~=. & gdpgrowth~=. & gdppc~=.

3. Transforming variables

Let’s create log of GDP per capita - we will need it for the analysis. Also, let’s create a new variable that indicates what range of debt an observation belongs to. These are the ranges used in Reinhart and Rogoff.

g loggdppc = log(gdppc)
g str15 debt_cat="0-30%" if debttogdp<=30
replace debt_cat="30-60%" if debttogdp>30 & debttogdp<=60
replace debt_cat="60-90%" if debttogdp>60 & debttogdp<=90
replace debt_cat="Above 90%" if debttogdp>90 & debttogdp~=.

4. Plotting data

Let’s create a bar chart showing growth by debt category.

graph bar gdpgrowth ,over(debt_cat)

5. Aggregating data by groups

The above graph shows us the contemporaneous relationship between debt and growth, i.e. whether high debt today is associated with slow growth today. However, what we are really interested in is whether high debt today predicts growth over the next, say, five years. This requires us to do a bit more data manipulation. Specifically, we will need to calculate growth over non-overlapping five-year periods. We will then link that five-year growth to debt in the first year of that five year period. For example, the data for Australia starts in 1999. We will then calculate average growth over the years from 1999 through 2003, and line it up with debt in 1999.

First, let’s sort data by country and year, and create a sequence of numbers for each country. Command egen combined with the ,by(countrycode) option will do this.

sort countrycode year
egen countryyear = seq() ,by(countrycode)

Let’s create an indicator that marks each five-year period. This new variable will be 1 for all the observations in the first five-year period, 2 for the second five-year period, etc.

g fivey = ceil(countryyear/5)

We have to count the number of observations in each five-year period, and drop observations that belong to a five-year period that has less than five years.

egen nyearsin5y = count(countryyear) ,by(countrycode fivey)
drop if nyearsin5y < 5

Now we can calculate average GDP growth for each five-year period for each country. This average growth will be the same within each five-year period.

egen growth5y = mean(gdpgrowth) ,by(countrycode fivey)

We only need the observations in the first year of each five-year period.

keep if countryyear == 1 | countryyear == 6 | countryyear == 11

Now we can redo our graph and plot average subsequent growth against debt category.

graph bar growth5y ,over(debt_cat)

6. Estimating regressions

Create descriptive statistics table.

tabstat growth5y debttogdp gdppc ,statistics(mean median sd min max) columns(statistics) format(%9.1f)

Estimate three regressions and put them in a nice table.

reg growth5y debttogdp
outreg2 using table2, replace
reg growth5y debttogdp gdppc
outreg2 using table2
reg growth5y debttogdp loggdppc
outreg2 using table2, word