This is an illustrative example of data retrieval and manipulation using R. Let’s first set working directory and load relevent packages.
setwd("C:/Users/dvorakt/Google Drive/reproducibility")
library(WDI)
library(dplyr)
library(ggplot2)
library(stargazer)
We are going to download the data from World Bank’s World Development Indicators (WDI). There is an R package called WDI that accesses the internet and retrieves the series liste in the indicators option. The names of series can be found here.
wdi <- WDI(country = "all", start=1960, end=2015, extra="TRUE",
indicator=c("NY.GDP.MKTP.KD.ZG","GC.DOD.TOTL.GD.ZS" , "NY.GDP.PCAP.KD"))
Let’s do some basic data manipulation.
#rename the variables more recognizable names
wdi <- rename(wdi, gdppc = NY.GDP.PCAP.KD, debttogdp = GC.DOD.TOTL.GD.ZS, gdpgrowth = NY.GDP.MKTP.KD.ZG)
#delete the 'Aggregates' so that we only have countries
wdi <- wdi[wdi$region != "Aggregates",]
#keep only the variables we're going to use
wdi <- select(wdi, debttogdp, gdpgrowth, gdppc, year, country)
#keep only observations for which we have no missing values
wdi <- wdi[!is.na(wdi$debttogdp), ]
wdi <- wdi[!is.na(wdi$gdpgrowth), ]
wdi <- wdi[!is.na(wdi$gdppc), ]
#create a log of GDP per capita in case we need it later int he analysis
wdi$loggdppc <- log(wdi$gdppc)
#create debt categories
wdi$debtcat <- ifelse(wdi$debttogdp <= 30, "0-30%",
ifelse(wdi$debttogdp <= 60, "30-60%",
ifelse(wdi$debttogdp <= 90 , "60-90%", "Above 90%")))
#plot growth against debt categories
ggplot(wdi,aes(x = factor(debtcat), y = gdpgrowth)) + stat_summary(fun.y = mean , geom = "bar")
Let’s create a dataset that looks at debt to GDP ratio and SUBSEQUENT growth over the next five years.
wdi <- arrange(wdi, country , year) #sort by country and year
#give each year within a country a number starting with 1
wdi <- wdi %>% group_by(country) %>% mutate(countryyear = row_number())
#create an indicator that marks each five-year period
wdi$fivey <- ceiling(wdi$countryyear/5)
#create the number of years in each five-year period
wdi <- wdi %>% group_by(country, fivey) %>% mutate(nyearsin5y = n())
#drop five-year periods that don't have five years
wdi <- filter(wdi, nyearsin5y == 5)
#keep only the first year of each five-year period
wdi <- filter(wdi, countryyear == 1 | countryyear == 6 | countryyear == 11)
#wdi needs to be dataframe for stargazer to work
wdi <- data.frame(wdi)
Let’s produce a descriptive statistics table:
stargazer(wdi[c("gdpgrowth", "debttogdp", "gdppc")], type = "text" , digits=1)
##
## ==============================================
## Statistic N Mean St. Dev. Min Max
## ----------------------------------------------
## gdpgrowth 173 3.7 4.2 -9.6 12.3
## debttogdp 173 50.1 39.0 0.6 244.4
## gdppc 173 14,332.4 19,033.8 182.9 99,626.1
## ----------------------------------------------
Let’s estimate some regressions.
r1 <- lm(gdpgrowth ~ debttogdp, data = wdi)
r2 <- lm(gdpgrowth ~ debttogdp + gdppc, data = wdi)
r3 <- lm(gdpgrowth ~ debttogdp + loggdppc, data = wdi)
And show the results in a nice table:
stargazer(r1, r2,r3, type = "html")
| Dependent variable: | |||
| gdpgrowth | |||
| (1) | (2) | (3) | |
| debttogdp | -0.016** | -0.018** | -0.018** |
| (0.008) | (0.008) | (0.008) | |
| gdppc | -0.00002 | ||
| (0.00002) | |||
| loggdppc | -0.212 | ||
| (0.227) | |||
| Constant | 4.560*** | 4.887*** | 6.493*** |
| (0.519) | (0.623) | (2.137) | |
| Observations | 173 | 173 | 173 |
| R2 | 0.023 | 0.028 | 0.028 |
| Adjusted R2 | 0.017 | 0.016 | 0.016 |
| Residual Std. Error | 4.181 (df = 171) | 4.182 (df = 170) | 4.183 (df = 170) |
| F Statistic | 3.956** (df = 1; 171) | 2.427* (df = 2; 170) | 2.411* (df = 2; 170) |
| Note: | p<0.1; p<0.05; p<0.01 | ||