This week’s comprehension homework is based on Acemoglu, Johnson and Robinson (2001) The Colonial Origins of Comparative Development: An Empirical Investigation. This task has been posted late, and so will be due on Tuesday, alongside the computational task.
Don’t spend too much time on it. Give the paper a read, and go over the parts you don’t understand a couple of times. If you have any conceptual issues, then please post them in the homework channel on Slack. Each question really only needs a couple of lines written.
This task asks you to implement a very simple Instrumental Variables model to help estimate the parameters of the Solow model. As a base, use the code snippet below, which constructs the data we used to generate the model in class.
In last week’s homework, we were estimating the model (for each country \(i\))
\[ \frac{Y_{i}}{POP_{i}A_{i}} = \left(\frac{s_{i}}{n_{i} + g + \delta}\right)^{\frac{\alpha}{1-\alpha}} \]
Where \(Y_{i}\) is country \(i\)’s GDP, \(POP_{i}\) is its population, \(A_{i}\) is the technology available to a country, \(s_{i}\) is their savings/investment rate, \(n_{i}\) is their population growth rate, \(g\) is the growth rate of population-augmenting technology (since we’ve divided GDP by population, not workforce), and \(\delta\) is the depreciation rate. We assume that \(g\) and \(\delta\) are the same in all countries.
Multiplying both sides by \(A_{i}\) and taking logs, we get
\[ \ln\left(\frac{Y_{i}}{P_{i}}\right) = \ln(A_{i}) + \frac{\alpha}{1-\alpha}\ln(s_{i}) - \frac{\alpha}{1-\alpha}\ln(n_{i} + g + \delta) \]
This looks very similar to a regression equation. If we make the assumption that \(\ln(A_{i}) = \hat{A} + \epsilon_{i}\) and substitute this in, then we have our regression equation. The identifying restriction (that is, the assumption that lets a linear regression correctly estimate \(A\) and \(\alpha\)) is that whatever is in \(\epsilon_{i}\) (you can think of this as being everything else) is not systematically correlated with \(s_{i}\) and \(n_{i}\).
In class, we estimated this model and found that we ended up with an \(\alpha\) estimate of 0.6 or so, which seemed too high. This would be the case if the sorts of things that push up savings/decrease population growth also push up GDP for reasons other than through their impacts on savings and population growth. These “other things” are called confounders. In particular, we are concerned that unobserved institutions are causing both an increase in GDP, a decrease in population growth, and an increase in savings, so that some of the relationship that we observe between the variables is not the true causal relationship.
Let’s propose that instead of saying that
\[ \ln(A_{i}) = \hat{A} + \epsilon_{i} \]
we say that
\[ \ln(A_{i}) = \bar{A} + \delta_{i}\mbox{aveexpr}_{i} + \eta_{i} \]
where aveexpr is a score of the average protection against expropriation risk from 1985–1995. We’ll use this variable as a proxy for institutional quality.
load(ajr.RData). You don’t need to create a new variable.left_join() function. A note on this is below.alpha?aveexpr as in the model described above. What happens to our estimate of alpha?avexpr).Often we have two datasets that have a column in common, and we want to “join” the datasets. That is, we want to keep all the observations in the “left” dataset, and match these observations to the one on the right. To do this, we use left_join(), which lives in the dplyr library. For example, if dataset1 contains a column called country and a column called GDP, and dataset2 contains a country called country and a column called Tax rate then we can join the two using
dataset3 <- left_join(dataset1, dataset2)
where dataset3 will now contain three columns, one called country, one called GDP and one called Tax rate.
For a more detailed description, see https://stat545-ubc.github.io/bit001_dplyr-cheatsheet.html
Following the paper, use the log of settler mortality (logem4) to instrument for our institutional proxy. Does this change our unrestricted estimates? There is an illustration of how to use the ivreg() function in library(AER) to estimate an unrestricted model. What happens to our estimates?
# Load the libraries
library(ggplot2); library(dplyr)
# Read the data
pwt <- read.csv("pwt71.csv")
# Filter out the observations outside the period I'm interested in
pwt.ss <- pwt %>% filter(year<=2010 & year>=1985)
# Generate our data for the regression
pwt.2 <- pwt.ss %>% group_by(isocode) %>% # For each country
summarise(s = mean(ki), # What was the average investment?
y = last(y), # The last GDP per person relative to the US?
n = 100*log(last(POP)/first(POP))/(n() - 1)) %>% # Population growth rate?
filter(!is.na(s)) %>% # Get rid of missing rows of s
mutate(ln_y = log(y), # Create new columns- log of y
ln_s = log(s), # Log of s
ln_ngd = log(n + 1.6 + 5)) # Log of n + g + delta
# Modelling! --------------------------------------------------------------
# Run the linear model (unrestricted)
mod1 <- lm(ln_y ~ ln_s + ln_ngd, data = pwt.2)
# Take a look at the parameter estimates
summary(mod1)
# Run the restricted parameter model
mod2 <- nls(ln_y ~ A + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd,
data = pwt.2,
start = list(A = 11, alpha = 0.3))
# Take a look at the parameters
summary(mod2)
# Simulate a new country called Straya -----------------------------------------------
# Exogenous variables
s_current <- 26
n_current <- 1.6
# Parameters of the model
A <- coef(mod2)[1]
alpha <- coef(mod2)[2]
se <- 1.134 # From the summary command
# Simulate new country 1000 times (benchmark/baseline/BAU)
straya_1 <- rnorm(100000, # Generate 100k new observations
mean = A + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
sd = se)
# Plot a histogram
hist(exp(straya_1), xlim = c(0, 200), breaks = 100)
# Simulate with new savings rate
s_new <- 27
straya_2 <- rnorm(100000,
mean = A + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
sd = se)
hist(exp(straya_2), xlim = c(0, 200), breaks = 100)
# What is the difference in median simulations between the scenarios?
median(straya_2) - median(straya_1)