Part 1: Comprehension

The Colonial Origins of Comparative Development: An Empirical Investigation by Acemoglu, Johnson and Robinson (2001)

1. What is the central point of the paper?

Acemoglu, Johnson and Robinson discuss the relationship between institutions income per capita. They argue that although countries with better institutions are able to invest more in physical and human capital as well as use them more efficiently it is difficult to determine the size of this impact, therefore this is a dangerous simplification. Depending on the region and the type of institution, the paper discusses differences many variables such as diseases, climate and colonial origin in regards to income per capita. The most definitive variable to shape colonial settlement however was mortality rates of settlers. The main point of the paper was to demonstrate the size of the impact institutions have on income per capita by using historic settler mortality rates as an instrumental variable in regressions.

2. What variables do the authors use to proxy for current institutions?

Settler mortality rates, rule of law, property rights, level of expropriation risk and the nature and time of settlement are important variables in these models. Protection from expropriation risk is the main proxy variable used by Acemoglu et al. It is a variable related to all the institution factors being investigated. They treat protection from expropriation risk as endogenous in the 2SLS model. The findings are that this variable has significant effects on GDP per capita.

3. Why can’t we just look at the simple relationship between these variables and today’s development? Isn’t that the causal relationship?

It may seem, based on this data that good institutions and income per capita have a causal relationship but Acemoglu et al warn there are a number of reasons for these variables to not be interpreted as causal. This is because richer countries may in fact prefer better institutions or are wealthy enough to afford them. Reverse causality may also be a factor when some omitted factors are considered. The authors also admit the potential for ex post bias toward an assumption around what ‘good institutions’ actually are.

4. What is the instrumental variable the authors describe?

Acemoglu et al. use the instrumental variable approach to assign the effect of diseases on income to current institutional features.

5. What is their exclusion restriction? Do you find it plausible?

The instrumental variable condition mentioned above implies an exclusion restriction that historic settler mortality rates have no effect on GDP per capita today other than their effect through institutions. The main concern of this exclusion restriction however is that historic mortality rates could be correlated with the current disease environment and have consequent effects on the economy today. Acemoglu et al. find that the exclusion restriction is plausible because the majority of historic settler deaths of Europeans were caused by malaria and yellow fever that had limited effects on indigenous people who had developed immunities. It is unlikely that these diseases have caused the current poor economic performance in areas of Africa and Asia where they are prevalent.

Part 2: Using Acemoglu Johnson and Robinson’s data to update our model from last week

Task 1

library(ggplot2); library(dplyr); library(AER)

Applying Jim’s code:

# Read the data
pwt <- read.csv("pwt71.csv")

# Filter out the observations outside the period I'm interested in
pwt.ss <- pwt %>% filter(year<=2010 & year>=1985)

# Generate our data for the regression
pwt.2 <- pwt.ss  %>% group_by(isocode) %>% # For each country
  summarise(s = mean(ki), # What was the average investment?
            y = last(y), # The last GDP per person relative to the US?
            n = 100*log(last(POP)/first(POP))/(n() - 1)) %>% # Population growth rate?
  filter(!is.na(s)) %>% # Get rid of missing rows of s
  mutate(ln_y = log(y), # Create new columns- log of y
         ln_s = log(s), # Log of s
         ln_ngd = log(n + 1.6 + 5)) # Log of n + g + delta
# Modelling!

# Run the linear model (unrestricted)
mod1 <- lm(ln_y ~ ln_s + ln_ngd, data = pwt.2)
# Take a look at the parameter estimates
summary(mod1)
## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd, data = pwt.2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.59258 -0.73868  0.00282  0.64383  3.06444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.9528     1.6039   7.453 5.90e-12 ***
## ln_s          0.9987     0.1925   5.187 6.56e-07 ***
## ln_ngd       -5.8616     0.6588  -8.898 1.36e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.001 on 156 degrees of freedom
## Multiple R-squared:  0.4482, Adjusted R-squared:  0.4411 
## F-statistic: 63.35 on 2 and 156 DF,  p-value: < 2.2e-16
# Run the restricted parameter model
mod2 <- nls(ln_y ~ A + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd, 
            data = pwt.2,
            start = list(A = 11, alpha = 0.3))
# Take a look at the parameters
summary(mod2)
## 
## Formula: ln_y ~ A + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - alpha)) * 
##     ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A      1.17489    0.21144   5.557 1.15e-07 ***
## alpha  0.60966    0.02999  20.328  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.134 on 157 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 8.969e-07
# Simulate a new country called Straya
# Exogenous variables
s_current <- 26
n_current <- 1.6

# Parameters of the model
A <- coef(mod2)[1]
alpha <- coef(mod2)[2]
se <- 1.134 # From the summary command


# Simulate new country 1000 times (benchmark/baseline/BAU)

straya_1 <- rnorm(100000, # Generate 100k new observations
                  mean = A + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)

# Plot a histogram
hist(exp(straya_1), xlim = c(0, 200), breaks = 100)

# Simulate with new savings rate
s_new <- 27

straya_2 <- rnorm(100000,
                  mean = A + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)
hist(exp(straya_2), xlim = c(0, 200), breaks = 100)

# What is the difference in median simulations between the scenarios? 
median(straya_2) - median(straya_1)
## [1] 0.05547652

Now we have applied Jim’s code, we are interested in finding the bias the coefficients may have on them due to unaccounted factors. We do this by loading the new data set (acemoglu) and combining both sets into one making a more robust model of prediction.

load("ajr.RData")

dataset3 <- left_join(acemoglu, pwt.2)

We can now run a linear regression including our new proxy for institutions:

# Run the linear model (unrestricted)
mod3 <- lm(ln_y ~ ln_s + ln_ngd + avexpr, data = dataset3)
# Take a look at the parameter estimates
summary(mod3)
## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd + avexpr, data = dataset3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91042 -0.46412  0.03536  0.43753  2.07754 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.08682    1.74192   1.772   0.0792 .  
## ln_s         1.14396    0.19254   5.941 3.57e-08 ***
## ln_ngd      -3.11426    0.68606  -4.539 1.48e-05 ***
## avexpr       0.38437    0.04704   8.170 6.66e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7312 on 107 degrees of freedom
##   (52 observations deleted due to missingness)
## Multiple R-squared:  0.7339, Adjusted R-squared:  0.7264 
## F-statistic: 98.35 on 3 and 107 DF,  p-value: < 2.2e-16

Re-run the non-linear model, this time including average protection against expropriation risk

# Run the restricted parameter model
mod4 <- nls(ln_y ~ A + delta*avexpr + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd, 
            data = dataset3,
            start = list(A = 11, alpha = 0.3, delta = 5))
# Take a look at the parameters
summary(mod4)
## 
## Formula: ln_y ~ A + delta * avexpr + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - 
##     alpha)) * ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A     -1.61506    0.28494  -5.668 1.21e-07 ***
## alpha  0.56647    0.03543  15.986  < 2e-16 ***
## delta  0.44143    0.04341  10.170  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7528 on 108 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 5.485e-07
##   (52 observations deleted due to missingness)

Result: When including our new proxy for institutions our value of alpha falls compared to the old model from 0.6097 to 0.5665, a more realistic, yet still high estimate of return to market share. In this case, the difference in average protection against expropriation risk between countries was a confounder in our previous model, assumed to be part of the error term. This reveals that alpha was in fact overstated, implying that the impact of expropriation risk did have an identifiable impact on GDP per capita that went unobserved in the model.

Effect of an increase in savings rate

Simulating a new country (straya_3):

s_current <- 26
n_current <- 1.6
# Parameters of the model
A <- coef(mod4)[1]
alpha <- coef(mod4)[2]
se <- 0.7528 # From the summary of mod4
delta <- 0.44143  #From the summary of mod4

# Simulate new country 1000 times (benchmark/baseline/BAU). 
#AUS's avexpr is 9.318182
straya_3 <- rnorm(100000, # Generate 100k new observations
                  mean = A + delta*9.318182 + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)

# Plot a histogram
hist(exp(straya_3), xlim = c(0, 200), breaks = 100)

Observing the effect of a 1% increase in savings (straya_4):

# Simulate with new savings rate
s_new <- 27

straya_4<- rnorm(100000,
                  mean = A + delta*9.318182 + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)
hist(exp(straya_4), xlim = c(0, 200), breaks = 100)

What is the difference between the two countries?

median(straya_4) - median(straya_3)
## [1] 0.0401914

Therefore this model predicts that a 1% increase in savings rate increases GDP per capita by 4.74%.

Task 2

Use the log of settler mortality (logem4) to instrument for our institutional proxy. Does this change our unrestricted estimates?

mod5 <- ivreg(ln_y ~ avexpr + ln_s + ln_ngd | logem4 + ln_s + ln_ngd, data = dataset3)

summary(mod5)
## 
## Call:
## ivreg(formula = ln_y ~ avexpr + ln_s + ln_ngd | logem4 + ln_s + 
##     ln_ngd, data = dataset3)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.067414 -0.584478 -0.005362  0.496341  1.962222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.002849   4.889020  -0.001   0.9995    
## avexpr       0.833978   0.212263   3.929   0.0002 ***
## ln_s         0.436080   0.356334   1.224   0.2252    
## ln_ngd      -2.055080   1.801921  -1.140   0.2580    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8945 on 69 degrees of freedom
## Multiple R-Squared: 0.5396,  Adjusted R-squared: 0.5196 
## Wald test: 33.78 on 3 and 69 DF,  p-value: 1.501e-13

Yes. The coefficient of ln_s has now decreased to 0.44 from 1.14, ln_ngd increased to -2.06 from -3.11 and avexpr has increased to 0.83 from 0.38.