Part 1 Comprehension

The Colonial Origins of Comparative Development: An Empirical Investigation.

1. What is the central point of the paper?

Countries settled with lower rates of settler mortality were more likely to represent ‘neo-Europes’ and take on Western institutions which, due to the persistent nature of institutions, has resulted in increased economic performance and higher GDP per capita.

2. What variables do the authors use to proxy for current institutions?

Protection from expropriation. Constraints on the executive. Democracy. Property rights.

3. Why can’t we just look at the simple relationship between these variables and today’s development? Isn’t that the causal relationship?

4. What is the instrumental variable the authors describe?

Settler mortality. Data on the mortality rates of soldiers, bishops, and sailors stationed in the colonies be- tween the seventeenth and nineteenth centuries. The more hospitable a settlement was, the more likely it became a neo-Europe.

5. What is their exclusion restriction? Do you find it plausible?

The mortality rates of European settlers more than 100 years ago have no effect on GDP per capita today, other than their effect through institutional development. The paper admits that the mortality rate could be correlated with the current diseases environment which would also impact on GDP per capita today, also does not regard this to be the case. They investigate by controlling for other potentially correlating variables. Using an overidentification test they find that settler mortality has no direct impact on economic performance.

Part 2: Using Acemoglu Johnson and Robinson’s data to update our model from last week.

Task 1

# Load the libraries
library(ggplot2); library(dplyr)
# Read the data
pwt <- read.csv("pwt71_wo_country_names_wo_g_vars.csv")

# Filter out the observations outside the period I'm interested in
pwt.ss <- pwt %>% filter(year<=2010 & year>=1985)

# Generate our data for the regression
pwt.2 <- pwt.ss  %>% group_by(isocode) %>% # For each country
  summarise(s = mean(ki), # What was the average investment?
            y = last(y), # The last GDP per person relative to the US?
            n = 100*log(last(POP)/first(POP))/(n() - 1)) %>% # Population growth rate?
  filter(!is.na(s)) %>% # Get rid of missing rows of s
  mutate(ln_y = log(y), # Create new columns- log of y
         ln_s = log(s), # Log of s
         ln_ngd = log(n + 1.6 + 5)) # Log of n + g + delta

# Modelling! --------------------------------------------------------------

# Run the linear model (unrestricted)
mod1 <- lm(ln_y ~ ln_s + ln_ngd, data = pwt.2)
# Take a look at the parameter estimates
summary(mod1)
## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd, data = pwt.2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.59258 -0.73868  0.00282  0.64383  3.06444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.9528     1.6039   7.453 5.90e-12 ***
## ln_s          0.9987     0.1925   5.187 6.56e-07 ***
## ln_ngd       -5.8616     0.6588  -8.898 1.36e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.001 on 156 degrees of freedom
## Multiple R-squared:  0.4482, Adjusted R-squared:  0.4411 
## F-statistic: 63.35 on 2 and 156 DF,  p-value: < 2.2e-16
# Run the restricted parameter model
mod2 <- nls(ln_y ~ A + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd, 
            data = pwt.2,
            start = list(A = 11, alpha = 0.3))
# Take a look at the parameters
summary(mod2)
## 
## Formula: ln_y ~ A + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - alpha)) * 
##     ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A      1.17489    0.21144   5.557 1.15e-07 ***
## alpha  0.60966    0.02999  20.328  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.134 on 157 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 8.969e-07
# Simulate a new country called Straya -----------------------------------------------

# Exogenous variables
s_current <- 26
n_current <- 1.6

# Parameters of the model
A <- coef(mod2)[1]
alpha <- coef(mod2)[2]
se <- 1.134 # From the summary command


# Simulate new country 1000 times (benchmark/baseline/BAU)

straya_1 <- rnorm(100000, # Generate 100k new observations
                  mean = A + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)

# Plot a histogram
hist(exp(straya_1), xlim = c(0, 200), breaks = 100)

# Simulate with new savings rate
s_new <- 27

straya_2 <- rnorm(100000,
                  mean = A + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)
hist(exp(straya_2), xlim = c(0, 200), breaks = 100)

# What is the difference in median simulations between the scenarios? 
median(straya_2) - median(straya_1)
## [1] 0.06605358

Calculating for institutions.

\[\ln(A_{i}) = \bar{A} + \delta_{i}\mbox{aveexpr}_{i} + \eta_{i}\]

#load new data
load("ajr.RData")
#Join the datasets
dataset3 <- left_join(pwt.2, acemoglu)
## Joining by: "isocode"
# Run the linear model (unrestricted). aveexpr is a score of the average protection against expropriation risk from 1985-1995. We'll use this variable as a proxy for institutional quality.
mod3 <- lm(ln_y ~ ln_s + ln_ngd + avexpr, data = dataset3)
# Take a look at the parameter estimates
summary(mod3)
## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd + avexpr, data = dataset3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.91042 -0.46412  0.03536  0.43753  2.07754 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.08682    1.74192   1.772   0.0792 .  
## ln_s         1.14396    0.19254   5.941 3.57e-08 ***
## ln_ngd      -3.11426    0.68606  -4.539 1.48e-05 ***
## avexpr       0.38437    0.04704   8.170 6.66e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7312 on 107 degrees of freedom
##   (48 observations deleted due to missingness)
## Multiple R-squared:  0.7339, Adjusted R-squared:  0.7264 
## F-statistic: 98.35 on 3 and 107 DF,  p-value: < 2.2e-16
# Run the restricted parameter model
mod4 <- nls(ln_y ~ A + delta*avexpr + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd, 
            data = dataset3,
            start = list(A = 11, alpha = 0.3, delta = 1))
# Take a look at the parameters
summary(mod4)
## 
## Formula: ln_y ~ A + delta * avexpr + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - 
##     alpha)) * ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A     -1.61506    0.28494  -5.668 1.21e-07 ***
## alpha  0.56647    0.03543  15.986  < 2e-16 ***
## delta  0.44143    0.04341  10.170  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7528 on 108 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 5.507e-07
##   (48 observations deleted due to missingness)

\(\alpha\) is 0.56647, down from 0.60966 in the initial model.

# Simulate a new country called Straya -----------------------------------------------

# Exogenous variables, saving increased by 1% from straya_2
s_current <- 26
n_current <- 1.6
# Parameters of the model
A <- coef(mod4)[1]
alpha <- coef(mod4)[2]
se <- 0.7528 # From the summary of mod4
delta <- 0.44143  #From the summary of mod4

# Simulate new country 1000 times (benchmark/baseline/BAU). AUS's avexpr is 9.318182

straya_3 <- rnorm(100000, # Generate 100k new observations
                  mean = A + delta*9.318182 + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)

# Plot a histogram
hist(exp(straya_3), xlim = c(0, 200), breaks = 100)

# Simulate with new savings rate
s_new <- 27

straya_4<- rnorm(100000,
                  mean = A + delta*9.318182 + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)
hist(exp(straya_4), xlim = c(0, 200), breaks = 100)

# What is the difference in median simulations between the scenarios? 
median(straya_4) - median(straya_3)
## [1] 0.04769185