Part 1 : Comprehension

What is the central point of the paper?

The central point of the paper is to analyse whether a countries economic development is affected by whether a region was colonized with European settlement or if an extraction regime was set up.
European mortality rates in colonies are a good proxy for institutional development. It is hypothesized that colonies where European mortality rates were high had extractive institutions that have persisted to the present day, in contrast to more favourable climates where Europeans settled and brought with them a desire for strong institutions, property rights and checks against government powers {Australia’s Rum Rebellion in the early 1800’s is a good example}. Locations of high European mortality where regions where malaria and yellow fever were prevalent, due to a lack of exposure during infancy Europeans were much more susceptible to these diseases than indigenous peoples. However European’s brought with them new diseases which also wiped out native people’s such as Pissaro’s conquest of the Inca’s in South America in the 1500’s, a small band on Spanish brough disease that wiped out around 50 - 80% of the Incans at the time.

What Variables do the authors use to proxy for current institutions?

The authors use the proxies of European colonizer mortality as a proxy for whether a geographic location was suitable for colonization and therefore more likely to have institutions developed. The authors look at whether the institutional development during colonisation had a persisting effect and whether it still affects GDP and how developed the institutions of the country are today.

The authors look at other variables that have been theorised as affecting development including:

Why can’t we just look at the simple relationship between these variables and today’s development? Isn’t that the casual relationship?

The institutions that were developed continued to be used in the colonies post independence from the Coloniser. The authors find that in extractive colonies such as in Latin America, slave labour and monopoly markets persisted into the 20th century in cases such as Guatemala, until 1886 in Brazil and 1910 in Mexico.

A causal relationship is one that a behaviour or outcome is the result of an event taking place. The authors find that these variables are not independent of the institutional development variable and that the only variable that matters is whether the Europeans could safely settle in an area. Where they could not settle they did not develop strong institutions.

European mortality is an exogenous variable and has a high correlation between European settlement, early institutional development and the institutions today.

What is the instrumental variable the authors describe?

The instrumental variable that authors describe is mortality of European settlers in colonisation (M).

What is their exclusion restriction? Do you find it plausible?

The exclusion restriction is that protection against expropriation variable R is treated as endogenous and does not appear in \(log y = \mu + \alpha R + \chi\gamma + \epsilon\)

This suggests that mortality is excluded from the process of extracting resources. I don’t find it particularly plausible as European mortality had a significant effect on determining where colonisation or extraction would occur. Indeed the British considered setting up a penal colony on an island in Africa, but the obscenely high mortality rates dissuaded them from that idea.

Part 2 : Using Acemoglu Johnson and Robinson’s data to update our model from last week

# Load the libraries
library(ggplot2); library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Read the data
pwt <- read.csv("pwt71.csv")

# Filter out the observations outside the period I'm interested in
pwt.ss <- pwt %>% filter(year<=2010 & year>=1985)

# Generate our data for the regression
pwt.2 <- pwt.ss  %>% group_by(isocode) %>% # For each country
  summarise(s = mean(ki), # What was the average investment?
            y = last(y), # The last GDP per person relative to the US?
            n = 100*log(last(POP)/first(POP))/(n() - 1)) %>% # Population growth rate?
  filter(!is.na(s)) %>% # Get rid of missing rows of s
  mutate(ln_y = log(y), # Create new columns- log of y
         ln_s = log(s), # Log of s
         ln_ngd = log(n + 1.6 + 5)) # Log of n + g + delta

# Modelling! --------------------------------------------------------------

# Run the linear model (unrestricted)
mod1 <- lm(ln_y ~ ln_s + ln_ngd, data = pwt.2)
# Take a look at the parameter estimates
summary(mod1)
## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd, data = pwt.2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.59258 -0.73868  0.00282  0.64383  3.06444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.9528     1.6039   7.453 5.90e-12 ***
## ln_s          0.9987     0.1925   5.187 6.56e-07 ***
## ln_ngd       -5.8616     0.6588  -8.898 1.36e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.001 on 156 degrees of freedom
## Multiple R-squared:  0.4482, Adjusted R-squared:  0.4411 
## F-statistic: 63.35 on 2 and 156 DF,  p-value: < 2.2e-16
# Run the restricted parameter model
mod2 <- nls(ln_y ~ A + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd, 
            data = pwt.2,
            start = list(A = 11, alpha = 0.3))

# Take a look at the parameter mod2 estimates
summary(mod2)
## 
## Formula: ln_y ~ A + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - alpha)) * 
##     ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A      1.17489    0.21144   5.557 1.15e-07 ***
## alpha  0.60966    0.02999  20.328  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.134 on 157 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 8.969e-07
# Simulate a new country called Straya -----------------------------------------------

# Exogenous variables
s_current <- 26
n_current <- 1.6

# Parameters of the model
A <- coef(mod2)[1]
alpha <- coef(mod2)[2]
se <- 1.134 # From the summary command


# Simulate new country 1000 times (benchmark/baseline/BAU)

straya_1 <- rnorm(100000, # Generate 100k new observations
                  mean = A + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)

# Plot a histogram
hist(exp(straya_1), xlim = c(0, 200), breaks = 100)

# Simulate with new savings rate
s_new <- 27

straya_2 <- rnorm(100000,
                  mean = A + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)
hist(exp(straya_2), xlim = c(0, 200), breaks = 100)

# What is the difference in median simulations between the scenarios? 
median(straya_2) - median(straya_1)
## [1] 0.06628292

Task 1

# Task 1, introducing ajr.RData -------------------------------------------

load("ajr.RData")

dataset3 <- left_join(pwt.2, acemoglu) #joining dataset pwt.2 and acemoglu together
## Joining by: "isocode"
## Warning in left_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
dataset4 <- dataset3 %>% filter(!is.na(avexpr)) %>% # Get rid of missing rows of avexpr, in new dataset dataset4
  filter(!is.na(logem4))  #Get rid of missing rows of logem4

mod3 <- lm(ln_y ~ ln_s + ln_ngd + avexpr, data = dataset4) #creating a linear model including the variable of institutional quality

summary(mod3) #summary of mod3
## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd + avexpr, data = dataset4)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.20719 -0.42763 -0.02798  0.49676  1.32809 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.84434    2.46420   3.183  0.00218 ** 
## ln_s         0.76539    0.23815   3.214  0.00199 ** 
## ln_ngd      -4.85618    0.93961  -5.168 2.19e-06 ***
## avexpr       0.39884    0.05701   6.996 1.35e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6587 on 69 degrees of freedom
## Multiple R-squared:  0.7504, Adjusted R-squared:  0.7395 
## F-statistic: 69.13 on 3 and 69 DF,  p-value: < 2.2e-16
#run unrestricted parameter model inputting delta * avexpr as a new expression
mod4 <- nls(ln_y ~ A + (alpha/(1-alpha))*ln_s + delta*avexpr - (alpha/(1-alpha))*ln_ngd,
            data = dataset4,
            start = list(A = 11, alpha = 0.3, delta = 1))

#summary of mod4
summary(mod4)
## 
## Formula: ln_y ~ A + (alpha/(1 - alpha)) * ln_s + delta * avexpr - (alpha/(1 - 
##     alpha)) * ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A     -1.74786    0.36606  -4.775 9.56e-06 ***
## alpha  0.55599    0.04403  12.626  < 2e-16 ***
## delta  0.46850    0.05951   7.873 3.13e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7234 on 70 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 7.793e-08
straya1 <- dataset4 %>% filter(isocode == "AUS") #creating new data with only country australia
s_straya <- straya1 %>% select(s) %>% sum() #selecting only australia's value of S as a value
n_straya <- straya1 %>% select(n) %>% sum() #selecting only australia's value of n as a value
avexpr_aus <- straya1 %>% select(avexpr) %>% sum() #creating new value of just australia's avexpr

#model parameters

A2 <-coef(mod4)[1] #taking the coefficient 1 from mod4, which gave us the estimate of A, and naming it as value A2
alpha2 <- coef(mod4)[2] #taking the coefficient 2 from mod4, which gave us the estimate of alpha, and naming it as value alpha2
delta <- coef(mod4)[3] #taking the coefficient 3 from mod4, which gave us the estimate of delta, and naming it as value delta
se2 <- 0.7234 # From the summary command

straya_3 <- rnorm(100000, # Generate 100k new observations
                  mean = A2 + delta * avexpr_aus + (alpha2/(1-alpha2))*log(s_straya) - (alpha2/(1-alpha2))*log(n_straya + 1.6 + 5),
                  sd = se2)

hist(exp(straya_3), xlim = c(0, 500), breaks = 150) #create historgram of straya_3 observations

s_straya1 <- s_straya + 1 #increase savings rate of australia by 1

straya_4 <- rnorm(100000, # Generate 100k new observations of australia with increased savings rate
                  mean = A2 + delta*avexpr_aus + (alpha2/(1-alpha2))*log(s_straya1) - (alpha2/(1-alpha2))*log(n_straya + 1.6 + 5),
                  sd = se2)

hist(exp(straya_4), xlim = c(0, 500), breaks = 150) #create historgram of straya_4 observations

median(straya_4)-median(straya_3)
## [1] 0.04151714

Task 2

# Task 2 ------------------------------------------------------------------


library(AER) #open library AER for use in task 2
## Loading required package: car
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: sandwich
## Loading required package: survival
# Run the linear model (unrestricted) with proxies (logem4)

task2_mod1 <- lm(ln_y ~ ln_s + ln_ngd + logem4,
                 data = dataset4)

summary(task2_mod1)
## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd + logem4, data = dataset4)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89277 -0.49977  0.09557  0.49210  1.52387 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.82426    2.64072   4.478 2.91e-05 ***
## ln_s         0.80430    0.27254   2.951 0.004323 ** 
## ln_ngd      -4.70480    1.14318  -4.116 0.000105 ***
## logem4      -0.39129    0.08351  -4.686 1.36e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.75 on 69 degrees of freedom
## Multiple R-squared:  0.6763, Adjusted R-squared:  0.6622 
## F-statistic: 48.05 on 3 and 69 DF,  p-value: < 2.2e-16
# Used ivreg to estimate (unrestricted) model with logem4

task2_mod2 <- ivreg(ln_y ~ ln_s + ln_ngd | ln_ngd + logem4,
                    data = dataset4)

summary(task2_mod2)
## 
## Call:
## ivreg(formula = ln_y ~ ln_s + ln_ngd | ln_ngd + logem4, data = dataset4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4519 -1.5995  0.2763  1.2965  6.3358 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -23.061     24.674  -0.935   0.3532  
## ln_s           7.007      3.751   1.868   0.0659 .
## ln_ngd         2.120      6.551   0.324   0.7472  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.172 on 70 degrees of freedom
## Multiple R-Squared: -1.754,  Adjusted R-squared: -1.832 
## Wald test: 8.076 on 2 and 70 DF,  p-value: 0.0006982
# Used ivreg to estimate (unrestricted) model with avexpr

task2_mod3 <- ivreg(ln_y ~ ln_s + ln_ngd | ln_ngd + avexpr,
                    data = dataset4)

summary(task2_mod3)
## 
## Call:
## ivreg(formula = ln_y ~ ln_s + ln_ngd | ln_ngd + avexpr, data = dataset4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2815 -2.0909  0.1943  1.7075  9.1022 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -42.015     39.925  -1.052    0.296
## ln_s           9.962      6.103   1.632    0.107
## ln_ngd         6.869     10.474   0.656    0.514
## 
## Residual standard error: 3.11 on 70 degrees of freedom
## Multiple R-Squared: -4.645,  Adjusted R-squared: -4.806 
## Wald test: 4.421 on 2 and 70 DF,  p-value: 0.01556