Week Six

Part One: Comprehension based on ‘The Colonial Origins of Comparative Development: An Empirical Investigation’ by Acemoglu, Johnson & Robinson (2001).

What is the central point of the paper?

The focus of this piece is to examine whether there is a relationship between the income per capita of a state and the level of quality of their institutions. Since institutions are made up of an inordinate amount of measurable and unmeasurable variables that are unable to be controlled for, the authors use settler mortality rates from colanisation 100 years ago to take the place of institutions and therefore allow the hypothesis to be regressed to examine their impact.

What variables do the authors use to proxy for current institutions?

The authors use the protection against “risk of expropriation” index from Political Risk Services as a proxy for institutions. This variable measures the quality of property rights in each state. This variable measures differences in institutions originating from different types of states and state policies as in theory the risk of appropriation should indicate the quality of the institutions in place.

Why can’t we just look at the simple relationship between these variables and today’s development? Isn’t that the causal relationship?

The problem of running a simple regression between income and protection against expropriation is that reverse causality may be present, reverse causality being that the dependant variable is actually affecting the other indepedant variables. In this situation it would be that wealthier states can afford to create robust institutions whilst poorer countries are not. This possible causality loop between the independent and dependent variables of the model may lead to endogenous variables that are correlated with the error term. The only way to correct for these are to use an Instrumental Variable to find the portion of the expropriation rate that is uncorrelated with the error term via the settler mortality rate.

What is the instrumental variable the authors describe?

The mortality rates faced by settlers are arguably exogenous, they are useful as an instrument to isolate the effect of institutions on performance.They use the mortality rates expected by the first European settlers in the colonies as an instrument for current institutions via the risk of appropriation. Settler mortality rates were a major determinant of settlements; settlements were a major determinant of early institutions; and there is a strong correlation between early institutions and institutions today.

What is their exclusion restriction? Do you find it plausible?

The exclusion restriction used is that mortality rates of European settlers 100 years ago have no effect on GDP per capita today, other than through their effect on institutional development. The major concern of Acemoglu, Johnson and Robinson is that the exclusion restriction may correlate with the current disease environment, which may have a direct effect on economic performance.

It is difficult to say if this restriction is plausible. Besides the authors’ misgivings, another reason may be because a high mortality rate would have an adverse effect on the develpment of the economy over time by reducing population, leading to a reduction in technology and therefore output. However, it could also be argued that the mortality rate for some countries may not be correlated as they may have developed methods to avoid disease, which the settlers did not have. For example, the settlers in places such as The Belgian Congos had an extremely high mortality rate as they inhabited disease ridden sections of the country whilst the native population did not.

Part Two: Using Acemoglu, Johnson and Robinson’s data to update our model from last week.

Task One:

Given Code:

library(dplyr); library(ggplot2)

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Read the data
pwt <- read.csv("pwt71.csv")

# Filter out the observations outside the period I'm interested in
pwt.ss <- pwt %>% filter(year<=2010 & year>=1985)

# Generate our data for the regression
pwt.2 <- pwt.ss  %>% group_by(isocode) %>% # For each country
  summarise(s = mean(ki), # What was the average investment?
            y = last(y), # The last GDP per person relative to the US?
            n = 100*log(last(POP)/first(POP))/(n() - 1)) %>% # Population growth rate?
  filter(!is.na(s)) %>% # Get rid of missing rows of s
  mutate(ln_y = log(y), # Create new columns- log of y
         ln_s = log(s), # Log of s
         ln_ngd = log(n + 1.6 + 5)) # Log of n + g + delta

# Modelling! --------------------------------------------------------------

# Run the linear model (unrestricted)
mod1 <- lm(ln_y ~ ln_s + ln_ngd, data = pwt.2)
# Take a look at the parameter estimates
summary(mod1)

## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd, data = pwt.2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.59258 -0.73868  0.00282  0.64383  3.06444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.9528     1.6039   7.453 5.90e-12 ***
## ln_s          0.9987     0.1925   5.187 6.56e-07 ***
## ln_ngd       -5.8616     0.6588  -8.898 1.36e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.001 on 156 degrees of freedom
## Multiple R-squared:  0.4482, Adjusted R-squared:  0.4411 
## F-statistic: 63.35 on 2 and 156 DF,  p-value: < 2.2e-16

# Run the restricted parameter model
mod2 <- nls(ln_y ~ A + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd, 
            data = pwt.2,
            start = list(A = 11, alpha = 0.3))
# Take a look at the parameters
summary(mod2)

## 
## Formula: ln_y ~ A + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - alpha)) * 
##     ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A      1.17489    0.21144   5.557 1.15e-07 ***
## alpha  0.60966    0.02999  20.328  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.134 on 157 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 8.969e-07

# Simulate a new country called Straya -----------------------------------------------

# Exogenous variables
s_current <- 26
n_current <- 1.6

# Parameters of the model
A <- coef(mod2)[1]
alpha <- coef(mod2)[2]
se <- 1.134 # From the summary command


# Simulate new country 1000 times (benchmark/baseline/BAU)

straya_1 <- rnorm(100000, # Generate 100k new observations
                  mean = A + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)

# Plot a histogram
hist(exp(straya_1), xlim = c(0, 200), breaks = 100)

# Simulate with new savings rate
s_new <- 27

straya_2 <- rnorm(100000,
                  mean = A + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
                  sd = se)
hist(exp(straya_2), xlim = c(0, 200), breaks = 100)

# What is the difference in median simulations between the scenarios? 
median(straya_2) - median(straya_1)

## [1] 0.06918354

New Code:

# Loading given data 
load("ajr.RData")

# Joining the Dataset
pwt.ss3 <- left_join(pwt.2, acemoglu)

## Joining by: "isocode"

## Warning in left_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector

pwt.ss3 <- pwt.ss3 %>% filter(!is.na(avexpr), !is.na(logem4), !is.na(n))

The Linear regression with institution proxy:

# The Linear Model
linmodprox <- lm(ln_y ~ ln_s + ln_ngd + avexpr, data = pwt.ss3)

#Summary Statistics
summary(linmodprox)

## 
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd + avexpr, data = pwt.ss3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.20719 -0.42763 -0.02798  0.49676  1.32809 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.84434    2.46420   3.183  0.00218 ** 
## ln_s         0.76539    0.23815   3.214  0.00199 ** 
## ln_ngd      -4.85618    0.93961  -5.168 2.19e-06 ***
## avexpr       0.39884    0.05701   6.996 1.35e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6587 on 69 degrees of freedom
## Multiple R-squared:  0.7504, Adjusted R-squared:  0.7395 
## F-statistic: 69.13 on 3 and 69 DF,  p-value: < 2.2e-16

the unrestricted estimates do not imply similar values of \(\alpha\).

The non-linear model with institution proxy:

modstrict<- nls(ln_y ~ A + delta*avexpr + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd, 
            data = pwt.ss3,
            start = list(A = 11, alpha = 0.3, delta = 1))
summary(modstrict)

## 
## Formula: ln_y ~ A + delta * avexpr + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - 
##     alpha)) * ln_ngd
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## A     -1.74786    0.36606  -4.775 9.56e-06 ***
## alpha  0.55599    0.04403  12.626  < 2e-16 ***
## delta  0.46850    0.05951   7.873 3.13e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7234 on 70 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 6.866e-08

Our estimate of \(\alpha\) has fallen from \(0.60966\) to \(0.55599\)

Effect of changing the Savings Rate in Australia by 1 Percentage Point:

# Model taken from Above and changed to display a savings rate increase by 1%
s_current <- 26 
n_current <- 1.6

# Parameters of the model
A2 <- coef(modstrict)[1]
alpha2 <- coef(modstrict)[2]
se2 <- 0.7234 # From the summary command
delta2 <-coef(modstrict)[3]

# Simulate new country 1000 times (benchmark/baseline/BAU)

aus_new <- rnorm(100000, # Generate 100k new observations
                  mean = A2 + delta2*9.318182 + (alpha2/(1-alpha2))*log(s_current) - (alpha2/(1-alpha2))*log(n_current + 1.6 + 5),
                  sd = se2)

# Plot a histogram
hist(exp(aus_new), xlim = c(0, 200), breaks = 100)

# Simulate with new savings rate
s_new <- 27

aus_sr <- rnorm(100000,
                  mean = A2 + delta2*9.318182 + (alpha2/(1-alpha2))*log(s_new) - (alpha2/(1-alpha2))*log(n_current + 1.6 + 5),
                  sd = se)
hist(exp(aus_sr), xlim = c(0, 200), breaks = 100)

# What is the difference in median simulations between the scenarios? 

median(aus_sr) - median(aus_new)

## [1] 0.05379889

Part Two: Using the Log of Settler mortality to instrument for institutional proxy:

original equation is:

\(log(y) = \beta_0+\beta_1log(avexper)+\beta_2log(s)+\beta_3log(n+g+\delta)+ e\)

however, the variable for appropriation risk, \(avexper\) is argueably biased and therefore unsuitable for OLS as we are unable to control for it. However by using a two staged least squares regression and an instrumental variable we are able to correct for this.

using Two-Staged least Squares to correct for the biased estimator \(avexpr\) first stage: finding the unbiased estimator by regressing the endogenous variable, \(avexper\) on the instrument and other and the other independent variables.

\(\widehat{log(avexper)} = \hat{\gamma_0} + \hat{\gamma_1}log(em4)+\hat{\gamma_3}ln(s)+\hat{\gamma_4}ln(n+g+\delta)\)

Second stage: substituting the new unbiased estimator, \(\widehat{log(avexper)}\), into the regression:

\(log(y) = \beta_0+\beta_1\widehat{log(avexper)}+\beta_2log(s)+\beta_3log(n+g+\delta)+ e\)

instrmod <- ivreg(ln_y ~ avexpr + ln_s + ln_ngd | ln_s +logem4 + ln_ngd, data = pwt.ss3)

summary(instrmod)

## 
## Call:
## ivreg(formula = ln_y ~ avexpr + ln_s + ln_ngd | ln_s + logem4 + 
##     ln_ngd, data = pwt.ss3)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.067414 -0.584478 -0.005362  0.496341  1.962222 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.002849   4.889020  -0.001   0.9995    
## avexpr       0.833978   0.212263   3.929   0.0002 ***
## ln_s         0.436080   0.356334   1.224   0.2252    
## ln_ngd      -2.055080   1.801921  -1.140   0.2580    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8945 on 69 degrees of freedom
## Multiple R-Squared: 0.5396,  Adjusted R-squared: 0.5196 
## Wald test: 33.78 on 3 and 69 DF,  p-value: 1.501e-13

Our Estimates seem to shift dramatically. The coefficient for protection against appropriation risk rises from \(0.39884\) to \(0.833978\). The Log of the savings rate falls from \(0.76539\) to \(0.436080\) and the log of the variables (population growth rate \(n\) + growth rate of population-augmenting technology \(g\) + depreciation rate \(\delta\)) has fallen from \(-4.85618\) to \(-2.055080\).