Question 1
The central point of the paper is to explore the hypothesis that mortality rates in European colonial settlements are correlated with the establishment of extractive institutions and therefore, can be used as a predictor of quality in modern institutions.
Question 2
An index of protection against expropriation, measured by the Political Risk Services data, is used in the paper as the primary proxy variable representing the quality of modern institutions. The dataset comprises a 1-10 rating of protection (10 being the most protection) for each country in the period 1985-1995. A dataset of constraints on the executive developed by Ted Gurr, for the year 1990, is included as an alternative proxy.
Two measures of institutional quality in the colonial period (circa 1900) are also provided, including a measure of constraints on the executive and a democracy rating. Finally a value for constraints on the executive for the future of independence from colonialism is used as a final proxy measure of institutional quality.
Question 3
We cannot look at the simple relationship between these variables and today’s development, as this does not take into account the confounding variables that may affect both institutional quality and development separately. As we have not these potential confounders into account, we cannot therefore assume the relationship is causal.
Question 4
The authors use the mortality rates of early European settlers in each country as an instrumental variable representing for current institutional quality. They argue that there is an independent association between mortality rates and institutional quality which accounts for potential confounders which comprise casaulity between current development and institutional quality.
Question 5
The exclusion restriction allowing the above instrumental variable is that mortality rates of European colonial settlers can have no relationship to GDP today except through the variables’ shared relationship with institutional quality. That is, there can be no confounding variable between the instrument and GDP.
1) Re-run the code from last week. Make sure that you’re comfortable with what is happening in each line.
# Load the libraries
library(ggplot2); library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Read the data
pwt <- read.csv("/Users/ashleighsalmon/Documents/Homework 5/pwt71_wo_country_names_wo_g_vars.csv")
# Filter out the observations outside the period I'm interested in
pwt.ss <- pwt %>% filter(year<=2010 & year>=1985)
# Generate our data for the regression
pwt.2 <- pwt.ss %>% group_by(isocode) %>% # For each country
summarise(s = mean(ki), # What was the average investment?
y = last(y), # The last GDP per person relative to the US?
n = 100*log(last(POP)/first(POP))/(n() - 1)) %>% # Population growth rate?
filter(!is.na(s)) %>% # Get rid of missing rows of s
mutate(ln_y = log(y), # Create new columns- log of y
ln_s = log(s), # Log of s
ln_ngd = log(n + 1.6 + 5)) # Log of n + g + delta
# Modelling! --------------------------------------------------------------
# Run the linear model (unrestricted)
mod1 <- lm(ln_y ~ ln_s + ln_ngd, data = pwt.2)
# Take a look at the parameter estimates
summary(mod1)
##
## Call:
## lm(formula = ln_y ~ ln_s + ln_ngd, data = pwt.2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.59258 -0.73868 0.00282 0.64383 3.06444
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.9528 1.6039 7.453 5.90e-12 ***
## ln_s 0.9987 0.1925 5.187 6.56e-07 ***
## ln_ngd -5.8616 0.6588 -8.898 1.36e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.001 on 156 degrees of freedom
## Multiple R-squared: 0.4482, Adjusted R-squared: 0.4411
## F-statistic: 63.35 on 2 and 156 DF, p-value: < 2.2e-16
# Run the restricted parameter model
mod2 <- nls(ln_y ~ A + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd,
data = pwt.2,
start = list(A = 11, alpha = 0.3))
# Take a look at the parameters
summary(mod2)
##
## Formula: ln_y ~ A + (alpha/(1 - alpha)) * ln_s - (alpha/(1 - alpha)) *
## ln_ngd
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## A 1.17489 0.21144 5.557 1.15e-07 ***
## alpha 0.60966 0.02999 20.328 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.134 on 157 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 8.969e-07
# Simulate a new country called Straya -----------------------------------------------
# Exogenous variables
s_current <- 26
n_current <- 1.6
# Parameters of the model
A <- coef(mod2)[1]
alpha <- coef(mod2)[2]
se <- 1.134 # From the summary command
# Simulate new country 1000 times (benchmark/baseline/BAU)
straya_1 <- rnorm(100000, # Generate 100k new observations
mean = A + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
sd = se)
# Plot a histogram
hist(exp(straya_1), xlim = c(0, 200), breaks = 100)
# Simulate with new savings rate
s_new <- 27
straya_2 <- rnorm(100000,
mean = A + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
sd = se)
hist(exp(straya_2), xlim = c(0, 200), breaks = 100)
# What is the difference in median simulations between the scenarios?
median(straya_2) - median(straya_1)
## [1] 0.07040049
2) Load the data from Slack. This is based on the data provided alongside the paper.
load("ajr.RData")
3) Join this data onto your main datafile using the left_join() function.
Week6Data <- left_join(pwt.2, acemoglu)
## Joining by: "isocode"
## Warning in left_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
Week6Data1 <- Week6Data %>% filter(!is.na(avexpr), !is.na(logem4), !is.na(n))
4) Run a linear regression as in the code below, but this time including our institution proxy (avexpr from acemoglu).
# Run the linear model (unrestricted)
lin_mod <- lm(ln_y ~ avexpr + ln_s + ln_ngd, data = Week6Data1)
# Take a look at the parameter estimates
summary(lin_mod)
##
## Call:
## lm(formula = ln_y ~ avexpr + ln_s + ln_ngd, data = Week6Data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20719 -0.42763 -0.02798 0.49676 1.32809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.84434 2.46420 3.183 0.00218 **
## avexpr 0.39884 0.05701 6.996 1.35e-09 ***
## ln_s 0.76539 0.23815 3.214 0.00199 **
## ln_ngd -4.85618 0.93961 -5.168 2.19e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6587 on 69 degrees of freedom
## Multiple R-squared: 0.7504, Adjusted R-squared: 0.7395
## F-statistic: 69.13 on 3 and 69 DF, p-value: < 2.2e-16
5) Do the unrestricted estimates imply similar values of \(\alpha\)? The introduction of avexpr as a proxy variable for institional quality considerably alters the coefficients in the second unrestricted model. Presumably, \(\alpha\) would also alter, though it is difficult to tell without estimates.
6) Re-run the non-linear model, this time including aveexpr as in the model described above. What happens to our estimate of alpha?
# Run the restricted parameter model
lin_mod2 <- nls(ln_y ~ A + delta*avexpr + (alpha/(1-alpha))*ln_s - (alpha/(1-alpha))*ln_ngd,
data = Week6Data1,
start = list(A = 11, alpha = 0.3, delta = 1))
# Take a look at the parameters
summary(lin_mod2)
##
## Formula: ln_y ~ A + delta * avexpr + (alpha/(1 - alpha)) * ln_s - (alpha/(1 -
## alpha)) * ln_ngd
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## A -1.74786 0.36606 -4.775 9.56e-06 ***
## alpha 0.55599 0.04403 12.626 < 2e-16 ***
## delta 0.46850 0.05951 7.873 3.13e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7234 on 70 degrees of freedom
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 6.866e-08
The alpha value appears to drop slightly below last weeks estimate of \(\alpha\) ~ 0.6 to \(\alpha\) ~ 0.55. This suggests a similar estimate that is slightly more refined.
7) What is the effect of increasing the savings rate in Australia by 1 per centage point?
#Simulate a new country called Straya
# Exogenous variables
s_current <- 26
n_current <- 1.6
avexpr_current <- 9.318182
# Parameters of the model
A <- coef(lin_mod2)[1]
alpha <- coef(lin_mod2)[2]
delta <- coef(lin_mod)[3]
se <- 0.7312 # From the summary command of lin_mod2
# Simulate new country 1000 times (benchmark/baseline/BAU)
straya_1a <- rnorm(100000, # Generate 100k new observations
mean = A + delta*avexpr_current + (alpha/(1-alpha))*log(s_current) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
sd = se)
# Plot a histogram
hist(exp(straya_1a), xlim = c(0, 2000), breaks = 100)
# Simulate with new savings rate
s_new <- 27
straya_2a <- rnorm(100000,
mean = A + delta*avexpr_current + (alpha/(1-alpha))*log(s_new) - (alpha/(1-alpha))*log(n_current + 1.6 + 5),
sd = se)
hist(exp(straya_2a), xlim = c(0, 2000), breaks = 100)
# What is the difference in median simulations between the scenarios?
median(straya_2a) - median(straya_1a)
## [1] 0.04508166