The data that we will use for the exercise comes from the Penn World Table http://www.rug.nl/research/ggdc/data/pwt/pwt-7.1. We will use it to estimate and simulate a basic Solow-Swan model.
Most of the useful features of R come in libraries; bits of code written by enthusiasts that are freely available. To install new libraries from within R Studio, go to Tools -> Install Packages, and then search for the library by name. You should install the following libraries: ggplot2, which we will use for plotting, and dplyr, a useful package for data summarisation.
First you will need to open a new R script. To do this, go to File -> New File -> R Script. In an R Script, you can write many lines of code—instructions—that stay put once you execute them. This is a useful way of arranging your analysis.
Now load the data by writing the following line. To execute the line, simply press ctrl-enter while your cursor is on the line. (If you want to execute multiple lines or just a part of a line, you can highlight that code and press ctrl-enter)
# Attach packages
library(dplyr); library(ggplot2); library(latex2exp)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Load the Penn World Table data
pwt <- read.csv('/Users/ashleighsalmon/Documents/Homework 5/pwt71_wo_country_names_wo_g_vars.csv')
# Look at the data
head(pwt)
## isocode year POP XRAT Currency_Unit ppp tcgdp cgdp cgdp2 cda2 cc
## 1 AFG 1950 8150.368 NA NA NA NA NA NA NA
## 2 AFG 1951 8284.473 NA NA NA NA NA NA NA
## 3 AFG 1952 8425.333 NA NA NA NA NA NA NA
## 4 AFG 1953 8573.217 NA NA NA NA NA NA NA
## 5 AFG 1954 8728.408 NA NA NA NA NA NA NA
## 6 AFG 1955 8891.209 0.0168 Afghani NA NA NA NA NA NA
## cg ci p p2 pc pg pi openc cgnp y y2 rgdpl rgdpl2 rgdpch kc kg ki openk
## 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## rgdpeqa rgdpwok rgdpl2wok rgdpl2pe rgdpl2te rgdpl2th rgdptt
## 1 NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA
## 4 NA NA NA NA NA NA NA
## 5 NA NA NA NA NA NA NA
## 6 NA NA NA NA NA NA NA
# Create new subset with years 1985-2010
pwt2 <- pwt %>% filter(year>=1985)
What we want here:
An observation for each country of their average investment rate 1985-2010. An observation for each country of their population growth rate 1985-2010. An observation for each country of their income per capita relative to the united states in 2010.
# Summarise the data
pwt.ss <- pwt2 %>% group_by(isocode) %>%
summarise(n = 100*log(last(POP)/first(POP))/n()-1,
s = mean(ki),
y = last(y)) %>% filter(!is.na(s))
summary(pwt.ss)
## isocode n s y
## AFG : 1 Min. :-1.8615 Min. : 3.625 Min. : 0.6063
## AGO : 1 1st Qu.:-0.1953 1st Qu.:17.647 1st Qu.: 5.4462
## ALB : 1 Median : 0.6610 Median :22.802 Median : 16.8836
## ARG : 1 Mean : 0.5942 Mean :23.451 Mean : 30.7630
## ATG : 1 3rd Qu.: 1.3679 3rd Qu.:26.728 3rd Qu.: 43.2961
## AUS : 1 Max. : 2.9440 Max. :64.964 Max. :200.7718
## (Other):153
# Plot the data
plot(pwt.ss[,-1])
pwt.ss %>% ggplot(aes(x = s)) +
geom_density()
Modelling using Solow’s model and the Augmented Solow model.
# Apply mutate to data
pwt3 <- pwt.ss %>%
mutate(lnY = log(y),
lnS = log(s),
lnN = log(n+1.6+5))
Modelling our data
Create a linear model of the following form
\[ln(y_i) = \beta_0 + \beta_1*ln(s_i) + \beta_2*ln(n_i + g + \sigma) + \epsilon_i \]
# Create linear model
model1 <- lm(lnY ~ lnS + lnN, data = pwt3)
summary(model1)
##
## Call:
## lm(formula = lnY ~ lnS + lnN, data = pwt3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.58818 -0.74098 -0.00319 0.64672 3.06503
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.9985 1.4042 7.121 3.7e-11 ***
## lnS 1.0003 0.1926 5.192 6.4e-07 ***
## lnN -5.2856 0.5952 -8.881 1.5e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.001 on 156 degrees of freedom
## Multiple R-squared: 0.4475, Adjusted R-squared: 0.4404
## F-statistic: 63.17 on 2 and 156 DF, p-value: < 2.2e-16
Now create two non-linear models (the unaugmented and then augmented Solow models).
First non-linear model (non-augmented Solow): \[ln(y_i) = \frac{\alpha}{1 - \alpha}ln(s_i) + \frac{\alpha}{1 - \alpha}ln(n_i + g + \sigma) + \epsilon_i \]
Second non-linear model (augmented Solow): \[ln(y_i) = ln (A_0) + \frac{\alpha}{1 - \alpha}ln(s_i) + \frac{\alpha}{1 - \alpha}ln(n_i + g + \sigma) + \epsilon_i \]
# Create first non-linear model
model2 <- nls(lnY ~ (alpha/(1-alpha)) * lnS - (alpha/(1-alpha)) * lnN,
data = pwt3,
start = list(alpha = 0.4),
control = nls.control(warnOnly = T))
summary(model2)
##
## Formula: lnY ~ (alpha/(1 - alpha)) * lnS - (alpha/(1 - alpha)) * lnN
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## alpha 0.69669 0.00715 97.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.18 on 158 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 4.395e-08
# Create second non-linear model
model3 <- nls(lnY ~ A + (alpha/(1-alpha)) * lnS - (alpha/(1-alpha)) * lnN,
data = pwt3,
start = list(alpha = 0.4, A = 1),
control = nls.control(warnOnly = T))
summary(model3)
##
## Formula: lnY ~ A + (alpha/(1 - alpha)) * lnS - (alpha/(1 - alpha)) * lnN
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## alpha 0.61108 0.02918 20.942 < 2e-16 ***
## A 0.94653 0.23230 4.075 7.3e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.126 on 157 degrees of freedom
##
## Number of iterations to convergence: 4
## Achieved convergence tolerance: 1.552e-07
Often, we want to check how well a model is performing by using the model to generate some new random numbers, and compare these to the actuals. This is in the spirit of what Bayesians call posterior predictive checking, and what macroeconomists call moment checking. The model will be more “valid” if the new countries that it generates have a similar distribution of GDP per capita to what we observe in reality.
# Create predicted y
pwt3 <- pwt3 %>%
mutate(ypredicted = exp(lnY))
# Plot predicted y against lnY
pwt3 %>% ggplot(aes(x = lnY, y = ypredicted)) +
geom_point() + geom_smooth(method = "lm")
# The model does not appear to be fitting the data well
# Generate new column containing the errors from model
pwt3["errorcolumn"] <- NA;
pwt3$errorcolumn <- (pwt3$ypredicted - pwt3$y)
# Create a histogram of the error plots
hist(pwt3$errorcolumn,
main="Histogram for error column",
xlab="Errors",
border="Purple",
col="Blue")
# Generate a plot of fitted data against error terms
plot(pwt3$ypredicted, pwt3$errorcolumn,
main = "Fitted data against errors",
xlab = "Fitted data",
ylab = "Errors")
# Find the standard deviation of the errors
sdestimate <- sd(pwt3$errorcolumn)
# New countries
# Generate new column of countries
pwt3 <- pwt3 %>%
mutate(AWholeNewWorld = rnorm(n = n(), mean = ypredicted, sd = sdestimate))
Generate two histograms, one of the actual GDP per capital relative to the united states, and the second with our “new countries”. Would you say the model is a good fit?
# Generate two historgrams; one for actual GDP relative to US and one for the new countries
# Create a histogram of the actual GDP
hist(pwt3$y,
main="Histogram for Actual GDP relative to US",
xlab="GDP",
border="Purple",
col="Blue")
# Create a histogram of the hypothetical countries
hist(pwt3$AWholeNewWorld,
main = "Histogram for Predicted GDP",
xlab="Predicted GDP",
border="Purple",
col="Red")
The histograms suggest the model isn’t a good representation as the hypothesised model forms a bell-shaped curve while the real data is heavily skewed.