Group 12 - Joshua Dawe, Arthers Gao, Paul Niklaus, Ahn Phong Nong

Part 1 - Reading in and preparing some data

The data that we will use for the exercise comes from the Penn World Table http://www.rug.nl/research/ggdc/data/pwt/pwt-7.1. We will use it to estimate and simulate a basic Solow-Swan model.

Most of the useful features of R come in libraries; bits of code written by enthusiasts that are freely available. To install new libraries from within R Studio, go to Tools -> Install Packages, and then search for the library by name. You should install the following libraries: ggplot2, which we will use for plotting, and dplyr, a useful package for data summarisation.

First you will need to open a new R script. To do this, go to File -> New File -> R Script. In an R Script, you can write many lines of code—instructions—that stay put once you execute them. This is a useful way of arranging your analysis.

Now load the data by writing the following line. To execute the line, simply press ctrl-enter while your cursor is on the line. (If you want to execute multiple lines or just a part of a line, you can highlight that code and press ctrl-enter)

Setting up the data
# Attach packages

library(dplyr); library(ggplot2); library(latex2exp)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Load the Penn World Table data

pwt <- read.csv('/Users/ashleighsalmon/Documents/Homework 5/pwt71_wo_country_names_wo_g_vars.csv')

# Look at the data
head(pwt)
##   isocode year      POP   XRAT Currency_Unit ppp tcgdp cgdp cgdp2 cda2 cc
## 1     AFG 1950 8150.368     NA                NA    NA   NA    NA   NA NA
## 2     AFG 1951 8284.473     NA                NA    NA   NA    NA   NA NA
## 3     AFG 1952 8425.333     NA                NA    NA   NA    NA   NA NA
## 4     AFG 1953 8573.217     NA                NA    NA   NA    NA   NA NA
## 5     AFG 1954 8728.408     NA                NA    NA   NA    NA   NA NA
## 6     AFG 1955 8891.209 0.0168       Afghani  NA    NA   NA    NA   NA NA
##   cg ci  p p2 pc pg pi openc cgnp  y y2 rgdpl rgdpl2 rgdpch kc kg ki openk
## 1 NA NA NA NA NA NA NA    NA   NA NA NA    NA     NA     NA NA NA NA    NA
## 2 NA NA NA NA NA NA NA    NA   NA NA NA    NA     NA     NA NA NA NA    NA
## 3 NA NA NA NA NA NA NA    NA   NA NA NA    NA     NA     NA NA NA NA    NA
## 4 NA NA NA NA NA NA NA    NA   NA NA NA    NA     NA     NA NA NA NA    NA
## 5 NA NA NA NA NA NA NA    NA   NA NA NA    NA     NA     NA NA NA NA    NA
## 6 NA NA NA NA NA NA NA    NA   NA NA NA    NA     NA     NA NA NA NA    NA
##   rgdpeqa rgdpwok rgdpl2wok rgdpl2pe rgdpl2te rgdpl2th rgdptt
## 1      NA      NA        NA       NA       NA       NA     NA
## 2      NA      NA        NA       NA       NA       NA     NA
## 3      NA      NA        NA       NA       NA       NA     NA
## 4      NA      NA        NA       NA       NA       NA     NA
## 5      NA      NA        NA       NA       NA       NA     NA
## 6      NA      NA        NA       NA       NA       NA     NA
# Create new subset with years 1985-2010
pwt2 <- pwt %>% filter(year>=1985)
Summarising the Data

What we want here:

An observation for each country of their average investment rate 1985-2010. An observation for each country of their population growth rate 1985-2010. An observation for each country of their income per capita relative to the united states in 2010.

# Summarise the data
pwt.ss <- pwt2 %>%   group_by(isocode) %>%
  summarise(n = 100*log(last(POP)/first(POP))/n()-1,
            s = mean(ki),
            y = last(y)) %>% filter(!is.na(s))

summary(pwt.ss)
##     isocode          n                 s                y           
##  AFG    :  1   Min.   :-1.8615   Min.   : 3.625   Min.   :  0.6063  
##  AGO    :  1   1st Qu.:-0.1953   1st Qu.:17.647   1st Qu.:  5.4462  
##  ALB    :  1   Median : 0.6610   Median :22.802   Median : 16.8836  
##  ARG    :  1   Mean   : 0.5942   Mean   :23.451   Mean   : 30.7630  
##  ATG    :  1   3rd Qu.: 1.3679   3rd Qu.:26.728   3rd Qu.: 43.2961  
##  AUS    :  1   Max.   : 2.9440   Max.   :64.964   Max.   :200.7718  
##  (Other):153
# Plot the data
plot(pwt.ss[,-1])    

pwt.ss %>% ggplot(aes(x = s)) +
  geom_density()

Part 2 - Modelling

Modelling using Solow’s model and the Augmented Solow model.

# Apply mutate to data

pwt3 <- pwt.ss %>%
  mutate(lnY = log(y),
         lnS = log(s),
         lnN = log(n+1.6+5))

Modelling our data

Create a linear model of the following form

\[ln(y_i) = \beta_0 + \beta_1*ln(s_i) + \beta_2*ln(n_i + g + \sigma) + \epsilon_i \]

# Create linear model

model1 <- lm(lnY ~ lnS + lnN, data = pwt3)

summary(model1)
## 
## Call:
## lm(formula = lnY ~ lnS + lnN, data = pwt3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.58818 -0.74098 -0.00319  0.64672  3.06503 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.9985     1.4042   7.121  3.7e-11 ***
## lnS           1.0003     0.1926   5.192  6.4e-07 ***
## lnN          -5.2856     0.5952  -8.881  1.5e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.001 on 156 degrees of freedom
## Multiple R-squared:  0.4475, Adjusted R-squared:  0.4404 
## F-statistic: 63.17 on 2 and 156 DF,  p-value: < 2.2e-16

Now create two non-linear models (the unaugmented and then augmented Solow models).

First non-linear model (non-augmented Solow): \[ln(y_i) = \frac{\alpha}{1 - \alpha}ln(s_i) + \frac{\alpha}{1 - \alpha}ln(n_i + g + \sigma) + \epsilon_i \]

Second non-linear model (augmented Solow): \[ln(y_i) = ln (A_0) + \frac{\alpha}{1 - \alpha}ln(s_i) + \frac{\alpha}{1 - \alpha}ln(n_i + g + \sigma) + \epsilon_i \]

# Create first non-linear model

model2 <- nls(lnY ~ (alpha/(1-alpha)) * lnS - (alpha/(1-alpha)) * lnN, 
              data = pwt3, 
              start = list(alpha = 0.4),
              control = nls.control(warnOnly = T))

summary(model2)
## 
## Formula: lnY ~ (alpha/(1 - alpha)) * lnS - (alpha/(1 - alpha)) * lnN
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## alpha  0.69669    0.00715   97.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.18 on 158 degrees of freedom
## 
## Number of iterations to convergence: 3 
## Achieved convergence tolerance: 4.395e-08
# Create second non-linear model

model3 <- nls(lnY ~ A + (alpha/(1-alpha)) * lnS - (alpha/(1-alpha)) * lnN, 
              data = pwt3, 
              start = list(alpha = 0.4, A = 1),
              control = nls.control(warnOnly = T))

summary(model3)
## 
## Formula: lnY ~ A + (alpha/(1 - alpha)) * lnS - (alpha/(1 - alpha)) * lnN
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## alpha  0.61108    0.02918  20.942  < 2e-16 ***
## A      0.94653    0.23230   4.075  7.3e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.126 on 157 degrees of freedom
## 
## Number of iterations to convergence: 4 
## Achieved convergence tolerance: 1.552e-07

Part 3 - Generate new countries

Often, we want to check how well a model is performing by using the model to generate some new random numbers, and compare these to the actuals. This is in the spirit of what Bayesians call posterior predictive checking, and what macroeconomists call moment checking. The model will be more “valid” if the new countries that it generates have a similar distribution of GDP per capita to what we observe in reality.

# Create predicted y

pwt3 <- pwt3 %>% 
  mutate(ypredicted = exp(lnY))

# Plot predicted y against lnY

pwt3 %>% ggplot(aes(x = lnY, y = ypredicted)) +
  geom_point() + geom_smooth(method = "lm") 

# The model does not appear to be fitting the data well

# Generate new column containing the errors from model
 pwt3["errorcolumn"] <- NA;
 pwt3$errorcolumn <- (pwt3$ypredicted - pwt3$y)

# Create a histogram of the error plots
 hist(pwt3$errorcolumn,
 main="Histogram for error column", 
 xlab="Errors", 
 border="Purple", 
 col="Blue")

# Generate a plot of fitted data against error terms
   plot(pwt3$ypredicted, pwt3$errorcolumn,
        main = "Fitted data against errors",
        xlab = "Fitted data",
        ylab = "Errors")

# Find the standard deviation of the errors
   sdestimate <- sd(pwt3$errorcolumn)
   
# New countries
# Generate new column of countries
   pwt3 <- pwt3 %>%
    mutate(AWholeNewWorld = rnorm(n = n(), mean = ypredicted, sd = sdestimate))

Generate two histograms, one of the actual GDP per capital relative to the united states, and the second with our “new countries”. Would you say the model is a good fit?

# Generate two historgrams; one for actual GDP relative to US and one for the new countries
   # Create a histogram of the actual GDP
   hist(pwt3$y,
        main="Histogram for Actual GDP relative to US", 
        xlab="GDP", 
        border="Purple", 
        col="Blue")

   # Create a histogram of the hypothetical countries
   hist(pwt3$AWholeNewWorld,
        main = "Histogram for Predicted GDP",
        xlab="Predicted GDP",
        border="Purple",
        col="Red")

The histograms suggest the model isn’t a good representation as the hypothesised model forms a bell-shaped curve while the real data is heavily skewed.