Part 1

Reading in and preparing the data

library(dplyr); library(ggplot2)
pwt <- read.csv("pwt71.csv")

What we want here:

  • An observation for each country of their average investment rate 1985-2010

  • An observation for each country of their population growth rate 1985-2010

  • An observation for each country of their income per capita relative to the united states in 2010.

Generating the data for each country

pwt.ss <- pwt %>% filter(year>=1985)%>%
  group_by(isocode) %>%
  summarise(n = 100*log(last(POP)/first(POP))/(n()-1),
            I = mean(ki),
            y = last(y)) %>% filter(!is.na(I))

pwt.2 <- pwt.ss %>%
  mutate(ln_y=log(y),
         ln_I=log(I),
         ln_ngd=log(n+1.5+5))

Here we have applied the data and restricted it’s output to focus on 3 key values; Population growth (n), average investment rate (I) and PPP Converted GDP Per Capita Relative to the United States (y) across all countries from the year 1985 onwards. The next step involved taking the log of these values in order to represent it linearly, using the Australian growth and depreciation rates of 1.5% and 5% respectively.

summary(pwt.2)
##     isocode          n                 I                y           
##  AFG    :  1   Min.   :-0.8960   Min.   : 3.625   Min.   :  0.6063  
##  AGO    :  1   1st Qu.: 0.8369   1st Qu.:17.647   1st Qu.:  5.4462  
##  ALB    :  1   Median : 1.7275   Median :22.802   Median : 16.8836  
##  ARG    :  1   Mean   : 1.6580   Mean   :23.451   Mean   : 30.7630  
##  ATG    :  1   3rd Qu.: 2.4626   3rd Qu.:26.728   3rd Qu.: 43.2961  
##  AUS    :  1   Max.   : 4.1017   Max.   :64.964   Max.   :200.7718  
##  (Other):153                                                        
##       ln_y              ln_I           ln_ngd     
##  Min.   :-0.5004   Min.   :1.288   Min.   :1.723  
##  1st Qu.: 1.6947   1st Qu.:2.871   1st Qu.:1.993  
##  Median : 2.8263   Median :3.127   Median :2.107  
##  Mean   : 2.6932   Mean   :3.076   Mean   :2.091  
##  3rd Qu.: 3.7664   3rd Qu.:3.286   3rd Qu.:2.193  
##  Max.   : 5.3022   Max.   :4.174   Max.   :2.361  
## 

Plotting the data

Effect of average investment rate on GDP

pwt.2 %>% ggplot(aes(x = ln_I, y = ln_y)) +
  geom_point()

Effect of Population growth rate on GDP

pwt.2 %>% ggplot(aes(x = n, y = ln_y)) +
  geom_point()

Income per capita relative to the US in 2010

pwt.ss %>% ggplot(aes(x = y)) +
  geom_histogram()

From the scatter plots we can see the impact of population growth and investment rate on GDP per capita. Investment rate is positively correlated with the log of GDP per capita, whereas a higher population growth rate is negatively correlated with the log of GDP per capita. The histogram shows that the vast majority of countries are worth a tiny fraction of the US’s GDP per capita.

Part 2: Modelling

To obtain an estimate of the capital share of income, \(\alpha\), we used a non-linear model with the form.

\(\ln(y_i) = \frac{\alpha}{1-\alpha}\ln(s_i) - \frac{\alpha}{1-\alpha} \ln(n_{i} + g + \delta) + \epsilon_i\)

model_2 <- nls(ln_y ~ (a/(1-a))*ln_I - (a/(1-a))*ln_ngd, data = pwt.2, start = list(a = 0.4), control = nls.control(warnOnly = T))

summary(model_2)  
## 
## Formula: ln_y ~ (a/(1 - a)) * ln_I - (a/(1 - a)) * ln_ngd
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a 0.716495   0.007226   99.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.231 on 158 degrees of freedom
## 
## Number of iterations to convergence: 4 
## Achieved convergence tolerance: 1.387e-09

The model estimates that if savings were to increase by 1% than GDP would increase by 0.99% whearas if the depreciation function were to increase by 1% than GDP would decrease by 5.9%.

Our estimate shows that, \(\alpha\) would be approximately 0.71 based on this model.

This estimate is too high. This could be because we have ommitted technology in the model.

We can estimate technology, \(A\) using the formula

\(\ln(y_i) = \ln(A_0) + \frac{\alpha}{1-\alpha}\ln(s_i) - \frac{\alpha}{1-\alpha} \ln(n_{i} + g + \delta) + \epsilon_i\)

model_3 <- nls(ln_y ~ A + (a/(1-a))*ln_I - (a/(1-a))*ln_ngd, data = pwt.2, start = list(a = 0.4, A = 1), control = nls.control(warnOnly = T))

summary(model_3) 
## 
## Formula: ln_y ~ A + (a/(1 - a)) * ln_I - (a/(1 - a)) * ln_ngd
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a  0.60985    0.02989  20.403  < 2e-16 ***
## A  1.15433    0.21320   5.414 2.26e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.133 on 157 degrees of freedom
## 
## Number of iterations to convergence: 4 
## Achieved convergence tolerance: 1.41e-07

This model shows that when we include technology in the model \(\alpha\) is approximately 0.61 which although is still high is an improvement.

We now have all of the components needed to calculate an estimate of GDP/capita.

y_est <- 1.14607 + (0.60935/(1-0.60935))*pwt.2$ln_I - (0.60935/(1-0.60935))*pwt.2$ln_ngd


#Creating a new data column:

pwt.2 <- pwt.2 %>% 
  
  mutate(y_est = exp(y_est))

By looking at the data it’s clear the estimates are wrong.

When plotting the y estimates we can compare them to the actual y values.

pwt.2 %>% ggplot(aes(x = y, y = y_est)) +
  geom_point()

We can create residuals by subtract estimates of y from the actual value of y. We can then create a histogram to observe the distributions.

#a new data column:

pwt.2 <- pwt.2 %>% 
  
  mutate(residuals = (y - y_est))


#histogram:

resid <- pwt.2$residuals

hist(resid)

I can use these residuals to plot against the estimates to demonstrate inaccuracies.

pwt.2 %>% ggplot(aes(x = resid, y = y_est)) +
  geom_point()

The plot shows that the smaller the estimates the more accurate they are.

Moment checking can demonstrate the accuracy of this model by creating a new column of made up countries using information from the model of given savings rates and population growth.

#Standard deviation of residuals

resid_sd <- sd(resid)


#Creating a new data column:

pwt.2 <- pwt.2 %>% 
  
  mutate(RandomCountries = exp(rnorm(n = n(), mean = y_est, sd = resid_sd)))

Using the moment checking method and comparing the model’s estimates against the real data we can see the model is flawed.

This histogram depicts the actual data of GDP.

y <- pwt.2$y

hist(y)

We can compare this to a histogram depicting the predicted values of GDP of our random countries.

Random_Countries <- pwt.2$RandomCountries

hist(Random_Countries)

This histogram shows that almost all 159 countries fall between 0% and 5% of US GDP. Clearly this does represent reality.