library(dplyr); library(ggplot2)
pwt <- read.csv("pwt71.csv")
What we want here:
An observation for each country of their average investment rate 1985-2010
An observation for each country of their population growth rate 1985-2010
An observation for each country of their income per capita relative to the united states in 2010.
pwt.ss <- pwt %>% filter(year>=1985)%>%
group_by(isocode) %>%
summarise(n = 100*log(last(POP)/first(POP))/(n()-1),
I = mean(ki),
y = last(y)) %>% filter(!is.na(I))
pwt.2 <- pwt.ss %>%
mutate(ln_y=log(y),
ln_I=log(I),
ln_ngd=log(n+1.5+5))
Here we have applied the data and restricted it’s output to focus on 3 key values; Population growth (n), average investment rate (I) and PPP Converted GDP Per Capita Relative to the United States (y) across all countries from the year 1985 onwards. The next step involved taking the log of these values in order to represent it linearly, using the Australian growth and depreciation rates of 1.5% and 5% respectively.
summary(pwt.2)
## isocode n I y
## AFG : 1 Min. :-0.8960 Min. : 3.625 Min. : 0.6063
## AGO : 1 1st Qu.: 0.8369 1st Qu.:17.647 1st Qu.: 5.4462
## ALB : 1 Median : 1.7275 Median :22.802 Median : 16.8836
## ARG : 1 Mean : 1.6580 Mean :23.451 Mean : 30.7630
## ATG : 1 3rd Qu.: 2.4626 3rd Qu.:26.728 3rd Qu.: 43.2961
## AUS : 1 Max. : 4.1017 Max. :64.964 Max. :200.7718
## (Other):153
## ln_y ln_I ln_ngd
## Min. :-0.5004 Min. :1.288 Min. :1.723
## 1st Qu.: 1.6947 1st Qu.:2.871 1st Qu.:1.993
## Median : 2.8263 Median :3.127 Median :2.107
## Mean : 2.6932 Mean :3.076 Mean :2.091
## 3rd Qu.: 3.7664 3rd Qu.:3.286 3rd Qu.:2.193
## Max. : 5.3022 Max. :4.174 Max. :2.361
##
pwt.2 %>% ggplot(aes(x = ln_I, y = ln_y)) +
geom_point()
pwt.2 %>% ggplot(aes(x = n, y = ln_y)) +
geom_point()
pwt.ss %>% ggplot(aes(x = y)) +
geom_histogram()
From the scatter plots we can see the impact of population growth and investment rate on GDP per capita. Investment rate is positively correlated with the log of GDP per capita, whereas a higher population growth rate is negatively correlated with the log of GDP per capita. The histogram shows that the vast majority of countries are worth a tiny fraction of the US’s GDP per capita.
To obtain an estimate of the capital share of income, \(\alpha\), we used a non-linear model with the form.
\(\ln(y_i) = \frac{\alpha}{1-\alpha}\ln(s_i) - \frac{\alpha}{1-\alpha} \ln(n_{i} + g + \delta) + \epsilon_i\)
model_2 <- nls(ln_y ~ (a/(1-a))*ln_I - (a/(1-a))*ln_ngd, data = pwt.2, start = list(a = 0.4), control = nls.control(warnOnly = T))
summary(model_2)
##
## Formula: ln_y ~ (a/(1 - a)) * ln_I - (a/(1 - a)) * ln_ngd
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 0.716495 0.007226 99.16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.231 on 158 degrees of freedom
##
## Number of iterations to convergence: 4
## Achieved convergence tolerance: 1.387e-09
The model estimates that if savings were to increase by 1% than GDP would increase by 0.99% whearas if the depreciation function were to increase by 1% than GDP would decrease by 5.9%.
Our estimate shows that, \(\alpha\) would be approximately 0.71 based on this model.
This estimate is too high. This could be because we have ommitted technology in the model.
We can estimate technology, \(A\) using the formula
\(\ln(y_i) = \ln(A_0) + \frac{\alpha}{1-\alpha}\ln(s_i) - \frac{\alpha}{1-\alpha} \ln(n_{i} + g + \delta) + \epsilon_i\)
model_3 <- nls(ln_y ~ A + (a/(1-a))*ln_I - (a/(1-a))*ln_ngd, data = pwt.2, start = list(a = 0.4, A = 1), control = nls.control(warnOnly = T))
summary(model_3)
##
## Formula: ln_y ~ A + (a/(1 - a)) * ln_I - (a/(1 - a)) * ln_ngd
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 0.60985 0.02989 20.403 < 2e-16 ***
## A 1.15433 0.21320 5.414 2.26e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.133 on 157 degrees of freedom
##
## Number of iterations to convergence: 4
## Achieved convergence tolerance: 1.41e-07
This model shows that when we include technology in the model \(\alpha\) is approximately 0.61 which although is still high is an improvement.
We now have all of the components needed to calculate an estimate of GDP/capita.
y_est <- 1.14607 + (0.60935/(1-0.60935))*pwt.2$ln_I - (0.60935/(1-0.60935))*pwt.2$ln_ngd
#Creating a new data column:
pwt.2 <- pwt.2 %>%
mutate(y_est = exp(y_est))
By looking at the data it’s clear the estimates are wrong.
When plotting the y estimates we can compare them to the actual y values.
pwt.2 %>% ggplot(aes(x = y, y = y_est)) +
geom_point()
We can create residuals by subtract estimates of y from the actual value of y. We can then create a histogram to observe the distributions.
#a new data column:
pwt.2 <- pwt.2 %>%
mutate(residuals = (y - y_est))
#histogram:
resid <- pwt.2$residuals
hist(resid)
I can use these residuals to plot against the estimates to demonstrate inaccuracies.
pwt.2 %>% ggplot(aes(x = resid, y = y_est)) +
geom_point()
The plot shows that the smaller the estimates the more accurate they are.
Moment checking can demonstrate the accuracy of this model by creating a new column of made up countries using information from the model of given savings rates and population growth.
#Standard deviation of residuals
resid_sd <- sd(resid)
#Creating a new data column:
pwt.2 <- pwt.2 %>%
mutate(RandomCountries = exp(rnorm(n = n(), mean = y_est, sd = resid_sd)))
Using the moment checking method and comparing the model’s estimates against the real data we can see the model is flawed.
This histogram depicts the actual data of GDP.
y <- pwt.2$y
hist(y)
We can compare this to a histogram depicting the predicted values of GDP of our random countries.
Random_Countries <- pwt.2$RandomCountries
hist(Random_Countries)
This histogram shows that almost all 159 countries fall between 0% and 5% of US GDP. Clearly this does represent reality.