Solow Growth Model, Human Capital, and Convergence

Jaromír Baxa & Eva Hromádková

IES FSV UK

Introduction

The cross-country regression based on the Solow model lead to interesting results: R² at 60%, correct signs; however, much higher value of the estimated coefficient α. Also, the regression left behind all other variables, potentially relevant for growth of technology g, the key parameter of the long-term growth.
The cross-country growth regression can be extended to account for the effects of other variables - within a framework that is consistent with the theory and that does not suffer by omitted variable bias.
This section focuses on the effects of education. We introduce human capital into a production function. Then, this framework will be utilized for studying economic convergence.

Outline:

Human capital in production function
Estimation of the extended model
Omitted variable bias
The conditional convergence hypothesis

1. Human capital in production function

Solow model: output determined by accumulation of physical capital and by technological progress.
However, human capital is important as well: schooling and investments into education (both private and public), into health and also opportunity costs of education.
Assume a production function with two types of capital:

\[ Y_t= K_t^{\alpha} H_t^{\beta} (A_t L_t)^{1-\alpha-\beta} \]

1. Human capital in production function

Define equations describing accumulation of physical and human capital:

\[ \dot{k} = s_k y_t - (n+g+\delta) k_t \]

\[ \dot{h} = s_h y_t - (n+g+\delta) k_t \]

Setting both equations to zero gives the conditions of the steady state:

\[ k^*= (\frac {s_k^{1-\beta} s_h^{\beta} } {n+g+\delta}) ^ { \frac {1} {(1-\alpha - \beta)}} \]

\[ h^*= (\frac {s_k^{\alpha} s_h^{1-\alpha} } {n+g+\delta}) ^ { \frac {1} {(1-\alpha - \beta)}} \]

1. Human capital in production function

We may now insert both equations into production function \(y = k^{\alpha} h^{\beta}\), and take logs to linearize the equation. This way, we obtain the cross-country growth regression:

\[ log (\frac {Y} {L}) = log A_0 + g.t + \frac {\alpha} {(1-\alpha - \beta)} log s_k + \frac {\beta} {(1-\alpha - \beta)} log s_h \\ - \frac {\alpha + \beta } {(1-\alpha -\beta)} log (n+g+\delta) \]

Thus, we will estimate the model

\[ log (\frac {Y_i} {L_i}) = \beta_0 + \beta_1 log s_{k,i} + \beta_2 log s_{h,i} + \beta_3 log (n_i+g+\delta) + u_i \]

with the \(g+\delta\) calibrated to 0.05.

Measuring human capital

Since the approaches to measure human capital used to be subject to debate, Mankiw-Romer-Weil (1992) proposed to restrict the human capital just on education level, and approximated savings on human capital with the proportion of the population attained at secondary schools.
By doing so, they approximate the savings on human capital by a proxy variable that represents both educational level and the returns on education.
Implicitly, they assume that people would have not spend additional years with secondary education if the returns would be low, or in other words, investments into education pay off against the alternatives.

Measuring human capital

Dataset: variable school: enrollment rate on secondary schools in the age of 12-17 × share of population of age 15-19.
Note that the recent editions of the Penn World Table datasets (starting from edition 8) contain a human capital index based on the average years of schooling from Barro and Lee (2013) and an assumed rate of return to education, based on Mincer equation estimates around the world.
For details on human capital in Penn World Tables, see https://www.rug.nl/ggdc/docs/human_capital_in_pwt_90.pdf.

2. Estimation of the extended model: Solow model first

Preliminaries

rm(list = ls(all = TRUE))
#install.packages(AER)
#install.packages(stargazer)

library(AER)
#loading library AER calls other packages needed for estimation, i.e. packages lmtest, car etc.

Loading the MRW dataset, restricting it to contain non-oil-producing countries:

data("GrowthDJ")
dj <- subset(GrowthDJ, oil == "no")

Transformations: Generate log of GDP in 1985 and logs of other variables. New variables created within the object “dj”

dj$l_gdp85 <- log(dj$gdp85)
dj$sk <- log(dj$invest/100)
dj$ngd <- log(dj$popgrowth/100+0.05)

2. Estimation of the extended model: Solow model first

Estimation of the Solow model:

#regression
mrwmodel <- lm(dj$l_gdp85 ~ dj$sk + dj$ngd)
#alternative
#mrwmodel <- lm(l_gdp85 ~ sk + sh, data = subset(GrowthDJ, d_nonoil == 1))
summary(mrwmodel)


Call:
lm(formula = dj$l_gdp85 ~ dj$sk + dj$ngd)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.79144 -0.39367  0.04124  0.43368  1.58046 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.4299     1.5839   3.428 0.000900 ***
dj$sk         1.4240     0.1431   9.951  < 2e-16 ***
dj$ngd       -1.9898     0.5634  -3.532 0.000639 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6891 on 95 degrees of freedom
Multiple R-squared:  0.6009,    Adjusted R-squared:  0.5925 
F-statistic: 71.51 on 2 and 95 DF,  p-value: < 2.2e-16

Testing linear restriction on coefficients at s and ngd

linearHypothesis(mrwmodel,"dj$sk + dj$ngd = 0")
mrwmodel2 <- lm(dj$l_gdp85 ~ I(dj$sk - dj$ngd))
summary(mrwmodel2)

2. Estimation of the extended model

Adding savings on human capital into the model

dj$sh <- log(dj$school/100)
#regression
mrwmodel_hc <- lm(dj$l_gdp85 ~ dj$sk + dj$sh + dj$ngd)
summary(mrwmodel_hc)


Call:
lm(formula = dj$l_gdp85 ~ dj$sk + dj$sh + dj$ngd)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.2875 -0.3208  0.0726  0.3321  1.0952 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.84441    1.17745   5.813 8.37e-08 ***
dj$sk        0.69671    0.13283   5.245 9.61e-07 ***
dj$sh        0.65446    0.07271   9.001 2.44e-14 ***
dj$ngd      -1.74525    0.41594  -4.196 6.16e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5077 on 94 degrees of freedom
Multiple R-squared:  0.7856,    Adjusted R-squared:  0.7788 
F-statistic: 114.8 on 3 and 94 DF,  p-value: < 2.2e-16

2. Estimation of the extended model: Specification tests


    studentized Breusch-Pagan test

data:  mrwmodel_hc
BP = 3.6847, df = 3, p-value = 0.2976

# A tibble: 1 × 5
  statistic p.value parameter method       alternative
      <dbl>   <dbl>     <dbl> <chr>        <chr>      
1      5.71   0.335         5 White's Test greater


    Durbin-Watson test

data:  mrwmodel_hc
DW = 2.1244, p-value = 0.7021
alternative hypothesis: true autocorrelation is greater than 0


    Shapiro-Wilk normality test

data:  mrwmodel_hc$residuals
W = 0.98457, p-value = 0.3091


    RESET test

data:  mrwmodel_hc
RESET = 8.653, df1 = 2, df2 = 92, p-value = 0.0003603

2. Estimation of the extended model: Linear restriction tests

linearHypothesis(mrwmodel_hc,"dj$sk + dj$sh + dj$ngd = 0")

Linear hypothesis test

Hypothesis:
dj$sk  + dj$sh  + dj$ngd = 0

Model 1: restricted model
Model 2: dj$l_gdp85 ~ dj$sk + dj$sh + dj$ngd

  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     95 24.418                           
2     94 24.226  1   0.19185 0.7444 0.3904

2. Estimation of the extended model: Linear restriction tests

# simple approach 
# mrwmodel_hc2 <- lm(dj$l_gdp85 ~ I(dj$sk + dj$sh - dj$ngd))
# does not allow calculation of parameters alpha and beta
mrwmodel_hc2 <- lm(dj$l_gdp85 ~ I(dj$sk - dj$ngd) + I(dj$sh - dj$ngd))
summary(mrwmodel_hc2)


Call:
lm(formula = dj$l_gdp85 ~ I(dj$sk - dj$ngd) + I(dj$sh - dj$ngd))

Residuals:
     Min       1Q   Median       3Q      Max 
-1.26287 -0.35150  0.06389  0.31553  1.04428 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        7.85309    0.14003  56.082  < 2e-16 ***
I(dj$sk - dj$ngd)  0.73828    0.12362   5.972 4.04e-08 ***
I(dj$sh - dj$ngd)  0.65708    0.07255   9.057 1.71e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.507 on 95 degrees of freedom
Multiple R-squared:  0.7839,    Adjusted R-squared:  0.7794 
F-statistic: 172.3 on 2 and 95 DF,  p-value: < 2.2e-16

2. Estimation of the extended model: Both models

Formatted table with all models is obtained with the help of library stargazer (note that the first line of the code needs to be {r results = "asis"} instead of {r} to produce a nicely formatted table in the html output):

library(stargazer)
stargazer(mrwmodel, mrwmodel_hc, type = "html") 
#just copy the output, save as html and open in web browser

#alternative
#model_results <- stargazer(mrwmodel, mrwmodel_hc, type = 'html') 
#capture.output(model_results, file = "MRW_results.html")

Help on stargazer syntax for example here: https://www.jakeruss.com/cheatsheets/stargazer/.

2. Estimation of the extended model: Both models


	Dependent variable:

	l_gdp85
	(1)	(2)

sk	1.424^***	0.697^***
	(0.143)	(0.133)

sh		0.654^***
		(0.073)

ngd	-1.990^***	-1.745^***
	(0.563)	(0.416)

Constant	5.430^***	6.844^***
	(1.584)	(1.177)


Observations	98	98
R²	0.601	0.786
Adjusted R²	0.592	0.779
Residual Std. Error	0.689 (df = 95)	0.508 (df = 94)
F Statistic	71.507^*** (df = 2; 95)	114.836^*** (df = 3; 94)

Note:	p<0.1; p<0.05; p<0.01

Results - comments

Improved fitness according to Solow model and, importantly, human capital expressed in terms of quality of education is significant.
The model leads to signs at coefficients that were expected and the coefficient at investment/output ratio is not that high. Even the restricted model is not rejected, although the tests for this over-identifying restriction differ across the authors.
The cross-country growth regression became since then a popular tool, perhaps because it is very close to the theory and because the estimated results in different papers are surprisingly consistent pointing to possible absence of the omitted variable bias.
It can be used also for analysis of convergence.

3. Omitted variable bias

The coefficients on savings on physical capital in Solow model with and without human capital differ. This is the consequence of the omitted variable bias.

Why does the omitted bias arise:

If any variable is not included in the model, by assumptions of the OLS it’s effect shall be hidden in residuals. But if the omitted variable is also correlated with explanatory variables (like here with initial GDP), then residuals are correlated with regressors. Hence, the assumption of OLS, that residuals are random on regressors, that is cov(u,X) = 0, is violated and the estimates become biased.

How to detect omitted variable bias?

No single test, but sensitivity analysis: include several other variables that are supposed to might matter (either in one regression with all variables or in a set of regressions with only one additional variable included) – if sign and size of the coefficient at variable-of-interest is consistent and in line with economic intuition, then the results are most probably reliable.

3. Omitted variable bias

How to prevent omitted variable bias?

Use economic theory to establish a set of alternative explanatory variables (Example: Solow model => Growth regression).
Follow existing research papers and check for the results and variables included there (the authors usually tried a lot of stuff so why not using existing knowledge).
Do sensitivity analysis – re-estimate models also with other variables.

4. Convergence

The issue of omitted variable bias appears even more striking when the convergence is investigated.

Rationale for convergence of income:

Diminishing returns in neoclassical models implies that countries with lower initial income will grow faster.
Hypothesis of the Relative backwardness advantage (A. Gerschenkron, 1952): poorer countries can take advantage of technological advances of the more developed countries
Therefore, poor and rich countries should converge in terms of income levels per capita.

4. Convergence

Is this convergence-hypothesis supported with the data?

Regress output growth over some period on a constant and initial income (Unconditional convergence):

\[ log (\frac {Y} {L})_{i,t} - log (\frac {Y} {L})_{i,t-1} = \alpha + \beta log (\frac {Y} {L})_{i,t-1} + u_i \]

dj$log_growth <- log(dj$gdp85) - log(dj$gdp60) 
dj$l_gdp60 <- log(dj$gdp60)

plot(dj$l_gdp60, dj$log_growth)
abline(lm(dj$log_growth ~ dj$l_gdp60))

conv_uncond <- lm(dj$log_growth ~ dj$l_gdp60)
summary(conv_uncond)

4. Convergence

Is this convergence-hypothesis supported with the data?


Call:
lm(formula = dj$log_growth ~ dj$l_gdp60)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.09784 -0.27467 -0.02826  0.25975  1.17747 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.26658    0.37960  -0.702   0.4842  
dj$l_gdp60   0.09431    0.04962   1.901   0.0603 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4405 on 96 degrees of freedom
Multiple R-squared:  0.03627,   Adjusted R-squared:  0.02623 
F-statistic: 3.613 on 1 and 96 DF,  p-value: 0.06033

4. Convergence

The estimates of the slope of the regression of growth on initial income are sample sensitive: years and selection of countries matter.

Recent data reveal that unconditional convergence might be present now - see the Figure from Patel, Sandefur, Subramarian: Everything You Know about Cross-Country Convergence Is (Now) Wrong

See https://www.piie.com/blogs/realtime-economic-issues-watch/everything-you-know-about-cross-country-convergence-now-wrong.

4. Convergence

Conditional convergence

Barro (1989): In neoclassical growth models with diminishing returns such as Solow (1956) and Koopmans (1965), a country’s per capita growth rate tends to be inversely related to its starting level of income per person. Therefore, in the absence of shocks, poor and rich countries would tend to converge in terms of levels of per capita income. However this convergence hypothesis seems to be inconsistent with the cross country evidence, which indicates that per capita growth rates are uncorrelated with the starting level of per capita output.

Mankiw-Romer-Weil (1992): Countries might have different steady states but if we control for the determinants of steady state (namely the saving rates), conditional convergence occurs.

4. Convergence

Formally:

\[ log (\frac {Y} {L})_{i,t} - log (\frac {Y} {L})_{i,t-1} = \alpha + \beta_0 log (\frac {Y} {L})_{i,t-1} + \beta_1 log s_{k,i} + \beta_3 log (n_i+g+\delta) + u_i \]

\[ log (\frac {Y} {L})_{i,t} - log (\frac {Y} {L})_{i,t-1} = \alpha + \beta_0 log (\frac {Y} {L})_{i,t-1} + \beta_1 log s_{k,i} + \beta_2 log s_{h,i} + \beta_3 log (n_i+g+\delta) + u_i \]

conv_cond1 <- lm(dj$log_growth ~ dj$l_gdp60 + dj$sk + dj$ngd)
summary(conv_cond1)

conv_cond2 <- lm(dj$log_growth ~ dj$l_gdp60 + dj$sk + dj$sh + dj$ngd)
summary(conv_cond2)

4. Convergence


Call:
lm(formula = dj$log_growth ~ dj$l_gdp60 + dj$sk + dj$ngd)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.07648 -0.15215  0.01185  0.19595  0.96056 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.91938    0.83367   2.302  0.02352 *  
dj$l_gdp60  -0.14090    0.05202  -2.709  0.00803 ** 
dj$sk        0.64724    0.08670   7.465 4.16e-11 ***
dj$ngd      -0.30235    0.30438  -0.993  0.32311    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3507 on 94 degrees of freedom
Multiple R-squared:  0.4019,    Adjusted R-squared:  0.3828 
F-statistic: 21.05 on 3 and 94 DF,  p-value: 1.622e-10


Call:
lm(formula = dj$log_growth ~ dj$l_gdp60 + dj$sk + dj$sh + dj$ngd)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.91041 -0.17599  0.01789  0.18439  0.93846 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.02152    0.82748   3.651 0.000431 ***
dj$l_gdp60  -0.28837    0.06158  -4.683 9.62e-06 ***
dj$sk        0.52374    0.08687   6.029 3.30e-08 ***
dj$sh        0.23112    0.05946   3.887 0.000190 ***
dj$ngd      -0.50566    0.28861  -1.752 0.083061 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.327 on 93 degrees of freedom
Multiple R-squared:  0.4855,    Adjusted R-squared:  0.4633 
F-statistic: 21.94 on 4 and 93 DF,  p-value: 8.987e-13

Both results seem to be consistent with the convergence hypothesis.

4. Convergence


	Dependent variable:

	log_growth
	(1)	(2)	(3)

l_gdp60	0.094^*	-0.141^***	-0.288^***
	(0.050)	(0.052)	(0.062)

sk		0.647^***	0.524^***
		(0.087)	(0.087)

sh			0.231^***
			(0.059)

ngd		-0.302	-0.506^*
		(0.304)	(0.289)

Constant	-0.267	1.919^**	3.022^***
	(0.380)	(0.834)	(0.827)


Observations	98	98	98
R²	0.036	0.402	0.485
Adjusted R²	0.026	0.383	0.463
Residual Std. Error	0.440 (df = 96)	0.351 (df = 94)	0.327 (df = 93)
F Statistic	3.613^* (df = 1; 96)	21.052^*** (df = 3; 94)	21.936^*** (df = 4; 93)

Note:	p<0.1; p<0.05; p<0.01

Wrapping up the main results

The cross-country growth regression was introduced - both in levels and growth. It is the foundation of the empirical growth literature that provides a framework for studies also on growth determinants (Durlauf, Johnson and Temple in Handbook of Economic Growth, 2005, ch. 8, Growth Econometrics).
Their version

\[ log (\frac {Y} {L})_{i,1} - log (\frac {Y} {L})_{i,0} = \alpha + \beta_0 log (\frac {Y} {L})_{i,0} + \beta_1 X_{i} + \beta_2 log Z_i + u_i \]

considers the Solow variables \(X_i\), the initial income \(y_{i,0}\) and all other exogenous variables are collected in \(Z_i\).

The presented framework can be easily extended for analysis with panel data, time series data and also it can account for possible endogeneity using IV variables.
Follow up seminars: what else do we know about economic growth.