Important: This file is written so you can knit to PDF (for Question 2) and also to HTML.
If PDF knitting fails, it is usually because LaTeX is not installed. Install TinyTeX in R:install.packages("tinytex") tinytex::install_tinytex()
NIPA115 from the BEA API# install.packages("bea.R") # this is commented out because I have already installed this package
library(bea.R)
# save your private key
beaKey <- "3EF221CE-1553-48B1-8F71-97D335D3C9F8"
# Set up a list with the specifications to retrieve the appropriate table
beaSpecs <- list(
'UserID' = beaKey ,
'Method' = 'GetData',
'datasetname' = 'NIPA',
'TableName' = 'T10105',
'Frequency' = 'Q',
'Year' = 'ALL',
'ResultFormat' = 'json'
)
# Retrieve data and save it
NIPA115_raw <- beaGet(beaSpecs)
# Remove all info except quarterly values
# You need to inspect the data to know which columns to eliminate)
NIPA115_clean <- NIPA115_raw[, -c(1:7)]
# Transpose columns.
#Here you will need the data.frame() function
NIPA115_clean<- data.frame(t(NIPA115_clean))
# Rename columns.
#Select the series you want by row number (these are vectors).
gdp<-NIPA115_clean[,1]
cons<-NIPA115_clean[,2]
goods<-NIPA115_clean[,3]
dur<-NIPA115_clean[,4]
ndur<-NIPA115_clean[,5]
serv<-NIPA115_clean[,6]
inv<-NIPA115_clean[,7]
fixinv<-NIPA115_clean[,8]
nres<-NIPA115_clean[,9]
struct<-NIPA115_clean[,10]
equip<-NIPA115_clean[,11]
ip<-NIPA115_clean[,12]
res<-NIPA115_clean[,13]
deltainv<-NIPA115_clean[,14]
nx<-NIPA115_clean[,15]
exp<-NIPA115_clean[,16]
expgoods<-NIPA115_clean[,17]
expserv<-NIPA115_clean[,18]
imp<-NIPA115_clean[,19]
impgoods<-NIPA115_clean[,20]
impserv<-NIPA115_clean[,21]
gov<-NIPA115_clean[,22]
fed<-NIPA115_clean[,23]
def<-NIPA115_clean[,24]
ndef<-NIPA115_clean[,25]
stateloc<-NIPA115_clean[,26]
# Find the total number of quarters in our data
t_q<-nrow(NIPA115_clean)
# Install the zoo package if you have not done so
# install.packages("zoo")
# Load the zoo package
library(zoo)
# Define your starting quarter
s_q <- "1947 Q1"
# Create sequence of quarters from first (s_q) with length t_q
quarters <- as.yearqtr(s_q) + seq(0, length = t_q) / 4
# Note that format of quarters is "yearqtr", a class used to represent quarterly data
# we need to transform this into a data frame with function fortify.zoo()
dfquarters<-fortify.zoo(quarters)
# remove the first column which is just and index, and keep the quarters
quarters<-dfquarters[-c(1)]
# Create a data frame with the above vectors
NIPA115<-data.frame(quarters,gdp,cons,goods,dur,ndur,serv,inv,fixinv,nres,struct,equip,ip,
res,deltainv,nx,exp,expgoods,expserv,imp,impgoods,impserv,gov,fed,def,ndef,stateloc)
# Quick check
head(NIPA115)
tail(NIPA115)
This section keeps all steps, equations, and code (no shortening).
We are given the Cobb–Douglas production function: \[ Y_t = A K_t^{\alpha}(Z_t N_t)^{1-\alpha}, \] where \(A > 0\) and \(0 < \alpha < 1\). Technology and labor grow exogenously: \[ Z_{t+1} = (1+z) Z_t, \qquad N_{t+1} = (1+n) N_t. \]
Define output and capital per efficiency unit: \[ \hat{y}_t = \frac{Y_t}{Z_t N_t}, \qquad \hat{k}_t = \frac{K_t}{Z_t N_t}. \]
Divide the production function by \(Z_t N_t\): \[ \hat{y}_t = \frac{A K_t^\alpha (Z_t N_t)^{1-\alpha}}{Z_t N_t}. \]
Use exponent rules: \[ \frac{(Z_t N_t)^{1-\alpha}}{(Z_t N_t)^1} = (Z_t N_t)^{-\alpha}. \]
So, \[ \hat{y}_t = A K_t^\alpha (Z_t N_t)^{-\alpha} = A\left(\frac{K_t}{Z_t N_t}\right)^\alpha = A \hat{k}_t^\alpha. \]
Therefore, \[ \boxed{\hat{y}_t = A \hat{k}_t^\alpha.} \]
Aggregate capital evolves as: \[ K_{t+1} = s A K_t^\alpha (Z_t N_t)^{1-\alpha} + (1-\delta)K_t. \]
Divide both sides by \(Z_{t+1}N_{t+1}\): \[ \hat{k}_{t+1} = \frac{K_{t+1}}{Z_{t+1}N_{t+1}}. \]
Since \(Z_{t+1}=(1+z)Z_t\) and \(N_{t+1}=(1+n)N_t\), we have: \[ Z_{t+1}N_{t+1} = (1+z)(1+n)Z_t N_t. \]
Therefore, \[ \hat{k}_{t+1} = \frac{sA K_t^\alpha (Z_tN_t)^{1-\alpha} + (1-\delta)K_t}{(1+z)(1+n)Z_tN_t}. \]
Simplify:
First term: \[ \frac{K_t^\alpha (Z_tN_t)^{1-\alpha}}{Z_tN_t} = \left(\frac{K_t}{Z_tN_t}\right)^\alpha = \hat{k}_t^\alpha. \]
Second term: \[ \frac{K_t}{Z_tN_t} = \hat{k}_t. \]
So the law of motion becomes: \[ \boxed{ \hat{k}_{t+1}= \frac{sA\hat{k}_t^\alpha + (1-\delta)\hat{k}_t}{(1+z)(1+n)}. } \]
At steady state, \(\hat{k}_{t+1}=\hat{k}_t=\hat{k}^*\). Substitute: \[ \hat{k}^*= \frac{sA(\hat{k}^*)^\alpha + (1-\delta)\hat{k}^*}{(1+z)(1+n)}. \]
Multiply both sides by \((1+z)(1+n)\): \[ (1+z)(1+n)\hat{k}^* = sA(\hat{k}^*)^\alpha + (1-\delta)\hat{k}^*. \]
Bring the \((1-\delta)\hat{k}^*\) term to the left: \[ \left((1+z)(1+n)-(1-\delta)\right)\hat{k}^* = sA(\hat{k}^*)^\alpha. \]
Define: \[ g \equiv (1+z)(1+n)-(1-\delta). \]
Then: \[ g\hat{k}^* = sA(\hat{k}^*)^\alpha \quad\Rightarrow\quad g(\hat{k}^*)^{1-\alpha} = sA. \]
So: \[ \boxed{ \hat{k}^* = \left(\frac{sA}{g}\right)^{\frac{1}{1-\alpha}}. } \]
Using \(\hat{y}^* = A(\hat{k}^*)^\alpha\): \[ \boxed{\hat{y}^* = A(\hat{k}^*)^\alpha.} \]
In Solow, \(C_t = (1-s)Y_t\), so in efficiency units: \[ \boxed{\hat{c}^* = (1-s)\hat{y}^* = (1-s)A(\hat{k}^*)^\alpha.} \]
In steady state, investment required to keep \(\hat{k}\) constant equals \(g\hat{k}\). Hence steady-state consumption as a function of \(\hat{k}\) is: \[ \hat{c}(\hat{k}) = A\hat{k}^\alpha - g\hat{k}. \]
Differentiate and set equal to zero: \[ \frac{d\hat{c}}{d\hat{k}} = A\alpha \hat{k}^{\alpha-1} - g = 0 \quad\Rightarrow\quad A\alpha \hat{k}^{\alpha-1} = g. \]
Solve: \[ \hat{k}^{1-\alpha} = \frac{A\alpha}{g} \quad\Rightarrow\quad \boxed{ \hat{k}_{GR} = \left(\frac{\alpha A}{g}\right)^{\frac{1}{1-\alpha}}. } \]
Golden-rule saving rate: \[ \boxed{s_{GR}=\alpha.} \]
Parameter values: \[ \alpha=0.3,\; A=10,\; \delta=0.07,\; s=0.2,\; z=0.02,\; n=0.01,\; \hat{k}_0=40. \]
alpha <- 0.3
A <- 10
delta <- 0.07
s <- 0.2
z <- 0.02
n <- 0.01
g <- (1+z)*(1+n) - (1-delta)
T <- 100
khat <- numeric(T+1)
yhat <- numeric(T+1)
chat <- numeric(T+1)
khat[1] <- 40
for (t in 1:T) {
yhat[t] <- A * khat[t]^alpha
chat[t] <- (1 - s) * yhat[t]
khat[t+1] <- (s*A*khat[t]^alpha + (1-delta)*khat[t]) / ((1+z)*(1+n))
}
# last values
yhat[T+1] <- A * khat[T+1]^alpha
chat[T+1] <- (1 - s) * yhat[T+1]
tt <- 0:T
plot(tt, khat, type="l", xlab="t", ylab="Efficiency units",
main="Convergence paths: khat, yhat, chat")
lines(tt, yhat)
lines(tt, chat)
s_vec <- seq(0.05, 0.65, by=0.01)
k_star_s <- ((s_vec*A)/g)^(1/(1-alpha))
y_star_s <- A * k_star_s^alpha
c_star_s <- (1 - s_vec) * y_star_s
plot(s_vec, c_star_s, type="l", xlab="Saving rate s",
ylab="Steady-state consumption c*",
main="Steady-state consumption vs saving rate")
s_hat_GR <- s_vec[which.max(c_star_s)]
s_hat_GR
## [1] 0.3
The maximizing saving rate from the grid should be close to \(\alpha=0.3\), which matches the analytical result \(s_{GR}=\alpha\).
We simulate \(t=0,\dots,200\) where \(s=0.2\) up to \(t=100\), and \(s=\alpha\) for \(t \ge 101\).
T2 <- 200
sGR <- alpha
khat2 <- numeric(T2+1)
yhat2 <- numeric(T2+1)
chat2 <- numeric(T2+1)
khat2[1] <- 40
for (t in 1:T2) {
s_now <- ifelse(t <= 100, s, sGR)
yhat2[t] <- A * khat2[t]^alpha
chat2[t] <- (1 - s_now) * yhat2[t]
khat2[t+1] <- (s_now*A*khat2[t]^alpha + (1-delta)*khat2[t]) /
((1+z)*(1+n))
}
yhat2[T2+1] <- A * khat2[T2+1]^alpha
chat2[T2+1] <- (1 - sGR) * yhat2[T2+1]
tt2 <- 0:T2
plot(tt2, khat2, type="l", xlab="t", ylab="Efficiency units",
main="Paths with saving-rate change at t=101")
lines(tt2, yhat2)
lines(tt2, chat2)
abline(v=100, lty=2)
Growth rate of \(\hat{y}_t\): \[ \text{growth of }\hat{y}_t = \frac{\hat{y}_{t+1}}{\hat{y}_t}-1. \]
g_yhat <- (yhat2[2:(T2+1)] / yhat2[1:T2]) - 1
plot(0:(T2-1), g_yhat, type="l", xlab="t", ylab="Growth rate",
main="Growth rate of yhat: yhat[t+1]/yhat[t] - 1")
abline(v=100, lty=2)
Output per worker is: \[ y_t \equiv \frac{Y_t}{N_t}. \]
Since \(\hat{y}_t=\frac{Y_t}{Z_tN_t}\), we have: \[ \frac{Y_t}{N_t} = \hat{y}_t Z_t \quad \Rightarrow \quad y_t=\hat{y}_t Z_t. \]
If \(Z_0=1\), then \(Z_t=(1+z)^t\). Hence: \[ y_t = \hat{y}_t(1+z)^t. \]
The growth rate of output per worker is: \[ \frac{y_{t+1}}{y_t}-1 = \frac{\hat{y}_{t+1}(1+z)^{t+1}}{\hat{y}_t(1+z)^t}-1 = (1+z)\frac{\hat{y}_{t+1}}{\hat{y}_t}-1. \]
As the economy converges to a steady state in efficiency units, \(\hat{y}_t\) converges to a constant, so: \[ \frac{\hat{y}_{t+1}}{\hat{y}_t} \to 1. \]
Therefore, the long-run growth rate becomes: \[ \frac{y_{t+1}}{y_t}-1 \to (1+z)\cdot 1 - 1 = z. \]
Conclusion: A one-time increase in saving cannot keep growth above \(z\) forever.
# install.packages(c("pwt10","dplyr","ggplot2"))
library(pwt10)
library(dplyr)
library(ggplot2)
data("pwt10.0")
pwt <- pwt10.0
# GDP per worker (as question defines it)
pwt <- pwt %>%
mutate(y_pw = rgdpe / emp) %>% # real GDP per worker
filter(!is.na(isocode), !is.na(year))
As asked in the question 1 to only keep the country that have data available from 1960 to 2019:
# keep countries with y_pw > 0 in BOTH 1960 and 2019
base60 <- pwt %>% filter(year == 1960, is.finite(y_pw), y_pw > 0) %>%
select(isocode, y1960 = y_pw)
base19 <- pwt %>% filter(year == 2019, is.finite(y_pw), y_pw > 0) %>%
select(isocode, y2019 = y_pw)
# combine and compute average annual growth rate (1960–2019)
df_conv <- inner_join(base60, base19, by = "isocode") %>%
mutate(g = (y2019 / y1960)^(1/59) - 1) # average annual growth rate
Run the linear regression:
m1 <- lm(g ~ y1960, data = df_conv)
summary(m1)
##
## Call:
## lm(formula = g ~ y1960, data = df_conv)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.079163 -0.006603 0.000965 0.008222 0.033567
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.412e-02 2.569e-03 9.389 5.88e-15 ***
## y1960 -2.569e-07 1.406e-07 -1.828 0.0709 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01562 on 89 degrees of freedom
## Multiple R-squared: 0.03619, Adjusted R-squared: 0.02536
## F-statistic: 3.341 on 1 and 89 DF, p-value: 0.07091
This is the regression output (as you provided):
Residuals:
Min 1Q Median 3Q Max
-0.079163 -0.006603 0.000965 0.008222 0.033567
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.412e-02 2.569e-03 9.389 5.88e-15 ***
y1960 -2.569e-07 1.406e-07 -1.828 0.0709 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.01562 on 89 degrees of freedom
Multiple R-squared: 0.03619, Adjusted R-squared: 0.02536
F-statistic: 3.341 on 1 and 89 DF, p-value: 0.07091
The regression estimates the relationship on initial income. Your estimated slope on y1960 is negative, meaning that—on average—countries with lower GDP per worker in 1960 tended to have higher average growth from 1960 to 2019. This negative sign is the direction predicted by unconditional convergence (poorer countries “catch up”). However, the p-value for y1960 is 0.0709, so the coefficient is not statistically significant at the 5% level (it is only marginally significant at the 10% level). That means we cannot confidently reject the null hypothesis using the usual 5% standard. Also, the R-squared is about 0.036, which is quite small, indicating that initial income explains only a small portion of cross-country differences in long-run growth—many other factors (institutions, policies, shocks, geography, etc.) are driving growth outcomes.
Draw the scatter plot:
ggplot(df_conv, aes(x = y1960, y = g)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Unconditional convergence (1960–2019)",
x = "Real GDP per worker in 1960 (y1960)",
y = "Average annual growth rate, 1960–2019")
The scatter plot shows initial GDP per worker in 1960 on the x-axis and the average annual growth rate (1960–2019) on the y-axis. The fitted regression line slopes downward, which visually matches the negative coefficient from the regression and suggests a mild catch-up pattern: richer countries in 1960 tend to have slightly lower long-run growth rates. But the points are widely dispersed around the line (many countries with similar initial income grew at very different rates), which visually supports the low R-squared and the idea that the relationship is weak. The confidence band is also fairly wide, reflecting uncertainty about the true slope. Overall, the plot suggests some tendency toward convergence, but it is not strong or cleanly separated in the data.
No—not at the standard 5% significance level. The estimated slope is negative (consistent with convergence), but the p-value (~0.071) is above 0.05, so the evidence is not conclusive by typical standards. At best, you could say there is weak/marginal evidence at the 10% level, and the scatter shows lots of variation, meaning unconditional convergence alone does not explain much of the growth differences across countries.
oecd20 <- c("AUT","BEL","CAN","CHE","DEU","DNK","ESP","FRA","GBR","GRC",
"ISL","IRL","ITA","LUX","NLD","NOR","PRT","SWE","TUR","USA")
df_oecd <- df_conv %>% filter(isocode %in% oecd20)
m2 <- lm(g ~ y1960, data = df_oecd)
summary(m2)
##
## Call:
## lm(formula = g ~ y1960, data = df_oecd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0040904 -0.0016989 -0.0009281 0.0004716 0.0142507
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.759e-02 2.397e-03 15.681 6.12e-12 ***
## y1960 -4.651e-07 8.126e-08 -5.723 2.00e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.004008 on 18 degrees of freedom
## Multiple R-squared: 0.6454, Adjusted R-squared: 0.6257
## F-statistic: 32.76 on 1 and 18 DF, p-value: 1.996e-05
This is the regression result (as you provided):
Residuals:
Min 1Q Median 3Q Max
-0.0040904 -0.0016989 -0.0009281 0.0004716 0.0142507
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.759e-02 2.397e-03 15.681 6.12e-12 ***
y1960 -4.651e-07 8.126e-08 -5.723 2.00e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.004008 on 18 degrees of freedom
Multiple R-squared: 0.6454, Adjusted R-squared: 0.6257
F-statistic: 32.76 on 1 and 18 DF, p-value: 1.996e-05
The regression estimates the relationship between the average annual growth rate of GDP per worker from 1960 to 2019 and the initial level of GDP per worker in 1960 for OECD countries only. The estimated coefficient on initial income (y1960) is negative and highly statistically significant, with a p-value of 2.00 × 10⁻⁵, which is far below the 1% significance level. This provides strong statistical evidence of convergence among OECD countries. The negative coefficient means that OECD countries with lower initial GDP per worker in 1960 experienced higher growth rates over the period 1960–2019, while richer OECD countries grew more slowly. The R-squared value is 0.645, which is relatively high compared to the full sample in Part 1, indicating that initial income explains a large portion (about 65%) of the variation in growth rates across OECD countries. This suggests that convergence is much stronger and more systematic within OECD countries than across all countries.
Make the scatter plot:
ggplot(df_oecd, aes(x = y1960, y = g)) +
geom_point(size = 2) +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "OECD convergence (1960–2019)",
x = "Real GDP per worker in 1960",
y = "Average annual growth rate")
The scatter plot shows a clear downward-sloping relationship between initial GDP per worker in 1960 and subsequent growth rates for OECD countries. Countries with lower initial income, such as Turkey, Portugal, and Greece, tend to have higher growth rates, while richer countries such as the United States, Switzerland, and Luxembourg show lower growth rates. The regression line slopes downward sharply, and the points are closely clustered around the line compared to Part 1. The relatively narrow confidence band also indicates that the relationship is estimated with high precision. This visual evidence strongly supports the regression result and confirms the presence of convergence among OECD countries.
Yes, there is strong and statistically conclusive evidence of convergence among OECD countries. The regression coefficient on initial income is negative and highly statistically significant at the 1% level, and the scatter plot shows a clear downward trend. Unlike the full sample in Part 1, where convergence evidence was weak, OECD countries show strong convergence. This occurs because OECD countries share similar institutional quality, education levels, and economic structures, which allows poorer OECD countries to catch up more easily with richer ones.
# Use the countries that have data for y_pw in 1960 (as the question says)
countries1960 <- base60$isocode
disp <- pwt %>%
filter(isocode %in% countries1960,
year >= 1960, year <= 2019,
is.finite(y_pw), y_pw > 0) %>%
group_by(year) %>%
summarise(
sd_logy = sd(log(y_pw), na.rm = TRUE),
# ratio of average y of richest 10% to poorest 10%
ratio_90_10 = {
y <- sort(y_pw)
n <- length(y)
k <- max(1, floor(0.10 * n))
mean(tail(y, k)) / mean(head(y, k))
},
.groups = "drop"
)
head(disp)
You can plot both series (one after another):
plot(disp$year, disp$sd_logy, type="l",
xlab="Year", ylab="SD of log(GDP per worker)",
main="Income dispersion over time: SD of log y")
plot(disp$year, disp$ratio_90_10, type="l",
xlab="Year", ylab="Richest 10% / Poorest 10% (mean ratio)",
main="Income dispersion over time: 90/10 ratio")
(i) Standard deviation of log(GDP per worker) — interpretation
The solid line in your figure plots the standard deviation of log GDP per worker across countries each year (using the same set of countries that have data in 1960). This measure captures overall dispersion in incomes across the full distribution. From the graph, the standard deviation rises steadily from the 1960s through the late 1990s/around 2000, meaning cross-country incomes became more spread out over time. After roughly 2000, the standard deviation falls somewhat, but it does not return to the low levels of the 1960s. So the main message is: global income differences widened strongly for several decades, and then narrowed a bit in the 2000s/2010s, but remained relatively high overall.
(ii) Richest 10% / Poorest 10% ratio — interpretation
The dashed line represents another dispersion measure: the ratio of the average GDP per worker of the richest 10% of countries to the average GDP per worker of the poorest 10% (plotted on a second scale). This measure focuses on the gap between the top and bottom of the world income distribution. Your graph shows this ratio increasing strongly from around ~20 in 1960 to above ~40 by the late 1990s/around 2000. That means the richest group of countries produced roughly 40 times the GDP per worker of the poorest group at the peak. After 2000, the ratio declines, suggesting some catch-up by poorer countries (or slower growth in the richest group), but it remains very large and rises again toward the end of the sample. Overall, the “top vs bottom” gap becomes much larger than in 1960 and stays high.
(iii) Conclusion: what do we learn about cross-country income inequality?
Putting both measures together, the evidence suggests that cross-country income inequality increased substantially between 1960 and about the late 1990s/early 2000s. The world did not show broad, smooth convergence over the whole period—if anything, the gap widened for decades. There is some partial reduction in dispersion after 2000 (both the standard deviation and the rich/poor ratio drop), which is consistent with some catch-up among poorer countries during the 2000s and 2010s. However, inequality remains much higher than in 1960, so the overall conclusion is: income gaps across countries widened a lot historically, and only partially narrowed more recently—global convergence is at best limited and uneven.
You compute total factor productivity (TFP) using the production-function formula: \[ A_{it}=\frac{Y_{it}}{K_{it}^{\alpha}N_{it}^{1-\alpha}}, \] where \(Y_{it}\) (output), \(K_{it}\) (capital stock), and effective labor input is: \[ N_{it}=emp\times avh \times hc. \]
TFP is basically the part of output that cannot be explained just by how much capital and labor a country uses. If two countries have similar capital and labor input, the one with higher \(A_{it}\) is producing more output “per unit of inputs,” which we interpret as higher technology/efficiency/institutions/organization (all the productivity factors bundled into \(A\)).
alpha <- 1/3
pwt2 <- pwt %>%
mutate(
Y = cgdpe,
K = cn,
N = emp * avh * hc,
A = Y / (K^alpha * N^(1 - alpha))
) %>%
filter(is.finite(A), A > 0,
is.finite(y_pw), y_pw > 0)
# (ii) averages 2000–2019 for each country: Abar and ybar
avg_0019 <- pwt2 %>%
filter(year >= 2000, year <= 2019) %>%
group_by(isocode) %>%
summarise(
Abar = mean(A, na.rm = TRUE),
ybar = mean(y_pw, na.rm = TRUE),
.groups = "drop"
) %>%
filter(is.finite(Abar), is.finite(ybar), Abar > 0, ybar > 0)
m_tfp <- lm(ybar ~ Abar, data = avg_0019)
summary(m_tfp)
##
## Call:
## lm(formula = ybar ~ Abar, data = avg_0019)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27843.6 -6696.1 60.2 5663.1 28662.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -24818 3501 -7.089 1.05e-09 ***
## Abar 28820 1118 25.780 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10390 on 67 degrees of freedom
## Multiple R-squared: 0.9084, Adjusted R-squared: 0.9071
## F-statistic: 664.6 on 1 and 67 DF, p-value: < 2.2e-16
This is the estimate results (as you provided):
Residuals:
Min 1Q Median 3Q Max
-27843.6 -6696.1 60.2 5663.1 28662.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -24818 3501 -7.089 1.05e-09 ***
Abar 28820 1118 25.780 < 2e-16 ***
---
Residual standard error: 10390 on 67 degrees of freedom
Multiple R-squared: 0.9084, Adjusted R-squared: 0.9071
F-statistic: 664.6 on 1 and 67 DF, p-value: < 2.2e-16
You then average TFP and GDP per worker over 2000–2019 for each country and estimate: \[ \bar{y}_i=\beta_0+\beta_1\bar{A}_i+u_i. \] Your regression shows a strong positive relationship: the coefficient on Abar is about 28,820 and it is extremely statistically significant (t ≈ 25.78, p < 2e−16). This means countries with higher average TFP tend to have much higher average GDP per worker. The R-squared is about 0.908, which is very high, implying that differences in average TFP explain about 91% of the cross-country variation in average GDP per worker in your sample (2000–2019). In short: the regression says productivity differences are strongly connected to income differences.
ggplot(avg_0019, aes(x = Abar, y = ybar)) +
geom_point(alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE) +
labs(
title = "Average GDP per worker vs Average TFP (2000–2019)",
x = "Average TFP (Abar)",
y = "Average GDP per worker (ybar)"
)
The scatter plot confirms the regression result visually. The points lie close to an upward-sloping fitted line, and the confidence band around the line is relatively narrow. That means the positive relationship is not driven by just a few extreme countries; it looks systematic across the sample. You still see some scatter (countries with similar TFP can have somewhat different income levels), but overall the pattern is very tight, which matches the high R-squared. So the plot and regression are consistent: higher TFP is strongly associated with higher GDP per worker.
Yes — based on your results, there is a tight statistical relationship between average TFP and average income. There are two clear reasons: