Important: This file is written so you can knit to PDF (for Question 2) and also to HTML.
If PDF knitting fails, it is usually because LaTeX is not installed. Install TinyTeX in R:

install.packages("tinytex")
tinytex::install_tinytex()

1 Question 1 — BEA NIPA (T10105 / NIPA115) + Plots (Q2–Q5)

1.1 1A. Download and construct NIPA115 from the BEA API

# install.packages("bea.R") # this is commented out because I have already installed this package 
library(bea.R)

# save your private key
beaKey <- "3EF221CE-1553-48B1-8F71-97D335D3C9F8"

# Set up a list with the specifications to retrieve the appropriate table 
beaSpecs <- list(
  'UserID' = beaKey ,
  'Method' = 'GetData',
  'datasetname' = 'NIPA',
  'TableName' = 'T10105',
  'Frequency' = 'Q',
  'Year' = 'ALL',
  'ResultFormat' = 'json'
)

# Retrieve data and save it
NIPA115_raw <- beaGet(beaSpecs)

# Remove all info except quarterly values
# You need to inspect the data to know which columns to eliminate) 
NIPA115_clean <- NIPA115_raw[, -c(1:7)] 

# Transpose columns. 
#Here you will need the data.frame() function 
NIPA115_clean<- data.frame(t(NIPA115_clean))

# Rename columns. 
#Select the series you want by row number (these are vectors).
gdp<-NIPA115_clean[,1]
cons<-NIPA115_clean[,2]
goods<-NIPA115_clean[,3]
dur<-NIPA115_clean[,4]
ndur<-NIPA115_clean[,5]
serv<-NIPA115_clean[,6]
inv<-NIPA115_clean[,7]
fixinv<-NIPA115_clean[,8]
nres<-NIPA115_clean[,9]
struct<-NIPA115_clean[,10]
equip<-NIPA115_clean[,11]
ip<-NIPA115_clean[,12]
res<-NIPA115_clean[,13]
deltainv<-NIPA115_clean[,14]
nx<-NIPA115_clean[,15]
exp<-NIPA115_clean[,16]
expgoods<-NIPA115_clean[,17]
expserv<-NIPA115_clean[,18]
imp<-NIPA115_clean[,19]
impgoods<-NIPA115_clean[,20]
impserv<-NIPA115_clean[,21]
gov<-NIPA115_clean[,22]
fed<-NIPA115_clean[,23]
def<-NIPA115_clean[,24]
ndef<-NIPA115_clean[,25]
stateloc<-NIPA115_clean[,26]

# Find the total number of quarters in our data
t_q<-nrow(NIPA115_clean)

# Install the zoo package if you have not done so
# install.packages("zoo") 
# Load the zoo package
library(zoo)

# Define your starting quarter
s_q <- "1947 Q1"

# Create sequence of quarters from first (s_q) with length t_q
quarters <- as.yearqtr(s_q) + seq(0, length = t_q) / 4

# Note that format of quarters is "yearqtr", a class used to represent quarterly data 
# we need to transform this into a data frame with function fortify.zoo()
dfquarters<-fortify.zoo(quarters)
# remove the first column which is just and index, and keep the quarters
quarters<-dfquarters[-c(1)]

# Create a data frame with the above vectors
NIPA115<-data.frame(quarters,gdp,cons,goods,dur,ndur,serv,inv,fixinv,nres,struct,equip,ip,
                    res,deltainv,nx,exp,expgoods,expserv,imp,impgoods,impserv,gov,fed,def,ndef,stateloc)

# Quick check
head(NIPA115)
tail(NIPA115)

1.2 1B. PROBLEM 1 — Q2: Shares of GDP (C, I, G, NX)

library(areaplot)

df1 <- data.frame(
  quarters = NIPA115$quarters,
  C  = NIPA115$cons / NIPA115$gdp,
  I  = NIPA115$inv  / NIPA115$gdp,
  G  = NIPA115$gov  / NIPA115$gdp,
  NX = NIPA115$nx   / NIPA115$gdp
)

areaplot(df1$quarters, df1[,2:5],
         main="Shares of GDP: C, I, G, NX", xlab="Quarters",
         col=c("steelblue","gold","seagreen3","tomato"),
         legend=TRUE, border="white")

1.2.1 Interpretation (keep full)

First, I filtered the NIPA115 dataset and kept only the required series: GDP (line 1), Consumption (line 2), Investment (line 7), Government Spending (line 22), and Net Exports (line 15). After that, I calculated the share of each component by dividing each variable by GDP for every quarter. Since GDP = C + I + G + NX, the shares sum to 1 (or 100%) at each point in time.

I then plotted these shares using a stacked area chart. This type of graph is appropriate because it clearly shows how much each component contributes to total GDP and how the composition changes over time. From the graph, consumption is the largest and most stable component. It stays around 60–70% of GDP throughout the sample. Investment is smaller and more volatile, usually around 15–20%, and it drops sharply during recessions such as the early 1980s, the 2008 financial crisis, and COVID-19. Government spending is relatively stable at about 15–18% of GDP, with small increases during crisis periods. Net exports are the smallest component, usually between −2% and 0%, meaning the U.S. often runs a trade deficit.

If your instructor requires the same screenshot: put it in figures/fig_gdp_shares.png and uncomment:


1.3 1C. PROBLEM 1 — Q3: Shares of Consumption (Durables, Non-durables, Services)

df2 <- data.frame(
  quarters = NIPA115$quarters,
  Durables     = NIPA115$dur  / NIPA115$cons,
  `Non-durables` = NIPA115$ndur / NIPA115$cons,
  Services     = NIPA115$serv / NIPA115$cons
)

areaplot(df2$quarters, df2[,2:4],
         main="Shares of Consumption", xlab="Quarters",
         col=c("steelblue","gold","seagreen3"),
         legend=TRUE, border="white")

1.3.1 Interpretation (keep full)

First, I filtered the NIPA115 dataset and kept only consumption, durable goods, non-durable goods, and services. Then I calculated the share of each category by dividing it by total consumption for every quarter. Because total consumption equals durables + non-durables + services, the three shares add up to 1 in each period. The stacked area chart shows this clearly since the total height is always 100%.

From the figure, services clearly become the dominant component over time. In the early 1950s, services made up about 38–40% of consumption, but by 2020 they rise to around 65–70%. Non-durables fall from roughly 45–50% to about 20–25%, while durables stay relatively small and stable at around 10–15%, with more short-run fluctuations during recessions. The historical rise in services can be explained by simple economic arguments. As income increases, households spend proportionally less on basic goods like food and clothing and more on services such as healthcare, education, travel, housing services, and entertainment. This follows Engel’s Law, where the share of necessities declines as income grows. In addition, many services are harder to automate and become relatively more expensive over time, which increases their spending share. Population aging also raises demand for healthcare and personal services. Together, these factors shift consumption away from goods and toward services.


1.4 1D. PROBLEM 1 — Q4: Shares of Private Investment

df3 <- data.frame(
  quarters = NIPA115$quarters,
  Structures   = NIPA115$struct   / NIPA115$inv,
  Equipment    = NIPA115$equip    / NIPA115$inv,
  `IP Products`  = NIPA115$ip       / NIPA115$inv,
  Residential  = NIPA115$res      / NIPA115$inv,
  Inventories  = NIPA115$deltainv / NIPA115$inv
)

areaplot(df3$quarters, df3[,2:6],
         main="Shares of Private Investment", xlab="Quarters",
         col=c("steelblue","gold","seagreen3","tomato","purple"),
         legend=TRUE, border="white")

1.4.1 Interpretation (keep full)

First, I filtered the NIPA115 data and kept private investment and its components: structures, equipment, IP products, residential investment, and inventories. Then I calculated the share of each part by dividing it by total private investment for every quarter. Since private investment is the sum of these components, all shares add up to 1. The stacked area chart shows this because the total height is always 100%.

From the graph, equipment and residential investment take a large share for most of the period. Structures slowly decline over time. Inventories are very small and move up and down a lot, and sometimes look negative, which is normal because inventory changes can be negative. The main trend is that IP products increase a lot over time. In the early years, IP is only around 5–10%, but in recent years it rises to roughly 25–30% or more of total investment. A simple reason for this rise is that the economy has become more technology and knowledge based. Firms now spend more on software, research, patents, and data systems instead of just buildings and machines. So investment shifts from physical capital to intangible capital, which increases the share of IP.

If your instructor requires the same screenshot: put it in figures/fig_private_investment.png and uncomment:


1.5 1E. PROBLEM 1 — Q5: Shares of Government Spending

df4 <- data.frame(
  quarters = NIPA115$quarters,
  `Fed Defense`     = NIPA115$def      / NIPA115$gov,
  `Fed Nondefense`  = NIPA115$ndef     / NIPA115$gov,
  `State & Local`   = NIPA115$stateloc / NIPA115$gov
)

areaplot(df4$quarters, df4[,2:4],
         main="Shares of Government Spending", xlab="Quarters",
         col=c("steelblue","gold","seagreen3"),
         legend=TRUE, border="white")

1.5.1 Interpretation (keep full)

First, I filtered the NIPA115 data and kept only the series the question asked for: Government Spending (line 22), Federal Defense (line 24), Federal Non-defense (line 25), and State & Local (line 26). Then I computed the shares by dividing each part by total government spending each quarter:

• Defense share = Fed Defense / Government Spending
• Non-defense share = Fed Non-defense / Government Spending
• State & Local share = State & Local / Government Spending

Because total government spending is made up of these three parts, the shares should add up to 1 (100%) each period. The stacked area chart shows that since the total height stays at 1 the whole time.

From the graph, state and local spending is the biggest part most of the time. It rises from around 35–40% in the early years to roughly 60–65% later on. Federal defense starts very high (around 45–55% in the early period) but then falls over the long run and stays closer to about 20–25% in recent decades, with some ups and downs. Federal non-defense is smaller and more stable, usually around 10–20% over time.

If your instructor requires the same screenshot: put it in figures/fig_government_spending.png and uncomment:

2 Question 2 — Solow Model with Technological Growth (FULL, for PDF)

This section keeps all steps, equations, and code (no shortening).

2.1 Problem 2: Solow Model with Technological Growth

We are given the Cobb–Douglas production function: \[ Y_t = A K_t^{\alpha}(Z_t N_t)^{1-\alpha}, \] where \(A > 0\) and \(0 < \alpha < 1\). Technology and labor grow exogenously: \[ Z_{t+1} = (1+z) Z_t, \qquad N_{t+1} = (1+n) N_t. \]

2.2 (1) Production in efficiency units

Define output and capital per efficiency unit: \[ \hat{y}_t = \frac{Y_t}{Z_t N_t}, \qquad \hat{k}_t = \frac{K_t}{Z_t N_t}. \]

Divide the production function by \(Z_t N_t\): \[ \hat{y}_t = \frac{A K_t^\alpha (Z_t N_t)^{1-\alpha}}{Z_t N_t}. \]

Use exponent rules: \[ \frac{(Z_t N_t)^{1-\alpha}}{(Z_t N_t)^1} = (Z_t N_t)^{-\alpha}. \]

So, \[ \hat{y}_t = A K_t^\alpha (Z_t N_t)^{-\alpha} = A\left(\frac{K_t}{Z_t N_t}\right)^\alpha = A \hat{k}_t^\alpha. \]

Therefore, \[ \boxed{\hat{y}_t = A \hat{k}_t^\alpha.} \]

2.3 (2) Law of motion for capital per efficiency unit

Aggregate capital evolves as: \[ K_{t+1} = s A K_t^\alpha (Z_t N_t)^{1-\alpha} + (1-\delta)K_t. \]

Divide both sides by \(Z_{t+1}N_{t+1}\): \[ \hat{k}_{t+1} = \frac{K_{t+1}}{Z_{t+1}N_{t+1}}. \]

Since \(Z_{t+1}=(1+z)Z_t\) and \(N_{t+1}=(1+n)N_t\), we have: \[ Z_{t+1}N_{t+1} = (1+z)(1+n)Z_t N_t. \]

Therefore, \[ \hat{k}_{t+1} = \frac{sA K_t^\alpha (Z_tN_t)^{1-\alpha} + (1-\delta)K_t}{(1+z)(1+n)Z_tN_t}. \]

Simplify:

  • First term: \[ \frac{K_t^\alpha (Z_tN_t)^{1-\alpha}}{Z_tN_t} = \left(\frac{K_t}{Z_tN_t}\right)^\alpha = \hat{k}_t^\alpha. \]

  • Second term: \[ \frac{K_t}{Z_tN_t} = \hat{k}_t. \]

So the law of motion becomes: \[ \boxed{ \hat{k}_{t+1}= \frac{sA\hat{k}_t^\alpha + (1-\delta)\hat{k}_t}{(1+z)(1+n)}. } \]

2.4 (3) Steady state

At steady state, \(\hat{k}_{t+1}=\hat{k}_t=\hat{k}^*\). Substitute: \[ \hat{k}^*= \frac{sA(\hat{k}^*)^\alpha + (1-\delta)\hat{k}^*}{(1+z)(1+n)}. \]

Multiply both sides by \((1+z)(1+n)\): \[ (1+z)(1+n)\hat{k}^* = sA(\hat{k}^*)^\alpha + (1-\delta)\hat{k}^*. \]

Bring the \((1-\delta)\hat{k}^*\) term to the left: \[ \left((1+z)(1+n)-(1-\delta)\right)\hat{k}^* = sA(\hat{k}^*)^\alpha. \]

Define: \[ g \equiv (1+z)(1+n)-(1-\delta). \]

Then: \[ g\hat{k}^* = sA(\hat{k}^*)^\alpha \quad\Rightarrow\quad g(\hat{k}^*)^{1-\alpha} = sA. \]

So: \[ \boxed{ \hat{k}^* = \left(\frac{sA}{g}\right)^{\frac{1}{1-\alpha}}. } \]

Using \(\hat{y}^* = A(\hat{k}^*)^\alpha\): \[ \boxed{\hat{y}^* = A(\hat{k}^*)^\alpha.} \]

In Solow, \(C_t = (1-s)Y_t\), so in efficiency units: \[ \boxed{\hat{c}^* = (1-s)\hat{y}^* = (1-s)A(\hat{k}^*)^\alpha.} \]

2.5 (4) Golden Rule

In steady state, investment required to keep \(\hat{k}\) constant equals \(g\hat{k}\). Hence steady-state consumption as a function of \(\hat{k}\) is: \[ \hat{c}(\hat{k}) = A\hat{k}^\alpha - g\hat{k}. \]

Differentiate and set equal to zero: \[ \frac{d\hat{c}}{d\hat{k}} = A\alpha \hat{k}^{\alpha-1} - g = 0 \quad\Rightarrow\quad A\alpha \hat{k}^{\alpha-1} = g. \]

Solve: \[ \hat{k}^{1-\alpha} = \frac{A\alpha}{g} \quad\Rightarrow\quad \boxed{ \hat{k}_{GR} = \left(\frac{\alpha A}{g}\right)^{\frac{1}{1-\alpha}}. } \]

Golden-rule saving rate: \[ \boxed{s_{GR}=\alpha.} \]

2.6 (5) Simulation for 100 periods (R code)

Parameter values: \[ \alpha=0.3,\; A=10,\; \delta=0.07,\; s=0.2,\; z=0.02,\; n=0.01,\; \hat{k}_0=40. \]

alpha <- 0.3
A <- 10
delta <- 0.07
s <- 0.2
z <- 0.02
n <- 0.01

g <- (1+z)*(1+n) - (1-delta)

T <- 100
khat <- numeric(T+1)
yhat <- numeric(T+1)
chat <- numeric(T+1)

khat[1] <- 40

for (t in 1:T) {
  yhat[t] <- A * khat[t]^alpha
  chat[t] <- (1 - s) * yhat[t]
  khat[t+1] <- (s*A*khat[t]^alpha + (1-delta)*khat[t]) / ((1+z)*(1+n))
}

# last values
yhat[T+1] <- A * khat[T+1]^alpha
chat[T+1] <- (1 - s) * yhat[T+1]

tt <- 0:T
plot(tt, khat, type="l", xlab="t", ylab="Efficiency units",
     main="Convergence paths: khat, yhat, chat")
lines(tt, yhat)
lines(tt, chat)

2.7 (6) Grid search over saving rates and Golden Rule

s_vec <- seq(0.05, 0.65, by=0.01)

k_star_s <- ((s_vec*A)/g)^(1/(1-alpha))
y_star_s <- A * k_star_s^alpha
c_star_s <- (1 - s_vec) * y_star_s

plot(s_vec, c_star_s, type="l", xlab="Saving rate s",
     ylab="Steady-state consumption c*",
     main="Steady-state consumption vs saving rate")

s_hat_GR <- s_vec[which.max(c_star_s)]
s_hat_GR
## [1] 0.3

The maximizing saving rate from the grid should be close to \(\alpha=0.3\), which matches the analytical result \(s_{GR}=\alpha\).

2.8 (7) Saving rate changes at \(t=101\) to the Golden Rule

We simulate \(t=0,\dots,200\) where \(s=0.2\) up to \(t=100\), and \(s=\alpha\) for \(t \ge 101\).

T2 <- 200
sGR <- alpha

khat2 <- numeric(T2+1)
yhat2 <- numeric(T2+1)
chat2 <- numeric(T2+1)

khat2[1] <- 40

for (t in 1:T2) {

  s_now <- ifelse(t <= 100, s, sGR)

  yhat2[t] <- A * khat2[t]^alpha
  chat2[t] <- (1 - s_now) * yhat2[t]

  khat2[t+1] <- (s_now*A*khat2[t]^alpha + (1-delta)*khat2[t]) /
    ((1+z)*(1+n))
}

yhat2[T2+1] <- A * khat2[T2+1]^alpha
chat2[T2+1] <- (1 - sGR) * yhat2[T2+1]

tt2 <- 0:T2
plot(tt2, khat2, type="l", xlab="t", ylab="Efficiency units",
     main="Paths with saving-rate change at t=101")
lines(tt2, yhat2)
lines(tt2, chat2)
abline(v=100, lty=2)

Growth rate of \(\hat{y}_t\): \[ \text{growth of }\hat{y}_t = \frac{\hat{y}_{t+1}}{\hat{y}_t}-1. \]

g_yhat <- (yhat2[2:(T2+1)] / yhat2[1:T2]) - 1
plot(0:(T2-1), g_yhat, type="l", xlab="t", ylab="Growth rate",
     main="Growth rate of yhat: yhat[t+1]/yhat[t] - 1")
abline(v=100, lty=2)

2.9 (8) Can output per worker grow faster than \(z\) forever after a one-time saving increase?

Output per worker is: \[ y_t \equiv \frac{Y_t}{N_t}. \]

Since \(\hat{y}_t=\frac{Y_t}{Z_tN_t}\), we have: \[ \frac{Y_t}{N_t} = \hat{y}_t Z_t \quad \Rightarrow \quad y_t=\hat{y}_t Z_t. \]

If \(Z_0=1\), then \(Z_t=(1+z)^t\). Hence: \[ y_t = \hat{y}_t(1+z)^t. \]

The growth rate of output per worker is: \[ \frac{y_{t+1}}{y_t}-1 = \frac{\hat{y}_{t+1}(1+z)^{t+1}}{\hat{y}_t(1+z)^t}-1 = (1+z)\frac{\hat{y}_{t+1}}{\hat{y}_t}-1. \]

As the economy converges to a steady state in efficiency units, \(\hat{y}_t\) converges to a constant, so: \[ \frac{\hat{y}_{t+1}}{\hat{y}_t} \to 1. \]

Therefore, the long-run growth rate becomes: \[ \frac{y_{t+1}}{y_t}-1 \to (1+z)\cdot 1 - 1 = z. \]

Conclusion: A one-time increase in saving cannot keep growth above \(z\) forever.

3 Question 3 — PWT10 (Full write-up + code + interpretations)

# install.packages(c("pwt10","dplyr","ggplot2"))
library(pwt10)
library(dplyr)
library(ggplot2)

data("pwt10.0")
pwt <- pwt10.0

# GDP per worker (as question defines it)
pwt <- pwt %>%
  mutate(y_pw = rgdpe / emp) %>%              # real GDP per worker
  filter(!is.na(isocode), !is.na(year))

3.1 Part 1 — Option 1: All countries with data in BOTH 1960 and 2019

As asked in the question 1 to only keep the country that have data available from 1960 to 2019:

# keep countries with y_pw > 0 in BOTH 1960 and 2019
base60 <- pwt %>% filter(year == 1960, is.finite(y_pw), y_pw > 0) %>%
  select(isocode, y1960 = y_pw)

base19 <- pwt %>% filter(year == 2019, is.finite(y_pw), y_pw > 0) %>%
  select(isocode, y2019 = y_pw)

# combine and compute average annual growth rate (1960–2019)
df_conv <- inner_join(base60, base19, by = "isocode") %>%
  mutate(g = (y2019 / y1960)^(1/59) - 1)      # average annual growth rate

Run the linear regression:

m1 <- lm(g ~ y1960, data = df_conv)
summary(m1)
## 
## Call:
## lm(formula = g ~ y1960, data = df_conv)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.079163 -0.006603  0.000965  0.008222  0.033567 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.412e-02  2.569e-03   9.389 5.88e-15 ***
## y1960       -2.569e-07  1.406e-07  -1.828   0.0709 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01562 on 89 degrees of freedom
## Multiple R-squared:  0.03619,    Adjusted R-squared:  0.02536 
## F-statistic: 3.341 on 1 and 89 DF,  p-value: 0.07091

This is the regression output (as you provided):

Residuals:
      Min        1Q    Median        3Q       Max 
-0.079163 -0.006603  0.000965  0.008222  0.033567 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.412e-02  2.569e-03   9.389 5.88e-15 ***
y1960       -2.569e-07  1.406e-07  -1.828   0.0709 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.01562 on 89 degrees of freedom
Multiple R-squared:  0.03619,   Adjusted R-squared:  0.02536 
F-statistic: 3.341 on 1 and 89 DF,  p-value: 0.07091

The regression estimates the relationship on initial income. Your estimated slope on y1960 is negative, meaning that—on average—countries with lower GDP per worker in 1960 tended to have higher average growth from 1960 to 2019. This negative sign is the direction predicted by unconditional convergence (poorer countries “catch up”). However, the p-value for y1960 is 0.0709, so the coefficient is not statistically significant at the 5% level (it is only marginally significant at the 10% level). That means we cannot confidently reject the null hypothesis using the usual 5% standard. Also, the R-squared is about 0.036, which is quite small, indicating that initial income explains only a small portion of cross-country differences in long-run growth—many other factors (institutions, policies, shocks, geography, etc.) are driving growth outcomes.

Draw the scatter plot:

ggplot(df_conv, aes(x = y1960, y = g)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "Unconditional convergence (1960–2019)",
       x = "Real GDP per worker in 1960 (y1960)",
       y = "Average annual growth rate, 1960–2019")

The scatter plot shows initial GDP per worker in 1960 on the x-axis and the average annual growth rate (1960–2019) on the y-axis. The fitted regression line slopes downward, which visually matches the negative coefficient from the regression and suggests a mild catch-up pattern: richer countries in 1960 tend to have slightly lower long-run growth rates. But the points are widely dispersed around the line (many countries with similar initial income grew at very different rates), which visually supports the low R-squared and the idea that the relationship is weak. The confidence band is also fairly wide, reflecting uncertainty about the true slope. Overall, the plot suggests some tendency toward convergence, but it is not strong or cleanly separated in the data.

No—not at the standard 5% significance level. The estimated slope is negative (consistent with convergence), but the p-value (~0.071) is above 0.05, so the evidence is not conclusive by typical standards. At best, you could say there is weak/marginal evidence at the 10% level, and the scatter shows lots of variation, meaning unconditional convergence alone does not explain much of the growth differences across countries.


3.2 Option 2: Keep only the OECD countries

oecd20 <- c("AUT","BEL","CAN","CHE","DEU","DNK","ESP","FRA","GBR","GRC",
            "ISL","IRL","ITA","LUX","NLD","NOR","PRT","SWE","TUR","USA")

df_oecd <- df_conv %>% filter(isocode %in% oecd20)

m2 <- lm(g ~ y1960, data = df_oecd)
summary(m2)
## 
## Call:
## lm(formula = g ~ y1960, data = df_oecd)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0040904 -0.0016989 -0.0009281  0.0004716  0.0142507 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.759e-02  2.397e-03  15.681 6.12e-12 ***
## y1960       -4.651e-07  8.126e-08  -5.723 2.00e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.004008 on 18 degrees of freedom
## Multiple R-squared:  0.6454, Adjusted R-squared:  0.6257 
## F-statistic: 32.76 on 1 and 18 DF,  p-value: 1.996e-05

This is the regression result (as you provided):

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0040904 -0.0016989 -0.0009281  0.0004716  0.0142507 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.759e-02  2.397e-03  15.681 6.12e-12 ***
y1960       -4.651e-07  8.126e-08  -5.723 2.00e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.004008 on 18 degrees of freedom
Multiple R-squared:  0.6454,    Adjusted R-squared:  0.6257 
F-statistic: 32.76 on 1 and 18 DF,  p-value: 1.996e-05

The regression estimates the relationship between the average annual growth rate of GDP per worker from 1960 to 2019 and the initial level of GDP per worker in 1960 for OECD countries only. The estimated coefficient on initial income (y1960) is negative and highly statistically significant, with a p-value of 2.00 × 10⁻⁵, which is far below the 1% significance level. This provides strong statistical evidence of convergence among OECD countries. The negative coefficient means that OECD countries with lower initial GDP per worker in 1960 experienced higher growth rates over the period 1960–2019, while richer OECD countries grew more slowly. The R-squared value is 0.645, which is relatively high compared to the full sample in Part 1, indicating that initial income explains a large portion (about 65%) of the variation in growth rates across OECD countries. This suggests that convergence is much stronger and more systematic within OECD countries than across all countries.

Make the scatter plot:

ggplot(df_oecd, aes(x = y1960, y = g)) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title = "OECD convergence (1960–2019)",
       x = "Real GDP per worker in 1960",
       y = "Average annual growth rate")

The scatter plot shows a clear downward-sloping relationship between initial GDP per worker in 1960 and subsequent growth rates for OECD countries. Countries with lower initial income, such as Turkey, Portugal, and Greece, tend to have higher growth rates, while richer countries such as the United States, Switzerland, and Luxembourg show lower growth rates. The regression line slopes downward sharply, and the points are closely clustered around the line compared to Part 1. The relatively narrow confidence band also indicates that the relationship is estimated with high precision. This visual evidence strongly supports the regression result and confirms the presence of convergence among OECD countries.

Yes, there is strong and statistically conclusive evidence of convergence among OECD countries. The regression coefficient on initial income is negative and highly statistically significant at the 1% level, and the scatter plot shows a clear downward trend. Unlike the full sample in Part 1, where convergence evidence was weak, OECD countries show strong convergence. This occurs because OECD countries share similar institutional quality, education levels, and economic structures, which allows poorer OECD countries to catch up more easily with richer ones.


3.3 Option 3: Cross-country income dispersion over time (1960–2019)

# Use the countries that have data for y_pw in 1960 (as the question says)
countries1960 <- base60$isocode

disp <- pwt %>%
  filter(isocode %in% countries1960,
         year >= 1960, year <= 2019,
         is.finite(y_pw), y_pw > 0) %>%
  group_by(year) %>%
  summarise(
    sd_logy = sd(log(y_pw), na.rm = TRUE),

    # ratio of average y of richest 10% to poorest 10%
    ratio_90_10 = {
      y <- sort(y_pw)
      n <- length(y)
      k <- max(1, floor(0.10 * n))
      mean(tail(y, k)) / mean(head(y, k))
    },
    .groups = "drop"
  )

head(disp)

You can plot both series (one after another):

plot(disp$year, disp$sd_logy, type="l",
     xlab="Year", ylab="SD of log(GDP per worker)",
     main="Income dispersion over time: SD of log y")

plot(disp$year, disp$ratio_90_10, type="l",
     xlab="Year", ylab="Richest 10% / Poorest 10% (mean ratio)",
     main="Income dispersion over time: 90/10 ratio")

3.3.1 Interpretation (keep full)

(i) Standard deviation of log(GDP per worker) — interpretation

The solid line in your figure plots the standard deviation of log GDP per worker across countries each year (using the same set of countries that have data in 1960). This measure captures overall dispersion in incomes across the full distribution. From the graph, the standard deviation rises steadily from the 1960s through the late 1990s/around 2000, meaning cross-country incomes became more spread out over time. After roughly 2000, the standard deviation falls somewhat, but it does not return to the low levels of the 1960s. So the main message is: global income differences widened strongly for several decades, and then narrowed a bit in the 2000s/2010s, but remained relatively high overall.

(ii) Richest 10% / Poorest 10% ratio — interpretation

The dashed line represents another dispersion measure: the ratio of the average GDP per worker of the richest 10% of countries to the average GDP per worker of the poorest 10% (plotted on a second scale). This measure focuses on the gap between the top and bottom of the world income distribution. Your graph shows this ratio increasing strongly from around ~20 in 1960 to above ~40 by the late 1990s/around 2000. That means the richest group of countries produced roughly 40 times the GDP per worker of the poorest group at the peak. After 2000, the ratio declines, suggesting some catch-up by poorer countries (or slower growth in the richest group), but it remains very large and rises again toward the end of the sample. Overall, the “top vs bottom” gap becomes much larger than in 1960 and stays high.

(iii) Conclusion: what do we learn about cross-country income inequality?

Putting both measures together, the evidence suggests that cross-country income inequality increased substantially between 1960 and about the late 1990s/early 2000s. The world did not show broad, smooth convergence over the whole period—if anything, the gap widened for decades. There is some partial reduction in dispersion after 2000 (both the standard deviation and the rich/poor ratio drop), which is consistent with some catch-up among poorer countries during the 2000s and 2010s. However, inequality remains much higher than in 1960, so the overall conclusion is: income gaps across countries widened a lot historically, and only partially narrowed more recently—global convergence is at best limited and uneven.


3.4 Part 4: TFP (Ait) and income (2000–2019)

3.4.1 Part 4 (i): Computing TFP (what it means)

You compute total factor productivity (TFP) using the production-function formula: \[ A_{it}=\frac{Y_{it}}{K_{it}^{\alpha}N_{it}^{1-\alpha}}, \] where \(Y_{it}\) (output), \(K_{it}\) (capital stock), and effective labor input is: \[ N_{it}=emp\times avh \times hc. \]

TFP is basically the part of output that cannot be explained just by how much capital and labor a country uses. If two countries have similar capital and labor input, the one with higher \(A_{it}\) is producing more output “per unit of inputs,” which we interpret as higher technology/efficiency/institutions/organization (all the productivity factors bundled into \(A\)).

3.4.2 Section 4(ii): R code for TFP + averages + regression

alpha <- 1/3

pwt2 <- pwt %>%
  mutate(
    Y = cgdpe,
    K = cn,
    N = emp * avh * hc,
    A = Y / (K^alpha * N^(1 - alpha))
  ) %>%
  filter(is.finite(A), A > 0,
         is.finite(y_pw), y_pw > 0)

# (ii) averages 2000–2019 for each country: Abar and ybar
avg_0019 <- pwt2 %>%
  filter(year >= 2000, year <= 2019) %>%
  group_by(isocode) %>%
  summarise(
    Abar = mean(A, na.rm = TRUE),
    ybar = mean(y_pw, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(is.finite(Abar), is.finite(ybar), Abar > 0, ybar > 0)

m_tfp <- lm(ybar ~ Abar, data = avg_0019)
summary(m_tfp)
## 
## Call:
## lm(formula = ybar ~ Abar, data = avg_0019)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27843.6  -6696.1     60.2   5663.1  28662.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -24818       3501  -7.089 1.05e-09 ***
## Abar           28820       1118  25.780  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10390 on 67 degrees of freedom
## Multiple R-squared:  0.9084, Adjusted R-squared:  0.9071 
## F-statistic: 664.6 on 1 and 67 DF,  p-value: < 2.2e-16

This is the estimate results (as you provided):

Residuals:
    Min      1Q  Median      3Q     Max 
-27843.6 -6696.1    60.2  5663.1 28662.8 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -24818       3501  -7.089 1.05e-09 ***
Abar           28820       1118  25.780  < 2e-16 ***
---
Residual standard error: 10390 on 67 degrees of freedom
Multiple R-squared:  0.9084, Adjusted R-squared:  0.9071 
F-statistic: 664.6 on 1 and 67 DF,  p-value: < 2.2e-16

You then average TFP and GDP per worker over 2000–2019 for each country and estimate: \[ \bar{y}_i=\beta_0+\beta_1\bar{A}_i+u_i. \] Your regression shows a strong positive relationship: the coefficient on Abar is about 28,820 and it is extremely statistically significant (t ≈ 25.78, p < 2e−16). This means countries with higher average TFP tend to have much higher average GDP per worker. The R-squared is about 0.908, which is very high, implying that differences in average TFP explain about 91% of the cross-country variation in average GDP per worker in your sample (2000–2019). In short: the regression says productivity differences are strongly connected to income differences.

3.4.3 Part (ii): Scatter plot interpretation (separate paragraph)

ggplot(avg_0019, aes(x = Abar, y = ybar)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "Average GDP per worker vs Average TFP (2000–2019)",
    x = "Average TFP (Abar)",
    y = "Average GDP per worker (ybar)"
  )

The scatter plot confirms the regression result visually. The points lie close to an upward-sloping fitted line, and the confidence band around the line is relatively narrow. That means the positive relationship is not driven by just a few extreme countries; it looks systematic across the sample. You still see some scatter (countries with similar TFP can have somewhat different income levels), but overall the pattern is very tight, which matches the high R-squared. So the plot and regression are consistent: higher TFP is strongly associated with higher GDP per worker.

3.4.4 Part (iii): Is there a “tight” relationship? Why or why not?

Yes — based on your results, there is a tight statistical relationship between average TFP and average income. There are two clear reasons:

  1. Statistical evidence: the slope is highly significant and the R² ≈ 0.91 is very large, meaning the model fits the data closely.
  2. Economic logic: in the production-function framework, output per worker is strongly influenced by productivity. Even if a country has capital and labor, without high productivity (better technology, efficiency, institutions), it cannot sustain high income per worker. That’s exactly what your data show: countries with higher \(A\) tend to have much higher \(\bar{y}\).