Final Exam: Part I
In this exercise, we analyze a simple economic environment featuring production, investment and labor demand. Your job is to replicate my results. You should submit both the Rmd/Qmd source and the generated HTML file.
In replicating the results, your economic knowledge will be more important than your R knowledge. Indeed, in order to replicate the results, it is necessary to comprehend the fundamental economic scenario, and then tell the story using basic commands in the programming language R.
Setup
Consider the following production and investment process for \(j=1,...,1000\) firms across \(t=1,...,10\) periods.
The log production function for firm \(j\) in period \(t\): \[ y_{jt} = β_0 + β_l l_{jt} + β_k k_{jt} + ω_{jt} + η_{jt} \] where \(ω_{jt}\) is the anticipated shock and \(η_{jt}\) is the ex post shock. The anticipated shocks evolve as: \[ ω_{jt} = α ω_{j,t-1} + ν_{jt}. \] Assume \(ν_{jt}\) (\(η_{jt}\)) is an IID normal random variable with mean 0 and standard deviation \(σ_ν\) (\(σ_η\)).
The product price is the same across firms and normalized at 1. The wage \(w_t\) is constant at 0.5.
The capital accumulate according to \[ K_{j,t+1} = (1-δ) K_{j,t} + I_{jt}. \]
Our goal is to estimate the parameters \(β_l\) and \(β_k\) in the production function and the parameter \(α\). To do that, we first simulate the data set df_all, and then estimate the parameters.
Simulation
We set the parameters as follows:
| parameter | variable | value |
|---|---|---|
| \(β_0\) | beta_0 |
1 |
| \(β_l\) | beta_l |
0.2 |
| \(β_k\) | beta_k |
0.7 |
| \(α\) | alpha |
0.7 |
| \(σ_η\) | sigma_eta |
0.2 |
| \(σ_ν\) | sigma_nu |
0.5 |
| \(σ_ω\) | sigma_w |
0.1 |
| \(δ\) | delta |
0.05 |
- Define the parameter variables as above.
- Write a function that returns the log output given
l,k,ωandηunder the given parameter values and name itlog_production(l, k, omega, eta, beta_0, beta_l, beta_k).
log_production(
l = 1,
k = 1,
omega = 1,
eta = 1,
beta_0 = beta_0,
beta_l = beta_l,
beta_k = beta_k
) [1] 3.9
Suppose that the labor is determined after \(ω\) is observed and before \(η\) is observed, given the log capital level \(k_{jt}\).
- Derive the optimal log labor as a function of \(ω_{jt}, η_{jt}, k_{jt}\) and wage. Write a function to return the optimal log labor given the variables and parameters and name it
log_labor_choice(k, wage, omega, beta_0, beta_l, beta_k, sigma_eta).
Some knowledge of calculus (differentiation and first-order condition) is needed in understanding the economic scenario. If you do not know differentiation, there should be no problem as I have already derived the first-order condition for you below. You just need to translate the first-order condition into an R function.
The firm solves \[ \max_{l_{jt}} \mathbb E_{\eta} \exp (β_0 + β_l l_{jt} + β_k k_{jt} + ω_{jt} + η_{jt}) - w_t \cdot \exp(l_{jt}) \]
First-order condition yields: \[ \ln β_l + β_0 + β_k k + β_l l + ω_{jt} + σ_η^2/2 = \ln (w_t) + l \] where I use the fact that if \(\ln X \sim N(μ, σ^2)\), then \(\mathbb E X = \exp(μ + σ^2)\).
log_labor_choice(
k = 1,
wage = 1,
omega = 1,
beta_0 = beta_0,
beta_l = beta_l,
beta_k = beta_k,
sigma_eta = sigma_eta
)[1] 1.388203
If there is no additional variation in labor, the coefficient on the labor \(β_l\) is not identified. Thus, if we generate labor choice from the previous function, \(β_l\) will not be identified from the simulated data.
To see this, we write a modified version of the previous function in which
\(ω_{jt}\) is replaced with \(ω_{jt} + ι_{jt}\), where \(ι_{jt}\) is an optimization error that follows an i.i.d. normal distribution with mean 0 and standard deviation 0.05. That is, the manager of the firm perceives as if the shock is \(ω_{jt} + ι_{jt}\), even though the true shock is \(ω_{jt}\).
- Modify the previous function by including \(ι_{jt}\) as an additional input and name it
log_labor_choice_error(k, wage, omega, beta_0, beta_l, beta_k, iota, sigma_eta).
log_labor_choice_error(
k = 1,
wage = 1,
omega = 1,
beta_0 = beta_0,
beta_l = beta_l,
beta_k = beta_k,
iota = 1,
sigma_eta = sigma_eta
)[1] 2.638203
Consider an investment process such that: \[ I_{jt} = (δ+ γ ω_{jt}) K_{jt} \] where \(I_{jt}\) and \(K_{jt}\) are investment and capital in level. Set \(γ=0.1\), ie, the investment is strictly increasing in \(ω\). (The investment function should be derived by solving the dynamic problem of a firm. But here, we just specify it in a reduced-form.)
- Define
gamma = 0.1. Write a function that returns the investment given the log-capital \(k_{jt}\), the anticipated shock \(ω_{jt}\), and parameter values, according to the previous equation. Name itinvestment_choice(k, omega, gamma, delta).
gamma = 0.1
investment_choice(
k = 1,
omega = 1,
gamma = gamma,
delta = delta
)[1] 0.4077423
Simulate the data first using the labor choice without optimization error and second using the labor choice with optimization error. To do so, we specify the initial values for the state variables \(k_{jt}\) and \(ω_{jt}\) as follows.
- Draw \(k_{j1}\) from an i.i.d. normal distribution with mean 1 and standard deviation 0.5. Draw \(ω_{j1}\) from its stationary distribution (ie, \(ω_{j1} \sim N(0, σ_ν^2/(1-α^2))\)). Draw a wage. Before simulating the rest of the data, set the seed at 1.
df# A tibble: 1,000 × 5
j t k omega wage
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.687 0.795 0.5
2 2 1 1.09 0.779 0.5
3 3 1 0.582 -0.610 0.5
4 4 1 1.80 0.148 0.5
5 5 1 1.16 0.0486 0.5
6 6 1 0.590 -1.16 0.5
7 7 1 1.24 0.568 0.5
8 8 1 1.37 -1.34 0.5
9 9 1 1.29 -0.873 0.5
10 10 1 0.847 0.699 0.5
# … with 990 more rows
- Draw optimization error \(ι\) and compute the labor and investment choice of period 1. For labor choice, compute both types of labor choices.
df# A tibble: 1,000 × 9
j t k omega wage iota l l_error inv
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.687 0.795 0.5 -0.0443 1.72 1.67 0.257
2 2 1 1.09 0.779 0.5 -0.0961 2.06 1.94 0.381
3 3 1 0.582 -0.610 0.5 0.0810 -0.123 -0.0218 -0.0196
4 4 1 1.80 0.148 0.5 0.0260 1.89 1.92 0.391
5 5 1 1.16 0.0486 0.5 -0.00279 1.21 1.21 0.176
6 6 1 0.590 -1.16 0.5 0.0348 -0.809 -0.766 -0.120
7 7 1 1.24 0.568 0.5 0.00268 1.93 1.93 0.370
8 8 1 1.37 -1.34 0.5 -0.0655 -0.346 -0.428 -0.330
9 9 1 1.29 -0.873 0.5 -0.106 0.165 0.0327 -0.135
10 10 1 0.847 0.699 0.5 -0.0104 1.74 1.73 0.280
# … with 990 more rows
- Draw ex post shock and compute the output according to the production function for both labor without optimization error and with optimization error. Name the output without optimization error
yand the one with optimization errory_error.
print(df, width=Inf)# A tibble: 1,000 × 12
j t k omega wage iota l l_error inv eta y
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.687 0.795 0.5 -0.0443 1.72 1.67 0.257 0.148 2.77
2 2 1 1.09 0.779 0.5 -0.0961 2.06 1.94 0.381 0.0773 3.03
3 3 1 0.582 -0.610 0.5 0.0810 -0.123 -0.0218 -0.0196 0.259 1.03
4 4 1 1.80 0.148 0.5 0.0260 1.89 1.92 0.391 -0.161 2.62
5 5 1 1.16 0.0486 0.5 -0.00279 1.21 1.21 0.176 -0.321 1.79
6 6 1 0.590 -1.16 0.5 0.0348 -0.809 -0.766 -0.120 0.187 0.274
7 7 1 1.24 0.568 0.5 0.00268 1.93 1.93 0.370 0.361 3.19
8 8 1 1.37 -1.34 0.5 -0.0655 -0.346 -0.428 -0.330 -0.0113 0.539
9 9 1 1.29 -0.873 0.5 -0.106 0.165 0.0327 -0.135 0.377 1.44
10 10 1 0.847 0.699 0.5 -0.0104 1.74 1.73 0.280 0.316 2.96
y_error
<dbl>
1 2.76
2 3.01
3 1.05
4 2.63
5 1.78
6 0.282
7 3.19
8 0.523
9 1.41
10 2.95
# … with 990 more rows
- Repeat the procedures for \(t=1,...,10\) by updating the capital and anticipated shocks, and name the resulting data frame
df_all. Use the previously generateddfas the data fort = 1. In each period, generatenu,iota, andetain this order, if you want to match the realization of the random variables.
df_all# A tibble: 10,000 × 12
j t k omega wage iota l l_error inv eta y
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.687 0.795 0.5 -0.0443 1.72 1.67 0.257 0.148 2.77
2 2 1 1.09 0.779 0.5 -0.0961 2.06 1.94 0.381 0.0773 3.03
3 3 1 0.582 -0.610 0.5 0.0810 -0.123 -0.0218 -0.0196 0.259 1.03
4 4 1 1.80 0.148 0.5 0.0260 1.89 1.92 0.391 -0.161 2.62
5 5 1 1.16 0.0486 0.5 -0.00279 1.21 1.21 0.176 -0.321 1.79
6 6 1 0.590 -1.16 0.5 0.0348 -0.809 -0.766 -0.120 0.187 0.274
7 7 1 1.24 0.568 0.5 0.00268 1.93 1.93 0.370 0.361 3.19
8 8 1 1.37 -1.34 0.5 -0.0655 -0.346 -0.428 -0.330 -0.0113 0.539
9 9 1 1.29 -0.873 0.5 -0.106 0.165 0.0327 -0.135 0.377 1.44
10 10 1 0.847 0.699 0.5 -0.0104 1.74 1.73 0.280 0.316 2.96
# … with 9,990 more rows, and 1 more variable: y_error <dbl>
- Check the simulated data by making a summary table. I use the
psychpackage for making this table.
df_all |>
psych::describe() |>
select(mean, sd, median, min, max) mean sd median min max
j 500.50 288.69 500.50 1.00 1000.00
t 5.50 2.87 5.50 1.00 10.00
k 0.98 0.58 0.99 -1.28 3.23
omega -0.01 0.70 -0.01 -2.59 2.63
wage 0.50 0.00 0.50 0.50 0.50
iota 0.00 0.05 0.00 -0.18 0.17
l 0.98 1.10 0.99 -3.33 4.97
l_error 0.98 1.10 0.99 -3.38 4.95
inv 0.18 0.30 0.12 -1.27 3.30
eta 0.00 0.20 0.00 -0.77 0.75
y 1.88 1.12 1.89 -2.47 6.12
y_error 1.88 1.12 1.89 -2.48 6.12
Estimate the parameters
For now, use the labor choice with optimization error.
- Regress \(y_{jt}\) on \(l_{jt}\) and \(k_{jt}\) using the least square method.
Call:
lm(formula = y_error ~ l_error + k, data = df_all)
Residuals:
Min 1Q Median 3Q Max
-0.73002 -0.14117 -0.00071 0.13743 0.87983
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.892542 0.004058 219.966 <2e-16 ***
l_error 0.997913 0.002396 416.454 <2e-16 ***
k 0.007599 0.004503 1.688 0.0915 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2068 on 9997 degrees of freedom
Multiple R-squared: 0.9657, Adjusted R-squared: 0.9657
F-statistic: 1.408e+05 on 2 and 9997 DF, p-value: < 2.2e-16
- As you can see, the LS results
1.0and0.0are very different from the true values,0.2and0.7. This is expected as both \(l\) and \(k\) are endogenous.
Your work ends here. Below, I show you how to estimate the parameters using Olley-Pakes method. Since we did not discuss Olley-Pakes method in class, you are not required to replicate the results below.
First, estimate the first-step model of Olley-Pakes method: \[ y_{jt} = β_1 l_{jt} + \phi(k_{jt}, I_{jt}) + \eta_{jt} \] by approximating \(\phi\) using a kernel function.
We use the npplreg function from the np package to estimate a partially linear model with a multivariate kernel.
- You first use
npplregbwto obtain the optimal band width, - and then use
npplregto estimate the model under the optimal bandwidth.
The computation of the optimal bandwidth is time consuming. On my mac, it takes about 4 hours.
library(np)
bw1 <- npplregbw(y ~ l | k + inv,
data = df_all)
pl <- npplreg(bws=bw1)
summary(pl)
Partially Linear Model
Regression data: 10000 training points, in 3 variable(s)
With 1 linear parametric regressor(s), 2 nonparametric regressor(s)
y(z)
Bandwidth(s): 0.07347226 0.01437256
x(z)
Bandwidth(s): 0.02960262 0.009987366
l
Coefficient(s): 1.187301
Kernel Regression Estimator: Local-Constant
Bandwidth Type: Fixed
Residual standard error: 0.1932366
R-squared: 0.9701184
Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 2
Then, we estimate the second stage model of Olley-Pakes method.
- Using the estimates in the first step, compute \(y_{jt} - \hat{\beta_l} l_{jt}\) and \(\hat{\phi}(k_{j, t - 1}, I_{j, t - 1})\), and save it as a data frame named
df_all_1st.
df_all_1st# A tibble: 10,000 × 4
j t y_error_tilde phi_t_1
<int> <dbl> <dbl> <dbl>
1 1 1 2.34 NA
2 1 2 1.37 2.21
3 1 3 0.621 1.49
4 1 4 0.447 0.880
5 1 5 0.878 0.611
6 1 6 1.62 0.926
7 1 7 0.558 1.40
8 1 8 0.684 0.439
9 1 9 0.939 0.525
10 1 10 1.49 0.838
# … with 9,990 more rows
We estimate \(α\), \(β_0\), and \(β_k\) by a GMM estimator. The moment is: \[ g_{JT}(\alpha, \beta_0, \beta_k) \equiv \frac{1}{JT}\sum_{j = 1}^J \sum_{t = 1}^T \{y_{jt} - \hat{\beta_l} l_{jt} - \beta_0 - \beta_k k_{jt} - \alpha[\hat{\phi}(k_{j, t - 1}, I_{j, t - 1}) - \beta_0 - \beta_k k_{j, t-1}]\} \begin{bmatrix} k_{jt} \\ k_{j, t - 1} \\ I_{j, t - 1} \end{bmatrix}. \]
Compute a function that returns the value of \(g_{JT}(\alpha, \beta_0, \beta_k)\) given parameter values, data, and
df_all_1st, and name itmoment_olleypakes_2nd.
moment_olleypakes_2nd(
alpha = alpha,
beta_0 = beta_0,
beta_k = beta_k,
df_all = df_all,
df_all_1st = df_all_1st
)[1] -0.018459481 -0.019010969 -0.003815687
Based on the moment, we can define the objective function of a generalized method of moments estimator with a weighting matrix \(W\) as \[ Q_{JT}(\alpha, \beta_0, \beta_k) \equiv g_{JT}(\alpha, \beta_0, \beta_k)' W g_{JT}(\alpha, \beta_0, \beta_k). \]
Write a function that returns the value of
\(Q_{JT}(\alpha, \beta_0, \beta_k)\) and name itobjective_olleypakes_2nd. Set \(W\) as the identity matrix, show the value of the objective function evaluated at the true parameters.
W <- diag(3)
theta <-
c(
alpha,
beta_0,
beta_k
)
objective_olleypakes_2nd(
theta = theta,
df_all = df_all,
df_all_1st = df_all_1st,
W = W
) [,1]
[1,] 0.0007167288
Finally, use optim() to find the parameters that minimize the objective function. I used the L-BFGS-B method here.
$par
[1] 0.7023647 0.9766275 0.6694232
$value
[1] 2.027934e-07
$counts
function gradient
10 10
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
The estimates are 0.7023647, 0.9766275 and 0.6694232 and the true values are alpha = 0.7, beta_0 = 1 and beta_k = 0.7.