CS2

Author

Shogo Maeda

Published

March 28, 2025

library(dplyr)
library(car)
library(stargazer)
library(ggplot2)
library(knitr)
library(kableExtra)

1 Empirical Study

The Fisher-Kaysen Specification with US Time series Data

eler <- read.csv("CS2_Eler.csv")

(a)

To estimate the parameters in Equation (5), the variables are generated as follows. The corresponding code is shown below.

\[ \begin{align} LNP1 &= \text{ln} (PELEC_t / PELEC_{t-1}) \\ LNG1 &= \text{ln} (GNP_t / GNP_{t-1}) \\ LNK1 &= \text{ln} (KWH_t / KWH_{t-1}) \end{align} \]

eler <- eler |>
  mutate(LNP1 = log(PELEC/lag(PELEC))) |>
  mutate(LNG1 = log(GNP/lag(GNP))) |>
  mutate(LNK1 = log(KWH/lag(KWH)))

(b)

Equation (5) is formulated as the following OLS model.

\[ \begin{align} LNK1_t = \gamma + \alpha LNP1_t + \beta LNG1_t + \varepsilon_t \end{align} \]

The estimation results obtained by OLS are presented below.

OLS <- lm(LNK1 ~ LNP1 + LNG1, data = eler)
stargazer(OLS, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                               LNK1            
-----------------------------------------------
LNP1                         -0.435***         
                              (0.095)          
                                               
LNG1                          0.412**          
                              (0.184)          
                                               
Constant                     0.045***          
                              (0.007)          
                                               
-----------------------------------------------
Observations                    33             
R2                             0.640           
Adjusted R2                    0.616           
Residual Std. Error       0.022 (df = 30)      
F Statistic           26.630*** (df = 2; 30)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

Based on these results, the parameters \( \gamma \) and \( \alpha \) are statistically significant at the 1% level, while \( \beta \) is significant at the 5% level.

The estimated value of \( \alpha \) is -0.435, indicating that a 1% increase in electricity prices reduces the growth rate of electricity consumption by approximately 0.435 percentage points. Similarly, the estimated value of \( \beta \) is 0.412, suggesting that a 1% increase in GNP leads to an approximately 0.412 percentage point increase in the growth rate of electricity consumption. Furthermore, the constant term \( \gamma \), estimated at 0.045, represents the average growth in electricity consumption attributable to changes in equipment stock.

In this model, the average growth rate of equipment stock is absorbed into the constant term. Therefore, the parameters on LNP and LNG represent the relationship between the rate of change in electricity consumption and the rate of change in price and income, conditional on the growth of equipment stock. For this reason, the estimated parameters can be interpreted as short-run elasticities.

(c)

Using the relationship \( \Delta \text{ln} Y = \text{ln}(1 + r) \approx r \) in a log-log model, the growth rate of electricity demand can be approximated by \( \gamma + \alpha * (-0.02) + \beta * (0.04) \). The result of the calculation is as follows:

coef_vals <- coef(OLS)
gamma = coef_vals[1]
alpha <- coef_vals["LNP1"]
beta <- coef_vals["LNG1"]

# Define changes in PELEC and GNP
dPELEC <- -0.02
dGNP <- 0.04

# Calculate percent change in KWH
dKWH <- gamma + alpha * dPELEC + beta * dGNP

# Show the result
cat("Forecasted growth rate for electricity demand:", round(dKWH * 100, 2), "%\n")
Forecasted growth rate for electricity demand: 6.99 %

(d)

Durbin-Watson test

To examine the presence of first-order autocorrelation in the model, we assume the following structure for the error term:

\[ \begin{align} \varepsilon_t = \rho \varepsilon_{t-1} + \nu_t, \quad \nu_t \sim \text{i.i.d. } (0, \sigma^2) \end{align} \]

The hypotheses for the Durbin-Watson test are formulated as follows:

\[ \begin{align} H_0: \rho = 0 \\ H_1: \rho \neq 0 \end{align} \]

The Durbin-Watson (DW) statistic is calculated using the residuals \( \hat{\varepsilon}_t \) from the OLS estimation, according to the following formula:

\[ \begin{align} DW = \frac{\sum_{t=2}^{T} (\hat{\varepsilon}_t - \hat{\varepsilon}_{t-1})^2}{\sum_{t=1}^{T} \hat{\varepsilon}_t^2} \end{align} \]

resid_OLS <- resid(OLS)
dwstat <- sum(diff(resid(OLS))^2)/sum(resid(OLS)^2)
cat("Durbin-Watson statistic:", round(dwstat, 5), "\n")
Durbin-Watson statistic: 0.95744 

In this case, the DW statistic was calculated as 0.95744. According to the Durbin-Watson critical value table, the lower bound at the 1% significance level for a sample size of n = 30 and 2 explanatory variables is approximately 1.07.

Since the DW statistic is below this lower bound, the null hypothesis of no autocorrelation is rejected. This result indicates strong evidence of positive first-order autocorrelation in the residuals.

Indeed, as shown in the residual plot over time below, the magnitude of the residuals decreases as time progresses. This visual pattern also suggests the presence of autocorrelation in the error terms.

resid_OLS <- resid(OLS)
plot(resid_OLS, type = "l", col = "blue",
     main = "Residuals over Time", ylab = "Residuals", xlab = "Time")
abline(h = 0, col = "red", lty = 2)

Cochrane-Orcutt estimation

Since first-order autocorrelation is suspected in the OLS model, we now re-estimate the model using the Cochrane-Orcutt procedure. Assuming first-order autocorrelation, the parameter \( \rho \) is estimated using the following equation:

\[ \begin{align} \hat{\varepsilon}_t = \rho \hat{\varepsilon}_{t-1} + \nu_t \end{align} \]

where \( \nu_t \sim N(0, \sigma_\nu^2) \).

Based on the estimated value \( \hat{\rho} \), a generalized differencing transformation is applied. The newly estimated parameters are substituted back into the original model to compute updated residuals, from which a new \( \hat{\rho} \) is re-estimated. This procedure is repeated until convergence. The convergence threshold is set at 0.005.

We conduct the procedure as follows:

threshold <- 0.005
max_iter <- 100
iter <- 0
converged <- FALSE

# Estimate the original model and obtain residuals
model_ols <- lm(LNK1 ~ LNP1 + LNG1, data = eler)
resid <- resid(model_ols)

  # Estimate the parameter rho using residuals
  model_rho <- lm(resid[-1] ~ resid[-length(resid)])
  rho_hat <- coef(model_rho)[2]

while (iter < max_iter) {
  iter <- iter + 1
  
  # Formulate the generalized differences of variables
  eler_co <- eler[-1, ]
  eler_co$LNK1_gd <- eler$LNK1[-1] - rho_hat * eler$LNK1[-nrow(eler)]
  eler_co$LNP1_gd <- eler$LNP1[-1] - rho_hat * eler$LNP1[-nrow(eler)]
  eler_co$LNG1_gd <- eler$LNG1[-1] - rho_hat * eler$LNG1[-nrow(eler)]

  # Regress the the generalized difference model and estimate new parameters
  model_gd <- lm(LNK1_gd ~ LNP1_gd + LNG1_gd, data = eler_co)
  coefs <- coef(model_gd) 

  # Substitute the parameters into the original model and calculate new residuals 
  resid <- eler$LNK1 - (coefs[1] + coefs[2] * eler$LNP1 + coefs[3] * eler$LNG1)

  # Re-estimate the parameter rho using the new residuals
  model_rho <- lm(resid[-1] ~ resid[-length(resid)])
  rho_hat_new <- coef(model_rho)[2]

  # Check convergence and iterate
  if (abs(rho_hat_new - rho_hat) < threshold) {
    converged <- TRUE
    break
  } else {
    rho_hat <- rho_hat_new
  }
}

# Results
if (converged) {
  cat("Cochrane-Orcutt procedure converged in", iter, "iterations.\n")
  cat("Estimated rho:", round(rho_hat_new, 4), "\n")
} else {
  cat("Cochrane-Orcutt did not converge within", max_iter, "iterations.\n")
}
Cochrane-Orcutt procedure converged in 4 iterations.
Estimated rho: 0.7582 
# Regression summary
stargazer(model_gd, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                              LNK1_gd          
-----------------------------------------------
LNP1_gd                       -0.170*          
                              (0.085)          
                                               
LNG1_gd                      0.795***          
                              (0.128)          
                                               
Constant                      0.007**          
                              (0.003)          
                                               
-----------------------------------------------
Observations                    32             
R2                             0.778           
Adjusted R2                    0.762           
Residual Std. Error       0.017 (df = 29)      
F Statistic           50.715*** (df = 2; 29)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

As a result, \( \hat{\rho} \) converged to 0.7582.

The estimation results of the model using the Cochrane-Orcutt procedure are shown above. However, it should be noted that the reported intercept corresponds to the value of \( \beta_0 (1 - \hat{\rho}) \). The actual intercept, when recalculated according to this formula, is 0.029.

Using this value, the Durbin-Watson statistic was calculated for the generalized differencing model, yielding the following result.

dwstat_co <- sum(diff(resid(model_gd))^2)/sum(resid(model_gd)^2)
cat("Durbin-Watson statistic:", round(dwstat_co, 5), "\n")
Durbin-Watson statistic: 1.68657 

The new Durbin-Watson statistic is approximately 1.687. Since this exceeds the upper bound (1.60) of the 5% significance level for the Durbin-Watson distribution when n = 40 and the number of explanatory variables is 2 (under conservative assumptions), the null hypothesis is not rejected.

Therefore, autocorrelation is no longer present, indicating that the Cochrane-Orcutt correction worked effectively.

Furthermore, the absence of autocorrelation is also evident from the residual plot over time. Unlike the residuals obtained from the OLS estimation, the residuals here appear to be evenly distributed over time, showing no systematic pattern as time progresses.

resid_co <- resid(model_gd)
plot(resid_co, type = "l", col = "blue",
     main = "Residuals over Time", ylab = "Residuals", xlab = "Time")
abline(h = 0, col = "red", lty = 2)

The estimation results from the Cochrane-Orcutt model and the original OLS model are summarized below. Regarding the constant term, the value is recalculated using the formula \( \beta_0 (1 - \hat{\rho}) \), considering that the estimated intercept in the transformed model corresponds to this expression.

ols_summary <- summary(model_ols)
ols_coef <- coef(ols_summary)

gd_summary <- summary(model_gd)
gd_coef <- coef(gd_summary)

beta0_co <- gd_coef[1, "Estimate"] / (1 - rho_hat_new)
beta1_co <- gd_coef[2, "Estimate"]
beta2_co <- gd_coef[3, "Estimate"]

se_beta0 <- gd_coef[1, "Std. Error"] / (1 - rho_hat_new)
se_beta1 <- gd_coef[2, "Std. Error"]
se_beta2 <- gd_coef[3, "Std. Error"]

t_beta0 <- beta0_co / se_beta0
t_beta1 <- beta1_co / se_beta1
t_beta2 <- beta2_co / se_beta2

compare_full <- rbind(
  data.frame(
    Term = c("Intercept", "LNP1", "LNG1"),
    OLS_Estimate = round(ols_coef[, "Estimate"], 4),
    OLS_t_value = round(ols_coef[, "t value"], 2),
    CO_Estimate = round(c(beta0_co, beta1_co, beta2_co), 4),
    CO_t_value = round(c(t_beta0, t_beta1, t_beta2), 2),
    row.names = NULL
  ),
  data.frame(
    Term = c("R-squared", "DW statistic"),
    OLS_Estimate = c(round(ols_summary$r.squared, 4), round(dwstat, 4)),
    OLS_t_value = c(NA, NA),
    CO_Estimate = c(round(gd_summary$r.squared, 4), round(dwstat_co, 4)),
    CO_t_value = c(NA, NA),
    row.names = NULL
  )
)
kbl(compare_full, 
    caption = "Comparison of OLS and Cochrane-Orcutt Estimates", 
    format = "html", 
    booktabs = TRUE, 
    row.names = FALSE) |> 
  kable_styling(full_width = FALSE)
Comparison of OLS and Cochrane-Orcutt Estimates
Term OLS_Estimate OLS_t_value CO_Estimate CO_t_value
Intercept 0.0447 6.53 0.0289 2.24
LNP1 -0.4350 -4.58 -0.1699 -1.99
LNG1 0.4122 2.24 0.7949 6.20
R-squared 0.6397 NA 0.7777 NA
DW statistic 0.9574 NA 1.6866 NA

In the Cochrane-Orcutt model, the estimated coefficient for price is −0.1699, and for income is 0.7949, both statistically significant at the 10% and 1% levels, respectively. Both coefficients differ substantially from the OLS estimates: the absolute value of the price coefficient is less than half, and the income coefficient is nearly twice as large.

The negative sign of the price coefficient is consistent with the usual inverse relationship between demand and price. Likewise, the positive sign of the income coefficient aligns with the intuitive expectation that electricity consumption increases with higher income.

On the other hand, the Durbin-Watson test suggests that the OLS model suffers from autocorrelation, making its parameter estimates inefficient and potentially yielding misleading t-statistics. In contrast, the Cochrane-Orcutt model shows no evidence of autocorrelation, and its estimation results are statistically more credible.

In conclusion, the estimation results from the Cochrane-Orcutt procedure are considered both more plausible and more credible than those from the original OLS model.

(e)

eler_sub <- eler |> 
  filter(YEAR >= 1951 & YEAR <= 1973)

Repeating (b)

eler_sub <- eler_sub |>
  mutate(LNP1 = log(PELEC/lag(PELEC))) |>
  mutate(LNG1 = log(GNP/lag(GNP))) |>
  mutate(LNK1 = log(KWH/lag(KWH)))
OLS_sub <- lm(LNK1 ~ LNP1 + LNG1, data = eler_sub)
stargazer(OLS_sub, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                               LNK1            
-----------------------------------------------
LNP1                         -0.410**          
                              (0.162)          
                                               
LNG1                          0.450*           
                              (0.216)          
                                               
Constant                     0.049***          
                              (0.008)          
                                               
-----------------------------------------------
Observations                    22             
R2                             0.474           
Adjusted R2                    0.419           
Residual Std. Error       0.020 (df = 19)      
F Statistic            8.560*** (df = 2; 19)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

Based on these results, the parameters \( \gamma \), \( \alpha \), and \( \beta \) are statistically significant at the 1%, 5%, and 10% levels, respectively.

The estimated value of \( \alpha \) is –0.410, and that of \( \beta \) is 0.450, indicating that a 1% increase in price or income leads to corresponding percentage changes in electricity consumption, as discussed in (b).
Similarly, the constant term \( \gamma \), estimated at 0.049, represents the average growth in electricity consumption attributable to changes in equipment stock.

As in (b), the estimated parameters can be interpreted as short-run elasticities.

Repeating (c)

coef_vals_sub <- coef(OLS_sub)
gamma_sub = coef_vals_sub[1]
alpha_sub <- coef_vals_sub["LNP1"]
beta_sub <- coef_vals_sub["LNG1"]

# Define changes in PELEC and GNP
dPELEC_sub <- -0.02
dGNP_sub <- 0.04

# Calculate percent change in KWH
dKWH_sub <- gamma_sub + alpha_sub * dPELEC_sub + beta_sub * dGNP_sub

# Show the result
cat("Forecasted growth rate for electricity demand:", round(dKWH_sub * 100, 2), "%\n")
Forecasted growth rate for electricity demand: 7.55 %

Repeating (d)

resid_OLS_sub <- resid(OLS_sub)
dwstat_sub <- sum(diff(resid(OLS_sub))^2)/sum(resid(OLS_sub)^2)
cat("Durbin-Watson statistic:", round(dwstat_sub, 5), "\n")
Durbin-Watson statistic: 1.05106 

The DW statistic was calculated as 1.05106. According to the Durbin-Watson critical value table, the DW statistic is below this lower bound and the null hypothesis of no autocorrelation is rejected. This result indicates strong evidence of positive first-order autocorrelation in the residuals.

threshold <- 0.005
max_iter <- 100
iter <- 0
converged <- FALSE

# Estimate the original model and obtain residuals
model_ols_sub <- lm(LNK1 ~ LNP1 + LNG1, data = eler_sub)
resid_sub <- resid(model_ols_sub)

  # Estimate the parameter rho using residuals
  model_rho_sub <- lm(resid_sub[-1] ~ resid_sub[-length(resid_sub)])
  rho_hat_sub <- coef(model_rho_sub)[2]

while (iter < max_iter) {
  iter <- iter + 1
  
  # Formulate the generalized differences of variables
  eler_co_sub <- eler_sub[-1, ]
  eler_co_sub$LNK1_gd <- eler_sub$LNK1[-1] - rho_hat_sub * eler_sub$LNK1[-nrow(eler_sub)]
  eler_co_sub$LNP1_gd <- eler_sub$LNP1[-1] - rho_hat_sub * eler_sub$LNP1[-nrow(eler_sub)]
  eler_co_sub$LNG1_gd <- eler_sub$LNG1[-1] - rho_hat_sub * eler_sub$LNG1[-nrow(eler_sub)]

  # Regress the the generalized difference model and estimate new parameters
  model_gd_sub <- lm(LNK1_gd ~ LNP1_gd + LNG1_gd, data = eler_co_sub)
  coefs_sub <- coef(model_gd_sub) 

  # Substitute the parameters into the original model and calculate new residuals 
  resid_sub <- eler_sub$LNK1 - (coefs_sub[1] + coefs_sub[2] * eler_sub$LNP1 + coefs_sub[3] * eler_sub$LNG1)

  # Re-estimate the parameter rho using the new residuals
  model_rho_sub <- lm(resid_sub[-1] ~ resid_sub[-length(resid_sub)])
  rho_hat_new_sub <- coef(model_rho_sub)[2]

  # Check convergence and iterate
  if (abs(rho_hat_new_sub - rho_hat_sub) < threshold) {
    converged <- TRUE
    break
  } else {
    rho_hat_sub <- rho_hat_new_sub
  }
}

# Results
if (converged) {
  cat("Cochrane-Orcutt procedure converged in", iter, "iterations.\n")
  cat("Estimated rho:", round(rho_hat_new_sub, 4), "\n")
} else {
  cat("Cochrane-Orcutt did not converge within", max_iter, "iterations.\n")
}
Cochrane-Orcutt procedure converged in 3 iterations.
Estimated rho: 0.6651 

As a result, \( \hat{\rho} \) converged to 0.6651.

Using this value, the Durbin-Watson statistic was calculated for the generalized differencing model, yielding the following result.

dwstat_co_sub <- sum(diff(resid(model_gd_sub))^2)/sum(resid(model_gd_sub)^2)
cat("Durbin-Watson statistic:", round(dwstat_co_sub, 5), "\n")
Durbin-Watson statistic: 1.52769 

The new Durbin-Watson statistic is approximately 1.53. Since this exceeds the upper bound of the 1% significance level for the Durbin-Watson distribution, the null hypothesis is not rejected.

Therefore, autocorrelation is no longer present, indicating that the Cochrane-Orcutt correction worked effectively.

The table below summarizes the results estimated using the Cochrane-Orcutt procedure based on the full sample through 1984 (Full Estimate) and the subsample through 1973 (Sub Estimate).

gd_summary_sub <- summary(model_gd_sub)
gd_coef_sub <- coef(gd_summary_sub)

beta0_co_sub <- gd_coef_sub[1, "Estimate"] / (1 - rho_hat_new_sub)
beta1_co_sub <- gd_coef_sub[2, "Estimate"]
beta2_co_sub <- gd_coef_sub[3, "Estimate"]

se_beta0_sub <- gd_coef_sub[1, "Std. Error"] / (1 - rho_hat_new_sub)
se_beta1_sub <- gd_coef_sub[2, "Std. Error"]
se_beta2_sub <- gd_coef_sub[3, "Std. Error"]

t_beta0_sub <- beta0_co_sub / se_beta0_sub
t_beta1_sub <- beta1_co_sub / se_beta1_sub
t_beta2_sub <- beta2_co_sub / se_beta2_sub

compare_full_sub <- rbind(
  data.frame(
    Term = c("Intercept", "LNP1", "LNG1"),
    Full_Estimate = round(c(beta0_co, beta1_co, beta2_co), 4),
    Full_t_value = round(c(t_beta0, t_beta1, t_beta2), 2),
    Sub_Estimate = round(c(beta0_co_sub, beta1_co_sub, beta2_co_sub), 4),
    Sub_t_value = round(c(t_beta0_sub, t_beta1_sub, t_beta2_sub), 2),
    row.names = NULL
  ),
  data.frame(
    Term = c("R-squared", "DW statistic"),
    Full_Estimate = c(round(gd_summary$r.squared, 4), round(dwstat_co, 4)),
    Full_t_value = c(NA, NA),
    Sub_Estimate = c(round(gd_summary_sub$r.squared, 4), round(dwstat_co_sub, 4)),
    Sub_t_value = c(NA, NA),
    row.names = NULL
  )
)
kbl(compare_full_sub, 
    caption = "Comparison of Cochrane-Orcutt Estimates of Full and Sub Sample", 
    format = "html", 
    booktabs = TRUE, 
    row.names = FALSE) |> 
  kable_styling(full_width = FALSE)
Comparison of Cochrane-Orcutt Estimates of Full and Sub Sample
Term Full_Estimate Full_t_value Sub_Estimate Sub_t_value
Intercept 0.0289 2.24 0.0392 3.26
LNP1 -0.1699 -1.99 -0.2565 -1.68
LNG1 0.7949 6.20 0.8003 4.71
R-squared 0.7777 NA 0.7551 NA
DW statistic 1.6866 NA 1.5277 NA

Based on the Cochrane-Orcutt corrected estimates using data through 1973, the predicted growth rate of electricity consumption can be calculated as follows:

\[ \begin{align} \Delta \ln(KWH) &= 0.0392 + (-0.2565) \cdot (-0.024) + 0.8003 \cdot 0.035 \\ &\approx 0.0733 \end{align} \]

That is, assuming that GNP and PELEC continue to grow at their 1951–1973 average annual rates of 3.5% and –2.4%, respectively, the forecasted annual growth rate of electricity demand would have been approximately 7.3%. This figure is very close to the NERC forecast of 7.5% for the 1974–1983 period and therefore I agree with Peck and Nelson.

If the forecasters had anticipated a GNP growth rate of 2.5% and an electricity price increase of 4.2%, they would likely have significantly revised their forecast downward. Based on the estimated parameters, the predicted electricity demand growth rate would be:

\[ \begin{align} \Delta \ln(KWH) &= 0.0392 + (-0.2565) \cdot 0.042 + 0.8003 \cdot 0.025 \\ &\approx 0.0484 \end{align} \]

That is, the predicted average growth rate would be approximately 4.8%, which is substantially lower than the original forecast of 7.3%. This downward revision is consistent with the economic relationships captured in the model: the rise in electricity prices contributes to a decrease in consumption, and slower income growth leads to a lower rate of increase in electricity demand.

(f)

The Fisher-Kaysen specification is consistent with economic theory in terms of the signs of the estimated parameters, and the estimates remain stable even when the sample period is extended. This suggests that the model is relatively consistent.

However, the assumption of a constant growth rate of equipment stock may diverge from actual economic behavior. In reality, the saturation of household appliances—such as televisions, refrigerators, and washing machines—means that as household penetration rates approach 100%, the growth in equipment stock slows down. Since it is unlikely that households will own multiple units of the same appliance, failing to incorporate this deceleration into the model may lead to an overestimation of electricity demand over time.

Indeed, as shown in the previous question, while the model predicted a 4.9% annual growth rate based on actual price and income growth, the realized electricity demand growth was only 2.3%. This suggests a tendency of the model to overpredict in the long run, likely due to its treatment of equipment stock growth as constant.

2 Supplemental Problems

Problem 1

(a)

(i)

We test the null hypothesis against the alternative hypothesis as follows:

\[ \begin{align} H_0: \hat{\beta_K} = \hat{\beta_L} \\ H_1: \hat{\beta_K} \neq \hat{\beta_L} \end{align} \]

By defining \( d = \hat{\beta_K} - \hat{\beta_L} \) , we can rewrite these hypotheses as follows:

\[ \begin{align} H_0: d = \hat{\beta_K} - \hat{\beta_L} =0 \\ H_1: d = \hat{\beta_K} - \hat{\beta_L} \neq 0 \end{align} \]

The t-statistic for the hypothesis is

\[ \begin{align} t_d &= \frac{d}{S_d} \end{align} \]

Since

\[ \begin{align} S_d &= \sqrt{Var(\hat{\beta_K} - \hat{\beta_L})} = \sqrt{Var(\hat{\beta_K}) - Var(\hat{\beta_L})+2Cov(\beta_K,\beta_L)} \\ &=\sqrt{0.258^2-0.219^2+2 \cdot 0.055} \\ &\approx 0.0673 \end{align} \]

We can calculate the t-statistics as follows:

\[ \begin{align} t_d &= \frac{d}{S_d} \\ &= \frac{0.633-0.453}{0.0673} \\ &\approx 2.675 \end{align} \]

The calculated t-statistic is 2.675. Assuming a sufficiently large sample size, the critical value at the 5% significance level is 1.960. Therefore, the null hypothesis is rejected, indicating that the elasticities of capital and labor are not statistically identical.

(ii)

We test the null hypothesis against the alternative hypothesis as follows:

\[ \begin{align} H_0: \hat{\beta_K} + \hat{\beta_L} = 1 \\ H_1: \hat{\beta_K} + \hat{\beta_L} \neq 1 \end{align} \]

By defining \( c = \hat{\beta_K} + \hat{\beta_L} \) , the t-statistic for the hypothesis is

\[ \begin{align} t_c &= \frac{c-1}{S_c} \end{align} \]

Since

\[ \begin{align} S_c &= \sqrt{Var(\hat{\beta_K} + \hat{\beta_L})} = \sqrt{Var(\hat{\beta_K}) + Var(\hat{\beta_L})+2Cov(\beta_K,\beta_L)} \\ &=\sqrt{0.258^2+0.219^2+2 \cdot 0.055} \\ &\approx 0.4738 \end{align} \]

We can calculate the t-statistics as follows:

\[ \begin{align} t_d &= \frac{c-1}{S_d} \\ &= \frac{0.633+0.453-1}{0.4738} \\ &\approx 1.815 \end{align} \]

The calculated t-statistic is 1.815. Assuming a sufficiently large sample size, the critical value at the 10% significance level is 1.645, while the statistic is below the 5% significance level critical value of 1.960. Therefore, the null hypothesis is rejected at the 10% significance level, and we can conclude that returns to scale are not constant.

(b)

The number of observations in the sample affects the above conclusions. When the sample size is smaller, the critical value becomes larger. Therefore, even if the null hypothesis is rejected under the assumption of a sufficiently large sample size, it may not be rejected when the sample size is limited.

Problem 2

(a)

To test the stated hypothesis, the null and alternative hypotheses are:

\[ \begin{align} H_0 &: \beta_4 = 0 \\ H_1 &: \beta_4 \neq 0 \end{align} \]

To perform the test, we use the following test statistic:

\[ t_{\hat{\beta}_4} = \frac{\hat{\beta_4}}{S_{\hat{\beta}_4}} \]

The degrees of freedom are n - k = 650 - 4 = 646.

Since the critical value is 1.960 at the 5% significance level for a two-sided test,

\[ \begin{align} \left\{ \begin{array}{ll} \text{if } \left| t_{\hat{\beta}_4} \right| > 1.960 &: \text{we reject } H_0, \text{ and the stated hypothesis is not supported}. \\ \text{otherwise} &:\text{we do not reject } H_0, \text{ and the stated hypothesis is not rejected}. \end{array} \right. \end{align} \]

(b)

To test the stated hypothesis, the null and alternative hypotheses are:

\[ \begin{align} H_0 &: \beta_4 = 0 \quad \text{and} \quad \beta_5 = 0 \\ H_1 &: \beta_4 \neq 0 \quad \text{or} \quad \beta_5 \neq 0 \end{align} \]

To conduct this test, we estimate both the restricted and unrestricted models.

Restricted model:

\[ \begin{align} \text{Restricted model: } &\ln(WAGE) = \beta_1 + \beta_2ED + \beta_3 EX + \varepsilon \\ \text{Unrestricted model: } &\ln(WAGE) = \beta_1 + \beta_2 ED + \beta_3 EX + \beta_4FE + \beta_5 NONWH + \varepsilon \end{align} \]

Then, compute the F-statistic as:

\[ F = \frac{(RSS_r - RSS_u) / q}{RSS_u / (n - k)} \]

Where:

  • \( RSS_r \): residual sum of squares from the restricted model

  • \( RSS_u \): residual sum of squares from the unrestricted model

  • \( q = 2 \), \( n = 650 \), \( k = 5 \)

The degrees of freedom: numerator = 2, denominator = 645

Since the critical value is approximately 3.00 at the 5% significance level,

\[ \begin{align} \left\{ \begin{array}{ll} \text{if } F > 3.00 &: \text{we reject } H_0, \text{ and the stated hypothesis is not supported.} \\ \text{otherwise}&: \text{we do not reject } H_0, \text{ and the stated hypothesis is not rejected.} \end{array} \right. \end{align} \]

(c)

To test the stated hypothesis, we introduce an interaction term \( EX \cdot FE \) into the model:

\[ \ln(WAGE) = \beta_1 + \beta_2 ED + \beta_3 EX + \beta_4 FE + \beta_5 NONWH + \beta_6 (EX \cdot FE) + \varepsilon \]

The null and alternative hypotheses are:

\[ \begin{align} H_0: \beta_6 = 0 \\ H_1: \beta_6 \neq 0 \end{align} \]

To perform the test, we use:

\[ t_{\hat{\beta_6}} = \frac{\hat{\beta_6}}{S_{\hat{\beta_6}}} \]

Degrees of freedom: n - k = 650 - 5 = 645.

Since the critical value is approximately 1.960 at the 5% significance level for a two-sided test,

\[ \begin{align} \left\{ \begin{array}{ll} \text{if } \left| t_{\hat{\beta}_4} \right| > 1.960 &: \text{we reject } H_0, \text{ and the stated hypothesis is not supported}. \\ \text{otherwise} &:\text{we do not reject } H_0, \text{ and the stated hypothesis is not rejected}. \end{array} \right. \end{align} \]

(d)

To test the stated hypothesis, we include interaction terms between \( FE \) and all other regressors:

\[ \begin{align} \ln(WAGE) &= \beta_1 + \beta_2 ED + \beta_3 EX + \beta_4 NONWH + \beta_5 FE \\ &\quad + \beta_6 (ED \cdot FE) + \beta_7 (EX \cdot FE) + \beta_8 (NONWH \cdot FE) + \varepsilon \end{align} \]

The null and alternative hypotheses are:

\[ \begin{align} H_0 &: \beta_6 = \beta_7 = \beta_8 = 0 \\ H_1 &: \text{otherwise} \end{align} \]

We compute the F-statistic using the following models:

Restricted model:

\[ \begin{align} \text{Restricted model: } \ln(WAGE) &= \beta_1 + \beta_2 ED + \beta_3 EX + \beta_4 NONWH + \beta_5FE + \varepsilon \\ \text{Unrestricted model: } \ln(WAGE) &= \beta_1 + \beta_2 ED + \beta_3 EX + \beta_4 NONWH + \beta_5 FE \\ &\quad+ \beta_6 (ED \cdot FE) + \beta_7 (EX \cdot FE) + \beta_8 (NONWH \cdot FE) + \varepsilon \end{align} \]

Then,

\[ F = \frac{(RSS_r - RSS_u) / q}{RSS_u / (n - k)} \]

Where:

  • \( RSS_r \): residual sum of squares from the restricted model

  • \( RSS_u \): residual sum of squares from the unrestricted model

  • \( q = 3 \), \( n = 650 \), \( k = 8 \)

Degrees of freedom: numerator = 3, denominator = 642.

Since the critical value is approximately 2.60 at the 5% significance level,

\[ \begin{align} \left\{ \begin{array}{ll} \text{If } F > 2.60 &: \text{we reject } H_0, \text{ and the stated hypothesis is not supported.} \\ \text{Otherwise}&: \text{we do not reject } H_0, \text{ and the stated hypothesis is not rejected.} \end{array} \right. \end{align} \]

Problem 3

Yes, it will work.

We begin by formulating a multiple regression model without an intercept as follows:

\[ y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + \varepsilon_i \]

The sum of squared residuals \( SSR(\beta) \) for this model is given by:

\[ SSR(\beta) = \sum_{i=1}^{N} \left( y_i - \tilde{\beta_1} x_{i1} - \tilde{\beta_2} x_{i2} - \cdots - \tilde{\beta_k} x_{ik} \right)^2 \]

The first-order conditions for minimizing \( SSR(\beta) \) with respect to each parameter are:

\[ \begin{align} \frac{\partial SSR(\beta)}{\partial \tilde{\beta_1}} &= -2 \sum_{i=1}^{N} \left( y_i - \tilde{\beta_1} x_{i1} - \tilde{\beta_2} x_{i2} - \cdots - \tilde{\beta_k} x_{ik} \right) x_{i1} = 0 \tag{1}\\ \frac{\partial SSR(\beta)}{\partial \tilde{\beta_2}} &= -2 \sum_{i=1}^{N} \left( y_i - \tilde{\beta_1} x_{i1} - \tilde{\beta_2} x_{i2} - \cdots - \tilde{\beta_k} x_{ik} \right) x_{i2} = 0 \tag{2}\\ &\vdots \\ \frac{\partial SSR(\beta)}{\partial \tilde{\beta_k}} &= -2 \sum_{i=1}^{N} \left( y_i - \tilde{\beta_1} x_{i1} - \tilde{\beta_2} x_{i2} - \cdots - \tilde{\beta_k} x_{ik} \right) x_{ik} = 0 \tag{3} \end{align} \]

These are the normal equations that must be satisfied for the least squares estimates \( \tilde{\beta_1}, \ldots, \tilde{\beta_k} \) .

On the other hand, a standard multiple regression model with an intercept is formulated as follows:

\[ y_i = b_0 + b_1 x_{i1} + b_2 x_{i2} + \cdots + b_k x_{ik} + e_i \]

The sum of squared residuals SSR(b) is generally given by:

\[ SSR(b) = \sum_{i=1}^{N} \left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)^2 \]

However, in the present case, we consider a data structure where for each original data point, an additional point is added with the opposite sign for both the dependent and independent variables. As a result, the residual sum of squares is modified as follows:

\[ SSR(b) = \sum_{i=1}^{N} \left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)^2 + \sum_{j=1}^{M} \left( y_j - \hat{a} - \hat{b_1} x_{j1} - \hat{b_2} x_{j2} - \cdots - \hat{b_k} x_{jk} \right)^2 \]

Here, the added data satisfy the relationships:

\[ \begin{align} y_j &= -y_i \\ x_{jk} &= -x_{ik}\\ M &= N \end{align} \]

The first-order conditions for minimizing \( SSR(b) \) with respect to each parameter are:

\[ \begin{align} \frac{\partial SSR(b)}{\partial \hat{a}} = -2 \sum_{i=1}^{N} &\left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right) \\ &-2 \sum_{j=1}^{M} \left( y_j - \hat{a} - \hat{b_1} x_{j1} - \hat{b_2} x_{j2} - \cdots - \hat{b_k} x_{jk} \right) = 0 \tag{4} \\ \frac{\partial SSR(b)}{\partial \hat{b_1}} = -2 \sum_{i=1}^{N} &\left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right) x_{i1} \\ &-2 \sum_{j=1}^{M} \left( y_j - \hat{a} - \hat{b_1} x_{j1} - \hat{b_2} x_{j2} - \cdots - \hat{b_k} x_{jk} \right)x_{j1}= 0 \tag{5} \\ \frac{\partial SSR(b)}{\partial \hat{b_2}} = -2 \sum_{i=1}^{N} &\left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right) x_{i2} \\ &-2 \sum_{j=1}^{M} \left( y_j - \hat{a} - \hat{b_1} x_{j1} - \hat{b_2} x_{j2} - \cdots - \hat{b_k} x_{jk} \right)x_{j2}= 0 \tag{6} \\ &\vdots \\ \frac{\partial SSR(b)}{\partial \hat{b_k}} = -2 \sum_{i=1}^{N} &\left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right) x_{ik} \\ &-2 \sum_{j=1}^{M} \left( y_j - \hat{a} - \hat{b_1} x_{j1} - \hat{b_2} x_{j2} - \cdots - \hat{b_k} x_{jk} \right)x_{jk}= 0 \tag{7} \end{align} \]

Dividing both sides of the Equation (4) by –2 and substituting the relationships \( y_j = -y_i \) , \( x_{jk} = -x_{ik} \) , \( M = N \), we obtain:

\[ \begin{align} &\sum_{i=1}^{N} \left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right) + \sum_{i=1}^{N} \left( (-y_i) - \hat{a} - \hat{b_1} (-x_{i1}) - \hat{b_2} (-x_{i2}) - \cdots - \hat{b_k} (-x_{ik}) \right) \\ &=2\sum_{i=1}^{N} \hat{a} = 0 \end{align} \] Therefore, we obtain \( \hat{a} = 0 \) , indicating that the intercept of this model is zero.

Substituting \( \hat{a} = 0 \) , equations (5) can similarly be rewritten as follows.

\[ \begin{align} (5) \quad &\sum_{i=1}^{N} \left( y_i - \hat{a} - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{i1} + \sum_{i=1}^{N} \left( (-y_i) - \hat{a} - \hat{b_1} (-x_{i1}) - \hat{b_2} (-x_{i2}) - \cdots - \hat{b_k} (-x_{ik}) \right)(-x_{i1}) \\ &=\sum_{i=1}^{N} \left( y_i - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{i1} + \sum_{i=1}^{N} \left( y_i - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{i1} \\ &= 2\sum_{i=1}^{N} \left( y_i - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{i1} =0 \end{align} \] Therefore, we obtain:

\[ \begin{align} \sum_{i=1}^{N} \left( y_i - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{i1} =0 \end{align} \]

By applying the same transformation, the first-order conditions can be expressed as follows.

\[ \begin{align} \sum_{i=1}^{N} &\left( y_i - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{i1} =0 \\ \sum_{i=1}^{N} &\left( y_i - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{i2} =0 \\ &\vdots \\ \sum_{i=1}^{N} &\left( y_i - \hat{b_1} x_{i1} - \hat{b_2} x_{i2} - \cdots - \hat{b_k} x_{ik} \right)x_{ik} =0 \end{align} \]

These are equivalent to equations (1) through (3), and thus it follows that \( \hat{\beta}_1 = \hat{b}_1, \ldots, \hat{\beta}_k = \hat{b}_k \) .

Problem 4

1. Omitted Variable

Let the true model be:

\[ U_{in} = \alpha_i + \beta X_{in} + \gamma Z_{in} + \varepsilon_{in} \]

where \( \varepsilon_{in} \sim N(0, \sigma^2) \), and is i.i.d.

The observable regression model is:

\[ U_{in} = \alpha_i + \beta X_{in} + \tilde{\varepsilon}_{in} \]

and we have:

\[ \tilde{\varepsilon}_{in} = \gamma Z_{in} + \varepsilon_{in} \]

For \( \tilde{\varepsilon}_{in} \) to be i.i.d., the following two consitions must hold:

\[ \begin{align} \text{Homoskedasticity: } &Var(\tilde{\varepsilon}_{in}) = \sigma_{\tilde{\varepsilon}}^2 \quad \text{for all } i, n \\ \text{Independence: } &Cov(\tilde{\varepsilon}_{in}, \tilde{\varepsilon}_{jm}) = 0 \quad \text{for all } i \neq j \text{ or } n \neq m \end{align} \]

About Homoskedasticity, we compute:

\[ Var(\tilde{\varepsilon}_{in}) = Var(\gamma Z_{in} + \varepsilon_{in}) = \gamma^2 Var(Z_{in}) + Var(\varepsilon_{in}) + 2\gamma \cdot Cov(Z_{in}, \varepsilon_{in}) \]

Here, we assume \( Var(\varepsilon_{in}) = \sigma^2 \) and \( Cov(Z_{in}, \varepsilon_{in}) = 0 \). Therefore, if \( Z_{in} \) is homoskedastic, then \( \tilde{\varepsilon}_{in} \) is also homoskedastic.

However, it is not reasonable to assume that \( Z_{in} \) is homoskedastic. It is more plausible that the variance of utility differs across individuals and alternatives.

Hence, it is not reasonable to assume that \( \tilde{\varepsilon}_{in} \) is homoskedastic.

About Independence, we compute:

\[ Cov(\tilde{\varepsilon}_{in}, \tilde{\varepsilon}_{jm}) = \gamma^2 Cov(Z_{in}, Z_{jm}) + \gamma Cov(Z_{in}, \varepsilon_{jm}) + \gamma Cov(Z_{jm}, \varepsilon_{in}) + Cov(\varepsilon_{in}, \varepsilon_{jm}) \]

Assuming \( Cov(Z_{in}, \varepsilon_{jm}) = Cov(Z_{jm}, \varepsilon_{in}) = Cov(\varepsilon_{in}, \varepsilon_{jm}) = 0 \), then we have:

\[ Cov(\tilde{\varepsilon}_{in}, \tilde{\varepsilon}_{jm}) = \gamma^2 Cov(Z_{in}, Z_{jm}) \]

Thus, if \( Cov(Z_{in}, Z_{jm}) = 0 \), then \( \tilde{\varepsilon}_{in} \) and \( \tilde{\varepsilon}_{jm} \) are uncorrelated.

Thus, if \( Cov(Z_{in}, Z_{jm}) = 0 \), then \( \tilde{\varepsilon}_{in} \) and \( \tilde{\varepsilon}_{in} \) are uncorrelated.

However, it is not reasonable to assume that \( Z_{in} \) and \( Z_{jm} \) are uncorrelated. For example, if the alternatives are car, bus, and train, then an individual who derives high utility from the bus is also likely to derive high utility from the train (i.e., prefer public transportation over driving). This suggests that the unobserved components of utility for different alternatives may be positively correlated.

Therefore, it is not reasonable to assume that \( \tilde{\varepsilon}_{in} \) is independent.

Putting this together, for \( \tilde{\varepsilon}_{in} = \gamma Z_{in} + \varepsilon_{in} \) to be i.i.d., we would need:

  • \( Z_{in} \) to have constant variance (homoskedasticity), and
  • \( Z_{in} \) and \( Z_{jm} \) to be uncorrelated (independence)

However, both assumptions are unlikely to hold in practice. Therefore, it is not reasonable to assume that \( \tilde{\varepsilon}_{in} \) is i.i.d.