Describa de la manera máss formal pero solo con un párrafo cada una de las palabras clave en (Wooldridge, 2016, pp. 517)
Fuente: Glosario de Wooldridge, J. M. (2016). Introductory Econometrics: A Modern Approach (6th ed.). Moston, MA: Cengage Learning.
Use SMOKE for this exercise.
library(wooldridge)
## Warning: package 'wooldridge' was built under R version 3.6.3
data('smoke')
A model to estimate the effects of smoking on annual income (perhaps through lost work days due to illness, or productivity effects) is
\(log(income)=\beta_0+\beta_1cigs+\beta_2educ+\beta_3age+\beta_4age^2+u_1,\)
where \(cigs\) is number of cigarettes smoked per day, on average. How do you interpret \(\beta_1\)?
m1_c1<-gmm::gmm(lincome~cigs+educ+age+agesq,~cigs+educ+age+agesq,data=smoke,type="iterative")
summary(m1_c1)
##
## Call:
## gmm::gmm(g = lincome ~ cigs + educ + age + agesq, x = ~cigs +
## educ + age + agesq, type = "iterative", data = smoke)
##
##
## Method: iterative
##
## Kernel: Quadratic Spectral
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.7954e+00 2.1817e-01 3.5731e+01 1.2880e-279
## cigs 1.7306e-03 1.5142e-03 1.1429e+00 2.5309e-01
## educ 6.0361e-02 7.2090e-03 8.3729e+00 5.6202e-17
## age 5.7691e-02 9.8861e-03 5.8356e+00 5.3605e-09
## agesq -6.3059e-04 1.0559e-04 -5.9721e+00 2.3426e-09
##
## J-Test: degrees of freedom is 0
## J-test P-value
## Test E(g)=0: 1.80437408079986e-20 *******
Es el incremento porcentual en el ingreso (dado que la variable dependiente está en escala logarítmica) por cada cigarrillo fumado. En realidad esta variable no es significativa en el modelo propuesto.
To reflect the fact that cigarette consumption might be jointly determined with income, a demand for cigarettes equation is
\(cigs=\gamma_0+\gamma_1log(income)+\gamma_2educ+\gamma_3age+\gamma_4age^2+\gamma_5log(cigpric)+\gamma_6restaurn+u_2,\)
where \(cigpric\) is the price of a pack of cigarettes (in cents) and \(restaurn\) is a binary variable equal to unity if the person lives in a state with restaurant smoking restrictions. Assuming these are exogenous to the individual, what signs would you expect for \(\gamma_5\) and \(\gamma_6\)?
m2_c1<-gmm::gmm(cigs~lincome+educ+age+agesq+lcigpric+restaurn,~lincome+educ+age+agesq+lcigpric+restaurn, data=smoke,type="iterative")
summary(m2_c1)
##
## Call:
## gmm::gmm(g = cigs ~ lincome + educ + age + agesq + lcigpric +
## restaurn, x = ~lincome + educ + age + agesq + lcigpric +
## restaurn, type = "iterative", data = smoke)
##
##
## Method: iterative
##
## Kernel: Quadratic Spectral
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.6398e+00 2.4065e+01 -1.5125e-01 8.7978e-01
## lincome 8.8027e-01 6.4169e-01 1.3718e+00 1.7012e-01
## educ -5.0150e-01 1.6759e-01 -2.9924e+00 2.7681e-03
## age 7.7069e-01 1.3744e-01 5.6076e+00 2.0517e-08
## agesq -9.0228e-03 1.4616e-03 -6.1732e+00 6.6914e-10
## lcigpric -7.5086e-01 5.6459e+00 -1.3299e-01 8.9420e-01
## restaurn -2.8251e+00 1.0001e+00 -2.8249e+00 4.7295e-03
##
## J-Test: degrees of freedom is 0
## J-test P-value
## Test E(g)=0: 6.53942503758163e-20 *******
Se espera que tengan signos negativos pues, el incremento en el precio reduciría la demanda, así como las restricciones a fumar en restaurantes. en efecto las variables tienen signos negativos pero el precio de los cigarrillos no es significativo.
Under what assumption is the income equation from part (i) identified?
Se necesita que almenos una de las variables instrumentales (el logaritmo del precio del cigarrillo y la restricción de fumar en restaurantes) sea estadísticamente significativa
Estimate the income equation by \(OLS\) and discuss the estimate of \(\beta_1\).
m3_c1<-lm(lincome~cigs+educ+age+agesq,data=smoke)
summary(m3_c1)
##
## Call:
## lm(formula = lincome ~ cigs + educ + age + agesq, data = smoke)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6237 -0.2978 0.1314 0.4167 1.3542
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.795e+00 1.704e-01 45.741 < 2e-16 ***
## cigs 1.731e-03 1.714e-03 1.010 0.313
## educ 6.036e-02 7.898e-03 7.642 6.10e-14 ***
## age 5.769e-02 7.644e-03 7.548 1.21e-13 ***
## agesq -6.306e-04 8.338e-05 -7.563 1.08e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6529 on 802 degrees of freedom
## Multiple R-squared: 0.165, Adjusted R-squared: 0.1608
## F-statistic: 39.61 on 4 and 802 DF, p-value: < 2.2e-16
Como se había mostrado con GMM, \(\beta_1\) no es significativo
Estimate the reduced form for \(cigs\). (Recall that this entails regressing \(cigs\) on all exogenous variables.) Are \(log(cigpric)\) and \(restaurn\) significant in the reduced form?
m4_c1<-gmm::gmm(cigs~educ+age+agesq+lcigpric+restaurn,~educ+age+agesq+lcigpric+restaurn, data=smoke,type="iterative")
summary(m4_c1)
##
## Call:
## gmm::gmm(g = cigs ~ educ + age + agesq + lcigpric + restaurn,
## x = ~educ + age + agesq + lcigpric + restaurn, type = "iterative",
## data = smoke)
##
##
## Method: iterative
##
## Kernel: Quadratic Spectral
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5801e+00 2.3734e+01 6.6576e-02 9.4692e-01
## educ -4.5015e-01 1.6025e-01 -2.8090e+00 4.9700e-03
## age 8.2254e-01 1.3427e-01 6.1262e+00 8.9980e-10
## agesq -9.5903e-03 1.4362e-03 -6.6777e+00 2.4276e-11
## lcigpric -3.5132e-01 5.6545e+00 -6.2130e-02 9.5046e-01
## restaurn -2.7364e+00 9.9204e-01 -2.7583e+00 5.8097e-03
##
## J-Test: degrees of freedom is 0
## J-test P-value
## Test E(g)=0: 2.99814665856364e-20 *******
En la forma reducida lcigpric no es significativa pero restaurn sí
Now, estimate the income equation by \(2SLS\). Discuss how the estimate of \(\beta_1\) compares with the \(OLS\) estimate.
m5_c1<-AER::ivreg(lincome~cigs+educ+age+agesq|.-cigs+lcigpric+restaurn,data=smoke)
summary(m5_c1)
##
## Call:
## AER::ivreg(formula = lincome ~ cigs + educ + age + agesq | . -
## cigs + lcigpric + restaurn, data = smoke)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.13055 -0.44952 -0.05524 0.52926 3.09278
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.7808934 0.2298673 33.850 < 2e-16 ***
## cigs -0.0421257 0.0262184 -1.607 0.108509
## educ 0.0396746 0.0162811 2.437 0.015032 *
## age 0.0938182 0.0238534 3.933 9.11e-05 ***
## agesq -0.0010508 0.0002743 -3.831 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.88 on 802 degrees of freedom
## Multiple R-Squared: -0.5169, Adjusted R-squared: -0.5245
## Wald test: 22.31 on 4 and 802 DF, p-value: < 2.2e-16
Mejora un poco su significancia con respecto a \(OLS\) pero sigue siendo no significativo incluso al 10%
Do you think that cigarette prices and restaurant smoking restrictions are exogenous in the income equation?
cor(cbind(m1_c1$residuals,smoke$lincome),m4_c1$model)
## cigs educ age agesq lcigpric
## [1,] 1.487333e-15 1.366530e-14 3.085611e-15 2.783702e-15 0.06801587
## [2,] 6.599159e-02 3.137384e-01 -4.166856e-02 -9.959236e-02 0.07684559
## restaurn
## [1,] 0.07806637
## [2,] 0.08740495
Aunque no están correlacionados con el término de error en el modelo por OLSm la correlación con el nivel de ingreso es igualmente baja, por lo que no serían un buen instrumento. En cuanto a si son exógenas, pues hay que considerar lo que pasa con el precio de algo que se prohibe, y es que sube. si el precio sube, se esperaría que el mayor consumo se presentara en los altos ingresos, sin embargo, como se trata de un elemento que genera adicción, incluso los más pobres harían sacrificios para consumirlo, así que no parece haber evidencia alguna de que esta variable pueda contribuir a explicar el ingreso como exógena.
Use MROZ for this exercise.
data('mroz')
Reestimate the labor supply function in Example 16.5, using \(log(hours)\) as the dependent variable. Compare the estimated elasticity (which is now constant) to the estimate obtained from equation (16.24) at the average hours worked.
m1_c2<-AER::ivreg(log(hours)~lwage+educ+age+kidslt6+nwifeinc|.-lwage+exper+expersq,data=mroz) #no es necesario escribir todas las variables exógenas, basta con indicar cual variable es la que se estima en el primer paso con un (-) y cuales son los instrumentos z a añadir con un (+)
summary(m1_c2)
##
## Call:
## AER::ivreg(formula = log(hours) ~ lwage + educ + age + kidslt6 +
## nwifeinc | . - lwage + exper + expersq, data = mroz)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.01985 -0.68471 0.02881 0.73494 7.31966
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.370233 0.689011 12.148 < 2e-16 ***
## lwage 1.994349 0.564309 3.534 0.000454 ***
## educ -0.235461 0.070872 -3.322 0.000970 ***
## age -0.013525 0.011246 -1.203 0.229795
## kidslt6 -0.465439 0.219367 -2.122 0.034442 *
## nwifeinc -0.013904 0.007932 -1.753 0.080350 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.624 on 422 degrees of freedom
## Multiple R-Squared: -1.776, Adjusted R-squared: -1.809
## Wald test: 4.811 on 5 and 422 DF, p-value: 0.0002738
La elasticidad obtenida con este modelo es mayor a la de la ecuación 16.24
In the labor supply equation from part (i), allow \(educ\) to be endogenous because of omitted ability. Use \(motheduc\) and \(fatheduc\) as IVs for \(educ\). Remember, you now have two endogenous variables in the equation.
m2_c2<-AER::ivreg(log(hours)~lwage+educ+age+kidslt6+nwifeinc|.-lwage+exper+expersq-educ+motheduc+fatheduc,data=mroz)
summary(m2_c2)
##
## Call:
## AER::ivreg(formula = log(hours) ~ lwage + educ + age + kidslt6 +
## nwifeinc | . - lwage + exper + expersq - educ + motheduc +
## fatheduc, data = mroz)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.60141 -0.62622 0.03879 0.68524 6.83474
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.260764 1.019392 7.123 4.6e-12 ***
## lwage 1.810915 0.497769 3.638 0.000309 ***
## educ -0.128606 0.087433 -1.471 0.142063
## age -0.011601 0.010586 -1.096 0.273729
## kidslt6 -0.543186 0.211334 -2.570 0.010504 *
## nwifeinc -0.018906 0.008783 -2.152 0.031930 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.536 on 422 degrees of freedom
## Multiple R-Squared: -1.482, Adjusted R-squared: -1.511
## Wald test: 5.183 on 5 and 422 DF, p-value: 0.0001253
Test the overidentifying restrictions in the \(2SLS\) estimation from part (ii). Do the IVs pass the test?
Primero añadimos los residuales del modelo (ii) a la base de datos y los regresamos frente a las variables exógenas:
mroz2<-m2_c2$model
mroz2$u1<-m2_c2$residuals
test<-lm(u1~age+kidslt6+nwifeinc+exper+expersq+motheduc+fatheduc, data=mroz2)
summary(test)
##
## Call:
## lm(formula = u1 ~ age + kidslt6 + nwifeinc + exper + expersq +
## motheduc + fatheduc, data = mroz2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5879 -0.6505 0.0549 0.6906 6.8446
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0041712 0.6062841 0.007 0.995
## age -0.0005731 0.0122421 -0.047 0.963
## kidslt6 0.0031535 0.2027493 0.016 0.988
## nwifeinc 0.0005676 0.0073721 0.077 0.939
## exper -0.0043184 0.0304801 -0.142 0.887
## expersq 0.0001953 0.0009253 0.211 0.833
## motheduc 0.0144205 0.0276602 0.521 0.602
## fatheduc -0.0131371 0.0256754 -0.512 0.609
##
## Residual standard error: 1.538 on 420 degrees of freedom
## Multiple R-squared: 0.001043, Adjusted R-squared: -0.01561
## F-statistic: 0.06264 on 7 and 420 DF, p-value: 0.9996
Posteriormente probamos la hipotesis nula de que los parametros estimados son cero, para ello calculamos el J statistic como \(J=mF\) dónde m es el número de parametros en la regresión de los residuales. Este estadístico J se comporta como una chi cuadrado con \(m-k\) grados de libertad (siendo k el número de variables endógenas).
Ft<-summary(test)$fstatistic[[1]]
J<-8*Ft
pchisq(J, df=(8-2), lower.tail=FALSE)
## [1] 0.9978247
Se acepta la hipóteis nula de que los parametros son cero, es decir, todos los instrumentos son exógenos.
Use the data in OPENNESS for this exercise.
data('openness')
Because \(log(pcinc)\) is insignificant in both (16.22) and the reduced form for \(open\), drop it from the analysis. Estimate (16.22) by \(OLS\) and \(IV\) without \(log(pcinc)\). Do any important conclusions change?
# OLS
m1_c3<-lm(inf~open,data=openness)
summary(m1_c3)
##
## Call:
## lm(formula = inf ~ open, data = openness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.161 -8.867 -5.836 0.426 186.453
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.23421 4.10212 6.152 1.22e-08 ***
## open -0.21495 0.09328 -2.304 0.023 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.55 on 112 degrees of freedom
## Multiple R-squared: 0.04527, Adjusted R-squared: 0.03675
## F-statistic: 5.311 on 1 and 112 DF, p-value: 0.02304
# IV Estimators
m2_c3<-AER::ivreg(inf~open|.-open+lland,data=openness)
summary(m2_c3)
##
## Call:
## AER::ivreg(formula = inf ~ open | . - open + lland, data = openness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.144 -9.918 -6.125 2.571 184.816
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.6066 5.6583 5.232 7.88e-07 ***
## open -0.3329 0.1403 -2.372 0.0194 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.72 on 112 degrees of freedom
## Multiple R-Squared: 0.03165, Adjusted R-squared: 0.023
## Wald test: 5.625 on 1 and 112 DF, p-value: 0.01941
los betas estimados son mayores cuando se usan los instrumentos, pero el \(R^2\) es menor y el error estándar ligeramente superior. Con respecto a los resultados cuando se tenía \(log(pcinc)\) en la página 509 - 510, el parametro \(\beta\) de open es practicamente el mismo.
Still leaving \(log(pcinc)\) out of the analysis, is \(land\) or \(log(land)\) a better instrument for \(open\)? (Hint: Regress \(open\) on each of these separately and jointly.)
m3_c3<-lm(open~land+lland,data=openness)
print("modelo con ambos instrumentos")
## [1] "modelo con ambos instrumentos"
summary(m3_c3)
##
## Call:
## lm(formula = open ~ land + lland, data = openness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.174 -8.815 -2.580 5.295 79.877
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.292e+02 1.047e+01 12.341 < 2e-16 ***
## land 4.334e-06 3.136e-06 1.382 0.17
## lland -8.398e+00 9.755e-01 -8.609 5.55e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.66 on 111 degrees of freedom
## Multiple R-squared: 0.4573, Adjusted R-squared: 0.4476
## F-statistic: 46.77 on 2 and 111 DF, p-value: 1.846e-15
m4_c3<-lm(open~land,data=openness)
print("modelo solo con land")
## [1] "modelo solo con land"
summary(m4_c3)
##
## Call:
## lm(formula = open ~ land, data = openness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.220 -14.393 -5.277 8.796 123.353
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.045e+01 2.342e+00 17.271 < 2e-16 ***
## land -1.128e-05 3.289e-06 -3.429 0.000848 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.7 on 112 degrees of freedom
## Multiple R-squared: 0.09502, Adjusted R-squared: 0.08694
## F-statistic: 11.76 on 1 and 112 DF, p-value: 0.000848
m5_c3<-lm(open~lland,data=openness)
print("modelo solo con log(land)")
## [1] "modelo solo con log(land)"
summary(m5_c3)
##
## Call:
## lm(formula = open ~ lland, data = openness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.945 -8.848 -3.025 6.285 83.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 121.838 9.044 13.472 < 2e-16 ***
## lland -7.618 0.799 -9.534 3.93e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.73 on 112 degrees of freedom
## Multiple R-squared: 0.448, Adjusted R-squared: 0.4431
## F-statistic: 90.9 on 1 and 112 DF, p-value: 3.932e-16
Cuando se usan ambos solamente \(log(land)\) es significativa, mientras que si se evalúan por separado ambas son significativas. En este caso es mejor instrumento \(Log(land)\) porque el \(R^2\) es mayor, es decir, se correlaciona más con open.
Now, return to (16.22). Add the dummy variable \(oil\) to the equation and treat it as exogenous. Estimate the equation by \(IV\). Does being an oil producer have a ceteris paribus effect on inflation?
m6_c3<-AER::ivreg(inf~open+lpcinc+oil|.-open+lland,data=openness)
summary(m6_c3)
##
## Call:
## AER::ivreg(formula = inf ~ open + lpcinc + oil | . - open + lland,
## data = openness)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.715 -10.408 -5.132 2.651 184.619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.0089 16.0355 1.497 0.1372
## open -0.3370 0.1445 -2.332 0.0215 *
## lpcinc 0.8033 2.1179 0.379 0.7052
## oil -6.5557 9.8014 -0.669 0.5050
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.89 on 110 degrees of freedom
## Multiple R-Squared: 0.03492, Adjusted R-squared: 0.008603
## Wald test: 2.006 on 3 and 110 DF, p-value: 0.1173
Dado que la variable no es significativa, se puede afirmar que ser productor de petroleo no tiene efecto en la inflación
Use the data in CONSUMP for this exercise.
data('consump')
In Example 16.7, use the method from Section 15-5 to test the single overidentifying restriction in estimating (16.35). What do you conclude?
Estimamos el modelo por 2SLS
m1_c4<-AER::ivreg(gc~gy+r3|gc_1+gy_1+r3_1,data=consump)
summary(m1_c4)
##
## Call:
## AER::ivreg(formula = gc ~ gy + r3 | gc_1 + gy_1 + r3_1, data = consump)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0135627 -0.0035412 -0.0006202 0.0036514 0.0128639
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0080597 0.0032327 2.493 0.018026 *
## gy 0.5861880 0.1345737 4.356 0.000128 ***
## r3 -0.0002694 0.0007640 -0.353 0.726698
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.007471 on 32 degrees of freedom
## Multiple R-Squared: 0.6779, Adjusted R-squared: 0.6578
## Wald test: 9.586 on 2 and 32 DF, p-value: 0.0005468
Regresion de los residuos sobre las variables exógenas
consump2<-m1_c4$model
consump2$u1<-m1_c4$residuals
m2_c4<-lm(u1~gc_1+gy_1+r3_1, data=consump2)
summary(m2_c4)
##
## Call:
## lm(formula = u1 ~ gc_1 + gy_1 + r3_1, data = consump2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0124071 -0.0040265 -0.0008372 0.0049171 0.0142163
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.038e-03 2.470e-03 -0.420 0.677
## gc_1 -1.074e-01 1.785e-01 -0.602 0.552
## gy_1 1.474e-01 1.255e-01 1.175 0.249
## r3_1 5.419e-05 6.295e-04 0.086 0.932
##
## Residual standard error: 0.007354 on 31 degrees of freedom
## Multiple R-squared: 0.06132, Adjusted R-squared: -0.02952
## F-statistic: 0.6751 on 3 and 31 DF, p-value: 0.5739
Y con los resultados ejercemos la prueba
Chi_c4<-nrow(m2_c4$model)*summary(m2_c4)$r.squared
pchisq(Chi_c4, df=(3-2), lower.tail=FALSE)
## [1] 0.1429138
Con este resultado se acepta la hipotesis nula de que el modelo está sobreidentificado
Campbell and Mankiw (1990) use \(second\) lags of all variables as IVs because of potential data measurement problems and informational lags. Reestimate (16.35), using only \(gC_{t-2}\), \(gy_{t-2}\), and \(r3_{t-2}\) as IVs. How do the estimates compare with those in (16.36)?
m3_c4<-AER::ivreg(gc~gy+r3|gc_2+gy_2+r3_2,data=consump)
summary(m3_c4)
##
## Call:
## AER::ivreg(formula = gc ~ gy + r3 | gc_2 + gy_2 + r3_2, data = consump)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.037548 -0.006751 0.002353 0.008976 0.021476
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0054347 0.0273609 -0.199 0.844
## gy 1.2042047 1.2717876 0.947 0.351
## r3 -0.0004262 0.0019588 -0.218 0.829
##
## Residual standard error: 0.01406 on 31 degrees of freedom
## Multiple R-Squared: -0.1161, Adjusted R-squared: -0.1881
## Wald test: 0.4799 on 2 and 31 DF, p-value: 0.6233
En este caso la variación en el ingreso deja de ser significativa estadísticamente
Regress \(gy_t\) on the IVs from part (ii) and test whether \(gy_t\) is sufficiently correlated with them. Why is this important?
m4_c4<-lm(gy~gc_2+gy_2+r3_2,data=consump)
summary(m4_c4)
##
## Call:
## lm(formula = gy ~ gc_2 + gy_2 + r3_2, data = consump)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.038905 -0.010402 -0.000693 0.011047 0.038265
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0207718 0.0065528 3.170 0.0035 **
## gc_2 -0.0702152 0.4693316 -0.150 0.8821
## gy_2 0.0937446 0.3301229 0.284 0.7784
## r3_2 0.0007375 0.0016570 0.445 0.6594
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01933 on 30 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.01371, Adjusted R-squared: -0.08492
## F-statistic: 0.139 on 3 and 30 DF, p-value: 0.9359
Ninguno de los instrumentos es significativo. Esto es importante porque muestra la validez del instrumento para estimar \(gy_t\)
Use the Economic Report of the President (2005 or later) to update the data in CONSUMP, at least through 2003. Reestimate equation (16.35). Do any important conclusions change?
m1_c5<-AER::ivreg(gc~gy+r3|gc_1+gy_1+r3_1,data=consump_act_2005)
summary(m1_c5)
##
## Call:
## AER::ivreg(formula = gc ~ gy + r3 | gc_1 + gy_1 + r3_1, data = consump_act_2005)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0311647 -0.0067907 -0.0004172 0.0068834 0.0367990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.921e-03 6.807e-03 0.576 0.56774
## gy 9.164e-01 3.154e-01 2.906 0.00589 **
## r3 -2.834e-05 1.178e-03 -0.024 0.98093
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01252 on 41 degrees of freedom
## Multiple R-Squared: 0.08408, Adjusted R-squared: 0.0394
## Wald test: 4.318 on 2 and 41 DF, p-value: 0.01987
Ha incrementado el coeficiente de \(gy_t\) de 0.5 a 0.9, las demás variables no son significativas.
Use the data in CEMENT for this exercise.
data('cement')
A static (inverse) supply function for the monthly growth in cement price (\(gprc\)) as a function of growth in quantity (\(gcem\)) is
\(gprc_t=\alpha_1gcem_t+\beta_0+\beta_1gprcpet+\beta_2feb_t+...+\beta_{12}dec_t+u^s_t,\)
where \(gprcpet\) (growth in the price of petroleum) is assumed to be exogenous and feb, …, dec are monthly dummy variables. What signs do you expect for \(\alpha_1\) and \(\beta_1\)? Estimate the equation by \(OLS\). Does the supply function slope upward?
m1_c6<-lm(gprc~gcem+gprcpet+feb+mar+apr+may+jun+jul+aug+sep+oct+nov+dec,data=cement)
summary(m1_c6)
##
## Call:
## lm(formula = gprc ~ gcem + gprcpet + feb + mar + apr + may +
## jun + jul + aug + sep + oct + nov + dec, data = cement)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.028565 -0.004630 -0.000498 0.002900 0.068610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0144058 0.0031901 4.516 9.26e-06 ***
## gcem -0.0442538 0.0090941 -4.866 1.89e-06 ***
## gprcpet 0.0627563 0.0152639 4.111 5.15e-05 ***
## feb -0.0034284 0.0048033 -0.714 0.475962
## mar 0.0008645 0.0054721 0.158 0.874575
## apr 0.0054941 0.0052582 1.045 0.296972
## may -0.0086713 0.0044149 -1.964 0.050494 .
## jun -0.0109393 0.0045376 -2.411 0.016552 *
## jul -0.0111308 0.0036664 -3.036 0.002621 **
## aug -0.0098232 0.0042707 -2.300 0.022164 *
## sep -0.0165426 0.0036542 -4.527 8.81e-06 ***
## oct -0.0146566 0.0039724 -3.690 0.000269 ***
## nov -0.0264983 0.0031947 -8.295 4.43e-15 ***
## dec -0.0302025 0.0032270 -9.359 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01113 on 284 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.3857, Adjusted R-squared: 0.3576
## F-statistic: 13.72 on 13 and 284 DF, p-value: < 2.2e-16
Dado que alpha acompaña a las cantidades producidas de cemento, y se está estimando una función inversa de oferta, se esperaría que el coeficiente fuera negativo, pues al incrementar las cantidades bajará el precio. En cuanto al precio del petroleo, si se usa este como indicador de los costos de transporte, podría afirmarse que al subir el precio del petroleo, tambien subiría el precio del cemento. En efecto los parametros tienen los signos esperados. La pendiente será positiva dependiendo de cual variable es más grande, si gprcpet o gcem, pues ambas son estadísticamente significativas y de hecho el coeficiente de gprcpet es mayor al de gcem
The variable \(gdefs\) is the monthly growth in real defense spending in the United States. What do you need to assume about \(gdefs\) for it to be a good IV for \(gcem\)? Test whether \(gcem\) is partially correlated with \(gdefs\). (Do not worry about possible serial correlation in the reduced form.) Can you use \(gdefs\) as an \(IV\) in estimating the supply function?
# En primer lugar miramos el test de correlación
cor.test(cement$gdefs,cement$gcem, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: cement$gdefs and cement$gcem
## t = 0.31706, df = 304, p-value = 0.7514
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.09413354 0.13004037
## sample estimates:
## cor
## 0.01818192
Parece ser que las variables no están correlacionadaspues el pvalue es de 0.75 con lo que se acepta la hipótesis nula de que la correlación es igual a cero, si estimamos la forma reducida de \(gcem\) encontraremos que el parametro que acompaña a \(gdefs\) no es significativamente distinto de cero, por lo que no es un buen instrumento.
m2_c6<-lm(gcem~gdefs+gprcpet+feb+mar+apr+may+jun+jul+aug+sep+oct+nov+dec,data=cement)
summary(m2_c6)
##
## Call:
## lm(formula = gcem ~ gdefs + gprcpet + feb + mar + apr + may +
## jun + jul + aug + sep + oct + nov + dec, data = cement)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.36908 -0.03638 0.00063 0.03300 0.31740
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.24817 0.01471 -16.873 < 2e-16 ***
## gdefs -1.05409 3.57969 -0.294 0.76861
## gprcpet 0.06702 0.09978 0.672 0.50232
## feb 0.39723 0.02065 19.233 < 2e-16 ***
## mar 0.48698 0.02065 23.586 < 2e-16 ***
## apr 0.46307 0.02046 22.635 < 2e-16 ***
## may 0.34460 0.02051 16.804 < 2e-16 ***
## jun 0.35679 0.02040 17.494 < 2e-16 ***
## jul 0.20508 0.02047 10.021 < 2e-16 ***
## aug 0.31333 0.02048 15.296 < 2e-16 ***
## sep 0.20370 0.02059 9.894 < 2e-16 ***
## oct 0.26355 0.02066 12.759 < 2e-16 ***
## nov 0.01003 0.02063 0.487 0.62697
## dec -0.06065 0.02058 -2.947 0.00346 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07261 on 292 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.858, Adjusted R-squared: 0.8517
## F-statistic: 135.7 on 13 and 292 DF, p-value: < 2.2e-16
Shea (1993) argues that the growth in output of residential (\(gres\)) and nonresidential (\(gnon\)) construction are valid instruments for \(gcem\). The idea is that these are demand shifters that should be roughly uncorrelated with the supply error \(u^s_t\). Test whether \(gcem\) is partially correlated with \(gres\) and \(gnon\); again, do not worry about serial correlation in the reduced form.
# Pimero hacemos el test con gres
cor.test(cement$gres,cement$gcem, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: cement$gres and cement$gcem
## t = 1.4264, df = 307, p-value = 0.1548
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03071636 0.19098700
## sample estimates:
## cor
## 0.08113889
# posteriormente el test con gnon
cor.test(cement$gnon,cement$gcem, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: cement$gnon and cement$gcem
## t = 1.5973, df = 307, p-value = 0.1112
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02100299 0.20033476
## sample estimates:
## cor
## 0.09078693
En ambos casos parece no haber correlación estadísticamente diferente de cero, por lo que probamos con la forma reducida de \(gcem\)
m3_c6<-lm(gcem~gres+gnon+gprcpet+feb+mar+apr+may+jun+jul+aug+sep+oct+nov+dec,data=cement)
summary(m3_c6)
##
## Call:
## lm(formula = gcem ~ gres + gnon + gprcpet + feb + mar + apr +
## may + jun + jul + aug + sep + oct + nov + dec, data = cement)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.306088 -0.033561 0.003657 0.034204 0.284548
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.243719 0.013940 -17.484 < 2e-16 ***
## gres 0.136148 0.138419 0.984 0.326124
## gnon 1.145457 0.209563 5.466 9.84e-08 ***
## gprcpet 0.036935 0.094542 0.391 0.696319
## feb 0.397565 0.019389 20.504 < 2e-16 ***
## mar 0.470851 0.019854 23.716 < 2e-16 ***
## apr 0.457552 0.019428 23.551 < 2e-16 ***
## may 0.340218 0.019379 17.556 < 2e-16 ***
## jun 0.355545 0.019314 18.409 < 2e-16 ***
## jul 0.197661 0.019426 10.175 < 2e-16 ***
## aug 0.308444 0.019420 15.883 < 2e-16 ***
## sep 0.187809 0.019469 9.646 < 2e-16 ***
## oct 0.270452 0.019391 13.948 < 2e-16 ***
## nov 0.003087 0.019544 0.158 0.874586
## dec -0.071295 0.019565 -3.644 0.000317 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06871 on 294 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.8722, Adjusted R-squared: 0.8661
## F-statistic: 143.3 on 14 and 294 DF, p-value: < 2.2e-16
En este caso, al incluir las demás variables exógenas, gnon sí es significativo y podría ser usado como instrumento mas gres no.
Estimate the supply function, using \(gres\) and \(gnon\) as \(IVs\) for \(gcem\). What do you conclude about the static supply function for cement? [The dynamic supply function is, apparently, upward sloping; see Shea (1993).]
m4_c6<-AER::ivreg(gprc~gcem+gprcpet+feb+mar+apr+may+jun+jul+aug+sep+oct+nov+dec|.-gcem+gres+gnon,data=cement)
summary(m4_c6)
##
## Call:
## AER::ivreg(formula = gprc ~ gcem + gprcpet + feb + mar + apr +
## may + jun + jul + aug + sep + oct + nov + dec | . - gcem +
## gres + gnon, data = cement)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0304061 -0.0041144 -0.0002182 0.0023404 0.0713813
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.022770 0.007256 3.138 0.001878 **
## gcem -0.010551 0.027717 -0.381 0.703747
## gprcpet 0.060495 0.015726 3.847 0.000148 ***
## feb -0.016815 0.011476 -1.465 0.143978
## mar -0.015603 0.013932 -1.120 0.263682
## apr -0.010064 0.013199 -0.762 0.446406
## may -0.020100 0.009940 -2.022 0.044100 *
## jun -0.023021 0.010448 -2.203 0.028372 *
## jul -0.017974 0.006496 -2.767 0.006026 **
## aug -0.020429 0.009306 -2.195 0.028961 *
## sep -0.023383 0.006486 -3.605 0.000369 ***
## oct -0.023555 0.008003 -2.943 0.003516 **
## nov -0.026941 0.003289 -8.191 8.88e-15 ***
## dec -0.028309 0.003615 -7.831 9.66e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0114 on 284 degrees of freedom
## Multiple R-Squared: 0.356, Adjusted R-squared: 0.3265
## Wald test: 11.36 on 13 and 284 DF, p-value: < 2.2e-16
La variable gcem se vuelve no significativa mientras que gprpet mantiene un parametro relativamente similar al obtenido por \(OLS\), podría afirmarse que la demanda no mueve la oferta, pues aunque el incremento en el producto de construcción no residencial pueda ser un buen instrumento en la forma reducida de \(gcem\), la correlación no es muy fuerte, por lo que es mejor dejar \(gcem\) como exógena. por el contrario, los parametros de los meses parecen incrementar, especialmente cuando se trata del último trimestre, por lo que sería mejor incluir la inflación año corrido, o estimar el precio con rezagos temporales.
Refer to Example 13.9 and the data in CRIME4,
data('crime4')
Suppose that, after differencing to remove the unobserved effect, you think \(\Delta log(polpc)\) is simultaneously determined with \(\Delta log(crmrte)\); in particular, increases in crime are associated with increases in police officers. How does this help to explain the positive coefficient on \(\Delta log(polpc)\) in equation (13.33)?
m1_c7<-lm(clcrmrte~d83+d84+d85+d86+d87+clprbarr+clprbcon+clprbpri+clavgsen+clpolpc,data=crime4)
summary(m1_c7)
##
## Call:
## lm(formula = clcrmrte ~ d83 + d84 + d85 + d86 + d87 + clprbarr +
## clprbcon + clprbpri + clavgsen + clpolpc, data = crime4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.65936 -0.07838 0.00296 0.07504 0.68307
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.007713 0.017058 0.452 0.6513
## d83 -0.099866 0.023895 -4.179 3.42e-05 ***
## d84 -0.047937 0.023502 -2.040 0.0419 *
## d85 -0.004611 0.023500 -0.196 0.8445
## d86 0.027514 0.024149 1.139 0.2551
## d87 0.040827 0.024415 1.672 0.0951 .
## clprbarr -0.327494 0.029980 -10.924 < 2e-16 ***
## clprbcon -0.238107 0.018234 -13.058 < 2e-16 ***
## clprbpri -0.165046 0.025969 -6.356 4.49e-10 ***
## clavgsen -0.021761 0.022091 -0.985 0.3250
## clpolpc 0.398426 0.026882 14.821 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1543 on 529 degrees of freedom
## (90 observations deleted due to missingness)
## Multiple R-squared: 0.4325, Adjusted R-squared: 0.4218
## F-statistic: 40.32 on 10 and 529 DF, p-value: < 2.2e-16
El hecho de que el crimen incremente con más policias requeríría de estudios más detallados. Podría suceder que los policias estén involucrados en crimen, o que los incrementos del número de policias se generen en sitios con alto crimen, pero que esto no esté reflejando una reducción sino que el crimen se desarrolle más, por lo que incrementar el número de policias es inefectivo y como ambas series se correlacionan positivamente, se obtiene un parámetro positivo.
The variable \(taxpc\) is the taxes collected per person in the county. Does it seem reasonable to exclude this from the crime equation?
Podría pensarse que en zonas con mayores recaudos en impuestos los ingresos on más altos y hay menos criminalidad; sin embargo tiene sentido excluir la variable ya que dentro de un mismo Estado puede haber zonas peligrosas y zonas muy seguras, entonces no necesariamente se reflejaría la relación propuesta.
Estimate the reduced form for \(\Delta log(polpc)\) using pooled OLS, including the potential \(IV\), \(\Delta log(taxpc)\). Does it look like \(\Delta log(taxpc)\) is a good IV candidate? Explain.
m2_c7<-lm(clpolpc~d83+d84+d85+d86+d87+clprbarr+clprbcon+clprbpri+clavgsen+cltaxpc,data=crime4)
summary(m2_c7)
##
## Call:
## lm(formula = clpolpc ~ d83 + d84 + d85 + d86 + d87 + clprbarr +
## clprbcon + clprbpri + clavgsen + cltaxpc, data = crime4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.71032 -0.07463 -0.00598 0.07114 2.18424
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.038957 0.027817 -1.400 0.1620
## d83 0.097160 0.038460 2.526 0.0118 *
## d84 0.052943 0.038014 1.393 0.1643
## d85 0.034982 0.038144 0.917 0.3595
## d86 0.036938 0.039032 0.946 0.3444
## d87 0.090924 0.039320 2.312 0.0211 *
## clprbarr 0.114858 0.048253 2.380 0.0177 *
## clprbcon 0.248010 0.027496 9.020 <2e-16 ***
## clprbpri 0.080269 0.041860 1.918 0.0557 .
## clavgsen -0.068226 0.035646 -1.914 0.0562 .
## cltaxpc 0.005179 0.064848 0.080 0.9364
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2496 on 529 degrees of freedom
## (90 observations deleted due to missingness)
## Multiple R-squared: 0.1629, Adjusted R-squared: 0.1471
## F-statistic: 10.29 on 10 and 529 DF, p-value: 6.099e-16
Los “pooled OLS” hace referencia a estimar por OLS suponiendo que no existen efectos fijos ni aleatorios en un panel, por lo que podemos usar el comando lm. Al realizar la estimación encontramos que la variable \(\Delta log(taxpc_{it})\) no es estadísticamente significativa para la forma reducida de \(\Delta log(polpc_{it})\) por lo que no se considera un buen candidato a instrumento
Suppose that, in several of the years, the state of North Carolina awarded grants to some counties to increase the size of their county police force. How could you use this information to estimate the effect of additional police officers on the crime rate?
En este caso podemos incluir las variables “west” y “central” las cuales son dummies para las observaciones que corresponden a Western North Carolina y Central North Carolina en la estimación de \(\Delta log(polpc_{it})\)
m3_c7<-lm(clpolpc~d83+d84+d85+d86+d87+clprbarr+clprbcon+clprbpri+clavgsen+west+central,data=crime4)
summary(m3_c7)
##
## Call:
## lm(formula = clpolpc ~ d83 + d84 + d85 + d86 + d87 + clprbarr +
## clprbcon + clprbpri + clavgsen + west + central, data = crime4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.72184 -0.07269 -0.00611 0.07024 2.17237
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.02751 0.03053 -0.901 0.3681
## d83 0.09736 0.03843 2.534 0.0116 *
## d84 0.05328 0.03795 1.404 0.1610
## d85 0.03556 0.03799 0.936 0.3497
## d86 0.03741 0.03904 0.958 0.3383
## d87 0.09141 0.03930 2.326 0.0204 *
## clprbarr 0.11587 0.04832 2.398 0.0168 *
## clprbcon 0.24814 0.02746 9.038 <2e-16 ***
## clprbpri 0.08122 0.04188 1.939 0.0530 .
## clavgsen -0.06943 0.03564 -1.948 0.0519 .
## west -0.01755 0.02818 -0.623 0.5338
## central -0.01934 0.02457 -0.787 0.4314
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2496 on 528 degrees of freedom
## (90 observations deleted due to missingness)
## Multiple R-squared: 0.164, Adjusted R-squared: 0.1466
## F-statistic: 9.417 on 11 and 528 DF, p-value: 1.464e-15
Encontramos que ninguna de las dos es significativa, por lo que la entrega de subvenciones no se relacionan con el incremento en el número de policias, adicionalmente, si se mira la relación de estas dos con el término de error en el primer modelo (para determinar si podrian generar algún efecto no observado que esté quedando en el error) se encuentra que tampoco están relacionados.
crime4_2<-m3_c7$model
crime4_2$u2<-m1_c7$residuals
m4_c7<-lm(u2~d83+d84+d85+d86+d87+clprbarr+clprbcon+clprbpri+clavgsen+west+central,data=crime4_2)
summary(m4_c7)
##
## Call:
## lm(formula = u2 ~ d83 + d84 + d85 + d86 + d87 + clprbarr + clprbcon +
## clprbpri + clavgsen + west + central, data = crime4_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.65755 -0.07712 0.00260 0.07562 0.67546
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.135e-03 1.888e-02 -0.060 0.952
## d83 2.956e-06 2.377e-02 0.000 1.000
## d84 2.375e-05 2.347e-02 0.001 0.999
## d85 -2.937e-05 2.349e-02 -0.001 0.999
## d86 -4.160e-05 2.414e-02 -0.002 0.999
## d87 6.456e-06 2.431e-02 0.000 1.000
## clprbarr -1.032e-03 2.988e-02 -0.035 0.972
## clprbcon -9.878e-05 1.698e-02 -0.006 0.995
## clprbpri -1.218e-04 2.590e-02 -0.005 0.996
## clavgsen -1.648e-04 2.204e-02 -0.007 0.994
## west 8.567e-03 1.743e-02 0.491 0.623
## central -2.287e-03 1.519e-02 -0.150 0.880
##
## Residual standard error: 0.1544 on 528 degrees of freedom
## Multiple R-squared: 0.0007606, Adjusted R-squared: -0.02006
## F-statistic: 0.03654 on 11 and 528 DF, p-value: 1
Use the data set in FISH, which comes from Graddy (1995), to do this exercise. The data set is also used in Computer Exercise C9 in Chapter 12. Now, we will use it to estimate a demand function for fish.
data('fish')
Assume that the demand equation can be written, in equilibrium for each time period, as
\(log(totqty_t)=\alpha_1log(avgprc_t)+\beta_{10}+\beta_{11}mon_t+\beta_{12}tues_t+\beta_{13}wed_t+\beta_{14}thurs_t+u_{t1},\)
so that demand is allowed to differ across days of the week. Treating the price variable as endogenous, what additional information do we need to estimate the demand-equation parameters consistently?
Todavía no se han incorporado variables exógenas que la relacionen con la ecuación de oferta
The variables \(wave2_t\) and \(wave3_t\) are measures of ocean wave heights over the past several days. What two assumptions do we need to make in order to use \(wave2_t\) and \(wave3_t\) as IVs for \(log(avgprc_t)\) in estimating the demand equation?
m1_c8<-lm(lavgprc~mon+tues+wed+thurs+wave2+wave3, data=fish)
summary(m1_c8)
##
## Call:
## lm(formula = lavgprc ~ mon + tues + wed + thurs + wave2 + wave3,
## data = fish)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.11284 -0.19723 0.04479 0.21870 0.76716
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.022801 0.144134 -7.096 2.84e-10 ***
## mon -0.012080 0.113641 -0.106 0.91558
## tues -0.008976 0.111928 -0.080 0.93626
## wed 0.050547 0.111548 0.453 0.65154
## thurs 0.124191 0.110770 1.121 0.26520
## wave2 0.094481 0.021285 4.439 2.56e-05 ***
## wave3 0.052566 0.019820 2.652 0.00945 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3486 on 90 degrees of freedom
## Multiple R-squared: 0.3041, Adjusted R-squared: 0.2577
## F-statistic: 6.555 on 6 and 90 DF, p-value: 9.06e-06
Se debe suponer que estas variables se incluyen en la ecuación de oferta. En efecto, la oferta de pescado no es una actividad que dependa enteramente del ejercicio de pescar, sino que se puede ver afectada por la disponibilidad de peces en el agua, la cual aumenta con las olas. Adicionalmente se necesita que estén correlacionadas positivamente con la variable \(log(avgprc_t)\), lo cual se compueba al incluirlas en la forma reducida de esta.
Regress #log(avgprc_t)$ on the day-of-the-week dummies and the two wave measures. Are \(wave2_t\) and \(wave3_t\) jointly significant? What is the p-value of the test?
An el paso anterior se realizó esta regresión. Ambas variables son estadísticamente significativas, incluso al 1%. para revisarlo de manera conjunta se podría estimar la prueba F de la siguiente manera;
car::linearHypothesis(m1_c8,c("wave2=0","wave3=0"), white.adjust="hc1")
## Linear hypothesis test
##
## Hypothesis:
## wave2 = 0
## wave3 = 0
##
## Model 1: restricted model
## Model 2: lavgprc ~ mon + tues + wed + thurs + wave2 + wave3
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 92
## 2 90 2 20.773 3.824e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Las variables son significativas en conjunto con un p-value de 0.000000382
Now, estimate the demand equation by \(2SLS\). What is the 95% confidence interval for the price elasticity of demand? Is the estimated elasticity reasonable?
m2_c8<-AER::ivreg(ltotqty~lavgprc+mon+tues+wed+thurs|.-lavgprc+wave2+wave3,data=fish)
summary(m2_c8)
##
## Call:
## AER::ivreg(formula = ltotqty ~ lavgprc + mon + tues + wed + thurs |
## . - lavgprc + wave2 + wave3, data = fish)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.65188 -0.40018 0.07281 0.50147 1.29351
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.16410 0.18171 44.930 < 2e-16 ***
## lavgprc -0.81582 0.32744 -2.492 0.01453 *
## mon -0.30744 0.22921 -1.341 0.18317
## tues -0.68473 0.22599 -3.030 0.00318 **
## wed -0.52061 0.22357 -2.329 0.02209 *
## thurs 0.09476 0.22521 0.421 0.67492
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7054 on 91 degrees of freedom
## Multiple R-Squared: 0.1933, Adjusted R-squared: 0.149
## Wald test: 4.411 on 5 and 91 DF, p-value: 0.001215
La prueba F es significativa al 5% con un p-value de 0.0012, lo cual indica que la estimación parece estar bien, aunque el \(R^2\) es un poco bajo. en todo caso, la variable \(log(avgprc)\) es significativa, por lo que los instrumentos parecen funcionar bien. El signo del parametro indicaría que ante el incremento del precio se reduciría la demanda, lo cual concuerda con la teoría. En todo caso, el hecho de que la ecuación no contemple aspectos como el ingreso, o el precio de los bienes sustitutos del pescado, arroja un resultado un poco extremo, y es que si el precio se dobla, la demanda caería en un 81%.
Obtain the \(2SLS\) residuals, \(\hat{u}_{t1}\). Add a single lag, \(\hat{u}_{t-1,1}\) in estimating the demand equation by \(2SLS\). Remember, use \(\hat{u}_{t-1,1}\) as its own instrument. Is there evidence of AR(1) serial correlation in the demand equation errors?
fish2<-m2_c8$model
fish2$u_1<-dplyr::lag(m2_c8$residuals,1)
m3_c8<-AER::ivreg(ltotqty~lavgprc+mon+tues+wed+thurs+u_1|.-lavgprc+wave2+wave3,data=fish2)
summary(m3_c8)
##
## Call:
## AER::ivreg(formula = ltotqty ~ lavgprc + mon + tues + wed + thurs +
## u_1 | . - lavgprc + wave2 + wave3, data = fish2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.35657 -0.35834 0.02487 0.41374 1.44248
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.1480 0.1758 46.340 < 2e-16 ***
## lavgprc -0.8741 0.3111 -2.809 0.00610 **
## mon -0.2930 0.2267 -1.292 0.19963
## tues -0.6953 0.2200 -3.161 0.00215 **
## wed -0.5204 0.2176 -2.392 0.01886 *
## thurs 0.1003 0.2191 0.458 0.64831
## u_1 0.2942 0.1033 2.849 0.00545 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6866 on 89 degrees of freedom
## Multiple R-Squared: 0.2519, Adjusted R-squared: 0.2015
## Wald test: 5.168 on 6 and 89 DF, p-value: 0.0001338
Dado que el parametro de los residuos rezagados es estadísticamente significativo al 1%, podría afirmarse que existe correlación serial
Given that the supply equation evidently depends on the wave variables, what two assumptions would we need to make in order to estimate the price elasticity of supply?
Como en la ecuación de demanda los días de la semana son exógenos, no se pueden incluir en la ecuación de oferta. Aparte de las cantidades y los precios para estimar una función de oferta inversa, no se dispone de más variables para estimar la ecuación, por lo que implicaría suponer que la pesca depende enteramente de la disponibilidad de peces que den las olas. sin embargo, como las olas están relacionadas con la cantidad, esta no sería significativa.
m4_c8<-lm(lavgprc~ltotqty+wave2+wave3, data=fish)
summary(m4_c8)
##
## Call:
## lm(formula = lavgprc ~ ltotqty + wave2 + wave3, data = fish)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.20265 -0.16953 0.01872 0.20830 0.70397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.34693 0.42989 -0.807 0.42171
## ltotqty -0.07274 0.04717 -1.542 0.12650
## wave2 0.08457 0.02077 4.071 9.81e-05 ***
## wave3 0.05152 0.01925 2.676 0.00881 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3426 on 93 degrees of freedom
## Multiple R-squared: 0.3055, Adjusted R-squared: 0.283
## F-statistic: 13.63 on 3 and 93 DF, p-value: 1.911e-07
In the reduced form equation for \(log(avgprc_t)\), are the day-of-the-week dummies jointly significant? What do you conclude about being able to estimate the supply elasticity?
car::linearHypothesis(m1_c8,c("mon=0","tues=0","wed=0","thurs=0"), white.adjust="hc1")
## Linear hypothesis test
##
## Hypothesis:
## mon = 0
## tues = 0
## wed = 0
## thurs = 0
##
## Model 1: restricted model
## Model 2: lavgprc ~ mon + tues + wed + thurs + wave2 + wave3
##
## Note: Coefficient covariance matrix supplied.
##
## Res.Df Df F Pr(>F)
## 1 94
## 2 90 4 0.6638 0.6188
Al ejecutar la prueba F para significancia conjunta se encuentra que los días de la semana no son significativos