1 はじめに

Cunningham (2021) の第7章に従って、Card (1993) のデータを用いて、教育年数educが所得lwageに与える影響を推定することを試みる。ここで、教育年数は、内生変数であると考えられ、撹乱項と相関するおそれがある。そのため、操作変数として、自宅がある群に大学があるかどうかのダミー変数nearc4を用いる。

第1段階の推定では、内生変数である教育年数educを従属変数とし、その他の説明変数と操作変数であるnearc4を説明変数として回帰する。

第2段階の推定では、所得lwageを従属変数、内生変数である教育年数educ及びその他の説明変数を説明変数とする。

DiagrammeRパッケージを用いて、Directed Acyclic Graphs (DAG)を描く。

install.packages("DiagrammeR")

library(DiagrammeR)

mermaid("
            graph LR
            E(内生変数E <br> 教育年数)-->Y(所得Y)
            Z(操作変数Z <br> 自宅のある郡に <br> 大学があるか)-->E
            U(観測できない変数 <br> 能力U)-->E
            U-->Y
            ")

能力は観測できない変数である。
自宅のある郡に大学があるかどうかという操作変数は、教育年数と相関するが、従属変数とは相関しないという「除外制約」を満たす必要がある。

DAGについては、Elwert (2021) や“DAGitty — draw and analyze causal diagrams”が詳しい。

オープンパス（Open paths）とは、コライダー（colliders）を含まないパスのことである。
また、コライダー変数とは、パスに沿った2つの矢印が内側を向いている変数のことである。
教育年数はパスU->E<-Z上のコライダーであるが、パスZ->E<-Y上ではコライダーではない。
2つの変数の間に少なくとも1つのオープン・パスがある場合、2つの変数はマージナルに関連する（marginally associated）。

以下のコードでも同じようなDAGを作れる。

library(DiagrammeR)

grViz("
digraph {
  graph []
  node [shape = box]
    E [label = '教育年数E']
    Y [label = '所得Y']
    Z [label = '自宅のある郡に大学があるかZ']
    U [label = '観測できない能力U']
  edge []
    E->Y
    U->Y
    U->E
    Z->E
  { rank = same; E; Y; Z}
}
")

2 操作変数の要件

「“関連性”または “第一段階の存在”」(“instrument relevance” or “existence of the first stage”): IVは処置（内生変数：教育年数）と相関（関連）している必要がある。
「除外制約」(“exclusion restriction”): IVは構造方程式の誤差項と無相関である必要がある。
“instrument relevance”をより強調したものと考えることができるが、第３の要件として、「強いIV」（“strong first stage” or “strong instrument”）が挙げられることがある。これは、IVと処置（内生変数：教育年数）との相関が弱いと、IVが構造方程式の誤差項と僅かにでも相関があった場合に、深刻なバイアスを生むからである。

「除外制約」を満たさない代表例として３つがある。

Yに対するZの直接効果（Direct effect of Z on Y）: IV（大学の有無）がアウトカム（所得Y）に直接影響を与える。
ZとYの交絡 (Confounding of Z and Y): IVとアウトカム両方に影響を与える観測できない要因が存在する。
選択的脱落（Selective attrition）: 例えば、兵役が事後の所得に与える影響を分析する場合を考える。くじ（IV）により兵役義務（処置）を負うとする。事後の所得の分析は、戦争を生き延びることを条件付けとしている。兵役が死亡率（D）を増加させるのであれば、データ収集段階で生存に条件付けすることは、内生的選択バイアスにつながる。Elwert and Segarra (2022) が参考になる。

3 パッケージ

IV推定を行うためのivreg関数を含むAERパッケージをインストールする。

install.packages("AER")

library(AER)

4 データ

データは、Wooldridge (2016) の教科書のサイト（以下）からダウンロードしたCARD.DTAである。

https://www.cengage.com/aise/economics/wooldridge_3e_datasets/

library(haven)
card <- read_dta("CARD.DTA")
head(card)

## # A tibble: 6 × 34
##      id nearc2 nearc4  educ   age fatheduc motheduc weight momdad14 sinmom14
##   <dbl>  <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>  <dbl>    <dbl>    <dbl>
## 1     2      0      0     7    29       NA       NA 158413        1        0
## 2     3      0      0    12    27        8        8 380166        1        0
## 3     4      0      0    12    34       14       12 367470        1        0
## 4     5      1      1    11    27       11       12 380166        1        0
## 5     6      1      1    12    34        8        7 367470        1        0
## 6     7      1      1    12    26        9       12 380166        1        0
## # ℹ 24 more variables: step14 <dbl>, reg661 <dbl>, reg662 <dbl>, reg663 <dbl>,
## #   reg664 <dbl>, reg665 <dbl>, reg666 <dbl>, reg667 <dbl>, reg668 <dbl>,
## #   reg669 <dbl>, south66 <dbl>, black <dbl>, smsa <dbl>, south <dbl>,
## #   smsa66 <dbl>, wage <dbl>, enroll <dbl>, KWW <dbl>, IQ <dbl>, married <dbl>,
## #   libcrd14 <dbl>, exper <dbl>, lwage <dbl>, expersq <dbl>

なお、Rのwooldridgeパッケージをインストールして、データを読み込むこともできる。その場合は、以下のコードでデータcardを読み込める。

install.packages("wooldridge")
library(wooldridge)
data('card')

データの変数の概要は以下の通りである。

id	person identifier
nearc2	=1 if near 2 yr college, 1966
nearc4	=1 if near 4 yr college, 1966
educ	years of schooling, 1976
age	in years
fatheduc	father’s schooling
motheduc	mother’s schooling
weight	NLS sampling weight, 1976
momdad14	=1 if live with mom, dad at 14
sinmom14	=1 if with single mom at 14
step14	=1 if with step parent at 14
reg661	=1 for region 1, 1966
reg662	=1 for region 2, 1966
reg663	=1 for region 3, 1966
reg664	=1 for region 4, 1966
reg665	=1 for region 5, 1966
reg666	=1 for region 6, 1966
reg667	=1 for region 7, 1966
reg668	=1 for region 8, 1966
reg669	=1 for region 9, 1966
south66	=1 if in south in 1966
black	=1 if black
smsa	=1 in in SMSA, 1976
south	=1 if in south, 1976
smsa66	=1 if in SMSA, 1966
wage	hourly wage in cents, 1976
enroll	=1 if enrolled in school, 1976
KWW	knowledge world of work score
IQ	IQ score
married	=1 if married, 1976
libcrd14	=1 if lib. card in home at 14
exper	age - educ - 6
lwage	log(wage)
expersq	exper^2

attach()を使って、以下の分析で用いるデータを指定する。

attach(card)

教育年数の分布を確認する。

hist(educ)

操作変数の分布を確認する。

barplot(table(nearc4), names = c("大学なし", "大学あり"))

操作変数と教育年数の関係を相関係数と箱ひげ図で確認する。

cor.test(educ, nearc4)

## 
##  Pearson's product-moment correlation
## 
## data:  educ and nearc4
## t = 7.9945, df = 3008, p-value = 1.838e-15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1090753 0.1790445
## sample estimates:
##       cor 
## 0.1442402

boxplot(educ, nearc4, names = c("大学あり", "大学なし"))

5 OLS

まず、OLSで推定を行う。

ols <- lm(lwage ~ educ + exper + black 
          + south + married + smsa)

summary(ols)

## 
## Call:
## lm(formula = lwage ~ educ + exper + black + south + married + 
##     smsa)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.59924 -0.23035  0.01812  0.23046  1.36797 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.063316   0.063740  79.437   <2e-16 ***
## educ         0.071173   0.003482  20.438   <2e-16 ***
## exper        0.034152   0.002214  15.422   <2e-16 ***
## black       -0.166027   0.017614  -9.426   <2e-16 ***
## south       -0.131552   0.014969  -8.788   <2e-16 ***
## married     -0.035871   0.003401 -10.547   <2e-16 ***
## smsa         0.175787   0.015458  11.372   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3702 on 2996 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.305,  Adjusted R-squared:  0.3036 
## F-statistic: 219.2 on 6 and 2996 DF,  p-value: < 2.2e-16

6 IV (2SLS)

AERパッケージのivregを使って、IV推定を行う。 |以下に、第1段階の説明変数を置く。nearc4（実家のある群に大学があるかどうかのダミー変数）は第1段階で含まれるが、第2段階で含まれない操作変数である。

iv <- ivreg(lwage ~ educ + exper + black 
          + south + married + smsa
          | exper + black 
          + south + married + smsa
          + nearc4)
summary(iv)

## 
## Call:
## ivreg(formula = lwage ~ educ + exper + black + south + married + 
##     smsa | exper + black + south + married + smsa + nearc4)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.81301 -0.23805  0.01766  0.24727  1.32278 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.162475   0.849591   4.899 1.01e-06 ***
## educ         0.124164   0.049956   2.485  0.01299 *  
## exper        0.055588   0.020286   2.740  0.00618 ** 
## black       -0.115685   0.050741  -2.280  0.02268 *  
## south       -0.113165   0.023244  -4.869 1.18e-06 ***
## married     -0.031975   0.005087  -6.286 3.73e-10 ***
## smsa         0.147706   0.030895   4.781 1.83e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3843 on 2996 degrees of freedom
## Multiple R-Squared: 0.2513,  Adjusted R-squared: 0.2498 
## Wald test: 139.8 on 6 and 2996 DF,  p-value: < 2.2e-16

第1段階の推定結果が表示されないので、別途第１段階を推定しておく。第１段階の従属変数は、教育年数(educ)である。

first <- lm(educ ~ exper + black 
          + south + married + smsa
          + nearc4)
summary(first)

## 
## Call:
## lm(formula = educ ~ exper + black + south + married + smsa + 
##     nearc4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6308 -1.4454 -0.0526  1.2986  6.3449 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.83070    0.13075 128.727  < 2e-16 ***
## exper       -0.40443    0.00894 -45.238  < 2e-16 ***
## black       -0.94753    0.09053 -10.467  < 2e-16 ***
## south       -0.29735    0.07906  -3.761 0.000173 ***
## married     -0.07269    0.01775  -4.096 4.31e-05 ***
## smsa         0.42090    0.08487   4.959 7.47e-07 ***
## nearc4       0.32728    0.08242   3.971 7.33e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.937 on 2996 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.4774, Adjusted R-squared:  0.4764 
## F-statistic: 456.1 on 6 and 2996 DF,  p-value: < 2.2e-16

7 推定結果の比較

modelsummaryパッケージを使って、推定結果の比較を行う。modelsummaryパッケージをインストールする。

install.packages("modelsummary")

modelsummaryパッケージを使って、推定結果表を作成する。

models <- list("OLS" = ols, "1st stage" = first, "2nd stage" = iv)

library(modelsummary)
modelsummary(models, stars = TRUE,
                coef_omit = "Intercept",
                gof_omit = 'DF|Deviance|AIC|BIC|RMSE|R2 Adj.')

tinytable_u3hnd48dh16l8abeeqr9

	OLS	1st stage	2nd stage
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
educ	0.071***		0.124*
	(0.003)		(0.050)
exper	0.034***	-0.404***	0.056**
	(0.002)	(0.009)	(0.020)
black	-0.166***	-0.948***	-0.116*
	(0.018)	(0.091)	(0.051)
south	-0.132***	-0.297***	-0.113***
	(0.015)	(0.079)	(0.023)
married	-0.036***	-0.073***	-0.032***
	(0.003)	(0.018)	(0.005)
smsa	0.176***	0.421***	0.148***
	(0.015)	(0.085)	(0.031)
nearc4		0.327***
		(0.082)
Num.Obs.	3003	3003	3003
R2	0.305	0.477	0.251
Log.Lik.	-1273.854	-6243.492
F	219.153	456.140

係数プロットで、推定値を比較する。

modelplot(models, coef_omit = 'Interc')

教育年数のみを係数プロットにする。教育年数は、２番目の推定係数（定数項の次）なので、coef_omit = c(-2)を指定している。

modelplot(models, coef_omit = c(-2) )

IV推定値の方がOLS推定値よりも標準誤差が大きい。

参考文献

Card, David. 1993. “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” National Bureau of Economic Research Cambridge, Mass., USA.

Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale university press. https://mixtape.scunning.com/.

Elwert, Felix. 2021. “Introduction to Directed Acyclic Graphs (DAGs) for Causal Inference.” SAMSI Program on Data Science in Social and Behavioural Sciences. https://www.samsi.info/wp-content/uploads/2021/01/Elwert-SAMSI_Final-Short.pdf.

Elwert, Felix, and Elan Segarra. 2022. “Instrumental Variables with Treatment-Induced Selection: Exact Bias Results.” In Probabilistic and Causal Inference: The Works of Judea Pearl, 575–92.

Wooldridge, Jeffrey M. 2016. Introductory Econometrics: A Modern Approach. Cengage.

操作変数法の基本

田中鮎夢

2024-04-03

1 はじめに

2 操作変数の要件

3 パッケージ

4 データ

5 OLS

6 IV (2SLS)

7 推定結果の比較

参考文献

操作変数法の基本

田中 鮎夢

2024-04-03

1 はじめに

2 操作変数の要件

3 パッケージ

4 データ

5 OLS

6 IV (2SLS)

7 推定結果の比較

参考文献

田中鮎夢