1 什麼是時間序列分析？

如果在時間點\(t\)觀察到依變數\(Y\)以及自變數\(X\)，其他時間點觀察到的\(Y\)有沒有影響時間點\(t\)的\(Y\)？

\[ Y_{it}=\beta_{0}+\beta_{1}X_{it}+\delta_{2}T_{i,t=2}+\delta_{3}T_{i,t=3}+\ldots+\delta_{t}T_{i,t=t}+u_{it}\]

\(\delta_{2}T_{i,t=2},\delta_{3}T_{i,t=3},\ldots,\delta_{t}T_{i,t=t}\)這些變數可以解釋不同時間點之間的異質性。
假設調查1,000位民眾，\(Y_{it}\)是選後對於民主的支持度，\(T_{i,t=1}\)是選前一個月對於民主的支持度，\(T_{i,t=2}\)是選前三個月對於民主的支持度，\(X_{it}\)是選後的民主價值，可以用以下的固定效果(fixed-effect)模型表示這這幾個變數之間的關係：（出處）

\[ Y_{it}=\beta_{0}+\beta_{1}X_{it}+\delta_{2}T_{i,t=2}+u_{it}\]

資料中有三個時間點\(t,t=1,t=2\)，方程式的右邊中只需要1避免共線性。
也可以寫成

\[\begin{align*} Y_{it}=\alpha_{i}+\beta_{t}X_{it}+\epsilon_{it} \tag{1.1} \end{align*}\]

\(\alpha_{i}\)代表所有未知的常數項，而且假設\(\text{E}[\epsilon\mid X,\alpha]=0\)。如果假設成立，\(\beta\)要最小化以下的模型:

\[\sum_{i=1}^{N}\sum_{t=1}^{T} \{(\{Y_{i}-\bar{Y}) - \beta(X_{it}-\bar{X_{i}})\}^2\]

這是因為(1.1)的左右兩邊取平均數，例如有10年的資料除以10，會得到：

\[\begin{align*} \bar{Y_{i}}=\alpha_{i}+\beta_{t}\bar{X_{i}}+ \bar{\epsilon_{i}} \tag{1.2} \end{align*}\]

\(\alpha_{i}\)是個體的特徵，所以不會被時間所平均。如果用(1.1)-(1.2)，會得到：

\[Y_{it}-\bar{Y}_{i}=\beta(X_{it}-\bar{X}_{i})+(\epsilon_{it}-\bar{\epsilon}_{i})\]

時間序列的資料也被稱為longitidunal data，可以是觀察一個個人或一個群體在不同時間的變化，例如每天中共戰機來擾台的飛機數，也可以是一群人在不同時間的變化，例如一些國家在不同時間點的GDP成長率。如果加上不同的地點，例如不同國家在不同年度的資料，稱為panel data。
用矩陣表示如下：

\[\begin{align*} \left[ \begin{array}{c} y_{i1} \\ y_{i2} \\ \vdots\\ y_{iT}\\ \end{array} \right] = \left[ \begin{array}{ccc} x_{i11} & x_{i12}&\cdots & x_{i1k} \\ x_{i21} & x_{i22}&\cdots & x_{i2k} \\ \vdots & \vdots & \ddots & \vdots\\ x_{nT1}& x_{nT2}&\cdots & x_{nTk} \\ \end{array} \right] \left[ \begin{array}{c} \beta_{i1} \\ \beta_{i2} \\ \vdots\\ \beta_{ik}\\ \end{array} \right] + \left[ \begin{array}{c} \epsilon_{i1} \\ \epsilon_{i2} \\ \vdots\\ \epsilon_{ik}\\ \end{array} \right] \end{align*}\]

我們假設\(\hat{\delta_{k}}\sim N(\delta_{k}, \sigma^2)\)，也就是每個時間的影響應該來自於平均數\(\delta_{k}\)、變異數一致是\(\sigma^2\)的常態分佈。
實務上，可以單純地納入時間點在迴歸模型中，然後估計係數。
Imai and Kim (2019)證明上述的\(\hat{\beta}_{FE}\)有可能不是最佳估計。如果\(X_{i}\in \{0,1\}\)，也就是二元變數，類似因果關係裡面的刺激(treatment)，\(\hat{\beta}_{FE}\)會與\(X_{i}\)的變異數成比例，也就不是固定不變，違反最小平方法的迴歸模型假設。

2 什麼是因果關係的時間序列分析？

假設在兩個以上的時間點觀察到Y、刺激D、特徵X，我們想估計不同時間點的Y的差異，跟刺激D有關的程度為何？
過去分析個體層次的連續資料(panel data)或者總體層次的時間序列(time-series)資料的方法是固定效果(fixed effect, FE)以及固定差異(first difference, FD)，後者的依變項是\(y_{it}-\bar{y}_{it}\)，前者是\(y_{it}-y_{it-1}\)>。後者比較容易理解。相關網頁
現在可以應用因果推論到時間序列分析，讓估計具有因果關係的基礎。

3 方法：Bayesian structural time-series model (BSTS)

3.1 理論架構

第一種方法是在觀察的時間序列之外，收集資料構成對照的時間序列，如果干預或者刺激有效果，應該只會影響觀察到的時間序列，不會影響對照的時間序列。建立因果關係的時間序列分析的模型是為了預測反事實的結果。做法是使用控制組的時間序列預測如果沒有干預的情形下，時間序列的結果是什麼？
例如某家公司的股價可能因為某天宣布裁員而上漲（因為減少人事成本），但是其他公司有可能都因為其他利多都上漲了，所以必須建立一個綜合許多公司的時間序列，才能估計裁員消息與股價的因果關係。這套方法也稱為synthetic control。
已經有CausalImpact這個套件可以執行以上的估計。
Brodersen et al.有完整的說明
實務上，需要收集三個資料。第一是依變數受到刺激之前的時間序列，第二個是同一時間其他變數受到刺激之前的時間序列，第三則是有關依變數的參數的資訊。這三個資訊用state space model加以結合。
在這張圖3.1中，\(x_{t}\)是觀察不到的潛在變數，\(y_{t}\)則是\(x_{t}\)的函數：

Figure 3.1: State Space Model

\[y_{t}=A_{t} x_{t}+v_{t}\]

\[x_{t}=\Phi x_{t-1}+w_{t}\]

用依變數受到刺激之前的時間序列，以及其他變數受到刺激之後的時間序列，計算後驗分佈的時間序列。

3.2 CausalImpact套件實作一

模擬資料，用\(\texttt{CausalImpact}\)套件實作：
準備資料：

rm(list = ls())

library(ggplot2)
library(patchwork)
library(CausalImpact)
library(assertthat)
library(dplyr)
#as_patchwork
patchwork <- here::here('source', 'add_plot.R')
#source(patchwork)
#plot_annotation
annotation <- here::here('source', 'plot_annotation.R')
source(annotation)


CreateImpactPlot <- here::here('source', 'CreateImpactPlot.R')
source(CreateImpactPlot)

#create data
set.seed(1)
x1 <- 100 + arima.sim(model = list(ar = 0.999), n = 100)
y <- 1.2 * x1 + rnorm(100)
y[71:100] <- y[71:100] + 10
data <- cbind(y, x1)

1#causal impact analysis

## [1] 1

pre.period <- c(1, 70)
post.period <- c(71, 100)
impact <- CausalImpact(data, pre.period, post.period)

畫圖3.2：

plot(impact, c("original", "cumulative")) + 
    plot_annotation(title = "Analysis of click behavior after intervention"
                    , theme = theme(plot.title = element_text(hjust = 0.5))) &
    theme(
        panel.background = element_rect(fill = "transparent"), # panel bg
      plot.background = element_rect(fill = "transparent", color = NA), # plot bg
        panel.grid.major = element_blank(), # get rid of major grid
        panel.grid.minor = element_blank())

Figure 3.2: Analysis of click behavior after intervention

3.3 CausalImpact套件實作二

參考Nurkholis的說明。
模擬每日交易的消費者數目(DTU)，分析交易界面的更新提高交易的程度。
納入5個變數解釋DTU，並且考慮季節的作用，以及先驗的分佈。
首先準備資料：

require(CausalImpact) 
require(ggplot2) # Optional, only for visualization purposes
require(timeDate) # Optional, only for simulating the data

# Set iteration counts
niter = 3000

# Set seed to ensure simulation reproducibility, entirely optional
addTaskCallback(function(...) {set.seed(1212);TRUE})

## 1 
## 1

# Generate workdays from June 2021 until May 2022, totalling 261 days
time_idx = as.timeDate(seq(from = as.Date('2021-06-01'), to = as.Date('2022-05-31'), by = 'day'))
time_idx = as.Date(time_idx[isBizday(time_idx, holidays = holidayNYSE(), wday = 1:5)])
length(time_idx)

## [1] 261

# Generate response: daily transacting user
dtu = arima.sim(n = 261, list(ar = c(0.5677, -0.132), ma = c(-0.2456)), sd = 180) + seq(from = 2540, to = 2400, length.out = 50) + 1000
dtu[1:220] = dtu[1:220] + (seq(from = 50000, to = 1, length.out = 220))^0.6
dtu[121:145] = dtu[121:145] + (seq(from = 1, to = 20000, length.out = 25))^0.5
dtu[146:170] = dtu[146:170] + (seq(from = 20000, to = 1, length.out = 25))^0.5

# Generate predictors: daily active user
dau = dtu*20 + rnorm(261, sd = 60)

# Generate predictors: push notification ctr
pn = (dtu-mean(dtu))/sd(dtu) + rgamma(n = 261, shape = 2, scale = 0.5)
pn = (pn-min(pn))/(max(pn)-min(pn)) * 0.03

# Generate predictors: email ctr
email = (dtu-mean(dtu))/sd(dtu) + rgamma(n = 261, shape = 1.5, scale = 0.9)
email = (email-min(email))/(max(email)-min(email)) * 0.021

# Generate predictors: Google ads ctr
gads = (dtu-mean(dtu))/sd(dtu) + rgamma(n = 261, shape = 15, scale = 10)
gads = (gads-min(gads))/(max(gads)-min(gads)) * 0.08

# Generate predictors: daily new users
nu = sqrt(dtu*40) - sd(dtu) + rgamma(n = 261, shape = 6, scale = 29)

# Add intervention effect on April 1st 2022. We also keep the true counterfactual.
dtu_counterfactual = dtu
dtu[219:261] = dtu[219:261] + (seq(from = 1, to = 50000, length.out = 43))^0.5

# Finalization
data = zoo((cbind(round(dtu), round(dau), pn, email, gads, round(nu))), time_idx)
colnames(data) = c('DTU', 'DAU', 'PN', 'EMAIL', 'G-ADS', 'NU')

定義刺激的時間：

# Define Pre and Post Intervention Periods
pre_intervention = as.Date(c('2021-06-01', '2022-03-31'))
post_intervention = as.Date(c('2022-04-01', '2022-05-31'))

# Compute the standard deviation of response (DTU) in the pre-intervention period
# useful for specifying prior later
sd_dtu = sd(data$DTU[index(data) <= pre_intervention[2]])

# Set the response (DTU) in post-intervention to NA
# We keep the observed post-intervention in a separate object (post_period_dtu)
post_period_dtu = as.vector(data$DTU[index(data) >= post_intervention[1]])
data$DTU[index(data) >= post_intervention[1]] = NA

tail(data) # DTU should be NA

##            DTU   DAU       PN    EMAIL   G-ADS  NU
## 2022-05-24  NA 64916 0.008519 0.009574 0.04661 304
## 2022-05-25  NA 73064 0.008548 0.003735 0.03506 277
## 2022-05-26  NA 68495 0.004606 0.005792 0.02570 141
## 2022-05-27  NA 70408 0.007215 0.005454 0.03054 278
## 2022-05-30  NA 63408 0.002184 0.002497 0.01565 258
## 2022-05-31  NA 61576 0.001674 0.000000 0.03303 182

cat('Intervention of April 1st 2022 is the', which(time(data) == '2022-04-01'),'th data')

## Intervention of April 1st 2022 is the 219 th data

time(data) = seq(1, NROW(data), length.out = NROW(data))
# > Output: Intervention of April 1st 2022 is the 219 th data

ss = list() # Initial
ss = AddLocalLevel(ss, y = data$DTU) # Add Local Level, pass the empty list
#ss = AddSeasonal(ss, y = data$DTU, nseasons = 50) # Add Seasonality, pass the #local level list
ss = AddAr(ss, y = data$DTU) # Add Autoregressive, pass the local level and seasonal

summary(lm(DTU ~. , data = data))

## 
## Call:
## lm(formula = DTU ~ ., data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.552 -1.641 -0.101  1.860  6.940 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.82e+00   4.62e+00    0.61     0.54    
## DAU          5.00e-02   7.25e-05  688.76   <2e-16 ***
## PN           3.51e+01   8.35e+01    0.42     0.67    
## EMAIL       -4.76e+01   8.24e+01   -0.58     0.56    
## `G-ADS`      9.46e+00   1.35e+01    0.70     0.48    
## NU          -4.45e-04   2.92e-03   -0.15     0.88    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3 on 212 degrees of freedom
##   (43 observations deleted due to missingness)
## Multiple R-squared:     1,   Adjusted R-squared:     1 
## F-statistic: 2.94e+05 on 5 and 212 DF,  p-value: <2e-16

加入線性模型以及季節變動，然後估計模型：

ss = list()

# Local Linear Trend
ss = AddLocalLinearTrend(ss, y = data$DTU, level.sigma.prior = SdPrior(sigma.guess = sd_dtu, sample.size = 0.01))

# Seasonality
ss = AddSeasonal(ss, y = data$DTU, nseasons = 50, season.duration = 1)

# Static Regression, we only specify the prior
reg_prior = SpikeSlabPrior(x = cbind(1, as.matrix(data[1:218, -1])), 
                           y = as.matrix(data[1:218, 1]), 
                           expected.r2 = 0.9, 
                           prior.df = 25,
                           expected.model.size = 2)
bsts_model = bsts(DTU~., data = data, 
                  state.specification = ss,
                  niter = 10,
                  prior = reg_prior)

## =-=-=-=-= Iteration 0 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 1 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 2 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 3 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 4 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 5 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 6 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 7 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 8 Fri Jun  7 08:39:45 2024 =-=-=-=-=
## =-=-=-=-= Iteration 9 Fri Jun  7 08:39:45 2024 =-=-=-=-=

PlotBstsCoefficients(bsts_model)

畫圖：

result = CausalImpact(bsts.model = bsts_model, 
                      post.period.response = post_period_dtu)
summary(result)

## Posterior inference {CausalImpact}
## 
##                          Average        Cumulative      
## Actual                   3642           156610          
## Prediction (s.d.)        3492 (NA)      150141 (NA)     
## 95% CI                   [3495, 3495]   [150267, 150267]
##                                                         
## Absolute effect (s.d.)   150 (NA)       6469 (NA)       
## 95% CI                   [148, 148]     [6343, 6343]    
##                                                         
## Relative effect (s.d.)   4.2% (NA%)     4.2% (NA%)      
## 95% CI                   [4.2%, 4.2%]   [4.2%, 4.2%]    
## 
## Posterior tail-area probability p:   0.5
## Posterior prob. of a causal effect:  50%
## 
## For more details, type: summary(impact, "report")

plot(result)

Figure 3.3: Causal Impact

3.4 小結

BSTS 需要三種資料、先驗資訊，可以很快用套件估計。
另外一個實例。

4 Matching

第二種方法是配對。如果在兩個時間點配對兩個觀察值，就可以估計兩個時間點之間的因果關係(DiD)。
已經有PanelMatch這個套件執行以上的方法。
詳細見Erik Wang演講。

4.1 PanelMatch套件

例子：民主與經濟成長的關係(Acemoglu et al., 2019)
- 估計年度以及鄉鎮的固定效果(FE)模型。
- 刺激變數是民主與否。
- 用PanelMatch套件顯示資料結構，紅色代表民主的狀態，藍色代表非民主的狀態：

library(PanelMatch)
DisplayTreatment(unit.id = "wbcode2",
                     time.id = "year", legend.position = "none",
                     xlab = "year", ylab = "Country Code",
                     treatment = "dem", data = dem)

Figure 4.1: 民主與否的資料

圖4.1顯示，有些國家很早民主化，有些國家很晚。有些國家從非民主化轉型為民主化，有些國家則從民主化回到威權政體。
可以考慮 within-unit over-time variation, within-time across-units variation, variation in the timing of treatment.
FE模型的係數跟difference-in-difference(DiD)的估計結果不對等，除非只有兩個時間點，但是我們的資料可能橫跨許多時間點。

4.2 估計

在這個模型中，我們允許\(L=2,\ldots\)，代表接收刺激後效果延遲出現的時間點可以至少有2。同時允許\(F=0,1,\ldots\)，也就是效果發生後，對於往後幾年的影響。
當政策\(X\)從0變成1時的平均效果ATT的表示方式：

\[ \delta(F, L)=\left\{Y_{i,t+F}\left(X_{it}=1,X_{i,t+1}=0,\{X_{i,t-\ell}\}_{\ell=2}^L\right)-Y_{i,t+F}\left(X_{it}=0,X_{i,t-1}=0,\{X_{i,t-\ell}\}_{\ell=2}^L\right)\mid X_{it}=1,X_{i,t-1}=0\right\} \]

假設沒有外溢效果(spillover effect)、在L時期有延遲效果(carryover effect)、有條件的平行趨勢(parallel trend after conditioning)

\[ \mathbb{E}[Y_{i,t+F}\left(X_{it}=X_{i,t-1}=1,\{X_{i,t-\ell}\}_{\ell=2}^L\right)-Y_{i,t-1}\mid X_{it}=0,X_{i,t-1}=1,\{X_{i,t-\ell},Y_{i,t-\ell}\}_{\ell=2}^L,\{\textrm{Z}_{i,t-\ell}\}_{\ell=0}^L]\\ =\mathbb{E}[Y_{i,t+F}\left(X_{it}=X_{i,t-1}=1,\{X_{i,t-\ell}\}_{\ell=2}^L\right)-Y_{i,t-1}\mid X_{it}=1,X_{i,t-1}=0,\{X_{i,t-\ell},Y_{i,t-\ell}\}_{\ell=2}^L,\{\textrm{Z}_{i,t-\ell}\}_{\ell=0}^L] \]

過去的結果以及相關特徵成立的條件下，不論過去X是1或0，現在X變成1之後，Y的平均差異相同。
ART表示從刺激的反轉，也就是從民主轉為非民主所造成的平均差異。

4.3 配對

配對在不同階段\(x_{it}=1,x_{i,t-1}=0\)的觀察值，建立一個實驗觀察值的集合(matched set)。
找出從\(t-L\)到\(t-1\)時間點有相同刺激歷史(treatment history)的控制觀察值(control units)，寫成：

\[ \mathcal{M}_{it}=\{\textsf{i}^{\prime}:\textsf{i}^{\prime}\neq \textsf{i}, X_{\textsf{i}^{\prime}t^{\prime}}=0, X_{\textsf{i}^{\prime}t^{\prime}}=X_{it^{\prime}}\hspace{0.5em}\text{for}\hspace{0.5em}\text{all}\hspace{0.5em}t^{\prime}=t-1,\ldots t-L\} \]

在這張圖4.2中，紅色或灰色圈圈代表受到刺激的觀察值，三角形代表控制的觀察值，長方形代表有同樣的實驗歷程。

Figure 4.2: 配對觀察值(Imai, Kim, and Wang, 2023)

如果找不到相對應的控制觀察值，就剔除實驗的觀察值。
多階段的差異效果估計式(Multi-period DiD estimator):

\[ \hat{\delta}_{att}(F,L)=\frac{1}{\sum_{t=t+1}^{T-F}D_{it}}\sum_{i=1}^{N}\sum_{t=L+1}^{T-F}D_{it}\Big\{(Y_{i,t+F}-Y_{i,t-1})-\sum_{i\in\mathcal{M}_{it}}w_{it}^{\textsf{i}^{\prime}}(Y_{\textsf{i}^{\prime},t+F}-Y_{\textsf{i}^{\prime},t-1})\Big\} \]

其中，\((Y_{i,t+F}-Y_{i,t-1})\)是first difference，代表被刺激的觀察值在刺激前後的差異。
\(Y_{\textsf{i}^{\prime},t+F}-Y_{\textsf{i}^{\prime},t-1}\)是second difference，代表在配對集合中，被刺激的觀察值在刺激前後的差異。\(w_{it}^{\textsf{i}^{\prime}}\)是改善配對的加權。所以這兩者合起來是加權平均的差異。

4.4 操作步驟

首先建立配對觀察值：

# get a matched set without refinement
sets0 <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
                    treatment = "dem", refinement.method = "none",
                    data = dem, match.missing = FALSE,
                    covs.formula = ~ I(lag(y, 1:4)) + I(lag(tradewb, 1:4)),
                        size.match = 5, qoi = "att",
                        outcome.var = "y",
                        lead = 0:4, forbid.treatment.reversal = FALSE)

修正後的配對觀察值：

# get a matched set with refinement using CBPS.match, setting the
    # size of matched set to 5
    sets1 <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
                        treatment = "dem", refinement.method = "mahalanobis",
                        data = dem, match.missing = FALSE,
                     covs.formula = ~ I(lag(y, 1:4)) + I(lag(tradewb, 1:4)),
                        size.match = 5, qoi = "att",
                        outcome.var = "y",
                        lead = 0:4, forbid.treatment.reversal = FALSE)


# get another matched set with refinement using CBPS.weight
sets2 <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
                    treatment = "dem", refinement.method = "ps.weight",
                    data = dem, match.missing = FALSE,
                    covs.formula = ~ I(lag(y, 1:4)) + I(lag(tradewb, 1:4)),
                    size.match = 10, qoi = "att",
                    outcome.var = "y",
                    lead = 0:4, forbid.treatment.reversal = FALSE)

散佈圖顯示修正前後的差別：

# use the function to produce the scatter plot
balance_scatter(non_refined_set = sets0$att,
              matched_set_list = list(sets1$att, sets2$att),
              data = dem,
              covariates = c("y", "tradewb"))
# add legend
legend(x = 0, y = 0.8,
legend = c("mahalanobis",
           "PS weighting"),
y.intersp = 0.65,
x.intersp = 0.3,
xjust = 0,
pch = c(1, 3), pt.cex = 1,
bty = "n", ncol = 1, cex = 1, bg = "white")

加上一些模擬資料，計算特徵變項在配對前後的差異：

#add some additional data to data set
dem$rdata <- runif(runif(nrow(dem)))
pm.obj <- PanelMatch(lead = 0:3, lag = 4, time.id = "year", unit.id = "wbcode2", treatment = "dem",
                        outcome.var ="y", refinement.method = "mahalanobis",
data = dem, match.missing = TRUE,
covs.formula = ~ tradewb + rdata + I(lag(tradewb, 1:4)) + I(lag(y, 1:4)),
                        size.match = 5, qoi = "att")
    get_covariate_balance(pm.obj$att, dem, covariates = c("tradewb", "rdata"),
                             ylim = c(-2,2))

##      tradewb      rdata
## t_4  0.03667 -0.0009159
## t_3 -0.03548  0.1417359
## t_2 -0.02535  0.0104848
## t_1  0.07292  0.0487455

    get_covariate_balance(pm.obj$att, dem, covariates = c("tradewb", "rdata"),
                             plot = TRUE, ylim = c(-2,2))

建立待分析的資料並且加權：

PM.results <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
                             treatment = "dem", refinement.method = "mahalanobis",
                             data = dem, match.missing = TRUE,
                             covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(y, 1:4)),
                             size.match = 5, qoi = "att",
outcome.var = "y", lead = 0:4, forbid.treatment.reversal = FALSE)
#print(PM.results$att)

畫圖：

plot(PM.results$att)

畫圖：

plot(PM.results$att, include.empty.sets = TRUE)

估計並且畫圖4.3表示不同時間點的平均差異:

PM.results <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
                   treatment = "dem", refinement.method = "mahalanobis",
                             data = dem, match.missing = TRUE,
                   covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(y, 1:4)),
                             size.match = 5, qoi = "att",
outcome.var = "y", lead = 0:4, forbid.treatment.reversal = TRUE) 
PE.results <- PanelEstimate(sets = PM.results, data = dem, 
                            number.iterations = 100)
summary(PE.results)

## Weighted Difference-in-Differences with Mahalanobis Distance
## Matches created with 4 lags
## 
## Standard errors computed with 100 Weighted bootstrap samples
## 
## Estimate of Average Treatment Effect on the Treated (ATT) by Period:

## $summary
##     estimate std.error   2.5%  97.5%
## t+0  -0.9685    0.9545 -2.695 0.9333
## t+1  -1.2142    1.7868 -4.761 2.5191
## t+2  -0.6130    2.2185 -5.102 4.0725
## t+3   0.9730    2.4839 -3.875 6.3687
## t+4   1.4794    2.6956 -3.483 6.5741
## 
## $lag
## [1] 4
## 
## $iterations
## [1] 100
## 
## $qoi
## [1] "att"

plot(PE.results)

Figure 4.3: ATT估計

估計並且畫圖??表示不同時間點的平均差異:

PM.results <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
                   treatment = "dem", refinement.method = "mahalanobis",
                             data = dem, match.missing = TRUE,
                   covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(y, 1:4)),
                             size.match = 10, qoi = "att",
outcome.var = "y", lead = 0:4, forbid.treatment.reversal = TRUE) 
PE.results <- PanelEstimate(sets = PM.results, data = dem, 
                            number.iterations = 100)
summary(PE.results)

## Weighted Difference-in-Differences with Mahalanobis Distance
## Matches created with 4 lags
## 
## Standard errors computed with 100 Weighted bootstrap samples
## 
## Estimate of Average Treatment Effect on the Treated (ATT) by Period:

## $summary
##     estimate std.error   2.5%  97.5%
## t+0  -1.3886     1.014 -3.202 0.5747
## t+1  -1.6657     1.640 -4.330 1.6831
## t+2  -0.9682     2.107 -4.255 3.1408
## t+3   0.2769     2.358 -3.727 4.8725
## t+4   0.8332     2.569 -4.237 5.5699
## 
## $lag
## [1] 4
## 
## $iterations
## [1] 100
## 
## $qoi
## [1] "att"

plot(PE.results)

4.5 小結

透過具有因果推論基礎的配對，不需要貝氏統計的假設，可以估計時間序列中的因果關係。
\(\texttt{PanelMatch}\)套件操作容易，但是要花時間配對。

5 更新講義時間

## 最後更新時間: 2024-06-07 08:39:56

社會科學統計方法

時間序列與因果關係

蔡佳泓

5/31/2024