R语言进行断点回归分析常用的包:rdrobust

vote:表示某次选举民主党在州参议院的席位占比,值从0到100,是RD分析中结局变量 margin:表示上次选举中获得相同参议院席位的边际收益,其中大于0表示民主党胜出,反之则为失败。 利用RDD估计民主党赢得参议院席位对其在下次选举中获得该席位的选票份额的影响

install.packages("rdrobust",repos = "http://cran.us.r-project.org")
## 程序包'rdrobust'打开成功,MD5和检查也通过
## 
## 下载的二进制程序包在
##  C:\Users\zhuoshen\AppData\Local\Temp\Rtmpk3GLs4\downloaded_packages里
install.packages("rddensity",repos = "http://cran.us.r-project.org")
## 程序包'rddensity'打开成功,MD5和检查也通过
## 
## 下载的二进制程序包在
##  C:\Users\zhuoshen\AppData\Local\Temp\Rtmpk3GLs4\downloaded_packages里
install.packages("tidyverse",repos = "http://cran.us.r-project.org")
## 程序包'tidyverse'打开成功,MD5和检查也通过
## 
## 下载的二进制程序包在
##  C:\Users\zhuoshen\AppData\Local\Temp\Rtmpk3GLs4\downloaded_packages里
install.packages("stargazer",repos = "http://cran.us.r-project.org")
## 程序包'stargazer'打开成功,MD5和检查也通过
## 
## 下载的二进制程序包在
##  C:\Users\zhuoshen\AppData\Local\Temp\Rtmpk3GLs4\downloaded_packages里
library(rdrobust)
library(rddensity)
library(magrittr)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract()   masks magrittr::extract()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ purrr::set_names() masks magrittr::set_names()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stargazer)
## 
## Please cite as: 
## 
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
data(rdrobust_RDsenate)
head(rdrobust_RDsenate)
##        margin     vote
## 1  -7.6885610 36.09757
## 2  -3.9237082 45.46875
## 3  -6.8686604 45.59821
## 4 -27.6680565 48.47606
## 5  -8.2569685 51.74687
## 6   0.7324815 39.80264

检验驱动变量是否受到操控

驱动变量的直方图,若大致呈现光滑曲线,无明显的断点,说明不受到操控。:

或者使用也可以利用rddenstiy包中的rddenstiy函数画密度图,

输出结果中,最下面一行robust的p值大于0.05,就说明驱动变量未受到操纵。

## $Estl
## Call: lpdensity
## 
## Sample size                                      640
## Polynomial order for point estimation    (p=)    2
## Order of derivative estimated            (v=)    1
## Polynomial order for confidence interval (q=)    3
## Kernel function                                  triangular
## Scaling factor                                   0.460763138948884
## Bandwidth method                                 user provided
## 
## Use summary(...) to show estimates.
## 
## $Estr
## Call: lpdensity
## 
## Sample size                                      750
## Polynomial order for point estimation    (p=)    2
## Order of derivative estimated            (v=)    1
## Polynomial order for confidence interval (q=)    3
## Kernel function                                  triangular
## Scaling factor                                   0.539956803455723
## Bandwidth method                                 user provided
## 
## Use summary(...) to show estimates.
## 
## $Estplot

画驱动变量与干预的散点图,判断为sharp还是fuzzy类型

可见本例断点前后干预率0-1,基本可判断为sharp类型

rdrobust_RDsenate$treatment <- ifelse(rdrobust_RDsenate$margin < 0, 0, 1)
rdrobust_RDsenate$color <- ifelse(rdrobust_RDsenate$margin < 0, 'blue', 'red')

画驱动变量与结果变量的关系图

散点图初步判定关系

拟合图

将驱动变量分成若干宽度(width)相等的区间(bin),但不能存在骑跨断点,断点两侧的区间数量可不相等 选择合适的区间和带宽。可以采用rdrobust包中的rdplot函数进行判断

## Call: rdplot
## 
## Number of Obs.                 1297
## Kernel                      Uniform
## 
## Number of Obs.                  595             702
## Eff. Number of Obs.             595             702
## Order poly. fit (p)               4               4
## BW poly. fit (h)            100.000         100.000
## Number of bins scale              1               1
## 
## Bins Selected                    20              20
## Average Bin Length            5.000           5.000
## Median Bin Length             5.000           5.000
## 
## IMSE-optimal bins                 8               9
## Mimicking Variance bins          15              35
## 
## Relative to IMSE-optimal:
## Implied scale                 2.500           2.222
## WIMSE variance weight         0.060           0.084
## WIMSE bias weight             0.940           0.916

进一步缩小样本范围(-50,50),提升拟合效果

全局参数估计

rdrobust_RDsenate$margin_del <- rdrobust_RDsenate$margin - 0

#linear

fit_1 <- lm(vote ~ margin_del + treatment, data = rdrobust_RDsenate) 

# linear interaction

fit_2 <- lm(vote ~ margin_del * treatment, data = rdrobust_RDsenate) 

# quadratic

fit_3 <- lm(vote ~ margin_del + I(margin_del ^ 2) + treatment, data = rdrobust_RDsenate) 

# quadratic interaction

fit_4 <- lm(vote ~ (margin_del + I(margin_del ^ 2)) * treatment, data = rdrobust_RDsenate) 

#cubic

fit_5 <- lm(vote ~ margin_del + I(margin_del ^ 2) + I(margin_del ^ 3) + treatment, data = rdrobust_RDsenate)

#cubic interaction

fit_6 <- lm(vote ~ (margin_del + I(margin_del ^ 2) + I(margin_del ^ 3)) * treatment, data = rdrobust_RDsenate) 

stargazer::stargazer(fit_1, fit_2, fit_3, fit_4, fit_5, fit_6, type = 'html', style = 'all')#输出表格形式的结果
## 
## <table style="text-align:center"><tr><td colspan="7" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="6"><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="6" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td colspan="6">vote</td></tr>
## <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td><td>(5)</td><td>(6)</td></tr>
## <tr><td colspan="7" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">margin_del</td><td>0.348<sup>***</sup></td><td>0.216<sup>***</sup></td><td>0.285<sup>***</sup></td><td>0.421<sup>***</sup></td><td>0.337<sup>***</sup></td><td>0.285<sup>*</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.013)</td><td>(0.028)</td><td>(0.017)</td><td>(0.069)</td><td>(0.028)</td><td>(0.151)</td></tr>
## <tr><td style="text-align:left"></td><td>t = 26.078</td><td>t = 7.803</td><td>t = 16.873</td><td>t = 6.133</td><td>t = 12.175</td><td>t = 1.884</td></tr>
## <tr><td style="text-align:left"></td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.060</td></tr>
## <tr><td style="text-align:left">I(margin_del2)</td><td></td><td></td><td>0.001<sup>***</sup></td><td>0.003<sup>***</sup></td><td>0.001<sup>***</sup></td><td>-0.002</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td>(0.0002)</td><td>(0.001)</td><td>(0.0002)</td><td>(0.005)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td>t = 6.016</td><td>t = 3.254</td><td>t = 6.325</td><td>t = -0.349</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td>p = 0.000</td><td>p = 0.002</td><td>p = 0.000</td><td>p = 0.727</td></tr>
## <tr><td style="text-align:left">I(margin_del3)</td><td></td><td></td><td></td><td></td><td>-0.00001<sup>**</sup></td><td>-0.00004</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td><td>(0.00000)</td><td>(0.00004)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td><td>t = -2.386</td><td>t = -1.011</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td><td>p = 0.018</td><td>p = 0.313</td></tr>
## <tr><td style="text-align:left">treatment</td><td>4.785<sup>***</sup></td><td>6.044<sup>***</sup></td><td>6.663<sup>***</sup></td><td>4.935<sup>***</sup></td><td>5.238<sup>***</sup></td><td>7.319<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.923)</td><td>(0.942)</td><td>(0.963)</td><td>(1.256)</td><td>(1.131)</td><td>(1.593)</td></tr>
## <tr><td style="text-align:left"></td><td>t = 5.184</td><td>t = 6.414</td><td>t = 6.922</td><td>t = 3.929</td><td>t = 4.630</td><td>t = 4.595</td></tr>
## <tr><td style="text-align:left"></td><td>p = 0.00000</td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.0001</td><td>p = 0.00001</td><td>p = 0.00001</td></tr>
## <tr><td style="text-align:left">margin_del:treatment</td><td></td><td>0.170<sup>***</sup></td><td></td><td>-0.095</td><td></td><td>-0.229</td></tr>
## <tr><td style="text-align:left"></td><td></td><td>(0.032)</td><td></td><td>(0.087)</td><td></td><td>(0.195)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td>t = 5.406</td><td></td><td>t = -1.088</td><td></td><td>t = -1.175</td></tr>
## <tr><td style="text-align:left"></td><td></td><td>p = 0.00000</td><td></td><td>p = 0.277</td><td></td><td>p = 0.241</td></tr>
## <tr><td style="text-align:left">I(margin_del2):treatment</td><td></td><td></td><td></td><td>-0.002<sup>**</sup></td><td></td><td>0.010<sup>*</sup></td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td>(0.001)</td><td></td><td>(0.006)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td>t = -2.242</td><td></td><td>t = 1.770</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td>p = 0.026</td><td></td><td>p = 0.077</td></tr>
## <tr><td style="text-align:left">I(margin_del3):treatment</td><td></td><td></td><td></td><td></td><td></td><td>-0.00002</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td><td></td><td>(0.00004)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td><td></td><td>t = -0.411</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td><td></td><td></td><td>p = 0.682</td></tr>
## <tr><td style="text-align:left">Constant</td><td>47.331<sup>***</sup></td><td>44.904<sup>***</sup></td><td>45.486<sup>***</sup></td><td>46.740<sup>***</sup></td><td>45.997<sup>***</sup></td><td>46.029<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.542)</td><td>(0.699)</td><td>(0.616)</td><td>(0.896)</td><td>(0.651)</td><td>(1.138)</td></tr>
## <tr><td style="text-align:left"></td><td>t = 87.340</td><td>t = 64.221</td><td>t = 73.798</td><td>t = 52.153</td><td>t = 70.614</td><td>t = 40.457</td></tr>
## <tr><td style="text-align:left"></td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.000</td><td>p = 0.000</td></tr>
## <tr><td colspan="7" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,297</td><td>1,297</td><td>1,297</td><td>1,297</td><td>1,297</td><td>1,297</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.578</td><td>0.587</td><td>0.590</td><td>0.591</td><td>0.591</td><td>0.593</td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.577</td><td>0.586</td><td>0.589</td><td>0.590</td><td>0.590</td><td>0.591</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>11.781 (df = 1294)</td><td>11.654 (df = 1293)</td><td>11.624 (df = 1293)</td><td>11.609 (df = 1291)</td><td>11.603 (df = 1292)</td><td>11.587 (df = 1289)</td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>886.433<sup>***</sup> (df = 2; 1294) (p = 0.000)</td><td>613.586<sup>***</sup> (df = 3; 1293) (p = 0.000)</td><td>619.091<sup>***</sup> (df = 3; 1293) (p = 0.000)</td><td>373.392<sup>***</sup> (df = 5; 1291) (p = 0.000)</td><td>467.426<sup>***</sup> (df = 4; 1292) (p = 0.000)</td><td>268.726<sup>***</sup> (df = 7; 1289) (p = 0.000)</td></tr>
## <tr><td colspan="7" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="6" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
#比较AIC或者回归残差,较小者模型较优。

AIC(fit_1, fit_2, fit_3, fit_4, fit_5, fit_6)
##       df      AIC
## fit_1  4 10083.70
## fit_2  5 10056.71
## fit_3  5 10049.89
## fit_4  7 10048.72
## fit_5  6 10046.19
## fit_6  9 10045.74

局部非参数估计

利用rdrobust包的rdrobust函数进行拟合。 局部估计可直接利用rdrobust函数中的bdselect参数指定选择最优带宽的方法。 带宽选择还可以先利用rdbwselect函数计算出最优带宽,然后用rdrobust函数中的h参数手动指定左右的带宽。

rdbwselect方法

结果中前缀mse更适用于进行点估计的带宽选择,而cer更适合区间估计的带宽选择。后缀rd指两侧区间相等,而two则不相等结果中h即为选择的左右最优带宽,而b给出的是用来进行敏感性分析时应该考虑的带宽。

rdbwselect(y = rdrobust_RDsenate$vote, x = rdrobust_RDsenate$margin, all = TRUE) %>% summary()
## Call: rdbwselect
## 
## Number of Obs.                 1297
## BW type                         All
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  595          702
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## Unique Obs.                     595          665
## 
## =======================================================
##                   BW est. (h)    BW bias (b)
##             Left of c Right of c  Left of c Right of c
## =======================================================
##      mserd    17.754     17.754     28.028     28.028
##     msetwo    16.170     18.126     27.104     29.344
##     msesum    18.365     18.365     31.319     31.319
##   msecomb1    17.754     17.754     28.028     28.028
##   msecomb2    17.754     18.126     28.028     29.344
##      cerrd    12.407     12.407     28.028     28.028
##     certwo    11.299     12.667     27.104     29.344
##     cersum    12.834     12.834     31.319     31.319
##   cercomb1    12.407     12.407     28.028     28.028
##   cercomb2    12.407     12.667     28.028     29.344
## =======================================================
#拟合,c用来指定cut-point,p用来指定局部加权回归的多项式幂次,kernel为选择的核函数

loc_fit_1 <- rdrobust(rdrobust_RDsenate$vote, rdrobust_RDsenate$margin, c = 0, p = 1,
                      kernel = 'triangular', bwselect = 'mserd',h=17.754)# 
#如果有混杂因素需要控制,可以用covs = c('var1', 'var2')

loc_fit_2 <- rdrobust(rdrobust_RDsenate$vote, rdrobust_RDsenate$margin, c = 0, p = 2,
                      kernel = 'triangular', bwselect = 'msetwo',h=17.754)

summary(loc_fit_1)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 1297
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  595          702
## Eff. Number of Obs.             360          323
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                  17.754       17.754
## BW bias (b)                  17.754       17.754
## rho (h/b)                     1.000        1.000
## Unique Obs.                     595          702
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     7.414     1.459     5.083     0.000     [4.555 , 10.273]    
##         Robust         -         -     4.030     0.000     [4.274 , 12.368]    
## =============================================================================
summary(loc_fit_2)#结果中coef即为效应值
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 1297
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  595          702
## Eff. Number of Obs.             360          323
## Order est. (p)                    2            2
## Order bias  (q)                   3            3
## BW est. (h)                  17.754       17.754
## BW bias (b)                  17.754       17.754
## rho (h/b)                     1.000        1.000
## Unique Obs.                     595          702
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     8.321     2.065     4.030     0.000     [4.274 , 12.368]    
##         Robust         -         -     4.167     0.000     [6.017 , 16.704]    
## =============================================================================

对cut-point附近个体的敏感性分析

对称剔除真实断点左右一定较小范围内的个体。 关于范围的大小同样没有明确的标准,通常选取多个范围比较即可。

rdrobust_RD_hole_1 <- subset(rdrobust_RDsenate, abs(margin) > 0.3)
rdrobust_RD_hole_2 <- subset(rdrobust_RDsenate, abs(margin) > 0.4)
sen_hole_1 <- rdrobust(rdrobust_RD_hole_1$vote, rdrobust_RD_hole_1$margin, c = 0, p = 2,
                       kernel = 'triangular', bwselect = 'msetwo') # #除c意外的其他参数应该与前面分析保持一致
sen_hole_2 <- rdrobust(rdrobust_RD_hole_2$vote, rdrobust_RD_hole_2$margin, c = 0, p = 2,
                       kernel = 'triangular', bwselect = 'msetwo') # #除c意外的其他参数应该与前面分析保持一致
summary(sen_hole_1)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 1288
## BW type                      msetwo
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  592          696
## Eff. Number of Obs.             408          360
## Order est. (p)                    2            2
## Order bias  (q)                   3            3
## BW est. (h)                  22.386       21.767
## BW bias (b)                  34.724       33.145
## rho (h/b)                     0.645        0.657
## Unique Obs.                     592          659
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     7.247     1.984     3.653     0.000     [3.358 , 11.135]    
##         Robust         -         -     3.314     0.001     [2.981 , 11.614]    
## =============================================================================
summary(sen_hole_2)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 1282
## BW type                      msetwo
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  590          692
## Eff. Number of Obs.             396          360
## Order est. (p)                    2            2
## Order bias  (q)                   3            3
## BW est. (h)                  21.721       22.272
## BW bias (b)                  33.862       34.014
## rho (h/b)                     0.641        0.655
## Unique Obs.                     590          655
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     6.988     2.093     3.338     0.001     [2.885 , 11.091]    
##         Robust         -         -     3.009     0.003     [2.450 , 11.607]    
## =============================================================================

对带宽bandwidth的敏感性分析

#更换不同的带宽即可
loc_fit_3 <- rdrobust(rdrobust_RDsenate$vote, rdrobust_RDsenate$margin, c = 0, p = 2,
                      kernel = 'triangular', bwselect = 'mserd',h=28.028)# #如果有混杂因素需要控制,可以用covs = c('var1', 'var2')
loc_fit_4 <- rdrobust(rdrobust_RDsenate$vote, rdrobust_RDsenate$margin, c = 0, p = 2,
                      kernel = 'triangular', bwselect = 'msetwo',h=20.086)

summary(loc_fit_3)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 1297
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  595          702
## Eff. Number of Obs.             465          437
## Order est. (p)                    2            2
## Order bias  (q)                   3            3
## BW est. (h)                  28.028       28.028
## BW bias (b)                  28.028       28.028
## rho (h/b)                     1.000        1.000
## Unique Obs.                     595          702
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     7.418     1.717     4.320     0.000     [4.052 , 10.783]    
##         Robust         -         -     4.113     0.000     [4.739 , 13.369]    
## =============================================================================
summary(loc_fit_4)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 1297
## BW type                      Manual
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                  595          702
## Eff. Number of Obs.             391          346
## Order est. (p)                    2            2
## Order bias  (q)                   3            3
## BW est. (h)                  20.086       20.086
## BW bias (b)                  20.086       20.086
## rho (h/b)                     1.000        1.000
## Unique Obs.                     595          702
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional     8.157     1.954     4.175     0.000     [4.327 , 11.986]    
##         Robust         -         -     3.965     0.000     [5.189 , 15.332]    
## =============================================================================

原文出处:https://blog.csdn.net/2301_79584199/article/details/139248209