Introduction

Regression Discontinuity Design (RDD) has been exploding on the causal inference scene. Despite being around for decades it wasn’t until the early 2000’s RDD use would proliferate. As seen by the chart above the use of RDD in published studies went exponential after 1999–and not surprisingly Angrist played a role. The main reason for RDD’s popularity is its powerful ability to handle selection bias.

RDD is utilized in situations where a cutoff or threshold allows for delineation of treatment and control. Fortunately, society has lots of arbitrary thresholds such as GPA or test scores for school admission, cutoffs for blood alcohol levels, or body mass index in determining medical treatment for COVID.

The main underpinning of RDD is the continuity assumption. To satisfy expected values are required to show a continuous trend. Below we are simulating the data and then creating a threshold at 50 to visualize data on either side of this threshold. As shown in the resulting plot the trend is continuous ABSENT treatment. Through out this example Y can be thought of as a peak salary for a student during their career and X can be viewed as an important test score. This assumption allows for the data just above and just below the cutoff to be as good as randomized.

This will utilize several packages many of which may need to be installed.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.1 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(haven)
library(estimatr)
library(stats)
library(rdrobust)
library(rddensity)
library(rdd)
## Loading required package: sandwich
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: AER
## Loading required package: car
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some
## 
## Loading required package: survival
## Loading required package: Formula

Data and Framing the Question

This will look to replicate an examination of political economy by Lee, Moretti, and Butler in their 2004 piece “Do Voters Affect or Elect Policies? Evidence from the U.S. House”. This research looked to address a major question about the role of elections in candidate legislative voting behavior. Does competition for votes in an election drive candidates towards the center driving policy compromises (Convergence Theory)? Or are voters simply electing policies and the election is the process by which a policy option is chosen (Divergence Theory)?

Their findings support the latter. To examine this they analyzed data from US House races from 1946 to 1995. Using this data they attempt to estimate the effect of a democratic candidate’s electoral strength on subsequent roll-call voting records as measured by scores from the Americans for Democratic Action organization. Simply put if candidates perform well in an election how does that affect their legislative votes.

Selection bias creates serious hurdles to this analysis. The winners of the seat are endogenously determined by things voter demographics, candidate quality, and candidate resources. Enter RDD. By focusing in on a subset of elections around the threshold of winning, 50% of the vote, the variation in electoral strength can be seen as exogenous. Selection bias is circumvented as around the cutoff selection is as good as random. Around the margin of 50% is where voter preferences are most similar.

Choosing the range for this subset of data is important. Here the subset is defined as between 48% and 52%.

read_data <- function(df)
{
  full_path <- paste("https://raw.github.com/scunning1975/mixtape/master/", 
                     df, sep = "")
  df <- read_dta(full_path)
  return(df)
}

lmb_data <- read_data("lmb-data.dta")

lmb_subset <- lmb_data %>% 
  filter(lagdemvoteshare>.48 & lagdemvoteshare<.52)

Comparing Local and Global Regressions

While the code below for each is pretty similar, notice the difference in the data used. The local regressions use the subset +/- 0.50 while the global regressions use the entire dataset.

local_lm_1 <- lm_robust(score ~ lagdemocrat, data = lmb_subset, clusters = id)
local_lm_2 <- lm_robust(score ~ democrat, data = lmb_subset, clusters = id)
local_lm_3 <- lm_robust(democrat ~ lagdemocrat, data = lmb_subset, clusters = id)

summary(local_lm_1)
## 
## Call:
## lm_robust(formula = score ~ lagdemocrat, data = lmb_subset, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)    31.20      1.334   23.39 5.880e-80    28.57    33.82 454.0
## lagdemocrat    21.28      1.951   10.91 3.988e-26    17.45    25.11 912.9
## 
## Multiple R-squared:  0.1152 ,    Adjusted R-squared:  0.1142 
## F-statistic:   119 on 1 and 914 DF,  p-value: < 2.2e-16
summary(local_lm_2)
## 
## Call:
## lm_robust(formula = score ~ democrat, data = lmb_subset, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)    18.75     0.8432   22.23  2.313e-75    17.09    20.40 470.0
## democrat       47.71     1.3560   35.18 7.434e-172    45.04    50.37 909.8
## 
## Multiple R-squared:  0.5783 ,    Adjusted R-squared:  0.5779 
## F-statistic:  1238 on 1 and 914 DF,  p-value: < 2.2e-16
summary(local_lm_3)
## 
## Call:
## lm_robust(formula = democrat ~ lagdemocrat, data = lmb_subset, 
##     clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)   0.2418    0.02009   12.03 3.935e-29   0.2023   0.2812 454.0
## lagdemocrat   0.4843    0.02893   16.74 4.627e-55   0.4275   0.5411 912.9
## 
## Multiple R-squared:  0.2348 ,    Adjusted R-squared:  0.2339 
## F-statistic: 280.2 on 1 and 914 DF,  p-value: < 2.2e-16
global_lm_1 <- lm_robust(score ~ lagdemocrat, data = lmb_data, clusters = id)
global_lm_2 <- lm_robust(score ~ democrat, data = lmb_data, clusters = id)
global_lm_3 <- lm_robust(democrat ~ lagdemocrat, data = lmb_data, clusters = id)

summary(global_lm_1)
## 
## Call:
## lm_robust(formula = score ~ lagdemocrat, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)    23.54     0.3375   69.75        0    22.88    24.20  5669
## lagdemocrat    31.51     0.4837   65.14        0    30.56    32.45 12211
## 
## Multiple R-squared:  0.2267 ,    Adjusted R-squared:  0.2267 
## F-statistic:  4243 on 1 and 13587 DF,  p-value: < 2.2e-16
summary(global_lm_2)
## 
## Call:
## lm_robust(formula = score ~ democrat, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)    17.58     0.2626   66.94        0    17.06    18.09  5479
## democrat       40.76     0.4182   97.48        0    39.94    41.58 11758
## 
## Multiple R-squared:  0.3756 ,    Adjusted R-squared:  0.3755 
## F-statistic:  9502 on 1 and 13587 DF,  p-value: < 2.2e-16
summary(global_lm_3)
## 
## Call:
## lm_robust(formula = democrat ~ lagdemocrat, data = lmb_data, 
##     clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)   0.1201   0.004318   27.82 9.396e-160   0.1116   0.1286  5669
## lagdemocrat   0.8179   0.005098  160.44  0.000e+00   0.8079   0.8279 12211
## 
## Multiple R-squared:  0.6759 ,    Adjusted R-squared:  0.6759 
## F-statistic: 2.574e+04 on 1 and 13587 DF,  p-value: < 2.2e-16
#using all data (note data used is lmb_data, not lmb_subset)

lm_1 <- lm_robust(score ~ lagdemocrat, data = lmb_data, clusters = id)
lm_2 <- lm_robust(score ~ democrat, data = lmb_data, clusters = id)
lm_3 <- lm_robust(democrat ~ lagdemocrat, data = lmb_data, clusters = id)

summary(lm_1)
## 
## Call:
## lm_robust(formula = score ~ lagdemocrat, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)    23.54     0.3375   69.75        0    22.88    24.20  5669
## lagdemocrat    31.51     0.4837   65.14        0    30.56    32.45 12211
## 
## Multiple R-squared:  0.2267 ,    Adjusted R-squared:  0.2267 
## F-statistic:  4243 on 1 and 13587 DF,  p-value: < 2.2e-16
summary(lm_2)
## 
## Call:
## lm_robust(formula = score ~ democrat, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)    17.58     0.2626   66.94        0    17.06    18.09  5479
## democrat       40.76     0.4182   97.48        0    39.94    41.58 11758
## 
## Multiple R-squared:  0.3756 ,    Adjusted R-squared:  0.3755 
## F-statistic:  9502 on 1 and 13587 DF,  p-value: < 2.2e-16
summary(lm_3)
## 
## Call:
## lm_robust(formula = democrat ~ lagdemocrat, data = lmb_data, 
##     clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##             Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper    DF
## (Intercept)   0.1201   0.004318   27.82 9.396e-160   0.1116   0.1286  5669
## lagdemocrat   0.8179   0.005098  160.44  0.000e+00   0.8079   0.8279 12211
## 
## Multiple R-squared:  0.6759 ,    Adjusted R-squared:  0.6759 
## F-statistic: 2.574e+04 on 1 and 13587 DF,  p-value: < 2.2e-16
lmb_data <- lmb_data %>% 
  mutate(demvoteshare_c = demvoteshare - 0.5)

lm_1 <- lm_robust(score ~ lagdemocrat + demvoteshare_c, data = lmb_data, clusters = id)
lm_2 <- lm_robust(score ~ democrat + demvoteshare_c, data = lmb_data, clusters = id)
lm_3 <- lm_robust(democrat ~ lagdemocrat + demvoteshare_c, data = lmb_data, clusters = id)

summary(lm_1)
## 
## Call:
## lm_robust(formula = score ~ lagdemocrat + demvoteshare_c, data = lmb_data, 
##     clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper   DF
## (Intercept)      22.883     0.4433  51.616  0.000e+00   22.014   23.753 6255
## lagdemocrat      33.451     0.8482  39.436 4.502e-307   31.788   35.114 6936
## demvoteshare_c   -5.626     1.8982  -2.964  3.056e-03   -9.347   -1.904 4626
## 
## Multiple R-squared:  0.2274 ,    Adjusted R-squared:  0.2273 
## F-statistic:  2116 on 2 and 13576 DF,  p-value: < 2.2e-16
summary(lm_2)
## 
## Call:
## lm_robust(formula = score ~ democrat + demvoteshare_c, data = lmb_data, 
##     clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper   DF
## (Intercept)       11.03     0.3363   32.81 2.384e-219    10.37    11.69 6820
## democrat          58.50     0.6564   89.13  0.000e+00    57.22    59.79 8717
## demvoteshare_c   -48.94     1.6416  -29.81 9.262e-180   -52.16   -45.72 4955
## 
## Multiple R-squared:  0.4242 ,    Adjusted R-squared:  0.4241 
## F-statistic:  6192 on 2 and 13576 DF,  p-value: < 2.2e-16
summary(lm_3)
## 
## Call:
## lm_robust(formula = democrat ~ lagdemocrat + demvoteshare_c, 
##     data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper   DF
## (Intercept)      0.2117   0.005275   40.13 1.512e-313   0.2013   0.2220 6255
## lagdemocrat      0.5516   0.010324   53.43  0.000e+00   0.5314   0.5718 6936
## demvoteshare_c   0.7725   0.018758   41.18 3.721e-316   0.7358   0.8093 4626
## 
## Multiple R-squared:  0.7354 ,    Adjusted R-squared:  0.7353 
## F-statistic: 4.896e+04 on 2 and 13576 DF,  p-value: < 2.2e-16
lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c, 
                  data = lmb_data, clusters = id)
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c, 
                  data = lmb_data, clusters = id)
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c, 
                  data = lmb_data, clusters = id)

summary(lm_1)
## 
## Call:
## lm_robust(formula = score ~ lagdemocrat * demvoteshare_c, data = lmb_data, 
##     clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                            Estimate Std. Error t value   Pr(>|t|) CI Lower
## (Intercept)                   31.44     0.5411   58.10  0.000e+00    30.37
## lagdemocrat                   30.51     0.8173   37.33 2.081e-273    28.91
## demvoteshare_c                66.04     3.1610   20.89  7.355e-80    59.84
## lagdemocrat:demvoteshare_c   -96.47     3.8530  -25.04 1.623e-117  -104.03
##                            CI Upper     DF
## (Intercept)                   32.50 2241.8
## lagdemocrat                   32.11 5781.1
## demvoteshare_c                72.25  935.8
## lagdemocrat:demvoteshare_c   -88.92 1647.0
## 
## Multiple R-squared:  0.2669 ,    Adjusted R-squared:  0.2668 
## F-statistic:  1863 on 3 and 13576 DF,  p-value: < 2.2e-16
summary(lm_2)
## 
## Call:
## lm_robust(formula = score ~ democrat * demvoteshare_c, data = lmb_data, 
##     clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                         Estimate Std. Error t value   Pr(>|t|) CI Lower
## (Intercept)               16.816     0.4186  40.174 1.076e-277    16.00
## democrat                  55.431     0.6374  86.960  0.000e+00    54.18
## demvoteshare_c            -5.683     2.6106  -2.177  2.976e-02   -10.81
## democrat:demvoteshare_c  -55.152     3.2189 -17.134  5.886e-60   -61.47
##                         CI Upper     DF
## (Intercept)              17.6367 2730.7
## democrat                 56.6809 6390.3
## demvoteshare_c           -0.5592  886.2
## democrat:demvoteshare_c -48.8376 1417.7
## 
## Multiple R-squared:  0.4344 ,    Adjusted R-squared:  0.4343 
## F-statistic:  4161 on 3 and 13576 DF,  p-value: < 2.2e-16
summary(lm_3)
## 
## Call:
## lm_robust(formula = democrat ~ lagdemocrat * demvoteshare_c, 
##     data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                            Estimate Std. Error t value   Pr(>|t|) CI Lower
## (Intercept)                  0.2869   0.007752   37.01 2.051e-234   0.2717
## lagdemocrat                  0.5257   0.010453   50.30  0.000e+00   0.5052
## demvoteshare_c               1.4029   0.044367   31.62 7.355e-150   1.3159
## lagdemocrat:demvoteshare_c  -0.8486   0.049019  -17.31  8.075e-62  -0.9448
##                            CI Upper     DF
## (Intercept)                  0.3021 2241.8
## lagdemocrat                  0.5462 5781.1
## demvoteshare_c               1.4900  935.8
## lagdemocrat:demvoteshare_c  -0.7525 1647.0
## 
## Multiple R-squared:  0.7489 ,    Adjusted R-squared:  0.7488 
## F-statistic: 2.519e+04 on 3 and 13576 DF,  p-value: < 2.2e-16
lmb_data <- lmb_data %>% 
  mutate(demvoteshare_sq = demvoteshare_c^2)

lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq, 
                  data = lmb_data, clusters = id)
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c + democrat*demvoteshare_sq, 
                  data = lmb_data, clusters = id)
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq, 
                  data = lmb_data, clusters = id)

summary(lm_1)
## 
## Call:
## lm_robust(formula = score ~ lagdemocrat * demvoteshare_c + lagdemocrat * 
##     demvoteshare_sq, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                             Estimate Std. Error t value   Pr(>|t|) CI Lower
## (Intercept)                    33.55     0.7135  47.016 3.892e-226    32.15
## lagdemocrat                    13.03     1.2856  10.135  4.366e-23    10.51
## demvoteshare_c                134.98     9.7861  13.793  7.598e-23   115.50
## demvoteshare_sq               212.13    22.7626   9.319  1.334e-14   166.86
## lagdemocrat:demvoteshare_c     57.05    15.4123   3.702  2.617e-04    26.70
## lagdemocrat:demvoteshare_sq  -641.85    31.3309 -20.486  5.818e-53  -703.60
##                             CI Upper      DF
## (Intercept)                    34.95  752.38
## lagdemocrat                    15.55 1040.83
## demvoteshare_c                154.45   79.97
## demvoteshare_sq               257.39   84.13
## lagdemocrat:demvoteshare_c     87.40  257.66
## lagdemocrat:demvoteshare_sq  -580.10  220.74
## 
## Multiple R-squared:  0.3707 ,    Adjusted R-squared:  0.3705 
## F-statistic:  1526 on 5 and 13576 DF,  p-value: < 2.2e-16
summary(lm_2)
## 
## Call:
## lm_robust(formula = score ~ democrat * demvoteshare_c + democrat * 
##     demvoteshare_sq, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                          Estimate Std. Error t value   Pr(>|t|) CI Lower
## (Intercept)                 15.61     0.5748  27.152 6.275e-140    14.48
## democrat                    44.40     0.9087  48.865  0.000e+00    42.62
## demvoteshare_c             -23.85     6.7132  -3.553  3.873e-04   -37.01
## demvoteshare_sq            -41.73    14.6858  -2.841  4.578e-03   -70.55
## democrat:demvoteshare_c    111.90     9.7809  11.440  5.295e-30    92.72
## democrat:demvoteshare_sq  -229.95    19.5462 -11.765  5.378e-31  -268.29
##                          CI Upper   DF
## (Intercept)                 16.73 2160
## democrat                    46.18 4444
## demvoteshare_c             -10.69 2961
## demvoteshare_sq            -12.91 1040
## democrat:demvoteshare_c    131.07 6122
## democrat:demvoteshare_sq  -191.62 2113
## 
## Multiple R-squared:  0.4559 ,    Adjusted R-squared:  0.4557 
## F-statistic:  2589 on 5 and 13576 DF,  p-value: < 2.2e-16
summary(lm_3)
## 
## Call:
## lm_robust(formula = democrat ~ lagdemocrat * demvoteshare_c + 
##     lagdemocrat * demvoteshare_sq, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                             Estimate Std. Error  t value   Pr(>|t|) CI Lower
## (Intercept)                  0.32965    0.01272  25.9114 2.627e-106   0.3047
## lagdemocrat                  0.32168    0.01844  17.4459  5.475e-60   0.2855
## demvoteshare_c               2.79834    0.19629  14.2563  1.149e-23   2.4077
## demvoteshare_sq              4.29401    0.45554   9.4262  8.122e-15   3.3881
## lagdemocrat:demvoteshare_c   0.09094    0.24142   0.3767  7.067e-01  -0.3845
## lagdemocrat:demvoteshare_sq -8.80437    0.51723 -17.0223  4.614e-42  -9.8237
##                             CI Upper      DF
## (Intercept)                   0.3546  752.38
## lagdemocrat                   0.3579 1040.83
## demvoteshare_c                3.1890   79.97
## demvoteshare_sq               5.1999   84.13
## lagdemocrat:demvoteshare_c    0.5663  257.66
## lagdemocrat:demvoteshare_sq  -7.7850  220.74
## 
## Multiple R-squared:  0.822 , Adjusted R-squared:  0.8219 
## F-statistic: 8.973e+04 on 5 and 13576 DF,  p-value: < 2.2e-16
lmb_data <- lmb_data %>% 
  filter(demvoteshare > .45 & demvoteshare < .55) %>%
  mutate(demvoteshare_sq = demvoteshare_c^2)

lm_1 <- lm_robust(score ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq, 
                  data = lmb_data, clusters = id)
lm_2 <- lm_robust(score ~ democrat*demvoteshare_c + democrat*demvoteshare_sq, 
                  data = lmb_data, clusters = id)
lm_3 <- lm_robust(democrat ~ lagdemocrat*demvoteshare_c + lagdemocrat*demvoteshare_sq, 
                  data = lmb_data, clusters = id)

summary(lm_1)
## 
## Call:
## lm_robust(formula = score ~ lagdemocrat * demvoteshare_c + lagdemocrat * 
##     demvoteshare_sq, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                               Estimate Std. Error t value   Pr(>|t|)   CI Lower
## (Intercept)                     37.121     0.9689  38.312 4.125e-185     35.219
## lagdemocrat                      7.347     1.5872   4.629  4.024e-06      4.233
## demvoteshare_c                 830.925    20.9558  39.651 2.272e-139    789.725
## demvoteshare_sq               5333.335   838.3250   6.362  4.513e-10   3686.254
## lagdemocrat:demvoteshare_c    -156.876    35.7396  -4.389  1.343e-05   -227.067
## lagdemocrat:demvoteshare_sq -10116.678  1435.1301  -7.049  3.858e-12 -12933.700
##                             CI Upper     DF
## (Intercept)                    39.02  822.7
## lagdemocrat                    10.46 1385.7
## demvoteshare_c                872.12  392.7
## demvoteshare_sq              6980.42  499.1
## lagdemocrat:demvoteshare_c    -86.69  598.9
## lagdemocrat:demvoteshare_sq -7299.66  808.3
## 
## Multiple R-squared:  0.4447 ,    Adjusted R-squared:  0.4435 
## F-statistic:   469 on 5 and 2386 DF,  p-value: < 2.2e-16
summary(lm_2)
## 
## Call:
## lm_robust(formula = score ~ democrat * demvoteshare_c + democrat * 
##     demvoteshare_sq, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                          Estimate Std. Error t value  Pr(>|t|)  CI Lower
## (Intercept)                 21.44      1.819 11.7874 4.848e-26     17.86
## democrat                    45.19      2.679 16.8688 7.128e-51     39.93
## demvoteshare_c             450.85    161.352  2.7942 5.416e-03    133.79
## demvoteshare_sq           7878.90   2995.192  2.6305 8.763e-03   1995.55
## democrat:demvoteshare_c   -688.34    247.711 -2.7788 5.570e-03  -1174.51
## democrat:demvoteshare_sq -3887.82   4802.371 -0.8096 4.184e-01 -13311.05
##                          CI Upper     DF
## (Intercept)                 25.02  263.6
## democrat                    50.45  500.2
## demvoteshare_c             767.91  469.4
## demvoteshare_sq          13762.26  552.5
## democrat:demvoteshare_c   -202.18  894.8
## democrat:demvoteshare_sq  5535.41 1060.8
## 
## Multiple R-squared:  0.5626 ,    Adjusted R-squared:  0.5617 
## F-statistic: 617.6 on 5 and 2386 DF,  p-value: < 2.2e-16
summary(lm_3)
## 
## Call:
## lm_robust(formula = democrat ~ lagdemocrat * demvoteshare_c + 
##     lagdemocrat * demvoteshare_sq, data = lmb_data, clusters = id)
## 
## Standard error type:  CR2 
## 
## Coefficients:
##                              Estimate Std. Error  t value   Pr(>|t|)  CI Lower
## (Intercept)                    0.4181    0.01316  31.7648 3.807e-145    0.3923
## lagdemocrat                    0.1674    0.01955   8.5660  2.810e-17    0.1291
## demvoteshare_c                15.6990    0.22762  68.9697 1.503e-221   15.2515
## demvoteshare_sq               91.6069   10.89337   8.4094  4.351e-16   70.2044
## lagdemocrat:demvoteshare_c     0.1245    0.35711   0.3487  7.275e-01   -0.5768
## lagdemocrat:demvoteshare_sq -188.3286   16.35131 -11.5176  1.576e-28 -220.4247
##                              CI Upper     DF
## (Intercept)                    0.4439  822.7
## lagdemocrat                    0.2058 1385.7
## demvoteshare_c                16.1465  392.7
## demvoteshare_sq              113.0094  499.1
## lagdemocrat:demvoteshare_c     0.8258  598.9
## lagdemocrat:demvoteshare_sq -156.2326  808.3
## 
## Multiple R-squared:  0.7743 ,    Adjusted R-squared:  0.7738 
## F-statistic:  6704 on 5 and 2386 DF,  p-value: < 2.2e-16
#aggregating the data
categories <- lmb_data$lagdemvoteshare

demmeans <- split(lmb_data$score, cut(lmb_data$lagdemvoteshare, 100)) %>% 
  lapply(mean) %>% 
  unlist()

agg_lmb_data <- data.frame(score = demmeans, lagdemvoteshare = seq(0.01,1, by = 0.01))

#plotting
lmb_data <- lmb_data %>% 
  mutate(gg_group = case_when(lagdemvoteshare > 0.5 ~ 1, TRUE ~ 0))
         
ggplot(lmb_data, aes(lagdemvoteshare, score)) +
  geom_point(aes(x = lagdemvoteshare, y = score), data = agg_lmb_data) +
  stat_smooth(aes(lagdemvoteshare, score, group = gg_group), method = "lm", 
              formula = y ~ x + I(x^2)) +
  xlim(0,1) + ylim(0,100) +
  geom_vline(xintercept = 0.5)
## Warning: Removed 195 rows containing non-finite values (stat_smooth).
## Warning: Removed 39 rows containing missing values (geom_point).

ggplot(lmb_data, aes(lagdemvoteshare, score)) +
  geom_point(aes(x = lagdemvoteshare, y = score), data = agg_lmb_data) +
  stat_smooth(aes(lagdemvoteshare, score, group = gg_group), method = "loess") +
  xlim(0,1) + ylim(0,100) +
  geom_vline(xintercept = 0.5)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 195 rows containing non-finite values (stat_smooth).
## Removed 39 rows containing missing values (geom_point).

ggplot(lmb_data, aes(lagdemvoteshare, score)) +
  geom_point(aes(x = lagdemvoteshare, y = score), data = agg_lmb_data) +
  stat_smooth(aes(lagdemvoteshare, score, group = gg_group), method = "lm") +
  xlim(0,1) + ylim(0,100) +
  geom_vline(xintercept = 0.5)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 195 rows containing non-finite values (stat_smooth).
## Removed 39 rows containing missing values (geom_point).

smooth_dem0 <- lmb_data %>% 
  filter(democrat == 0) %>% 
  select(score, demvoteshare)
smooth_dem0 <- as_tibble(ksmooth(smooth_dem0$demvoteshare, smooth_dem0$score, 
                                 kernel = "box", bandwidth = 0.1))


smooth_dem1 <- lmb_data %>% 
  filter(democrat == 1) %>% 
  select(score, demvoteshare) %>% 
  na.omit()
smooth_dem1 <- as_tibble(ksmooth(smooth_dem1$demvoteshare, smooth_dem1$score, 
                                 kernel = "box", bandwidth = 0.1))

ggplot() + 
  geom_smooth(aes(x, y), data = smooth_dem0) +
  geom_smooth(aes(x, y), data = smooth_dem1) +
  geom_vline(xintercept = 0.5)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Computation failed in `stat_smooth()`:
## NA/NaN/Inf in foreign function call (arg 3)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Computation failed in `stat_smooth()`:
## NA/NaN/Inf in foreign function call (arg 3)

rdr <- rdrobust(y = lmb_data$score,
                x = lmb_data$demvoteshare, c = 0.5)
## [1] "Mass points detected in the running variable."
summary(rdr)
## Sharp RD estimates using local polynomial regression.
## 
## Number of Obs.                 2387
## BW type                       mserd
## Kernel                   Triangular
## VCE method                       NN
## 
## Number of Obs.                 1206         1181
## Eff. Number of Obs.             369          338
## Order est. (p)                    1            1
## Order bias  (q)                   2            2
## BW est. (h)                   0.015        0.015
## BW bias (b)                   0.022        0.022
## rho (h/b)                     0.678        0.678
## Unique Obs.                     636          623
## 
## =============================================================================
##         Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
## =============================================================================
##   Conventional    43.536     3.119    13.961     0.000    [37.424 , 49.649]    
##         Robust         -         -    11.546     0.000    [35.879 , 50.550]    
## =============================================================================
DCdensity(lmb_data$demvoteshare, cutpoint = 0.5)

## [1] 0.2666621
density <- rddensity(lmb_data$demvoteshare, c = 0.5)
rdplotdensity(density, lmb_data$demvoteshare)

## $Estl
## Call: lpdensity
## 
## Sample size                                      1206
## Polynomial order for point estimation    (p=)    2
## Order of derivative estimated            (v=)    1
## Polynomial order for confidence interval (q=)    3
## Kernel function                                  triangular
## Scaling factor                                   0.505029337803856
## Bandwidth method                                 user provided
## 
## Use summary(...) to show estimates.
## 
## $Estr
## Call: lpdensity
## 
## Sample size                                      1181
## Polynomial order for point estimation    (p=)    2
## Order of derivative estimated            (v=)    1
## Polynomial order for confidence interval (q=)    3
## Kernel function                                  triangular
## Scaling factor                                   0.49455155071249
## Bandwidth method                                 user provided
## 
## Use summary(...) to show estimates.
## 
## $Estplot