class: title-slide, left, bottom # Regression discontinuity design ---- ## ** ** ### James Malloy ### March 2nd, 2022 --- class: inverse, middle, center # AGENDA ## 1.Overview ## 2.Practice example ## 3.Recap <!-- 1. Arbitrary cutoffs and causal inference --> <!-- 2. Drawing lines and measuring gaps --> <!-- 3. Main RDD Concerns --> <!-- A general description of the design, including general benefits and drawbacks --> <!-- •A brief overview of the example used (from Mixptape) --> <!-- •Walkthough of general data pre-checks --> <!-- •Implementation of the design, walking through the Stata code --> <!-- •Interpretation of Results --> <!-- •Standard (and/or updated!) robustness checks --> --- # Overview <!-- start of panel set --> .panelset[ .panel[.panel-name[Quasi-experiment] #### Regression discontinuity * a quasi-experimental design (i.e. using context) * compares groups along some numerical cutoff (e.g. a minimum score) ] <!-- end of panel --> .panel[.panel-name[Rules!] * RDD is rule-based * assignment based on threshhold * ethical allocation of resources ]<!-- end of panel --> .panel[.panel-name[Key terms] ### Running/forcing variable * index or measure that determines eligibility ### Cutoff/Cutpoint/Threshold * number that formally assigns access to program ] <!-- end of panel --> .panel[.panel-name[Discontinuities everywere] ### Poverty Line 2019 .pull-left[ <table> <thead> <tr> <th style="text-align:center;"> Size </th> <th style="text-align:center;"> Annual </th> <th style="text-align:center;"> Monthly </th> <th style="text-align:center;"> 138% </th> <th style="text-align:center;"> 150% </th> <th style="text-align:center;"> 200% </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> $12,760 </td> <td style="text-align:center;"> $1,063 </td> <td style="text-align:center;"> $17,609 </td> <td style="text-align:center;"> $19,140 </td> <td style="text-align:center;"> $25,520 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> $17,240 </td> <td style="text-align:center;"> $1,437 </td> <td style="text-align:center;"> $23,791 </td> <td style="text-align:center;"> $25,860 </td> <td style="text-align:center;"> $34,480 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> $21,720 </td> <td style="text-align:center;"> $1,810 </td> <td style="text-align:center;"> $29,974 </td> <td style="text-align:center;"> $32,580 </td> <td style="text-align:center;"> $43,440 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> $26,200 </td> <td style="text-align:center;"> $2,183 </td> <td style="text-align:center;"> $36,156 </td> <td style="text-align:center;"> $39,300 </td> <td style="text-align:center;"> $52,400 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> $30,680 </td> <td style="text-align:center;"> $2,557 </td> <td style="text-align:center;"> $42,338 </td> <td style="text-align:center;"> $46,020 </td> <td style="text-align:center;"> $61,360 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> $35,160 </td> <td style="text-align:center;"> $2,930 </td> <td style="text-align:center;"> $48,521 </td> <td style="text-align:center;"> $52,740 </td> <td style="text-align:center;"> $70,320 </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> $39,640 </td> <td style="text-align:center;"> $3,303 </td> <td style="text-align:center;"> $54,703 </td> <td style="text-align:center;"> $59,460 </td> <td style="text-align:center;"> $79,280 </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> $44,120 </td> <td style="text-align:center;"> $3,677 </td> <td style="text-align:center;"> $60,886 </td> <td style="text-align:center;"> $66,180 </td> <td style="text-align:center;"> $88,240 </td> </tr> </tbody> </table> ] .pull-right[ .box-inv-2.smaller[**Medicaid**<br>138%*] .box-inv-2.smaller[**ACA subsidies**<br>138–400%*] .box-inv-2.smaller[**CHIP**<br>200%] .box-inv-2.smaller[**SNAP/Free lunch**<br>130%] .box-inv-2.smaller[**Reduced lunch**<br>130–185%] ] ] <!-- end of panel --> .panel[.panel-name[Intuition] .center[ #### **The people right before and right after the threshold are essentially the same.** ] <img src="rdd_files/figure-html/tutoring-running-1.png" width="80%" /> ] .panel[.panel-name[Sharp vs. Fuzzy]  ] ] --- class: inverse, center, middle # RDD Steps ### 1.Find a rule ### 2.Sharp or Fuzzy? ### 3.Check for manipulation around cut point. ### 4.Check for discontinuity in outcome across running variable. ### 5.Measure size of the effect --- # Step 1: Find a Rule .center[ <img src="https://media3.giphy.com/media/E6wVXmnHJfk88/200.gif" alt="Learn Your Rules GIFs - Get the best GIF on GIPHY" jsaction="load:XAeZkd;" jsname="HiaYvf" class="n3VNCb" data-noaft="1" style="width: 650px; height: 350px; margin: 0px;"> ] --- # Step 2: Sharp or Fuzzy? .panelset[ .panel[.panel-name[Plot] <img src="rdd_files/figure-html/unnamed-chunk-1-1.png" width="80%" /> ] <!-- end of panel --> .panel[.panel-name[R Code] ```r set.seed(1234) ggplot(data = tutoring, aes(x = entrance_exam, y = tutoring, color = tutoring)) + geom_jitter(size = .5, alpha = .5, width = .5, height = .15) + geom_vline(xintercept = 70) ``` ]<!-- end of panel --> ] --- ### Step 3: Check for manipulation at cutpoint. <!-- We want to make sure nobody is manipulating entrance to the program. If we saw a giant clump of people at 69, that could indicate that people are purposely missing questions in order to gain access into the program which would undermine our all-else equal assumption. So what we have to do is check to see if there is a discontinuity at that cutpoint. Ideally we DON'T want there to be. --> .panelset[ .panel[.panel-name[Histogram] <!-- --> ] <!-- end of panel --> .panel[.panel-name[Histogram R Code] ```r ggplot(tutoring, aes(x = entrance_exam, fill = tutoring)) + geom_histogram(binwidth = 3, boundary = 70, color = "white") + geom_vline(xintercept = 70) ``` ] <!-- end of panel --> .panel[.panel-name[McCrary Density Test] <!-- --> ``` ## $Estl ## Call: lpdensity ## ## Sample size 238 ## Polynomial order for point estimation (p=) 2 ## Order of derivative estimated (v=) 1 ## Polynomial order for confidence interval (q=) 3 ## Kernel function triangular ## Scaling factor 0.237237237237237 ## Bandwidth method user provided ## ## Use summary(...) to show estimates. ## ## $Estr ## Call: lpdensity ## ## Sample size 762 ## Polynomial order for point estimation (p=) 2 ## Order of derivative estimated (v=) 1 ## Polynomial order for confidence interval (q=) 3 ## Kernel function triangular ## Scaling factor 0.761761761761762 ## Bandwidth method user provided ## ## Use summary(...) to show estimates. ## ## $Estplot ``` <!-- --> ] <!-- end of panel --> .panel[.panel-name[McCrary R Code] ```r rdplotdensity(rdd = rddensity(tutoring$entrance_exam, c = 70), X = tutoring$entrance_exam, type = "both") ``` ]<!-- end of panel --> ] --- ### Step 4: Check for discontinuity in outcome across running variable. .panelset[ .panel[.panel-name[Plot] <!-- --> ] <!-- end of panel --> .panel[.panel-name[R code] ```r ggplot(tutoring, aes(x = entrance_exam, y = exit_exam, color = tutoring)) + geom_vline(xintercept = 70) + geom_point(size = .75, alpha = .5) + geom_smooth(data = filter(tutoring, entrance_exam <= 70), method = "lm") + geom_smooth(data = filter(tutoring, entrance_exam > 70), method = "lm") ``` ] <!-- end of panel --> ] --- ### Step 5: Measure effect. .panelset[ ### Parametric <!-- The point of centering your variable is that it lets your y-axis also be your centered line and now you can run a regression. --> .panel[.panel-name[Parametric (No bandwidth)] .pull-left[ ```r tutoring_centered <- tutoring %>% mutate(entrance_centered = entrance_exam - 70) model_simple <- lm(exit_exam ~ entrance_centered + tutoring, data = tutoring_centered) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 59.3387171 </td> <td style="text-align:right;"> 0.4396235 </td> <td style="text-align:right;"> 134.97623 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> entrance_centered </td> <td style="text-align:right;"> 0.5140333 </td> <td style="text-align:right;"> 0.0268423 </td> <td style="text-align:right;"> 19.15012 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> tutoringTRUE </td> <td style="text-align:right;"> 10.9675612 </td> <td style="text-align:right;"> 0.8018465 </td> <td style="text-align:right;"> 13.67788 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> ] .pull-right[ <!-- --> ] ] <!-- end of panel --> .panel[.panel-name[Parametric (with bandwidth)] .pull-left[ <!-- --> ] .pull-right[ ```r model_bw10 <- lm(exit_exam ~ entrance_centered + tutoring, data = filter(tutoring_centered, entrance_centered <= 10, entrance_centered >= -10)) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 60.1070515 </td> <td style="text-align:right;"> 0.7419761 </td> <td style="text-align:right;"> 81.009415 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> entrance_centered </td> <td style="text-align:right;"> 0.4336777 </td> <td style="text-align:right;"> 0.1140631 </td> <td style="text-align:right;"> 3.802086 </td> <td style="text-align:right;"> 0.0001659 </td> </tr> <tr> <td style="text-align:left;"> tutoringTRUE </td> <td style="text-align:right;"> 9.8120393 </td> <td style="text-align:right;"> 1.3094135 </td> <td style="text-align:right;"> 7.493461 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> </tbody> </table> ] ]<!-- end of panel --> .panel[.panel-name[Nonparametric] .pull-left[ ```r rdrobust(y = tutoring$exit_exam, x = tutoring$entrance_exam, c = 70) %>% summary() ``` ``` ## Call: rdrobust ## ## Number of Obs. 1000 ## BW type mserd ## Kernel Triangular ## VCE method NN ## ## Number of Obs. 238 762 ## Eff. Number of Obs. 121 184 ## Order est. (p) 1 1 ## Order bias (q) 2 2 ## BW est. (h) 7.616 7.616 ## BW bias (b) 11.669 11.669 ## rho (h/b) 0.653 0.653 ## Unique Obs. 238 762 ## ## ============================================================================= ## Method Coef. Std. Err. z P>|z| [ 95% C.I. ] ## ============================================================================= ## Conventional -9.992 1.708 -5.852 0.000 [-13.339 , -6.646] ## Robust - - -4.992 0.000 [-14.244 , -6.212] ## ============================================================================= ``` ] .pull-right[ <!-- --> ] ] ] ---