library(clubSandwich)
## Registered S3 method overwritten by 'clubSandwich':
## method from
## bread.mlm sandwich
library(car)
## Loading required package: carData
library(AER)
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.6.2
library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.3
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.2
## Warning: package 'tibble' was built under R version 3.6.2
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.2
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x dplyr::recode() masks car::recode()
## x purrr::some() masks car::some()
library(texreg)
## Version: 1.36.23
## Date: 2017-03-03
## Author: Philip Leifeld (University of Glasgow)
##
## Please cite the JSS article in your publications -- see citation("texreg").
##
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
##
## extract
1. The individual causal effect or treatment effect for individual i is defined as the difference between two states of the world:
\[\delta_i = Y_{1i} - Y_{0i}\]
The definition depends on the potential outcomes; since it is the comparison of potential outcomes for the same individual i, at the same time after the treatment.
We are in this particular case interested in the effects of hydroxychloroquine as a treatment of confirmed COVID-19 infected adults.
Thus, the potential outcomes are:
# What are the potential outcomes?
x = c("0,0", "0,1")
y = c("1,0", "1,1")
potential.outcomes = rbind(x,y)
colnames(potential.outcomes) = rbind("No Treatment (D=0)", "Treatment (D=1)")
rownames(potential.outcomes) = cbind("Positive results (Y=0)", "Negative results (Y=1)")
potential.outcomes
## No Treatment (D=0) Treatment (D=1)
## Positive results (Y=0) "0,0" "0,1"
## Negative results (Y=1) "1,0" "1,1"
2. The fundamental problem is that we only have realised outcomes.
A causality is present if there is a difference in the effect between the actual outcome and the counter-factual outcome. Thus, our ability to make causal claims depends on the assumptions on the counterfactual state of world.
The fundamental problem in our case arises, because we are not able to observe the same COVID-19 patient treated and untreated simultaneously.
We can address this problem by the means of randomization, and the average treatment effect.
The average treatment effect is obtained by comparing observed outcomes, and because there only is one realized potential outcome per individual, we need to consider multiple units/groups in order to average out the treatment effect.
load("opentable.Rdata")
1.
# DiD
formula1 = reservations ~ treated_post + state + date
mod1 = lm(formula1, data = data)
test1 = coeftest(mod1, vcov. = vcovCR(mod1, cluster = data$state, type = "CR0"))
test1 %>% screenreg(digits = 2, omit.coef = "state")
##
## ==========================
## Model 1
## --------------------------
## (Intercept) 52506.62 ***
## (3085.12)
## treated_post -38.24 ***
## (2.50)
## date -2.87 ***
## (0.17)
## ==========================
## *** p < 0.001, ** p < 0.01, * p < 0.05
The above result is significant at all conventional levels. The interpretation of the coefficient of interest is that the the early lockdown reduced the average reservations in OpenTable compared to last year by 38.24% across states.
Further note that the regression above is performed with cluster robust errors at state level, and thereby allowing autocorrelation within states. Despite this, our estimates prevail.
2.
# LPM
formula2 = closed ~ treated_post + state + date
mod2 = lm(formula2, data = data)
test2 = coeftest(mod2, vcov. = vcovCR(mod2, cluster = data$state, type = "CR0"))
test2 %>% screenreg(digits = 2, omit.coef = "state")
##
## ========================
## Model 1
## ------------------------
## (Intercept) -70.66 ***
## (14.66)
## treated_post 0.42 ***
## (0.04)
## date 0.00 ***
## (0.00)
## ========================
## *** p < 0.001, ** p < 0.01, * p < 0.05
The estimate of our coefficient of interest is significant at all conventional levels. The standard errors in the LPM are by definition heteroskedastic. They rely on x, because they are Bernoulli distributed. However, the cluster robust error also controls heteroskedasticity.
The interpretation of the coefficient of interest is that if a restaurant is in a state of lockdown, the probability of being closed (having 100% fewer costumers than last year) increases by 42%.
However, the LPM model does not restrit the range of probability [0:1]. With an intercept much lower than 0, we have outliers outside the range of probability. Thus, we cannot interpret the causal effect with the LPM in this case.
3.a
formula3 = reservations ~ pre_trend_data + pre_day + pre_2 + pre_3 + pre_4 + pre_5 + pre_6 + pre_7
mod3 = lm(formula3, data = data)
hypothesis = c("pre_2=0", "pre_3=0", "pre_4=0", "pre_5=0", "pre_6=0", "pre_7=0")
linearHypothesis(mod3, hypothesis)
## Linear hypothesis test
##
## Hypothesis:
## pre_2 = 0
## pre_3 = 0
## pre_4 = 0
## pre_5 = 0
## pre_6 = 0
## pre_7 = 0
##
## Model 1: restricted model
## Model 2: reservations ~ pre_trend_data + pre_day + pre_2 + pre_3 + pre_4 +
## pre_5 + pre_6 + pre_7
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 959 257458
## 2 953 254027 6 3430.2 2.1448 0.04617 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.b With a p-value of 0.0462, we can reject the joint null hypothesis that we have parallel trends down to 5% level of significance. Thus, we should be careful in preceeding with the did-estimate.
4. The figure depicts how the median of OpenTable reservations declines from 20 days prior to the lockdown date. Therefore, we are not able to capture the causal effeect with d-in-d estimate, as there already was a declining trend.
Without the common trends assumption, we cannot establish a valid control group, and further obtain the average treatment effect.
Instead, an alternate “treatment” could be used to study what impacted the decline in table reservations, instead of the lockdown itself.
1. Intention to treat problem: In order to avoid the virus, people might already keep some degree of social distancing up to the imposed mandatory lockdown.
Or, the presence of non-compliance. maybe som people prefer to be outside in the rain.
In general, IV heterogeneity is always a threat.
2.a First, we want our instrument rainfall to be correlated with social distancing. But it has to comply with the assumptions of:
Further:
If this is the case. The randomness of rainfall is not correlated with any other possible confounders except for the treatment. This rule out any causal effect of rainfall on the social distancing. Except that there is a causal effect of the treatment on the outcome, social distancing.
2.b The exclusion restriction is, that rainfall (bad weather) kept people inside and by that a natural distance from each other. Thus, the natural distance, caused by the rain, reduced COVID-19 cases.
3.a The results are significant and aligned with the identifying assumption.
F-STAT: 11.68 - Rule of thumb
3.b. The interpretation of the reduced form tells us that more rain keep us from gathering. It is aligned with the sign of the coefficient from the first stage regression.
Assuming exogeneity , there’s no reverse causality. Cases of COVID-19 doesn’t affect the weather (“looking a side from global warming coming from humans”).
3.c We see in the 2SLS, we get a highly significant result, which implies a good indicative for the effect of the variable \(%Leaving Home\), on the general death of the Covid disease that is we have a causal interpretation of the coefficient.
The model predict that the number of covid cases rise approximately by 15 people (per 100,000 population) for every 1% increase in people leaving their home.
However, the magnitude might be a bit have a downward bias due to the negative correlation between rain and \(% Leaving Home\).
4.a Not really, although the relevance condition, is only “just” fulfilled.
A Haussman could be conducted to address a potential endogeneity problem. We know in general that the variance for the IV estimate is larger than OLS, and hence if the instrument is not endogenous, the OLS, would be preferable.
4.b Since only one instruments is being used (other weather conditions are not used), there has not been conducted a Sargan test, as it test the single overidentifying restriction.
The Sargan test, test for instrument exogeneity \(\mathbb E [z_i \varepsilon_i] = 0\).