Developed by Brodersen KH in 2015, the package CausalImpact implements an approach to estimating the causal effect of a designed intervention on a time series.
Given a response time series (e.g., clicks) and a set of control time series (e.g., clicks in non-affected markets or clicks on other sites), the package constructs a Bayesian structural time-series model. This model is then used to try and predict the counterfactual, i.e., how the response metric would have evolved after the intervention if the intervention had never occurred.
Reference: Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL. Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics, 2015, Vol. 9, No. 1, 247-274. http://research.google.com/pubs/pub41854.html
For the demonstration purpose, we apply the analysis on a simulated clinical trial in which the efficiency of a preventer inhaler is evaluated in a patient with Asthma. The main outcome (Peakflow) was daily monitored during 2 months (60 days) before and after the beginning of treatment.
The study question is whether the use of preventer inhaler from day 61 has significantly improved the Peakflow during the treatment. In other word, we should make a causal inference about the beneficial effect of treatment on Peakflow.
We can visualize the generated data
library(tidyverse)
set.seed(123)
peakflow=c(round(rnorm(60,160,25)+arima.sim(model=list(ar=0.3),n=60),0),
round(rnorm(60,350,40)+arima.sim(model=list(ar=0.3),n=60),0))
df=as.data.frame(peakflow)%>%mutate(.,SBP=round(rnorm(120,100,2)+arima.sim(model=list(ar=0.3),n=120),0),
Treatment=c(rep("Before",59),rep("After",61)))
df%>%ggplot(aes(x=c(1:120),color=Treatment,group=1))+
geom_rect(xmin=0,xmax=60,ymin=min(df$peakflow[1:60]),
ymax=max(df$peakflow[1:60]),fill="red",alpha=0.005,color=NA)+
geom_rect(xmin=61,xmax=120,ymin=max(df$peakflow[1:60]),
ymax=max(df$peakflow[61:120]),fill="skyblue",alpha=0.01,color=NA)+
geom_path(aes(y=peakflow),linetype=1,size=1)+
scale_x_continuous("Following up time (Day)")+
scale_y_continuous("Value",breaks=c(0,100,150,200,250,300,350,400,450,500))+
geom_vline(xintercept=60,size=1,linetype=2)+
theme_bw()+scale_color_manual(values=c("blue3","red3"))
https://google.github.io/CausalImpact/
To estimate a causal effect, we begin by specifying which period in the data should be used for training the model (pre-intervention period) and which period for computing a counterfactual prediction (post-intervention period).
As the treatment began from the day 61, those 2 periods are respectively (1 to 60) and (31 to 120), meaning that the first 60 time points will be used for model training, and time points 61 to 120 will be used for computing predictions
By default, the result plot contains three panels. The first panel shows the data and a counterfactual prediction for the post-treatment period. The second panel shows the difference between observed data and counterfactual predictions. This is the pointwise causal effect, as estimated by the model. The third panel adds up the pointwise contributions from the second panel, resulting in a plot of the cumulative effect of the intervention.
You can imagine that we could travel into the future 2 times and observe the consequences of either using and not using the preventer inhaler on Peakflow (Such observation is impossible in any conventional experiement design, simply because one patient can never be in Placebo group and treatment group at the same time).
library(CausalImpact)
model1=CausalImpact(df[,-3], c(1,60), c(61,120))
summary(model1)
## Posterior inference {CausalImpact}
##
## Average Cumulative
## Actual 349 20915
## Prediction (s.d.) 162 (3.7) 9701 (219.6)
## 95% CI [155, 169] [9279, 10143]
##
## Absolute effect (s.d.) 187 (3.7) 11214 (219.6)
## 95% CI [180, 194] [10772, 11636]
##
## Relative effect (s.d.) 116% (2.3%) 116% (2.3%)
## 95% CI [111%, 120%] [111%, 120%]
##
## Posterior tail-area probability p: 0.001
## Posterior prob. of a causal effect: 99.9%
##
## For more details, type: summary(impact, "report")
plot(model1)
The result could be interpreted as follows:
summary(model1, "report")
## Analysis report {CausalImpact}
##
##
## During the post-intervention period, the response variable had an average value of approx. 348.58. By contrast, in the absence of an intervention, we would have expected an average response of 161.68. The 95% interval of this counterfactual prediction is [154.64, 169.05]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 186.90 with a 95% interval of [179.53, 193.94]. For a discussion of the significance of this effect, see below.
##
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 20.91K. By contrast, had the intervention not taken place, we would have expected a sum of 9.70K. The 95% interval of this prediction is [9.28K, 10.14K].
##
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +116%. The 95% interval of this percentage is [+111%, +120%].
##
## This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (186.90) to the original goal of the underlying intervention.
##
## The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.001). This means the causal effect can be considered statistically significant.
The Bayesian structural time-series model is a useful method for causal inference on Time series. This method could be used for the clinical trial where the outcome is monitored during a long period or when the outcome consists of a physiological signal being recorded as time series (Holter ECG, blood pressure, peakflow or glycemia). This method could also be considered as alternative solution for Survival analysis as we could also make causal inference about cumulative events (COPD exacerbations or symptoms).