causalImpact est un package R développer par Google.
library(CausalImpact)
set.seed(1234)
causalImpact utilise un prédicteur afin de déterminer l’impact réel d’une campagne (ou tout autre événement.)
data.raw <- read.delim("C:/sb-waq/R/causalImpact/data.csv")
head(data.raw, 5)
## Pageviews Conv Date
## 1 1578 7 2016-01-01
## 2 2595 10 2016-01-02
## 3 2774 15 2016-01-03
## 4 4420 18 2016-01-04
## 5 4922 12 2016-01-05
tail(data.raw, 5)
## Pageviews Conv Date
## 362 2908 17 2016-12-27
## 363 3608 15 2016-12-28
## 364 3638 20 2016-12-29
## 365 3080 13 2016-12-30
## 366 2026 10 2016-12-31
Maintenant, nous allons préparer les données pour le package CausalImpact.
pred <- data.raw[,'Pageviews'] # Nombre de visite sur la page HUB (prédicteur)
conv <- data.raw[,'Conv'] # Nombre de conversions complèté.
data.conv <- zoo(cbind(conv, pred), seq.Date(from=as.Date("2016-01-01"), to=as.Date("2016-12-31"), by=1))
head(data.conv)
## conv pred
## 2016-01-01 7 1578
## 2016-01-02 10 2595
## 2016-01-03 15 2774
## 2016-01-04 18 4420
## 2016-01-05 12 4922
## 2016-01-06 15 4275
Définir la période pré et début de campagne.
pre.period <- as.Date(c("2016-01-01", "2016-10-15")) # Période historique avant le début de la campagne.
post.period <- as.Date(c("2016-10-16", "2016-12-31")) # Période de la campagne.
Effectuer l’analyse
impact <- CausalImpact(data.conv, pre.period, post.period, model.args = list(niter = 2000, nseasons = 7))
Afficher les résultats de l’analyse causalImpact.
plot(impact)

summary(impact)
## Posterior inference {CausalImpact}
##
## Average Cumulative
## Actual 58 4440
## Prediction (s.d.) 17 (0.94) 1313 (72.20)
## 95% CI [15, 19] [1172, 1459]
##
## Absolute effect (s.d.) 41 (0.94) 3127 (72.20)
## 95% CI [39, 42] [2981, 3268]
##
## Relative effect (s.d.) 238% (5.5%) 238% (5.5%)
## 95% CI [227%, 249%] [227%, 249%]
##
## Posterior tail-area probability p: 5e-04
## Posterior prob. of a causal effect: 99.94962%
##
## For more details, type: summary(impact, "report")
summary(impact, "report")
## Analysis report {CausalImpact}
##
##
## During the post-intervention period, the response variable had an average value of approx. 57.66. By contrast, in the absence of an intervention, we would have expected an average response of 17.05. The 95% interval of this counterfactual prediction is [15.23, 18.95]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 40.61 with a 95% interval of [38.72, 42.44]. For a discussion of the significance of this effect, see below.
##
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 4.44K. By contrast, had the intervention not taken place, we would have expected a sum of 1.31K. The 95% interval of this prediction is [1.17K, 1.46K].
##
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +238%. The 95% interval of this percentage is [+227%, +249%].
##
## This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (40.61) to the original goal of the underlying intervention.
##
## The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.001). This means the causal effect can be considered statistically significant.