causalImpact - Mesure de l’impact (lift) d’une campagne

causalImpact est un package R développer par Google.

library(CausalImpact)
set.seed(1234)

causalImpact utilise un prédicteur afin de déterminer l’impact réel d’une campagne (ou tout autre événement.)

data.raw <- read.delim("C:/sb-waq/R/causalImpact/data.csv")
head(data.raw, 5)

##   Pageviews Conv       Date
## 1      1578    7 2016-01-01
## 2      2595   10 2016-01-02
## 3      2774   15 2016-01-03
## 4      4420   18 2016-01-04
## 5      4922   12 2016-01-05

tail(data.raw, 5)

##     Pageviews Conv       Date
## 362      2908   17 2016-12-27
## 363      3608   15 2016-12-28
## 364      3638   20 2016-12-29
## 365      3080   13 2016-12-30
## 366      2026   10 2016-12-31

Maintenant, nous allons préparer les données pour le package CausalImpact.

pred <- data.raw[,'Pageviews'] # Nombre de visite sur la page HUB (prédicteur)
conv <- data.raw[,'Conv'] # Nombre de conversions complèté.
data.conv <- zoo(cbind(conv, pred), seq.Date(from=as.Date("2016-01-01"), to=as.Date("2016-12-31"), by=1))
head(data.conv)

##            conv pred
## 2016-01-01    7 1578
## 2016-01-02   10 2595
## 2016-01-03   15 2774
## 2016-01-04   18 4420
## 2016-01-05   12 4922
## 2016-01-06   15 4275

Définir la période pré et début de campagne.

pre.period <- as.Date(c("2016-01-01", "2016-10-15")) # Période historique avant le début de la campagne.
post.period <- as.Date(c("2016-10-16", "2016-12-31")) # Période de la campagne.

Effectuer l’analyse

impact <- CausalImpact(data.conv, pre.period, post.period, model.args = list(niter = 2000, nseasons = 7))

Afficher les résultats de l’analyse causalImpact.

plot(impact)

summary(impact)

## Posterior inference {CausalImpact}
## 
##                          Average        Cumulative  
## Actual                   58             4440        
## Prediction (s.d.)        17 (0.94)      1313 (72.20)
## 95% CI                   [15, 19]       [1172, 1459]
##                                                     
## Absolute effect (s.d.)   41 (0.94)      3127 (72.20)
## 95% CI                   [39, 42]       [2981, 3268]
##                                                     
## Relative effect (s.d.)   238% (5.5%)    238% (5.5%) 
## 95% CI                   [227%, 249%]   [227%, 249%]
## 
## Posterior tail-area probability p:   5e-04
## Posterior prob. of a causal effect:  99.94962%
## 
## For more details, type: summary(impact, "report")

summary(impact, "report")

## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 57.66. By contrast, in the absence of an intervention, we would have expected an average response of 17.05. The 95% interval of this counterfactual prediction is [15.23, 18.95]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 40.61 with a 95% interval of [38.72, 42.44]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 4.44K. By contrast, had the intervention not taken place, we would have expected a sum of 1.31K. The 95% interval of this prediction is [1.17K, 1.46K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +238%. The 95% interval of this percentage is [+227%, +249%].
## 
## This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (40.61) to the original goal of the underlying intervention.
## 
## The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.001). This means the causal effect can be considered statistically significant.