Casual Impact Analysis

Load data

data <- read.csv("ecom_9.csv")

Defining coalesce.na()

coalesce.na <- function(x, ...) {
  x.len <- length(x)
  ly <- list(...)
  for (y in ly) {
    y.len <- length(y)
    if (y.len == 1) {
      x[is.na(x)] <- y
    } else {
      if (x.len %% y.len != 0)
        warning('object length is not a multiple of first object length')
      pos <- which(is.na(x))
      x[pos] <- y[(pos - 1) %% y.len + 1]
    }
  }
  x
}

Summarizing the table into useful time series

summary<- data %>% 
      filter(REGION_NAME == 'PHOENIX' ) %>%  
      # group_by(REGION_NAME, .groups ) %>%
      transmute(bopus_o = coalesce.na(BOPUS_Orders,0) + coalesce.na(rest_BOPUS_Orders, 0), 
                bopus_r = coalesce.na(BOPUS_Retail,0) + coalesce.na(rest_BOPUS_Retail, 0),
                ndd_o   = coalesce.na(NDD_Orders,0) + coalesce.na(rest_NDD_Orders, 0), 
                ndd_r   = coalesce.na(NDD_Retail,0) + coalesce.na(rest_NDD_Retail,0), 
                sth_o   = coalesce.na(STH_Orders,0) + coalesce.na(rest_STH_Orders,0), 
                sth_r   = coalesce.na(STH_Retail,0) + coalesce.na(rest_STH_Retail,0)
               )

Show the data

par(cex = 1.3)
matplot(summary$bopus_r, type = "l", lwd= 3)
abline(v = 153)

# 55 for LAS VEGAS
# 51 for Bakersfield

Specify pre- and post-period

pre.period <- c(1, 152)
post.period <- c(153, 162)

Causal impact analysis

impact_br <- CausalImpact(summary$bopus_r, pre.period, post.period)
impact_bo <- CausalImpact(summary$bopus_o, pre.period, post.period)
impact_nr <- CausalImpact(summary$ndd_r, pre.period, post.period)
impact_no <- CausalImpact(summary$ndd_o, pre.period, post.period)
impact_sr <- CausalImpact(summary$sth_r, pre.period, post.period)
impact_so <- CausalImpact(summary$sth_o, pre.period, post.period)

plot(impact_br)

plot(impact_bo)

plot(impact_nr)

plot(impact_no)

plot(impact_sr)

plot(impact_so)

Printing Results

# print(impact)
print(impact_br)

## Posterior inference {CausalImpact}
## 
##                          Average          Cumulative      
## Actual                   79041            790413          
## Prediction (s.d.)        8e+04 (4587)     8e+05 (45870)   
## 95% CI                   [71634, 9e+04]   [716336, 9e+05] 
##                                                           
## Absolute effect (s.d.)   -1348 (4587)     -13482 (45870)  
## 95% CI                   [-11209, 7408]   [-112091, 74076]
##                                                           
## Relative effect (s.d.)   -1.7% (5.7%)     -1.7% (5.7%)    
## 95% CI                   [-14%, 9.2%]     [-14%, 9.2%]    
## 
## Posterior tail-area probability p:   0.37667
## Posterior prob. of a causal effect:  62%
## 
## For more details, type: summary(impact, "report")

print(impact_bo)

## Posterior inference {CausalImpact}
## 
##                          Average        Cumulative   
## Actual                   881            8812         
## Prediction (s.d.)        904 (80)       9043 (797)   
## 95% CI                   [749, 1062]    [7489, 10617]
##                                                      
## Absolute effect (s.d.)   -23 (80)       -231 (797)   
## 95% CI                   [-180, 132]    [-1805, 1323]
##                                                      
## Relative effect (s.d.)   -2.6% (8.8%)   -2.6% (8.8%) 
## 95% CI                   [-20%, 15%]    [-20%, 15%]  
## 
## Posterior tail-area probability p:   0.36777
## Posterior prob. of a causal effect:  63%
## 
## For more details, type: summary(impact, "report")

print(impact_nr)

## Posterior inference {CausalImpact}
## 
##                          Average          Cumulative     
## Actual                   27326            273261         
## Prediction (s.d.)        26838 (1736)     268378 (17356) 
## 95% CI                   [23352, 3e+04]   [233515, 3e+05]
##                                                          
## Absolute effect (s.d.)   488 (1736)       4882 (17356)   
## 95% CI                   [-2909, 3975]    [-29094, 39746]
##                                                          
## Relative effect (s.d.)   1.8% (6.5%)      1.8% (6.5%)    
## 95% CI                   [-11%, 15%]      [-11%, 15%]    
## 
## Posterior tail-area probability p:   0.39868
## Posterior prob. of a causal effect:  60%
## 
## For more details, type: summary(impact, "report")

print(impact_no)

## Posterior inference {CausalImpact}
## 
##                          Average        Cumulative  
## Actual                   289            2887        
## Prediction (s.d.)        295 (28)       2955 (280)  
## 95% CI                   [240, 353]     [2396, 3535]
##                                                     
## Absolute effect (s.d.)   -6.8 (28)      -67.6 (280) 
## 95% CI                   [-65, 49]      [-648, 491] 
##                                                     
## Relative effect (s.d.)   -2.3% (9.5%)   -2.3% (9.5%)
## 95% CI                   [-22%, 17%]    [-22%, 17%] 
## 
## Posterior tail-area probability p:   0.39291
## Posterior prob. of a causal effect:  61%
## 
## For more details, type: summary(impact, "report")

print(impact_sr)

## Posterior inference {CausalImpact}
## 
##                          Average          Cumulative      
## Actual                   45591            455915          
## Prediction (s.d.)        43284 (4071)     432845 (40710)  
## 95% CI                   [35003, 51182]   [350027, 511819]
##                                                           
## Absolute effect (s.d.)   2307 (4071)      23070 (40710)   
## 95% CI                   [-5590, 10589]   [-55904, 105888]
##                                                           
## Relative effect (s.d.)   5.3% (9.4%)      5.3% (9.4%)     
## 95% CI                   [-13%, 24%]      [-13%, 24%]     
## 
## Posterior tail-area probability p:   0.29157
## Posterior prob. of a causal effect:  71%
## 
## For more details, type: summary(impact, "report")

print(impact_so)

## Posterior inference {CausalImpact}
## 
##                          Average         Cumulative   
## Actual                   359             3594         
## Prediction (s.d.)        363 (34)        3629 (341)   
## 95% CI                   [294, 428]      [2943, 4276] 
##                                                       
## Absolute effect (s.d.)   -3.5 (34)       -34.5 (341)  
## 95% CI                   [-68, 65]       [-682, 651]  
##                                                       
## Relative effect (s.d.)   -0.95% (9.4%)   -0.95% (9.4%)
## 95% CI                   [-19%, 18%]     [-19%, 18%]  
## 
## Posterior tail-area probability p:   0.44481
## Posterior prob. of a causal effect:  56%
## 
## For more details, type: summary(impact, "report")

Summarizing the results in prose

summary(impact_bo, "report")

## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 881.20. In the absence of an intervention, we would have expected an average response of 904.34. The 95% interval of this counterfactual prediction is [748.87, 1061.68]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -23.14 with a 95% interval of [-180.48, 132.33]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 8.81K. Had the intervention not taken place, we would have expected a sum of 9.04K. The 95% interval of this prediction is [7.49K, 10.62K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -3%. The 95% interval of this percentage is [-20%, +15%].
## 
## This means that, although it may look as though the intervention has exerted a negative effect on the response variable when considering the intervention period as a whole, this effect is not statistically significant, and so cannot be meaningfully interpreted. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
## 
## The probability of obtaining this effect by chance is p = 0.368. This means the effect may be spurious and would generally not be considered statistically significant.

summary(impact_br, "report")

## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 79.04K. In the absence of an intervention, we would have expected an average response of 80.39K. The 95% interval of this counterfactual prediction is [71.63K, 90.25K]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -1.35K with a 95% interval of [-11.21K, 7.41K]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 790.41K. Had the intervention not taken place, we would have expected a sum of 803.89K. The 95% interval of this prediction is [716.34K, 902.50K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -2%. The 95% interval of this percentage is [-14%, +9%].
## 
## This means that, although it may look as though the intervention has exerted a negative effect on the response variable when considering the intervention period as a whole, this effect is not statistically significant, and so cannot be meaningfully interpreted. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
## 
## The probability of obtaining this effect by chance is p = 0.377. This means the effect may be spurious and would generally not be considered statistically significant.

summary(impact_no, "report")

## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 288.70. In the absence of an intervention, we would have expected an average response of 295.46. The 95% interval of this counterfactual prediction is [239.57, 353.47]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -6.76 with a 95% interval of [-64.77, 49.13]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 2.89K. Had the intervention not taken place, we would have expected a sum of 2.95K. The 95% interval of this prediction is [2.40K, 3.53K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -2%. The 95% interval of this percentage is [-22%, +17%].
## 
## This means that, although it may look as though the intervention has exerted a negative effect on the response variable when considering the intervention period as a whole, this effect is not statistically significant, and so cannot be meaningfully interpreted. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
## 
## The probability of obtaining this effect by chance is p = 0.393. This means the effect may be spurious and would generally not be considered statistically significant.

summary(impact_nr, "report")

## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 27.33K. In the absence of an intervention, we would have expected an average response of 26.84K. The 95% interval of this counterfactual prediction is [23.35K, 30.24K]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 0.49K with a 95% interval of [-2.91K, 3.97K]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 273.26K. Had the intervention not taken place, we would have expected a sum of 268.38K. The 95% interval of this prediction is [233.52K, 302.35K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +2%. The 95% interval of this percentage is [-11%, +15%].
## 
## This means that, although the intervention appears to have caused a positive effect, this effect is not statistically significant when considering the entire post-intervention period as a whole. Individual days or shorter stretches within the intervention period may of course still have had a significant effect, as indicated whenever the lower limit of the impact time series (lower plot) was above zero. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
## 
## The probability of obtaining this effect by chance is p = 0.399. This means the effect may be spurious and would generally not be considered statistically significant.

summary(impact_so, "report")

## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 359.40. In the absence of an intervention, we would have expected an average response of 362.85. The 95% interval of this counterfactual prediction is [294.28, 427.56]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -3.45 with a 95% interval of [-68.16, 65.12]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 3.59K. Had the intervention not taken place, we would have expected a sum of 3.63K. The 95% interval of this prediction is [2.94K, 4.28K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of -1%. The 95% interval of this percentage is [-19%, +18%].
## 
## This means that, although it may look as though the intervention has exerted a negative effect on the response variable when considering the intervention period as a whole, this effect is not statistically significant, and so cannot be meaningfully interpreted. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
## 
## The probability of obtaining this effect by chance is p = 0.445. This means the effect may be spurious and would generally not be considered statistically significant.

summary(impact_sr, "report")

## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 45.59K. In the absence of an intervention, we would have expected an average response of 43.28K. The 95% interval of this counterfactual prediction is [35.00K, 51.18K]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 2.31K with a 95% interval of [-5.59K, 10.59K]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 455.91K. Had the intervention not taken place, we would have expected a sum of 432.84K. The 95% interval of this prediction is [350.03K, 511.82K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +5%. The 95% interval of this percentage is [-13%, +24%].
## 
## This means that, although the intervention appears to have caused a positive effect, this effect is not statistically significant when considering the entire post-intervention period as a whole. Individual days or shorter stretches within the intervention period may of course still have had a significant effect, as indicated whenever the lower limit of the impact time series (lower plot) was above zero. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
## 
## The probability of obtaining this effect by chance is p = 0.292. This means the effect may be spurious and would generally not be considered statistically significant.