Causal Impact using Bayesian Structural Time-Series Models

Suppose we like to investigate if an intervention in some process or experiment made an impact,causal impact which relies on bayesian structural models provide the tool to accomplish this task. We have a series which starts in january with record of the number of vehicles which are tested at a vehicle testing site. Suppose the recording of these vehicles in a database was not accurately recorded until September 25,2017. We would like to investigate whether the intervention in September made any significant impact on forecasting future numbers vehicles which will be tested.

We load the required packages at this step.

pacman::p_load(CausalImpact,bsts,rio,imputeTS)

Load the data at this step.

Data<- rio::import("/Users/nanaakwasiabayieboateng/Documents/memphisclassesbooks/Time series/Manpower/actvsfcdatajuly311.xlsx",sheet="Sheet4")

Data1<-Data%>%filter(Date<"2017-09-25")
Data2<-Data%>%filter(Date>"2017-09-25")

Impute the missing values.

Data_ts=Data1%>%tk_zoo()
plotNA.gapsize(Data_ts)

statsNA(Data_ts)

[1] "Length of time series:"
[1] 91
[1] "-------------------------"
[1] "Number of Missing Values:"
[1] 4
[1] "-------------------------"
[1] "Percentage of Missing Values:"
[1] "4.4%"
[1] "-------------------------"
[1] "Stats for Bins"
[1] "  Bin 1 (23 values from 1 to 23) :      1 NAs (4.35%)"
[1] "  Bin 2 (23 values from 24 to 46) :      0 NAs (0%)"
[1] "  Bin 3 (23 values from 47 to 69) :      3 NAs (13%)"
[1] "  Bin 4 (22 values from 70 to 91) :      0 NAs (0%)"
[1] "-------------------------"
[1] "Longest NA gap (series of consecutive NAs)"
[1] "2 in a row"
[1] "-------------------------"
[1] "Most frequent gap size (series of consecutive NA series)"
[1] "1 NA in a row (occuring 2 times)"
[1] "-------------------------"
[1] "Gap size accounting for most NAs"
[1] "2 NA in a row (occuring 1 times, making up for overall 2 NAs)"
[1] "-------------------------"
[1] "Overview NA series"
[1] "  1 NA in a row: 2 times"
[1] "  2 NA in a row: 1 times"

plotNA.distributionBar(Data_ts)

plotNA.distribution(Data_ts)

Data$Actual[which(is.na(Data$Actual))]

[1] NA NA NA NA

#
#Uses Kalman Smoothing on structural time series models (or on the state space representation of an arima model) for imputation
Data_ts=na.kalman(Data_ts)
#plotNA.imputations(Data_ts)

Data1_ts=Data1%>%tk_zoo()
Data2_ts=Data2%>%tk_zoo()
Data_ts<-rbind(Data1_ts,Data2_ts)
Data1_ts=na.kalman(Data1_ts)

# fitting weekly data. number of seasons is 52 
ss <- AddSemilocalLinearTrend(list(), Data1_ts)
ss <- AddSeasonal(ss, Data1_ts, nseasons = 52)
modeld <- bsts(Data1_ts, state.specification = ss, niter = 1000)

=-=-=-=-= Iteration 0 Tue Jan  9 15:26:17 2018 =-=-=-=-=
=-=-=-=-= Iteration 100 Tue Jan  9 15:26:18 2018 =-=-=-=-=
=-=-=-=-= Iteration 200 Tue Jan  9 15:26:18 2018 =-=-=-=-=
=-=-=-=-= Iteration 300 Tue Jan  9 15:26:19 2018 =-=-=-=-=
=-=-=-=-= Iteration 400 Tue Jan  9 15:26:19 2018 =-=-=-=-=
=-=-=-=-= Iteration 500 Tue Jan  9 15:26:20 2018 =-=-=-=-=
=-=-=-=-= Iteration 600 Tue Jan  9 15:26:20 2018 =-=-=-=-=
=-=-=-=-= Iteration 700 Tue Jan  9 15:26:21 2018 =-=-=-=-=
=-=-=-=-= Iteration 800 Tue Jan  9 15:26:21 2018 =-=-=-=-=
=-=-=-=-= Iteration 900 Tue Jan  9 15:26:22 2018 =-=-=-=-=

burn <- SuggestBurn(0.1, modeld)
predd <- predict(modeld, horizon = 15, burn = burn, quantiles = c(.025, .975))

pred1 <- predict(modeld, horizon = 15,quantiles = c(.025, .975))
plot(predd, plot.original = 156)

pre.period <- c(1, 91)
post.period <- c(92, 106)
impact <- CausalImpact(Data$Actual,pre.period,
                       post.period)

# Print and plot results
summary(impact)

Posterior inference {CausalImpact}

                         Average      Cumulative
Actual                   52           775       
Prediction (s.d.)        30 (1.2)     446 (18.4)
95% CI                   [27, 32]     [409, 482]
                                                
Absolute effect (s.d.)   22 (1.2)     329 (18.4)
95% CI                   [20, 24]     [293, 366]
                                                
Relative effect (s.d.)   74% (4.1%)   74% (4.1%)
95% CI                   [66%, 82%]   [66%, 82%]

Posterior tail-area probability p:   0.001
Posterior prob. of a causal effect:  99.8999%

For more details, type: summary(impact, "report")

summary(impact, "report")

Analysis report {CausalImpact}


During the post-intervention period, the response variable had an average value of approx. 51.67. By contrast, in the absence of an intervention, we would have expected an average response of 29.70. The 95% interval of this counterfactual prediction is [27.29, 32.13]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is 21.96 with a 95% interval of [19.53, 24.38]. For a discussion of the significance of this effect, see below.

Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 775.00. By contrast, had the intervention not taken place, we would have expected a sum of 445.57. The 95% interval of this prediction is [409.34, 482.00].

The above results are given in terms of absolute numbers. In relative terms, the response variable showed an increase of +74%. The 95% interval of this percentage is [+66%, +82%].

This means that the positive effect observed during the intervention period is statistically significant and unlikely to be due to random fluctuations. It should be noted, however, that the question of whether this increase also bears substantive significance can only be answered by comparing the absolute effect (21.96) to the original goal of the underlying intervention.

The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.001). This means the causal effect can be considered statistically significant.

plot(impact)

plot(impact, "original")

#plot(impact$model$bsts.model, "coefficients")

# Example 3
  #
  # For full flexibility, specify a custom model and pass the
  # fitted model to CausalImpact(). To run this example, run
  # the code for Example 1 first.
Data=Data%>%drop_na()
pre.period <- c(1, 87)
post.period <- c(88, 102)
y=Data$Actual
x1=Data$Date
  post.period.response <- y[post.period[1] : post.period[2]]
  y[post.period[1] : post.period[2]] <- NA
  ss <- AddLocalLevel(list(), y)
  bsts.model <- bsts(y ~ x1, ss, niter = 1000)

=-=-=-=-= Iteration 0 Tue Jan  9 15:26:25 2018 =-=-=-=-=
=-=-=-=-= Iteration 100 Tue Jan  9 15:26:25 2018 =-=-=-=-=
=-=-=-=-= Iteration 200 Tue Jan  9 15:26:25 2018 =-=-=-=-=
=-=-=-=-= Iteration 300 Tue Jan  9 15:26:25 2018 =-=-=-=-=
=-=-=-=-= Iteration 400 Tue Jan  9 15:26:25 2018 =-=-=-=-=
=-=-=-=-= Iteration 500 Tue Jan  9 15:26:25 2018 =-=-=-=-=
=-=-=-=-= Iteration 600 Tue Jan  9 15:26:25 2018 =-=-=-=-=
=-=-=-=-= Iteration 700 Tue Jan  9 15:26:26 2018 =-=-=-=-=
=-=-=-=-= Iteration 800 Tue Jan  9 15:26:26 2018 =-=-=-=-=
=-=-=-=-= Iteration 900 Tue Jan  9 15:26:26 2018 =-=-=-=-=

  impact <- CausalImpact(bsts.model = bsts.model,
                         post.period.response = post.period.response)
plot(impact)

LS0tCnRpdGxlOiAiQ2F1c2FsIEltcGFjdCB1c2luZyBCYXllc2lhbiBTdHJ1Y3R1cmFsIFRpbWUtU2VyaWVzIE1vZGVscyIKb3V0cHV0OiBodG1sX25vdGVib29rCmF1dGhvcjogTmFuYSBCb2F0ZW5nCmRmX3ByaW50OiBwYWdlZApUaW1lOiAnYHIgU3lzLnRpbWUoKWAnCmRhdGU6ICJgciBmb3JtYXQoU3lzLnRpbWUoKSwgJyVCICVkLCAlWScpYCIKLS0tCgpTdXBwb3NlIHdlIGxpa2UgdG8gaW52ZXN0aWdhdGUgaWYgYW4gaW50ZXJ2ZW50aW9uIGluIHNvbWUgcHJvY2VzcyBvciBleHBlcmltZW50IG1hZGUgYW4gIGltcGFjdCxjYXVzYWwgaW1wYWN0IHdoaWNoIHJlbGllcyBvbiBiYXllc2lhbiBzdHJ1Y3R1cmFsIG1vZGVscyBwcm92aWRlIHRoZSB0b29sIHRvIGFjY29tcGxpc2ggdGhpcyB0YXNrLiBXZSBoYXZlIGEgc2VyaWVzIHdoaWNoIHN0YXJ0cyBpbiBqYW51YXJ5IHdpdGggcmVjb3JkIG9mIHRoZSBudW1iZXIgb2YgdmVoaWNsZXMgd2hpY2ggYXJlIHRlc3RlZCBhdCBhIHZlaGljbGUgdGVzdGluZyBzaXRlLiBTdXBwb3NlIHRoZSByZWNvcmRpbmcgb2YgdGhlc2UgdmVoaWNsZXMgaW4gYSBkYXRhYmFzZSB3YXMgbm90IGFjY3VyYXRlbHkgcmVjb3JkZWQgdW50aWwgU2VwdGVtYmVyIDI1LDIwMTcuIFdlIHdvdWxkIGxpa2UgdG8gaW52ZXN0aWdhdGUgd2hldGhlciB0aGUgaW50ZXJ2ZW50aW9uIGluIFNlcHRlbWJlciBtYWRlIGFueSBzaWduaWZpY2FudCBpbXBhY3Qgb24gZm9yZWNhc3RpbmcgZnV0dXJlIG51bWJlcnMgdmVoaWNsZXMgd2hpY2ggd2lsbCBiZSB0ZXN0ZWQuCgpgYGB7ciBzZXR1cCxpbmNsdWRlPUZBTFNFfQoKa25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLAogICAgICAgICAgICAgICAgICAgICAgd2FybmluZyA9IEZBTFNFLAogICAgICAgICAgICAgICAgICAgICAgb3V0LndpZHRoID0iMTAwJSIsCiAgICAgICAgICAgICAgICAgICAgICBtZXNzYWdlID0gRkFMU0UsCiAgICAgICAgICAgICAgICAgICAgICBmaWcuYWxpZ24gPSAnZGVmYXVsdCcsIAogICAgICAgICAgICAgICAgICAgICAgd2FybmluZyA9IEZBTFNFLCAKICAgICAgICAgICAgICAgICAgICAgIGZpZy5jYXAgPSJGaWcuIDMwIiwgCiAgICAgICAgICAgICAgICAgICAgICBvdXQud2lkdGg9IjEwMCUiKQoKYGBgCgoKCldlIGxvYWQgdGhlIHJlcXVpcmVkIHBhY2thZ2VzIGF0IHRoaXMgc3RlcC4KYGBge3J9CnBhY21hbjo6cF9sb2FkKENhdXNhbEltcGFjdCxic3RzLHJpbyxpbXB1dGVUUykKYGBgCgpMb2FkIHRoZSBkYXRhIGF0IHRoaXMgc3RlcC4KYGBge3J9CkRhdGE8LSByaW86OmltcG9ydCgiL1VzZXJzL25hbmFha3dhc2lhYmF5aWVib2F0ZW5nL0RvY3VtZW50cy9tZW1waGlzY2xhc3Nlc2Jvb2tzL1RpbWUgc2VyaWVzL01hbnBvd2VyL2FjdHZzZmNkYXRhanVseTMxMS54bHN4IixzaGVldD0iU2hlZXQ0IikKCmBgYAoKCgpgYGB7cn0KCgpEYXRhMTwtRGF0YSU+JWZpbHRlcihEYXRlPCIyMDE3LTA5LTI1IikKCgpEYXRhMjwtRGF0YSU+JWZpbHRlcihEYXRlPiIyMDE3LTA5LTI1IikKCmBgYAoKCgpJbXB1dGUgdGhlIG1pc3NpbmcgdmFsdWVzLgpgYGB7cn0KRGF0YV90cz1EYXRhMSU+JXRrX3pvbygpCgoKcGxvdE5BLmdhcHNpemUoRGF0YV90cykKCgpzdGF0c05BKERhdGFfdHMpCgoKcGxvdE5BLmRpc3RyaWJ1dGlvbkJhcihEYXRhX3RzKQoKcGxvdE5BLmRpc3RyaWJ1dGlvbihEYXRhX3RzKQoKRGF0YSRBY3R1YWxbd2hpY2goaXMubmEoRGF0YSRBY3R1YWwpKV0KCiMKI1VzZXMgS2FsbWFuIFNtb290aGluZyBvbiBzdHJ1Y3R1cmFsIHRpbWUgc2VyaWVzIG1vZGVscyAob3Igb24gdGhlIHN0YXRlIHNwYWNlIHJlcHJlc2VudGF0aW9uIG9mIGFuIGFyaW1hIG1vZGVsKSBmb3IgaW1wdXRhdGlvbgoKRGF0YV90cz1uYS5rYWxtYW4oRGF0YV90cykKCgojcGxvdE5BLmltcHV0YXRpb25zKERhdGFfdHMpCmBgYAoKCgpgYGB7cn0KRGF0YTFfdHM9RGF0YTElPiV0a196b28oKQoKRGF0YTJfdHM9RGF0YTIlPiV0a196b28oKQoKRGF0YV90czwtcmJpbmQoRGF0YTFfdHMsRGF0YTJfdHMpCgpEYXRhMV90cz1uYS5rYWxtYW4oRGF0YTFfdHMpCgoKYGBgCgoKCgoKYGBge3J9CiMgZml0dGluZyB3ZWVrbHkgZGF0YS4gbnVtYmVyIG9mIHNlYXNvbnMgaXMgNTIgCgoKc3MgPC0gQWRkU2VtaWxvY2FsTGluZWFyVHJlbmQobGlzdCgpLCBEYXRhMV90cykKc3MgPC0gQWRkU2Vhc29uYWwoc3MsIERhdGExX3RzLCBuc2Vhc29ucyA9IDUyKQptb2RlbGQgPC0gYnN0cyhEYXRhMV90cywgc3RhdGUuc3BlY2lmaWNhdGlvbiA9IHNzLCBuaXRlciA9IDEwMDApCgpidXJuIDwtIFN1Z2dlc3RCdXJuKDAuMSwgbW9kZWxkKQpwcmVkZCA8LSBwcmVkaWN0KG1vZGVsZCwgaG9yaXpvbiA9IDE1LCBidXJuID0gYnVybiwgcXVhbnRpbGVzID0gYyguMDI1LCAuOTc1KSkKCgpgYGAKCgpgYGB7cn0KcHJlZDEgPC0gcHJlZGljdChtb2RlbGQsIGhvcml6b24gPSAxNSxxdWFudGlsZXMgPSBjKC4wMjUsIC45NzUpKQpwbG90KHByZWRkLCBwbG90Lm9yaWdpbmFsID0gMTU2KQpgYGAKCgoKCmBgYHtyfQpwcmUucGVyaW9kIDwtIGMoMSwgOTEpCnBvc3QucGVyaW9kIDwtIGMoOTIsIDEwNikKaW1wYWN0IDwtIENhdXNhbEltcGFjdChEYXRhJEFjdHVhbCxwcmUucGVyaW9kLAogICAgICAgICAgICAgICAgICAgICAgIHBvc3QucGVyaW9kKQpgYGAKCgoKYGBge3J9CgojIFByaW50IGFuZCBwbG90IHJlc3VsdHMKc3VtbWFyeShpbXBhY3QpCnN1bW1hcnkoaW1wYWN0LCAicmVwb3J0IikKcGxvdChpbXBhY3QpCnBsb3QoaW1wYWN0LCAib3JpZ2luYWwiKQojcGxvdChpbXBhY3QkbW9kZWwkYnN0cy5tb2RlbCwgImNvZWZmaWNpZW50cyIpCmBgYAoKCgoKYGBge3J9CiMgRXhhbXBsZSAzCiAgIwogICMgRm9yIGZ1bGwgZmxleGliaWxpdHksIHNwZWNpZnkgYSBjdXN0b20gbW9kZWwgYW5kIHBhc3MgdGhlCiAgIyBmaXR0ZWQgbW9kZWwgdG8gQ2F1c2FsSW1wYWN0KCkuIFRvIHJ1biB0aGlzIGV4YW1wbGUsIHJ1bgogICMgdGhlIGNvZGUgZm9yIEV4YW1wbGUgMSBmaXJzdC4KRGF0YT1EYXRhJT4lZHJvcF9uYSgpCnByZS5wZXJpb2QgPC0gYygxLCA4NykKcG9zdC5wZXJpb2QgPC0gYyg4OCwgMTAyKQoKeT1EYXRhJEFjdHVhbAp4MT1EYXRhJERhdGUKICBwb3N0LnBlcmlvZC5yZXNwb25zZSA8LSB5W3Bvc3QucGVyaW9kWzFdIDogcG9zdC5wZXJpb2RbMl1dCiAgeVtwb3N0LnBlcmlvZFsxXSA6IHBvc3QucGVyaW9kWzJdXSA8LSBOQQogIHNzIDwtIEFkZExvY2FsTGV2ZWwobGlzdCgpLCB5KQogIGJzdHMubW9kZWwgPC0gYnN0cyh5IH4geDEsIHNzLCBuaXRlciA9IDEwMDApCiAgaW1wYWN0IDwtIENhdXNhbEltcGFjdChic3RzLm1vZGVsID0gYnN0cy5tb2RlbCwKICAgICAgICAgICAgICAgICAgICAgICAgIHBvc3QucGVyaW9kLnJlc3BvbnNlID0gcG9zdC5wZXJpb2QucmVzcG9uc2UpCnBsb3QoaW1wYWN0KQpgYGAK

Causal Impact using Bayesian Structural Time-Series Models

Nana Boateng

January 09, 2018