R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

## Registered S3 method overwritten by 'xts':
##   method     from
##   as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Registered S3 methods overwritten by 'forecast':
##   method             from    
##   fitted.fracdiff    fracdiff
##   residuals.fracdiff fracdiff
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: bsts
## Loading required package: BoomSpikeSlab
## Loading required package: Boom
## Loading required package: MASS
## 
## Attaching package: 'Boom'
## The following object is masked from 'package:stats':
## 
##     rWishart
## Loading required package: xts
## 
## Attaching package: 'bsts'
## The following object is masked from 'package:BoomSpikeSlab':
## 
##     SuggestBurn
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:reshape2':
## 
##     smiths

Import Scan Data Sets

##   Order.Date                DMA Scans
## 1     7/1/19 ABILENE-SWEETWATER     2
## 2     7/2/19 ABILENE-SWEETWATER     3
## 3     7/5/19 ABILENE-SWEETWATER     1
## 4     7/6/19 ABILENE-SWEETWATER     2
## 5     7/9/19 ABILENE-SWEETWATER    10
## 6    7/10/19 ABILENE-SWEETWATER     4

to find control markets you can use the “market matching” package.

Data must be in long format with markets stacked in one column. Dates must be in %Y-%m-%d format

## 22 markets were not matched with CHICAGO due to insufficient data or no variance.
##       DMA                BestControl RelativeDistance Correlation Length
## 1 CHICAGO                    ATLANTA        0.2709584   0.5705663     47
## 2 CHICAGO           DALLAS-FT. WORTH        0.2880135   0.5494348     47
## 3 CHICAGO               PHILADELPHIA        0.3033870   0.4353674     47
## 4 CHICAGO       MIAMI-FT. LAUDERDALE        0.3355753   0.3908819     47
## 5 CHICAGO SAN FRANCISCO-OAK-SAN JOSE        0.3677636   0.4682648     47
##   MatchingStartDate MatchingEndDate rank
## 1        2019-07-01      2019-08-16    1
## 2        2019-07-01      2019-08-16    2
## 3        2019-07-01      2019-08-16    3
## 4        2019-07-01      2019-08-16    4
## 5        2019-07-01      2019-08-16    5

From the above you can see the best control markets. I will use the top 2-3 but will play around with the code to see what gives me the best read.

Causal Impact Study

Important to note that the causal impact package requires that the market you’re looking to get a read on must be in the most right column of your data frame. I typically do this somewhat manually i.e. I create a data frame that is subset from my original wide data set. This is a data frame with the market you want to test on the right and all the control markets you identified from market matching on the right.

subset wide data to just be the market you want to test and the control market

##   CHICAGO ATLANTA DALLAS.FT..WORTH PHILADELPHIA
## 1      58      56              113           41
## 2      57      49               75           41
## 3      99      83               84           58
## 4     101      83              152           78
## 5      86      73               62           67
## 6     101      85              126           81

append on dates and subset what your pre-period and post period is##

##            CHICAGO ATLANTA DALLAS.FT..WORTH PHILADELPHIA
## 2019-06-01      58      56              113           41
## 2019-06-02      57      49               75           41
## 2019-06-03      99      83               84           58
## 2019-06-04     101      83              152           78
## 2019-06-05      86      73               62           67
## 2019-06-06     101      85              126           81

Pre.period is the time period start to finish before the intervention happened

Post.period is the time period after the intervention that you will see if there was uplift compared to expected

run causal impact analysis

## Warning: Removed 136 rows containing missing values (geom_path).
## Warning: Removed 22 rows containing missing values (geom_path).
## Warning: Removed 272 rows containing missing values (geom_path).

## Posterior inference {CausalImpact}
## 
##                          Average            Cumulative      
## Actual                   81                 3391            
## Prediction (s.d.)        88 (4.2)           3693 (174.9)    
## 95% CI                   [81, 96]           [3391, 4032]    
##                                                             
## Absolute effect (s.d.)   -7.2 (4.2)         -301.8 (174.9)  
## 95% CI                   [-15, -0.0045]     [-641, -0.1881] 
##                                                             
## Relative effect (s.d.)   -8.2% (4.7%)       -8.2% (4.7%)    
## 95% CI                   [-17%, -0.0051%]   [-17%, -0.0051%]
## 
## Posterior tail-area probability p:   0.026
## Posterior prob. of a causal effect:  97.4%
## 
## For more details, type: summary(impact, "report")
## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 80.74. By contrast, in the absence of an intervention, we would have expected an average response of 87.92. The 95% interval of this counterfactual prediction is [80.74, 96.00]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -7.19 with a 95% interval of [-15.26, -0.0045]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 3.39K. By contrast, had the intervention not taken place, we would have expected a sum of 3.69K. The 95% interval of this prediction is [3.39K, 4.03K].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of-8%. The 95% interval of this percentage is [-17%, -0%].
## 
## This means that the negative effect observed during the intervention period is statistically significant. If the experimenter had expected a positive effect, it is recommended to double-check whether anomalies in the control variables may have caused an overly optimistic expectation of what should have happened in the response variable in the absence of the intervention.
## 
## The probability of obtaining this effect by chance is very small (Bayesian one-sided tail-area probability p = 0.026). This means the causal effect can be considered statistically significant.

Import Kits Data Sets

##   Order.Date                DMA Kits
## 1     6/1/19 ABILENE-SWEETWATER    1
## 2     6/4/19 ABILENE-SWEETWATER    0
## 3     6/5/19 ABILENE-SWEETWATER    0
## 4     6/7/19 ABILENE-SWEETWATER    2
## 5    6/10/19 ABILENE-SWEETWATER    0
## 6    6/12/19 ABILENE-SWEETWATER    3

to find control markets you can use the “market matching” package.

Data must be in long format with markets stacked in one column. Dates must be in %Y-%m-%d format

## 6 markets were not matched with CHICAGO due to insufficient data or no variance.
##       DMA                BestControl RelativeDistance Correlation Length
## 1 CHICAGO                  BALTIMORE        0.5571429   0.2915523     42
## 2 CHICAGO        BOSTON (MANCHESTER)        0.5785714   0.3222589     42
## 3 CHICAGO               PHILADELPHIA        0.5857143   0.3616040     42
## 4 CHICAGO                    ATLANTA        0.6071429   0.1854190     42
## 5 CHICAGO SAN FRANCISCO-OAK-SAN JOSE        0.6214286   0.3609103     42
##   MatchingStartDate MatchingEndDate rank
## 1        2019-07-01      2019-08-11    1
## 2        2019-07-01      2019-08-11    2
## 3        2019-07-01      2019-08-11    3
## 4        2019-07-01      2019-08-11    4
## 5        2019-07-01      2019-08-11    5

From the above you can see the best control markets. I will use the top 2-3 but will play around with the code to see what gives me the best read. These matches are much worse than the scans matches.

Causal Impact Study

Important to note that the causal impact package requires that the market you’re looking to get a read on must be in the most right column of your data frame. I typically do this somewhat manually i.e. I create a data frame that is subset from my original wide data set. This is a data frame with the market you want to test on the right and all the control markets you identified from market matching on the right.

subset wide data to just be the market you want to test and the control market

##   CHICAGO BALTIMORE BOSTON..MANCHESTER. PHILADELPHIA ATLANTA
## 1       3         3                   7            3       1
## 2       1         2                   5            2       2
## 3       4         2                   3            7       1
## 4      10         1                   7            2       5
## 5       2         0                   1            0       7
## 6       6         2                   5            8       3
##   SAN.FRANCISCO.OAK.SAN.JOSE
## 1                          1
## 2                          3
## 3                          4
## 4                          2
## 5                          0
## 6                          7

append on dates and subset what your pre-period and post period is##

##            CHICAGO BALTIMORE BOSTON..MANCHESTER. PHILADELPHIA ATLANTA
## 2019-06-01       3         3                   7            3       1
## 2019-06-02       1         2                   5            2       2
## 2019-06-03       4         2                   3            7       1
## 2019-06-04      10         1                   7            2       5
## 2019-06-05       2         0                   1            0       7
## 2019-06-06       6         2                   5            8       3
##            SAN.FRANCISCO.OAK.SAN.JOSE
## 2019-06-01                          1
## 2019-06-02                          3
## 2019-06-03                          4
## 2019-06-04                          2
## 2019-06-05                          0
## 2019-06-06                          7

Pre.period is the time period start to finish before the intervention happened

Post.period is the time period after the intervention that you will see if there was uplift compared to expected

run causal impact analysis

## Warning: Removed 136 rows containing missing values (geom_path).
## Warning: Removed 52 rows containing missing values (geom_path).
## Warning: Removed 272 rows containing missing values (geom_path).

## Posterior inference {CausalImpact}
## 
##                          Average        Cumulative    
## Actual                   3.1            131.0         
## Prediction (s.d.)        3.5 (0.35)     146.2 (14.69) 
## 95% CI                   [2.8, 4.1]     [116.7, 173.4]
##                                                       
## Absolute effect (s.d.)   -0.36 (0.35)   -15.20 (14.69)
## 95% CI                   [-1, 0.34]     [-42, 14.29]  
##                                                       
## Relative effect (s.d.)   -10% (10%)     -10% (10%)    
## 95% CI                   [-29%, 9.8%]   [-29%, 9.8%]  
## 
## Posterior tail-area probability p:   0.15015
## Posterior prob. of a causal effect:  85%
## 
## For more details, type: summary(impact, "report")
## Analysis report {CausalImpact}
## 
## 
## During the post-intervention period, the response variable had an average value of approx. 3.12. In the absence of an intervention, we would have expected an average response of 3.48. The 95% interval of this counterfactual prediction is [2.78, 4.13]. Subtracting this prediction from the observed response yields an estimate of the causal effect the intervention had on the response variable. This effect is -0.36 with a 95% interval of [-1.01, 0.34]. For a discussion of the significance of this effect, see below.
## 
## Summing up the individual data points during the post-intervention period (which can only sometimes be meaningfully interpreted), the response variable had an overall value of 131.00. Had the intervention not taken place, we would have expected a sum of 146.20. The 95% interval of this prediction is [116.71, 173.41].
## 
## The above results are given in terms of absolute numbers. In relative terms, the response variable showed a decrease of-10%. The 95% interval of this percentage is [-29%, +10%].
## 
## This means that, although it may look as though the intervention has exerted a negative effect on the response variable when considering the intervention period as a whole, this effect is not statistically significant, and so cannot be meaningfully interpreted. The apparent effect could be the result of random fluctuations that are unrelated to the intervention. This is often the case when the intervention period is very long and includes much of the time when the effect has already worn off. It can also be the case when the intervention period is too short to distinguish the signal from the noise. Finally, failing to find a significant effect can happen when there are not enough control variables or when these variables do not correlate well with the response variable during the learning period.
## 
## The probability of obtaining this effect by chance is p = 0.15. This means the effect may be spurious and would generally not be considered statistically significant.