This R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available.
Given a response time series (e.g., clicks) and a set of control time series (e.g., clicks in non-affected markets or clicks on other sites), the package constructs a Bayesian structural time-series model. This model is then used to try and predict the counterfactual, i.e., how the response metric would have evolved after the intervention if the intervention had never occurred. For a quick overview, watch the tutorial video. For details, see: Brodersen et al., Annals of Applied Statistics (2015).
We will apply LIWC only in certain categories:
## [1] "i" "we" "you" "shehe" "they"
## [6] "ipron" "negate" "compare" "posemo" "negemo"
## [11] "anx" "anger" "sad" "social" "family"
## [16] "friend" "female" "male" "insight" "cause"
## [21] "discrep" "tentat" "certain" "differ" "see"
## [26] "hear" "feel" "body" "health" "sexual"
## [31] "ingest" "affiliation" "achiev" "power" "reward"
## [36] "risk" "focuspast" "focuspresent" "focusfuture" "relativ"
## [41] "work" "leisure" "home" "money" "relig"
## [46] "death" "informal" "swear" "assent" "nonflu"
## [51] "filler"
Example: In our example data we consider two different days (-1,1), two different users (A, B). User A tweeted 3 times during day -1, and 2 times during day 1. A mentioned happy words 2 times in the first tweet and 1 time in the 3th tweet. During day 1 both users tweeted 2 times. A and B mentioned happyness 1 time. And A mentioned 1 time ipron.
days_tweet <- c(rep(-1, 4), rep(1,4))
id <- c(rep("A", 3), "B", rep("A",2), rep("B", 2))
id_tweet <- sample(30000:40000, 8, replace = F)
happy <- c(2, 0, 1, 0, 1, 0, 1, 0)
ipron <- c(0, 0, 0, 1, 1, 0, 0, 0)
d <- tibble(days_tweet, id, id_tweet, happy, ipron)
d %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
| days_tweet | id | id_tweet | happy | ipron |
|---|---|---|---|---|
| -1 | A | 31949 | 2 | 0 |
| -1 | A | 36706 | 0 | 0 |
| -1 | A | 38442 | 1 | 0 |
| -1 | B | 34539 | 0 | 1 |
| 1 | A | 32656 | 1 | 1 |
| 1 | A | 30282 | 0 | 0 |
| 1 | B | 37801 | 1 | 0 |
| 1 | B | 36020 | 0 | 0 |
Steps:
d %>%
group_by(days_tweet) %>%
summarise_at(vars(happy:ipron), mean) %>%
ungroup() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
| days_tweet | happy | ipron |
|---|---|---|
| -1 | 0.75 | 0.25 |
| 1 | 0.50 | 0.25 |
Steps:
We first computed the mean value per day and user.
Then we computed the mean value per day.
d %>%
group_by(days_tweet, id) %>%
summarise_at(vars(happy:ipron), mean) %>%
ungroup() %>%
group_by(days_tweet) %>%
summarise_at(vars(happy:ipron), mean) %>%
ungroup() %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
| days_tweet | happy | ipron |
|---|---|---|
| -1 | 0.5 | 0.50 |
| 1 | 0.5 | 0.25 |
This is values in NEDA and baseline with the first approach:
baseline_liwc %>%
select(-text, -created_at, -id_tweet, -id) %>%
group_by(days_tweet) %>%
summarise_all(mean) %>%
ungroup() %>%
pivot_longer(cols = fun:filler, names_to = "categ", values_to = "values_baseline") -> ci_baseline
neda_liwc %>%
select(-id_tweet, -id) %>%
group_by(days_tweet) %>%
summarise_all(mean) %>%
ungroup() %>%
pivot_longer(-days_tweet, names_to = "categ", values_to = "values_neda") -> ci_neda_liwc
pre_period <- c(1, 16)
post_period <- c(17, 31)
ci_baseline %>%
inner_join(ci_neda_liwc) -> d_first
d_first %>%
pivot_longer(cols = values_baseline:values_neda,
names_to = "names", values_to = "values") %>%
mutate(names = case_when(names == "values_baseline" ~ "Baseline",
names == "values_neda" ~ "NEDA")) %>%
filter(categ %in% sel_cat[1:24]) %>%
ggplot(aes(x = days_tweet, y = values, color = names)) +
geom_line() +
facet_wrap(categ ~., scales = "free", ncol = 6) +
labs(color = "", title = "First Approach") +
theme(legend.position="top")
This is values in NEDA and baseline with the second approach:
In the table bellow we compare result from computing Causal Impact Package using this two different approach. Only p_value < 0.05 are included. Columns contains information about:
Word Category: Word category in LIWC.
1st App Relative Eff.: Cummulative relative effect in percentages with the first approach.
1st P Value: P Value first approach.
2nd App Relative Eff.: Cummulative relative effect in percentages with the second approach.
d_first %>%
select(categ, values_neda, values_baseline) %>%
nest(data = - categ) %>%
mutate(mod = map(data, ~CausalImpact::CausalImpact(., pre_period, post_period))) -> ci
ci %>%
mutate(summary_mod = map(mod, "summary")) %>%
filter(!map_lgl(summary_mod, is.null)) -> ci_resul
ci_resul %>%
mutate(p = map(summary_mod, "p")) %>%
mutate(p = map_dbl(p, 1)) %>%
filter(p < 0.05) %>%
mutate(relative_effect = map(summary_mod, "RelEffect")) %>%
mutate(relative_effect = map_dbl(relative_effect, 2))-> sig_cat
sig_cat %>%
filter(categ %in% sel_cat) -> sig_cat
sig_cat %>%
arrange(desc(relative_effect)) %>%
select(categ, first_relative_effect = relative_effect, p_first = p) -> first_ap
d_second %>%
select(categ, values_neda, values_baseline) %>%
nest(data = - categ) %>%
mutate(mod = map(data, ~CausalImpact::CausalImpact(., pre_period, post_period))) -> ci
ci %>%
mutate(summary_mod = map(mod, "summary")) %>%
filter(!map_lgl(summary_mod, is.null)) -> ci_resul
ci_resul %>%
mutate(p = map(summary_mod, "p")) %>%
mutate(p = map_dbl(p, 1)) %>%
filter(p < 0.05) %>%
mutate(relative_effect = map(summary_mod, "RelEffect")) %>%
mutate(relative_effect = map_dbl(relative_effect, 2))-> sig_cat
sig_cat %>%
filter(categ %in% sel_cat) -> sig_cat
sig_cat %>%
arrange(desc(relative_effect)) %>%
select(categ, second_relative_effect = relative_effect, p_second = p) -> second_ap
second_ap %>%
mutate(categ = str_to_title(categ)) %>%
mutate_at(vars(second_relative_effect), ~.*100) %>%
mutate_if(is.numeric, ~round(., digits = 3)) %>%
mutate_at(vars(second_relative_effect), function(x){
cell_spec(x, "html", color = spec_color(x), bold = T)
}) %>%
kable("html", escape = F,
align = "lrr",
col.names = c("Word Category", "Relative Eff.(%)",
"P Value")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)
| Word Category | Relative Eff.(%) | P Value |
|---|---|---|
| Female | 17.374 | 0.001 |
| Family | 7.842 | 0.002 |
| Anx | 7.495 | 0.001 |
| Relig | 5.527 | 0.029 |
| Money | 5.339 | 0.003 |
| They | 4.088 | 0.010 |
| Achiev | 3.877 | 0.005 |
| Negate | 3.315 | 0.001 |
| Health | 2.737 | 0.005 |
| Power | 2.26 | 0.016 |
| Negemo | 2.135 | 0.041 |
| Informal | 1.17 | 0.032 |
| Ipron | -1.493 | 0.013 |
| You | -2.027 | 0.041 |
| Discrep | -2.234 | 0.030 |
| Differ | -2.6 | 0.010 |
| Tentat | -3.404 | 0.002 |
| Posemo | -3.493 | 0.001 |
| Shehe | -6.757 | 0.027 |
| Affiliation | -7.352 | 0.002 |