Goal

The goal of this memo is to provide an analysis for both Phase 1 and Phase 2 of the FB-Misinformation experiment. This memo provides a documentation of the data cleaning and analysis process.

Experimental Design

The experiment was conducted in two separate phases (Phase 1 and Phase 2). Phase 1 respondents are assigned to one of four groups: (1) Bad News game, (2) Inoculation Science video, (3) Emotions text message course, and (4) Baseline 1. Phase 2 respondents are assigned to one of three groups: (1) Emotions text message course, (2) Long Baseline and (3) Short Baseline 2. Payment amount in Phase 1 was a fixed amount while payment amount in Phase 2 was randomly assigned to be either low, medium or high amount.

The chatbot will start by requesting consent, then randomize participants into one of the intervention groups. The placebo group will be shown a text message course adapted to the chatbot format, with timers that allow sufficient reading time in between messages, that gives users facts about misinformation, but contains no educational content on identifying or avoiding sharing misinformation. We will observe treatment compliance for the placebo group at five points in the intervention, based on responses to quiz questions that are part of the intervention, as well as with a question at the end that asks about what they learned.

The Bad News game group will be provided a link to a copy of the online game. Meta pixels on most buttons on the game website will allow us to observe a participant’s progress in the game. Note that we will not observe pixel activity for the subset of iOS users who have opted out of tracking (approximately 85% of iOS users); however, market penetration for Apple phones ranges from just 4% in Kenya to 23% in Ghana.} We will also ask a compliance question when a participant returns to the chatbot about how many followers they gained (the key metric in the game), as well as a question about what they learned.
The Inoculation Science video group will be shown the videos embedded in the chatbot, with timers that will prevent the participant from continuing the study before sufficient time has passed to watch the video. Participants must engage with the chatbot in order to proceed to each of the five videos. We will also observe treatment compliance through a question after the last video about what they learned.
Lastly, the Emotions text message course group will be shown the text message course adapted to the chatbot format, with timers that allow sufficient reading time in between messages. As in the placebo group’s text message course, we will observe treatment compliance for the Emotions course group at five points in the intervention, based on responses to quiz questions that are part of the intervention, as well as with a question at the end that asks about what they learned.
In Phase 2, the Long Baseline group will be shown a longer version of the Short Baseline.

After participants are exposed to their assigned intervention, they will be shown six posts and asked to respond to three questions for each post:

-Would you share this post? (yes/no)

-How manipulative do you find this post? (5-point Likert scale; “Not at all” to “Very”)

-How reliable do you find this post? (5-point Likert scale; “Not at all” to “Very”)

The sharing question will always come first, but the order of the manipulativeness and reliability questions will be randomized at the user-level. The six posts will have the following composition:

-2 non-misinformation posts

-1 misinformation post using the “false dichotomies” technique

-1 misinformation post using the “discrediting technique”

-1 misinformation post using the “emotional language” technique

-1 attention check

Each post will be randomly drawn from a set of (at least two) posts of that type, with the order of the six posts randomized.

Approximately 22 hours after a participant completes the chatbot (to allow enough time for payment to be made and received), we will send a message within the 24-hour window allowed by Facebook to recontact a user who has engaged with our chat to ask if the participant would like to be notified when additional paid surveys are available. Six weeks after the participant completes the chatbot, we will notify these users that they can participate in another study for $1 (in local currency) of mobile airtime. In addition, we will send sponsored messages (paid messages sent through Facebook Messenger) to participants who did not opt-in for notifications with the same recruitment message.

As in the first survey, the follow-up survey in the chatbot will start by requesting consent. Then, participants will be shown six posts and asked to respond to three questions for each post, using the same structure. The six posts will be drawn from the same set of posts as the posts in the first chatbot, without replacement for the individual participant (i.e., no participant will see the same post in both the first and follow-up chatbots). The user-level randomization for the ordering of the manipulativeness and reliability questions will be preserved in the follow-up survey.

Data

The raw data for was downloaded from Chatfuel and undergone an anonymization process before being stored in Sherlock. The anynomized data was then downloaded from Sherlock and stored in the data folder. The cleaning process involved dropping observations who are not in Phase 1 or Phase 2 and dropping actual repeaters. We identified actual repeaters by looking at whether they have a timestamp in more than one treatment arms and dropped them. Attempted repeaters are not actual repeaters, and these respondents were turned away from the chatbot when they tried to enter the chatbot for the second time after completing the first time. Since they did not repeat the chatbot, we don’t drop them from our dataset. We also checked to make sure all observations are unique by comparing the number of unique id with the number of observations in the dataset.

The data cleaning script is located in FB_Misinfo_cleaning
The descriptive script is located in FB_Misinfo_descriptive

Treatments

Library

library(tidyverse)
library(here)
library(RColorBrewer)
library(janitor)
library(psych)
library(ggtext)
library(knitr)
library(kableExtra)
library(forcats)
library(gtools)
library(ggrepel)
library(DT)
#library(papeR)
library(compareGroups)
library(ggcorrplot)
library(ggplot2)
library(cobalt)
library(ggthemes)
library(nnet)

library(data.table)
library(broom)
library(kableExtra)
library(texreg)
library(patchwork)
library(readr)
library(lubridate)
library(boot)
library(wrappedtools)
library(stargazer)
library(fixest)
library(pander)

# Functions
mean_barplot <- function(data, y, ylab, limits = c(0, 1)) {
  if (limits == c(0, 1)) { labels = scales::percent } else labels = { waiver() }
             
  ggplot(data %>% 
          filter(!is.na(!!sym(y))),
       aes(x = condition, y = !!sym(y), fill = condition)) +
  geom_bar(stat = "summary", fun = "mean") +
  stat_summary(fun.data = mean_se, geom = "errorbar", fun.args = list(mult = 1), width = 0.1) +
  coord_cartesian(ylim = limits, expand = FALSE) +
  scale_y_continuous(labels = labels) +
  labs(y = ylab,
       x = "Condition") +
  theme(legend.position = "none") 
}

# Computes standard errors of a given continuous variable or one-hot binary variable, respectively #
se_cont = function(x, na.rm=FALSE) {
  if (na.rm) x <- na.omit(x)
  sqrt(var(x)/length(x))}

se_binary = function(x, na.rm=FALSE) {
  if (na.rm) x <- na.omit(x)
  sqrt(mean(x)*(1-mean(x))/length(x))}

# add parentheses
add_parentheses <- function(x) {
  if (!is.na(x) && x!="" && !is.na(as.numeric(x))) {
    formatted_x <- formatC(as.numeric(x), format = "f", digits = 4)
    formatted_x[formatted_x != "NA"] <- paste0("(", formatted_x[formatted_x != "NA"], ")")
    formatted_x
  } else {
    x
  }
}

Loading dataset

The anonymized dataset is stored in Github directory ~fb_misinfo_interventions/data/
The current working branch is 184-survey-data-analysis-phase-ii

df <- read_csv(here("data", "misinfo_clean.csv"))

Balance Table

Phase 1 Balance Table

Covariate Balance of Completers in Phase 1

df_filter <- df %>% 
  filter(is.na(arm_coded)==FALSE) %>%
  filter(country_coded == "Ghana" | country_coded == "Nigeria" | country_coded == "South Africa" | country_coded == "Kenya")

df_filter$baseline <- ifelse(df_filter$arm_coded == "Baseline 1" | df_filter$arm_coded == "Short Baseline 2", 1, 0)
df_filter$sms <- ifelse(df_filter$arm_coded == "SMS", 1, 0)
df_filter$game <- ifelse(df_filter$arm_coded == "Game", 1, 0)
df_filter$video <- ifelse(df_filter$arm_coded == "Video", 1, 0)
df_filter$longbaseline <- ifelse(df_filter$arm_coded == "Long Baseline", 1, 0)

df_filter_phase1 <- df_filter %>%
  filter(phase_coded=="Phase 1")

phase_1_a <- df_filter_phase1 %>%
  filter(baseline==1 | sms ==1)

phase_1_b <- df_filter_phase1 %>%
  filter(baseline==1 | game ==1)

phase_1_c <- df_filter_phase1 %>%
  filter(baseline==1 | video ==1)

df_filter_phase2 <- df_filter %>%
  filter(phase_coded=="Phase 2")

phase_2_a <- df_filter_phase2 %>% 
  filter(baseline==1 | sms ==1)

phase_2_b <- df_filter_phase2 %>%
  filter(baseline==1 | longbaseline ==1)

# Compute SMD by stratum
love.plot(sms ~ country_coded + MisinfoChat_start_time,
                 data=filter(phase_1_a, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 1, sms - baseline.")

# Compute SMD by stratum
love.plot(game ~ country_coded+ MisinfoChat_start_time,
                 data=filter(phase_1_b, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 1, game - baseline.")

# Compute SMD by stratum
love.plot(video ~ country_coded+ MisinfoChat_start_time,
                 data=filter(phase_1_c, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 1, video - baseline.")

Phase 2 Balance Table

Covariate Balance for Phase 2

# Compute SMD by stratum
love.plot(sms ~ country_coded + MisinfoChat_start_time,
                 data=filter(phase_2_a, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 2, sms - baseline.")

# Compute SMD by stratum
love.plot(longbaseline ~ country_coded + MisinfoChat_start_time,
                 data=filter(phase_2_b, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 2, long baseline - baseline.")

Funnel Analysis statistics

Overall

Participants who started the chatbot received a timestamp for the ‘MisinfoChat_start_time’ variable.
This table provides the location where participants dropped off from the chatbot, the number of drop off at each location, and the percentage of started dropped off.

df_dropoff <- df %>%
  select(phase_coded, entry_start_time, consent_coded, completed, arm_coded, quiz_consent_coded,
                                    entry_end_time,
                                    MisinfoChat_start_time,
                                    InterArms_start_time,
                                    InterArmsVideo_end_time,
                                    InterArmsGame_start_time,
                                    interArmsGame_end_time,
                                    InterArmsSMS_start_time,
                                    InterArmsSMS_end_time, 
                                    InterArmsBase_start_time,
                                    InterArmsBase_end_time, 
                                    InterArmsLongBase_start_time,
                                    InterArmsLongBase_end_time,
                                    MisinfoQuiz_start_time,
                                    misinfoQuiz_end_time,
                                    EndPay_start_time,
                                    EndPay_end_time)


df_dropoff$started_coded_num <- ifelse(!is.na(df_dropoff$MisinfoChat_start_time),1,0)
df_dropoff$consent_coded_num <- ifelse(!is.na(df_dropoff$consent_coded),1,0)
df_dropoff$started_arm_num <- ifelse(!is.na(df_dropoff$InterArms_start_time) | !is.na(df_dropoff$InterArmsSMS_start_time) | !is.na(df_dropoff$InterArmsGame_start_time) | !is.na(df_dropoff$InterArmsBase_start_time) | !is.na(df_dropoff$InterArmsLongBase_start_time),1,0)
df_dropoff$quiz_started_num <- ifelse(!is.na(df_dropoff$MisinfoQuiz_start_time),1,0)
df_dropoff$quiz_consent_coded_num <- ifelse(!is.na(df_dropoff$quiz_consent_coded),1,0)
df_dropoff$quiz_completed_num <- ifelse(!is.na(df_dropoff$misinfoQuiz_end_time),1,0)
df_dropoff$complete_all_coded_num <- ifelse(!is.na(df_dropoff$completed),1,0)

df_dropoff <- df_dropoff %>%
  mutate(score= rowSums(select(., started_coded_num, consent_coded_num, started_arm_num, quiz_consent_coded_num, quiz_completed_num, complete_all_coded_num, quiz_started_num))) %>%
  mutate(score = ifelse(score==7, "7-Completed all", ifelse(score == 6, "6-Dropped off during quiz reveal", ifelse(score == 5, "5-Dropped off during quiz", ifelse(score == 4, "4-Dropped off before quiz consent", ifelse(score == 4, "3-Dropped off during intervention", ifelse(score == 3, "3-Dropped off during intervention", ifelse(score == 2, "2-Dropped off before intervention", ifelse(score == 1, "1-Dropped off before consent", "Did not start")))))))))


df_dropoff %>%
  group_by(phase_coded) %>%
  count(score) %>%
  mutate(percentage = n*100/sum(n)) %>%
  mutate(percentage = round(percentage, 2)) %>%
  kable("html", col.names=(c("Phase", "Dropped off location", "N",
                             "% of Participants Started"))) %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>% scroll_box(width = "100%")

Phase	Dropped off location	N	% of Participants Started
Phase 1	1-Dropped off before consent	5936	16.04
	2-Dropped off before intervention	1001	2.70
	3-Dropped off during intervention	9106	24.61
	4-Dropped off before quiz consent	220	0.59
	5-Dropped off during quiz	3727	10.07
	6-Dropped off during quiz reveal	301	0.81
	7-Completed all	16717	45.17
Phase 2	1-Dropped off before consent	8649	17.02
	2-Dropped off before intervention	1875	3.69
	3-Dropped off during intervention	8905	17.52
	4-Dropped off before quiz consent	294	0.58
	5-Dropped off during quiz	3922	7.72
	6-Dropped off during quiz reveal	331	0.65
	7-Completed all	26847	52.82

df_dropoff_table <- df_dropoff %>%
  group_by(score, phase_coded) %>%
  summarise(count = n(), .groups = 'drop') # Calculate count

# Ensure 'score' is a factor and ordered as you want
df_dropoff$score <- factor(df_dropoff$score, levels = unique(df_dropoff$score))

ggplot(df_dropoff_table, mapping = aes(x = score, y = count, group = phase_coded)) +
  geom_point(aes(color = phase_coded)) +  # Color points by phase
  geom_line(aes(color = phase_coded), alpha = 0.5) +  # Color lines by phase
  geom_text(aes(label = scales::comma(count)), vjust = -0.5, size = 3) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  scale_color_manual(values = c("Phase 1" = "purple", "Phase 2" = "green")) +
  labs(title = "Chatbot Funnel: Drop off at Different Stages",
       x = "Chatbot stage",
       y = "Number of observations",
       color = "Phase") +  # Label for color legend
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

started participants are those have a timestamp for the MisinfoChat_start_time variable.
consent participants are those who responded ‘I consent, start now’ in the consent variable.
completed_intervention participants are those who completed has a timestamp for either the baseline_learn, game_learn, sms_learn, or Video_learn variables (applicable to both Phase 1 and Phase 2)
quiz consent participants are those who responded ‘I acknowledge’ in the quiz_consent variable.
completed quiz participants are those who completed the misinformation quiz and have a timestamp for the misinfoQuiz_end_time variable.
completed all participants are those who completed the chatbot and have a stamp for the completed variable.

df$consent_binary_num = ifelse(df$consent_coded == "I consent, start now",1,0)
df<- df %>%
  mutate(started_coded_num=ifelse(!is.na(`signed up`),1,0)) %>%  
  mutate(completed_intervention = ifelse(!is.na(baseline_learn),baseline_learn,
                            ifelse(!is.na(game_learn),game_learn,
                                   ifelse(!is.na(sms_learn),sms_learn,
                                          ifelse(!is.na(Video_learn),Video_learn,NA))))) %>%
  mutate(completed_intervention_num =ifelse(!is.na(completed_intervention),1,0)) %>% 
  mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>% # calculating the time between signed up and completing the misinformation quiz. 
  mutate(completed_survey_num=ifelse(!is.na(time_completion_survey),1,0)) %>%
  mutate(completed_all_num=ifelse(!is.na(completed), 1, 0)) %>%
  mutate(quiz_consent_coded_num = ifelse(quiz_consent_coded == "I acknowledge",1,0))


df_a <- df %>% 
  group_by(phase_coded) %>%
  summarise(`started`=sum(started_coded_num==1, na.rm=TRUE),
            `consented`=sum(consent_binary_num==1, na.rm=TRUE),
            `completed intervention`=sum(completed_intervention_num==1, na.rm=TRUE),
            `consented quiz`=sum(quiz_consent_coded_num==1, na.rm=TRUE),
            `completed quiz` = sum(completed_survey_num==1, na.rm=TRUE),
            `completed all` = sum(completed_all_num==1, na.rm=TRUE), 
            ) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "n" # Name of new column
  ) %>%
  data.frame()

df_b <- df %>%
  group_by(phase_coded) %>%
  summarise(frac_started = sum(started_coded_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_consented_started  = sum(consent_binary_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE), 
            frac_completed_intervention_started  = sum(completed_intervention_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_consented_quiz_started  = sum(quiz_consent_coded_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_completed_quiz_started  = sum(completed_survey_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_completed_all_started  = sum(completed_all_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of started" # Name of new column
  ) %>%
  select(`percentage of started`) %>%
  data.frame()

df_c <- df %>%
  group_by(phase_coded) %>%
  summarise(frac__started_consented = NA, 
            frac_conseted_consented = sum(consent_binary_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_completed_intervention_consented = sum(completed_intervention_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_consented_quiz_consented = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_completed_quiz_consented = sum(completed_survey_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_completed_all_consented = sum(completed_all_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of consent" # Name of new column
  ) %>%
  select(`percentage of consent`) %>%
  data.frame()

df_d <- df %>%
  group_by(phase_coded) %>%
  summarise(frac_started_completed_intervention = NA, 
            frac_consented_completed_intervention = NA,
            frac_completed_intervention_completed_intervention = sum(completed_intervention_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
            frac_consented_quiz_completed_intervention = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
            frac_completed_quiz_completed_intervention = sum(completed_survey_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
            frac_completed_all_completed_intervention = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of completed intervention" # Name of new column
  ) %>%
  select(`percentage of completed intervention`) %>%
  data.frame()

df_e <- df %>% 
  group_by(phase_coded) %>%
  summarise(frac_started_completed_quiz = NA, 
            frac_consented_completed_quiz = NA,
            frac_completed_intervention_completed_quiz = NA,
            frac_consented_quiz_completed_quiz = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE), 
            frac_completed_quiz_completed_quiz = sum(completed_survey_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE),
            frac_completed_all_completed_quiz = sum(completed_all_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of quiz consent" # Name of new column
  ) %>%
  select(`percentage of quiz consent`) %>%
  data.frame()

df_f <- df %>% 
  group_by(phase_coded) %>%
  summarise(frac_started_completed_all = NA, 
            frac_consented_completed_all = NA,
            frac_completed_intervention_completed_all = NA,
            frac_consented_quiz_completed_all = NA,
            frac_completed_quiz_completed_all = sum(completed_survey_num==1, na.rm=TRUE)/sum(completed_survey_num==1, na.rm=TRUE),
            frac_completed_all_completed_all = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_survey_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of completed quiz" # Name of new column
  ) %>%
  select(`percentage of completed quiz`) %>%
  data.frame()

df_g <- df %>%
  group_by(phase_coded) %>%
  summarise(frac_started_completed_all = NA, 
            frac_consented_completed_all = NA,
            frac_completed_intervention_completed_all = NA,
            frac_consented_quiz_completed_all = NA,
            frac_completed_quiz_completed_all = NA,
            frac_completed_all_completed_all = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_all_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of completed all" # Name of new column
  ) %>%
  select(`percentage of completed all`) %>%
  data.frame()

funnel_df <- cbind(df_a, df_b, df_c, df_d, df_e, df_f, df_g) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>% data.frame()

funnel_df %>%
  kable(digits = 3,caption = "Funnel Statistics", 
        col.names = c("Phase", "Stage", "N", "Prop. of Started", "Prop. of Consent", "Prop. of Completed Intervetion", "Prop. of Quiz Consent", "Prop. of Completed Quiz", "Prop. of Completed All" )) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")

Funnel Statistics
Phase	Stage	N	Prop. of Started	Prop. of Consent	Prop. of Completed Intervetion	Prop. of Quiz Consent	Prop. of Completed Quiz	Prop. of Completed All
Phase 1	started	37008	1.000	NA	NA	NA	NA	NA
	consented	29993	0.810	1.000	NA	NA	NA	NA
	completed intervention	20965	0.566	0.699	1.000	NA	NA	NA
	consented quiz	20662	0.558	0.689	0.986	1.000	NA	NA
	completed quiz	17016	0.460	0.567	0.812	0.824	1.000	NA
	completed all	16724	0.452	0.558	0.798	0.809	0.983	1
Phase 2	started	50823	1.000	NA	NA	NA	NA	NA
	consented	40681	0.800	1.000	NA	NA	NA	NA
	completed intervention	31392	0.618	0.772	1.000	NA	NA	NA
	consented quiz	31020	0.610	0.763	0.988	1.000	NA	NA
	completed quiz	27178	0.535	0.668	0.866	0.876	1.000	NA
	completed all	26862	0.529	0.660	0.856	0.866	0.988	1

ggplot(funnel_df, mapping = aes(x = stage, y = n, group = phase_coded)) +
  geom_point(aes(color = phase_coded)) +  # Color points by phase
  geom_line(aes(color = phase_coded), alpha = 0.5) +  # Color lines by phase
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  scale_color_manual(values = c("Phase 1" = "purple", "Phase 2" = "green")) +
  labs(title = "Chatbot Funnel: Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations",
       color = "Phase") +  # Label for color legend
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Completion over time

df_summary <- df %>% 
  select(phase_coded, MisinfoChat_start_time, completed_survey_num) %>%
  mutate(startdate = as.Date(MisinfoQuiz_start_time))

daily_summary <- df_summary %>%
  group_by(startdate) %>%
  count(completed_survey_num==1) %>%
  mutate(pct = n/sum(n))


ggplot(daily_summary,
       aes(x = startdate, y = pct)) +
  geom_bar(position = "identity", stat = "identity") +
  scale_y_continuous(limits = c(0, 1), labels = scales::percent, expand = c(0, 0, 0, 0)) +
  scale_x_date(breaks = "day") +
  labs(x = "Day",
       y = "% complete") +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 90))

Phase 1

Only participants who consented are considered.
Zooming into the dropout during the intervention. Dropout is much higher for game, video and SMS than for facts baseline.

dfphase1 <- df %>%
  filter(phase_coded == "Phase 1") %>%
  filter(consent_binary_num == 1) %>%
  filter(arm_coded == "Baseline 1" | arm_coded == "SMS" | arm_coded == "Video" | arm_coded == "Game" | is.na("arm_coded")) 

dfphase1 %>%
  mutate(n_started_intervention = if_else(!is.na(InterArmsBase_start_time) | !is.na(InterArms_start_time) | !is.na(InterArmsGame_start_time) | !is.na(InterArmsSMS_start_time), 1, 0),
         n_finished_intervention = if_else(!is.na(InterArmsBase_end_time) | !is.na(InterArmsVideo_end_time) | !is.na(interArmsGame_end_time) | !is.na(InterArmsSMS_end_time), 1, 0)) %>%
  filter(!is.na(arm)) %>% 
  group_by(arm_coded) %>%
  summarize(n_started_intervention = sum(n_started_intervention==1),
            n_finished_intervention = sum(n_finished_intervention==1),
            completed_intervention_after_starting = n_finished_intervention/n_started_intervention) %>%
  kable(digits = 3,caption = "Intervention Funnel for Phase 1", 
        col.names = c("Treatment Arm", "Started Intervention", "Completed Intervention", "Completed/Started" )) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")

Intervention Funnel for Phase 1
Treatment Arm	Started Intervention	Completed Intervention	Completed/Started
Baseline 1	7441	6138	0.825
Game	7458	4667	0.626
SMS	7543	5143	0.682
Video	7525	4962	0.659

Baseline fact course

df_base_funnel <- dfphase1 %>% 
  filter(!is.na(InterArmsBase_start_time)) %>% 
  summarize(n_started_baseline_intervention = sum(!is.na(InterArmsBase_start_time)),
            n_base2_ready = sum(!is.na(base2_ready)),
            n_base3_ready = sum(!is.na(base3_ready)),
            n_base4_ready = sum(!is.na(base4_ready)),
            n_base5_ready = sum(!is.na(base5_ready)),
            n_finished_baseline_intervention = sum(!is.na(InterArmsBase_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))

ggplot(df_base_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started baseline intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Facts Baseline",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Game

df_game_funnel <- dfphase1 %>% 
  filter(!is.na(InterArmsGame_start_time)) %>% 
  summarize(n_started_game_intervention = sum(!is.na(InterArmsGame_start_time)),
            n_game_done = sum(!is.na(game_done)),
            n_finished_game_intervention = sum(!is.na(interArmsGame_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))

ggplot(df_game_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started game intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Game",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Video

df_video_funnel <- dfphase1 %>% 
  filter(!is.na(InterArms_start_time)) %>% 
  summarize(n_started_video_intervention = sum(!is.na(InterArms_start_time)),
            n_video2_ready = sum(!is.na(video2_ready)),
            n_video2_done = sum(!is.na(video2_done)),
            n_video3_ready = sum(!is.na(video3_ready)),
            n_video3_done = sum(!is.na(video3_done)),
            n_video4_ready = sum(!is.na(video4_ready)),
            n_video4_done = sum(!is.na(video4_done)),
            n_video5_done = sum(!is.na(video5_done)),
            n_finished_video_intervention = sum(!is.na(InterArmsVideo_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))

ggplot(df_video_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started video intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Video",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

SMS course

df_sms_funnel <- dfphase1 %>% 
  filter(!is.na(InterArmsSMS_start_time)) %>% 
  summarize(n_started_sms_intervention = sum(!is.na(InterArmsSMS_start_time)),
            n_sms2_ready = sum(!is.na(sms2_ready)),
            n_sms3_ready = sum(!is.na(sms3_ready)),
            n_sms4_ready = sum(!is.na(sms4_ready)),
            n_sms5_ready = sum(!is.na(sms5_ready)),
            n_finished_sms_intervention = sum(!is.na(InterArmsSMS_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))

ggplot(df_sms_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started sms intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel SMS",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Check completion times

# payment end time is almost always missing, so we don't analyze payment time
# misinfo chat duration is always 0, so we don't show it
df_duration1 <- dfphase1 %>% 
  mutate(baseline_duration = InterArmsBase_end_time - InterArmsBase_start_time,
         video_duration = InterArmsVideo_end_time - InterArms_start_time,
         game_duration = interArmsGame_end_time - InterArmsGame_start_time,
         sms_duration = InterArmsSMS_end_time - InterArmsSMS_start_time,
         misinfo_quiz_duration = misinfoQuiz_end_time - MisinfoQuiz_start_time) %>%
  select(contains("duration")) %>% 
  pivot_longer(cols = everything(),
               names_pattern = "(.*)_duration",
               names_to = "stage",
               values_to = "time") %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>%
  filter(!is.na(time))

# strangely, there are some negative durations, we filter those
df_duration1 <- df_duration1 %>%
  filter(time >= 0)

ggplot(df_duration1,
       mapping = aes(x = time, group = stage, color = stage)) +
  geom_histogram() +
  facet_wrap(~ stage, scales = "free") +
  labs(x = "time in minutes")

ggplot(df_duration1,
       mapping = aes(y = time, x = stage, color = stage)) +
  geom_boxplot() +
  # restrict scale to focus on observations that are not outliers
  scale_y_continuous(limits = c(0, 150)) +
  labs(y = "time in minutes")

df_duration1 %>%
  group_by(stage) %>% 
  summarize(median_duration = median(time),
            avg_duration = mean(time),
            sd_duration = sd(time),
            min_duration = min(time),
            max_duration = max(time)) %>%
  kable(digits = 3,caption = "Phase 1: Duration of Interventions", 
        col.names = c("Stage", "Median", "Average", "SD", "Min","Max")) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")

Phase 1: Duration of Interventions
Stage	Median	Average	SD	Min	Max
baseline	10.783 mins	175.426 mins	2036.561	4.617 mins	68946.00 mins
video	15.500 mins	337.589 mins	2889.449	9.000 mins	69023.57 mins
game	17.383 mins	225.703 mins	1824.336	5.767 mins	48474.42 mins
sms	17.467 mins	254.841 mins	2452.954	8.150 mins	64620.12 mins
misinfo_quiz	6.817 mins	177.845 mins	2295.861	1.517 mins	73517.35 mins

Phase 2

Only participants who consented are considered.
Zooming into the dropout during the intervention. Dropout is higher for long baseline and SMS. The differences in dropout rates are much lower in Phase 2 than Phase 1.

dfphase2 <- df %>%
  filter(phase_coded == "Phase 2") %>%
  filter(consent_binary_num == 1) %>%
  filter(arm_coded == "Short Baseline 2" | arm_coded == "SMS" | arm_coded == "Long Baseline" | is.na("arm_coded")) 

dfphase2 %>%
  mutate(n_started_intervention = if_else(!is.na(InterArmsBase_start_time) | !is.na(InterArmsLongBase_start_time) | !is.na(InterArmsSMS_start_time), 1, 0),
         n_finished_intervention = if_else(!is.na(InterArmsBase_end_time) | !is.na(InterArmsLongBase_end_time) | !is.na(InterArmsSMS_end_time), 1, 0)) %>%
  filter(!is.na(arm)) %>% 
  group_by(arm_coded) %>%
  summarize(n_started_intervention = sum(n_started_intervention==1),
            n_finished_intervention = sum(n_finished_intervention==1),
            completed_intervention_after_starting = n_finished_intervention/n_started_intervention) %>%
  kable(digits = 3,caption = "Intervention Funnel for Phase 2", 
        col.names = c("Treatment Arm", "Started Intervention", "Completed Intervention", "Completed/Started" )) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")

Intervention Funnel for Phase 2
Treatment Arm	Started Intervention	Completed Intervention	Completed/Started
Long Baseline	14024	10740	0.766
SMS	20077	15497	0.772
Short Baseline 2	6185	5148	0.832

Short Baseline

df_base_funnel2 <- dfphase2 %>% 
  filter(!is.na(InterArmsBase_start_time)) %>% 
  summarize(n_started_baseline_intervention = sum(!is.na(InterArmsBase_start_time)),
            n_base2_ready = sum(!is.na(base2_ready)),
            n_base3_ready = sum(!is.na(base3_ready)),
            n_base4_ready = sum(!is.na(base4_ready)),
            n_base5_ready = sum(!is.na(base5_ready)),
            n_finished_baseline_intervention = sum(!is.na(InterArmsBase_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))

ggplot(df_base_funnel2,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started baseline intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Facts Baseline",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

SMS course

df_sms_funnel2 <- dfphase2 %>% 
  filter(!is.na(InterArmsSMS_start_time)) %>% 
  summarize(n_started_sms_intervention = sum(!is.na(InterArmsSMS_start_time)),
            n_sms2_ready = sum(!is.na(sms2_ready)),
            n_sms3_ready = sum(!is.na(sms3_ready)),
            n_sms4_ready = sum(!is.na(sms4_ready)),
            n_sms5_ready = sum(!is.na(sms5_ready)),
            n_finished_sms_intervention = sum(!is.na(InterArmsSMS_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))

ggplot(df_sms_funnel2,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started sms intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel SMS",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Long Baseline

df_longbase_funnel2 <- dfphase2 %>% 
  filter(!is.na(InterArmsLongBase_start_time)) %>% 
  summarize(n_started_longbase_intervention = sum(!is.na(InterArmsLongBase_start_time)),
             n_base2_ready = sum(!is.na(base2_ready)),
            n_base3_ready = sum(!is.na(base3_ready)),
            n_base4_ready = sum(!is.na(base4_ready)),
            n_base5_ready = sum(!is.na(base5_ready)),
            n_base_experts_reason = sum(!is.na(base_experts_reason)),
            n_finished_longbase_intervention = sum(!is.na(InterArmsLongBase_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))

ggplot(df_longbase_funnel2,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started longbase intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel SMS",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Check completion times

# payment end time is almost always missing, so we don't analyze payment time
# misinfo chat duration is always 0, so we don't show it
df_duration2 <- dfphase2 %>% 
  mutate(shortbase_duration = InterArmsBase_end_time - InterArmsBase_start_time,
         longbase_duration = InterArmsLongBase_end_time - InterArmsLongBase_start_time,
         sms_duration = InterArmsSMS_end_time - InterArmsSMS_start_time,
         misinfo_quiz_duration = misinfoQuiz_end_time - MisinfoQuiz_start_time) %>%
  select(contains("duration")) %>% 
  pivot_longer(cols = everything(),
               names_pattern = "(.*)_duration",
               names_to = "stage",
               values_to = "time") %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>%
  filter(!is.na(time))

# strangely, there are some negative durations, we filter those
df_duration2 <- df_duration2 %>%
  filter(time >= 0)

ggplot(df_duration2,
       mapping = aes(x = time, group = stage, color = stage)) +
  geom_histogram() +
  facet_wrap(~ stage, scales = "free") +
  labs(x = "time in minutes")

ggplot(df_duration2,
       mapping = aes(y = time, x = stage, color = stage)) +
  geom_boxplot() +
  # restrict scale to focus on observations that are not outliers
  scale_y_continuous(limits = c(0, 150)) +
  labs(y = "time in minutes")

df_duration2 %>%
  group_by(stage) %>% 
  summarize(median_duration = median(time),
            avg_duration = mean(time),
            sd_duration = sd(time),
            min_duration = min(time),
            max_duration = max(time)) %>%
  kable(digits = 3,caption = "Phase 2: Duration of Interventions", 
        col.names = c("Stage", "Median", "Average", "SD", "Min","Max")) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")

Phase 2: Duration of Interventions
Stage	Median	Average	SD	Min	Max
shortbase	10.550 mins	122.126 mins	1377.030	4.217 mins	39524.40 mins
longbase	17.533 mins	153.456 mins	1189.853	3.967 mins	41738.68 mins
sms	16.650 mins	167.512 mins	1469.652	5.683 mins	47502.13 mins
misinfo_quiz	6.150 mins	108.910 mins	1242.914	1.200 mins	44377.45 mins

Mean Survey Outcomes

Sharing discernment is defined as the difference in proportion between non-misinformation posts and misinformation posts shared. For example, if user shared 2 out of 2 non-misinformation posts (i.e. $1$) and 1 out of 3 misinformation posts (i.e. $\frac{1}{3}$), discernment takes a value of $\frac{2}{3}$.

Reliability and Manipulation discernment are constructed in such a way that misinformation and non-misinformation are assigned an equal weight. Since each user encounters 3 misinformation posts and 2 non-misinformation posts, non-misinformation is reweighted by $\frac{3}{2}$

\[ Sharing Discernemnt = \frac{\textit{# Non-Misinformation posts Shared}}{3} - \frac{\textit{# Misinformation posts Shared}}{2} \]

\[ \text{Reliability Discernment:} = \begin{cases} \textit{Reliability score} \times \frac{3}{2}, & \text{if non-misinformation} \\ \textit{Reliability score} \times -1, & \text{if misinformation} \\ 0, & \text{otherwise} \end{cases} \]

\[ \text{Manipulation Discernment:} = \begin{cases} \textit{Manipulation score}\times \frac{3}{2}, & \text{if non-misinformation} \\ \textit{Manipulation score} \times -1, & \text{if misinformation} \\ 0, & \text{otherwise} \end{cases} \]

Phase 1

df_chat <- df %>%
  select(
    -matches("manipulative_coded"),
    -matches("reliable_coded"),
    -matches("manipulative_coded_num"),
    -matches("reliable_coded_num"),
    -matches("share_coded")
  ) %>%
   rename(condition = arm_coded) %>%
  filter(phase_coded=="Phase 1") 


df_chat <- df_chat %>% 
  mutate(treatments = ifelse(condition != "Baseline 1",1,0),
         completed = ifelse(!is.na(baseline_learn),baseline_learn,
                            ifelse(!is.na(game_learn),game_learn,
                                   ifelse(!is.na(sms_learn),sms_learn,
                                          ifelse(!is.na(Video_learn),Video_learn,NA))))) %>%
  mutate(completed_coded = ifelse(!is.na(completed),1,0)) %>%
  mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>%
  mutate(completed_survey = ifelse(!is.na(time_completion_survey),1,0)) %>%
  mutate(completed_coded =  ifelse(!is.na(baseline_learn),baseline_learn,
                                   ifelse(!is.na(game_learn),game_learn,
                                          ifelse(!is.na(sms_learn),sms_learn,
                                                 ifelse(!is.na(Video_learn),Video_learn,NA))))) %>% 
  mutate(completed_binary = ifelse(!is.na(completed_coded),1,0))

# 12 questions asking about manipulative or reliable content 
unique_responses_manip_reliable <- df_chat %>% 
  select(contains("_manipulative"), ends_with("reliable")) %>% 
  unlist() %>% 
  unique()

unique_responses_share <- df %>% 
  select(contains("_share")) %>% 
  unlist() %>% 
  unique()

valid_responses_5point <- c("1 - Not at all", "2", "3", "4", "5 - Very", 1, 2, 3, 4, 5, "5- Very", "5-Very", "Very5", "5 very", "1-Not at all")
valid_responses_share <- c("Yes", "No", "yes", "no", "YES")

df_chat <- df_chat %>% 
  mutate_at(.vars = vars(contains(c("_manipulative"))),
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(ends_with(c("_reliable"))), # ends_with to exclude manipulative_reliable_order
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(contains(c("_share"))),
            .funs = funs(case_when(. %in% valid_responses_share ~ .,
                                   TRUE ~ NA_character_)))

df_particpants_posts <- df_chat %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_outcomes <- df_chat %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>% 
  pivot_wider(names_from = variable,
              values_from = value)

df_chat_long <- left_join(df_outcomes %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_chat_long <- df_chat_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))

# Table 2
#Table with columns ("# of Obs", "Misinfo Sharing", "Non-misinfo Sharing", "Discernment", "Misinfo Reliability", "Non-misinfo Reliability", "Discernment", "Misinfo Manip", "Non-misinfo Manip", "Discernment") that report the mean of survey outcome for completers above and the standard error in parentheses, and rows ("All", "Placebo", "Treatments", "Text Course", "Videos", "Game")

# restrict to completers
df_completed <- df_chat %>% 
  filter(completed_binary == 1 & consent == "I consent, start now")


df_completed_long <- df_completed %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>%
 pivot_wider(names_from = variable,
              values_from = value) 

df_particpants_posts <- df_completed %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_completed_long <- left_join(df_completed_long %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_completed_long <- df_completed_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))
# shares can be codified to include other responses e.g. "Yeah" or "No I will not share"

# Create all and substitute placebo 
df_completed_long$condition[df_completed_long$condition == "Baseline 1"] <- "placebo"
df_completed_long$treatments <- ifelse(df_completed_long$condition != "placebo",1,0)


# Post contains misinformation
df_completed_long$post_contains_misinfo <- ifelse(!grepl("^attention_check|^non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_misinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_misinfo[is.na(df_completed_long$post)] <- NA

# Post contains non-misinformation
df_completed_long$post_contains_nonmisinfo <- ifelse(grepl("non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_nonmisinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_nonmisinfo[is.na(df_completed_long$post)] <- NA


# Equal weighting throughout
# number of misinformation posts user encountered
completed_coded <- df_completed_long %>%
  group_by(anon_id,post_contains_misinfo,post_contains_nonmisinfo) %>%
  summarise(posts_misinfo = ifelse(post_contains_misinfo==1,n(),NA),
            posts_nonmisinfo = ifelse(post_contains_nonmisinfo==1,n(),NA)) %>%
  ungroup() %>%
  select(anon_id,posts_misinfo,posts_nonmisinfo) %>%
  distinct() %>%
  filter(!is.na(posts_misinfo) | !is.na(posts_nonmisinfo))

# Discernment: sharing
# every user sees 3 misinfo posts and 2 non-misinfo
df_completed_long$disc_sharing = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1 & df_completed_long$share==1,1*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1 & df_completed_long$share==1,-1,0))

# Discernment: reliability
df_completed_long$disc_reliability = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,df_completed_long$reliable*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1,-1*df_completed_long$reliable,0))

# Discernment: manipulation
df_completed_long$disc_manipulation = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,-1*df_completed_long$manipulative,
         ifelse(df_completed_long$post_contains_misinfo==1,df_completed_long$manipulative*(3/2),0))


# "Misinfo Sharing"
misinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_misinfo_sharing <- misinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- misinfo_sharing_ind %>%
  summarise(condition = "All",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_sharing_ind[misinfo_sharing_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(misinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- misinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))
table2_misinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_misinfo_sharing)

# "Non-misinfo Sharing"
nonmisinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_nonmisinfo_sharing <- nonmisinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_sharing_ind[nonmisinfo_sharing_ind$condition %in%c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

table2_nonmisinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_sharing)

# "Discernment"

# Discernment defined as Prop. Non-Misinfo Shared - Prop. Misinfo shared (i.e., equal weighting).
sharing_ind <- df_completed_long %>%
  select(anon_id, condition, treatments, post_contains_misinfo, post_contains_nonmisinfo, share, post) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(
    posts_misinfo_share = sum(share == 1 & post_contains_misinfo == 1, na.rm = TRUE)/3,
    posts_nonmisinfo_share = sum(share == 1 & post_contains_nonmisinfo == 1, na.rm = TRUE)/2) %>%
  ungroup()

sharing_ind$disc_diff_prop = sharing_ind$posts_nonmisinfo_share - sharing_ind$posts_misinfo_share

sharing_disc_ind <- sharing_ind %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_diff_prop, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_diff_prop, na.rm = TRUE))

table2_disc_sharing <- aggregate(disc_rate~condition,sharing_disc_ind, mean)
names(table2_disc_sharing) <- c("condition","mean")
table2_disc_sharing$nobs <- c(seq(nrow(sharing_disc_ind),length(sharing_disc_ind$condition)))
table2_disc_sharing$se <- aggregate(disc_rate~condition,sharing_disc_ind, se_cont)[,2]

# all 
aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(sharing_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(sharing_disc_ind[sharing_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(sharing_disc_ind$disc_rate, na.rm = TRUE))

# treatments 
aux_treatments <- sharing_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE), 
            nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_sharing <- bind_rows(aux_all,aux_treatments,table2_disc_sharing)


# "Misinfo Reliability"
misinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_misinfo_reliability <- misinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_reliability_ind[misinfo_reliability_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(misinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- misinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_misinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_misinfo_reliability)

# "NonMisinfo Reliability"

nonmisinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_nonmisinfo_reliability <- nonmisinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_reliability_ind[nonmisinfo_reliability_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_nonmisinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_reliability)

# "Discernment"
reliability_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_reliability, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_reliability, na.rm = TRUE))

table2_disc_reliability <- reliability_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(reliability_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(reliability_disc_ind[reliability_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(reliability_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- reliability_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_reliability <- bind_rows(aux_all,aux_treatments,table2_disc_reliability)

# "Misinfo Manipulation"
misinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_misinfo_manipulation <- misinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_manipulation_ind[misinfo_manipulation_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- misinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_misinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_misinfo_manipulation)

# "NonMisinfo Manipulation"

nonmisinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_nonmisinfo_manipulation <- nonmisinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_manipulation_ind[nonmisinfo_manipulation_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_nonmisinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_manipulation)


# "Discernment"
manipulation_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_manipulation, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_manipulation, na.rm = TRUE))

table2_disc_manipulation <- manipulation_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(manipulation_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(manipulation_disc_ind[manipulation_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(manipulation_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- manipulation_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_manipulation <- bind_rows(aux_all,aux_treatments,table2_disc_manipulation)



# Final table
table2 <- data.frame("condition" = table2_misinfo_sharing$condition,
                     "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$mean,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$mean,
                     "Discernment Sharing" = table2_disc_sharing$mean,
                     "Misinfo Reliability" = table2_misinfo_reliability$mean,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$mean,
                     "Discernment Reliability" = table2_disc_reliability$mean,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$mean,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$mean,
                     "Discernment Manipulation" = table2_disc_manipulation$mean)
table2_se <- data.frame("condition" = table2_misinfo_sharing$condition,
                      "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$se,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$se,
                     "Discernment Sharing" = table2_disc_sharing$se,
                     "Misinfo Reliability" = table2_misinfo_reliability$se,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$se,
                     "Discernment Reliability" = table2_disc_reliability$se,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$se,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$se,
                     "Discernment Manipulation" = table2_disc_manipulation$se)


# add parentheses to s.e.
#order_vector <- c("20,689", " 6,078", "14,611", " 5,083", " 4,904"," 4,624") 
#table2_se <- table2_se[order(match(table2_se$X..of.Obs, order_vector)), ]
#table2_se$order <- seq(nrow(table2_se))
table2_se <- table2_se %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("SMS",condition), 4,
                  ifelse(grepl("Video",condition), 5,
                  ifelse(grepl("Game",condition), 6,NA)))))))
table2_se$priority <- 2
table2_se$X..of.Obs <- NA
table2_se[,1:11] <- data.frame(lapply(table2_se[,1:11], function(col) sapply(col, add_parentheses)))

# format table 2
table2 <- data.frame(lapply(table2, function(x) format(x, big.mark = ",", digits = 3)))
#table2 <- table2[order(match(table2$X..of.Obs, order_vector)), ]
#table2$order <- seq(nrow(table2))
table2 <- table2 %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("SMS",condition), 4,
                  ifelse(grepl("Video",condition), 5,
                  ifelse(grepl("Game",condition), 6,NA)))))))
table2$priority <- 1


# bind rows
names(table2) <- c("condition","# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation","order","priority")
names(table2_se) <- names(table2)
table <- rbind(table2,table2_se)
table <- table[order(table$order, table$priority), ]
table <- table %>% select(-c(order,priority))
table <- table %>% select(-c(condition))
# remove NA
table <- data.frame(lapply(table, function(x) {
  x <- as.character(x)
  x[is.na(x)] <- ""
  gsub("NA", "", x)
}))

names(table) <- names(table2)[2:11]
rownames(table) <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B","Videos","\u200B\u200B\u200B\u200B\u200B","Game","\u200B\u200B\u200B\u200B\u200B\u200B")

kable(table,format = "markdown", digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%", height = "500px")

	# of Obs	Misinfo Sharing	Non-misinfo Sharing	Discernment Sharing	Misinfo Reliability	Non-misinfo Reliability	Discernment Reliability	Misinfo Manipulation	Non-misinfo Manipulation	Discernment Manipulation
All	20,412	0.604	0.750	0.1378	2.64	3.05	0.244	2.86	2.48	1.56
		(0.0027)	(0.0025)	(0.0028)	(0.0092)	(0.0100)	(0.0082)	(0.0095)	(0.0098)	(0.0094)
Placebo	5,997	0.640	0.764	0.1300	2.81	3.17	0.218	2.86	2.55	1.54
		(0.0049)	(0.0045)	(0.0054)	(0.0174)	(0.0185)	(0.0154)	(0.0174)	(0.0185)	(0.0176)
Treatments	14,415	0.589	0.744	0.1464	2.58	3.00	0.255	2.86	2.45	1.57
		(0.0033)	(0.0030)	(0.0033)	(0.0108)	(0.0118)	(0.0098)	(0.0113)	(0.0115)	(0.0111)
Text Course	5,009	0.381	0.640	0.1172	2.21	2.87	0.397	3.06	2.48	1.74
		(0.0052)	(0.0055)	(0.0050)	(0.0165)	(0.0195)	(0.0160)	(0.0192)	(0.0194)	(0.0192)
Videos	4,855	0.671	0.806	0.2443	2.76	3.16	0.244	2.80	2.43	1.54
		(0.0053)	(0.0047)	(0.0059)	(0.0189)	(0.0204)	(0.0161)	(0.0191)	(0.0199)	(0.0179)
Game	4,551	0.736	0.794	0.0559	2.79	2.98	0.107	2.68	2.44	1.41
		(0.0050)	(0.0050)	(0.0055)	(0.0198)	(0.0213)	(0.0185)	(0.0196)	(0.0206)	(0.0202)

# make it tex code
table$col_1 <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B","Videos","\u200B\u200B\u200B\u200B\u200B","Game","\u200B\u200B\u200B\u200B\u200B\u200B")
table <- table[, c("col_1", setdiff(names(table), "col_1"))]
latex_code <- paste0(apply(table, 1, function(row) paste(row, collapse = " & ")), " \\\\\n")
latex_code <- c(paste("","\\# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation \\\\ \\hline ", sep = " & "),latex_code)
latex_code <- gsub("\\[\\d+\\]", "", latex_code)
# save .tex
write.table(latex_code, file = "table2.tex", sep = " & ", quote = FALSE, row.names = FALSE, col.names = FALSE)

Phase 2

df_chat <- df %>%
  select(
    -matches("manipulative_coded"),
    -matches("reliable_coded"),
    -matches("manipulative_coded_num"),
    -matches("reliable_coded_num"),
    -matches("share_coded")
  ) %>%
   rename(condition = arm_coded) %>%
  filter(phase_coded=="Phase 2") 


df_chat <- df_chat %>% 
  mutate(treatments = ifelse(condition != "Short Baseline 2",1,0),
         completed = ifelse(!is.na(baseline_learn),baseline_learn,
                                   ifelse(!is.na(sms_learn),sms_learn,
                                          NA))) %>%
  mutate(completed_coded = ifelse(!is.na(completed),1,0)) %>%
  mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>%
  mutate(completed_survey = ifelse(!is.na(time_completion_survey),1,0)) %>%
  mutate(completed_coded =  ifelse(!is.na(baseline_learn),baseline_learn,
                                          ifelse(!is.na(sms_learn),sms_learn,
                                                 NA))) %>% 
  mutate(completed_binary = ifelse(!is.na(completed_coded),1,0))

# 12 questions asking about manipulative or reliable content 
unique_responses_manip_reliable <- df_chat %>% 
  select(contains("_manipulative"), ends_with("reliable")) %>% 
  unlist() %>% 
  unique()

unique_responses_share <- df %>% 
  select(contains("_share")) %>% 
  unlist() %>% 
  unique()

valid_responses_5point <- c("1 - Not at all", "2", "3", "4", "5 - Very", 1, 2, 3, 4, 5, "5- Very", "5-Very", "Very5", "5 very", "1-Not at all")
valid_responses_share <- c("Yes", "No", "yes", "no", "YES")

df_chat <- df_chat %>% 
  mutate_at(.vars = vars(contains(c("_manipulative"))),
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(ends_with(c("_reliable"))), # ends_with to exclude manipulative_reliable_order
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(contains(c("_share"))),
            .funs = funs(case_when(. %in% valid_responses_share ~ .,
                                   TRUE ~ NA_character_)))

df_particpants_posts <- df_chat %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_outcomes <- df_chat %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>% 
  pivot_wider(names_from = variable,
              values_from = value)

df_chat_long <- left_join(df_outcomes %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_chat_long <- df_chat_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))

# Table 2
#Table with columns ("# of Obs", "Misinfo Sharing", "Non-misinfo Sharing", "Discernment", "Misinfo Reliability", "Non-misinfo Reliability", "Discernment", "Misinfo Manip", "Non-misinfo Manip", "Discernment") that report the mean of survey outcome for completers above and the standard error in parentheses, and rows ("All", "Placebo", "Treatments", "Text Course", "Videos", "Game")

# restrict to completers
df_completed <- df_chat %>% 
  filter(completed_binary == 1 & consent == "I consent, start now")


df_completed_long <- df_completed %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>%
 pivot_wider(names_from = variable,
              values_from = value) 

df_particpants_posts <- df_completed %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_completed_long <- left_join(df_completed_long %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_completed_long <- df_completed_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))
# shares can be codified to include other responses e.g. "Yeah" or "No I will not share"

# Create all and substitute placebo 
df_completed_long$condition[df_completed_long$condition == "Short Baseline 2"] <- "placebo"
df_completed_long$treatments <- ifelse(df_completed_long$condition != "placebo",1,0)


# Post contains misinformation
df_completed_long$post_contains_misinfo <- ifelse(!grepl("^attention_check|^non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_misinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_misinfo[is.na(df_completed_long$post)] <- NA

# Post contains non-misinformation
df_completed_long$post_contains_nonmisinfo <- ifelse(grepl("non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_nonmisinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_nonmisinfo[is.na(df_completed_long$post)] <- NA


# Equal weighting throughout
# number of misinformation posts user encountered
completed_coded <- df_completed_long %>%
  group_by(anon_id,post_contains_misinfo,post_contains_nonmisinfo) %>%
  summarise(posts_misinfo = ifelse(post_contains_misinfo==1,n(),NA),
            posts_nonmisinfo = ifelse(post_contains_nonmisinfo==1,n(),NA)) %>%
  ungroup() %>%
  select(anon_id,posts_misinfo,posts_nonmisinfo) %>%
  distinct() %>%
  filter(!is.na(posts_misinfo) | !is.na(posts_nonmisinfo))

# Discernment: sharing
# every user sees 3 misinfo posts and 2 non-misinfo
df_completed_long$disc_sharing = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1 & df_completed_long$share==1,1*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1 & df_completed_long$share==1,-1,0))

# Discernment: reliability
df_completed_long$disc_reliability = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,df_completed_long$reliable*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1,-1*df_completed_long$reliable,0))

# Discernment: manipulation
df_completed_long$disc_manipulation = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,-1*df_completed_long$manipulative,
         ifelse(df_completed_long$post_contains_misinfo==1,df_completed_long$manipulative*(3/2),0))


# "Misinfo Sharing"
misinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_misinfo_sharing <- misinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- misinfo_sharing_ind %>%
  summarise(condition = "All",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_sharing_ind[misinfo_sharing_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(misinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- misinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))
table2_misinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_misinfo_sharing)

# "Non-misinfo Sharing"
nonmisinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_nonmisinfo_sharing <- nonmisinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_sharing_ind[nonmisinfo_sharing_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

table2_nonmisinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_sharing)

# "Discernment"

# Discernment defined as Prop. Non-Misinfo Shared - Prop. Misinfo shared (i.e., equal weighting).
sharing_ind <- df_completed_long %>%
  select(anon_id, condition, treatments, post_contains_misinfo, post_contains_nonmisinfo, share, post) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(
    posts_misinfo_share = sum(share == 1 & post_contains_misinfo == 1, na.rm = TRUE)/3,
    posts_nonmisinfo_share = sum(share == 1 & post_contains_nonmisinfo == 1, na.rm = TRUE)/2) %>%
  ungroup()

sharing_ind$disc_diff_prop = sharing_ind$posts_nonmisinfo_share - sharing_ind$posts_misinfo_share

sharing_disc_ind <- sharing_ind %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_diff_prop, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_diff_prop, na.rm = TRUE))

table2_disc_sharing <- aggregate(disc_rate~condition,sharing_disc_ind, mean)
names(table2_disc_sharing) <- c("condition","mean")
table2_disc_sharing$nobs <- c(seq(nrow(sharing_disc_ind),length(sharing_disc_ind$condition)))
table2_disc_sharing$se <- aggregate(disc_rate~condition,sharing_disc_ind, se_cont)[,2]

# all 
aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(sharing_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(sharing_disc_ind[sharing_disc_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(sharing_disc_ind$disc_rate, na.rm = TRUE))

# treatments 
aux_treatments <- sharing_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE), 
            nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_sharing <- bind_rows(aux_all,aux_treatments,table2_disc_sharing)


# "Misinfo Reliability"
misinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_misinfo_reliability <- misinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_reliability_ind[misinfo_reliability_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(misinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- misinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_misinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_misinfo_reliability)

# "NonMisinfo Reliability"

nonmisinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_nonmisinfo_reliability <- nonmisinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_reliability_ind[nonmisinfo_reliability_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_nonmisinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_reliability)

# "Discernment"
reliability_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_reliability, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_reliability, na.rm = TRUE))

table2_disc_reliability <- reliability_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(reliability_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(reliability_disc_ind[reliability_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(reliability_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- reliability_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_reliability <- bind_rows(aux_all,aux_treatments,table2_disc_reliability)

# "Misinfo Manipulation"
misinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_misinfo_manipulation <- misinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_manipulation_ind[misinfo_manipulation_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- misinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_misinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_misinfo_manipulation)

# "NonMisinfo Manipulation"

nonmisinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_nonmisinfo_manipulation <- nonmisinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_manipulation_ind[nonmisinfo_manipulation_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_nonmisinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_manipulation)


# "Discernment"
manipulation_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_manipulation, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_manipulation, na.rm = TRUE))

table2_disc_manipulation <- manipulation_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(manipulation_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(manipulation_disc_ind[manipulation_disc_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(manipulation_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- manipulation_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_manipulation <- bind_rows(aux_all,aux_treatments,table2_disc_manipulation)



# Final table
table2 <- data.frame("condition" = table2_misinfo_sharing$condition,
                     "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$mean,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$mean,
                     "Discernment Sharing" = table2_disc_sharing$mean,
                     "Misinfo Reliability" = table2_misinfo_reliability$mean,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$mean,
                     "Discernment Reliability" = table2_disc_reliability$mean,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$mean,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$mean,
                     "Discernment Manipulation" = table2_disc_manipulation$mean)
table2_se <- data.frame("condition" = table2_misinfo_sharing$condition,
                      "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$se,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$se,
                     "Discernment Sharing" = table2_disc_sharing$se,
                     "Misinfo Reliability" = table2_misinfo_reliability$se,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$se,
                     "Discernment Reliability" = table2_disc_reliability$se,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$se,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$se,
                     "Discernment Manipulation" = table2_disc_manipulation$se)


# add parentheses to s.e.
#order_vector <- c("20,689", " 6,078", "14,611", " 5,083", " 4,904"," 4,624") 
#table2_se <- table2_se[order(match(table2_se$X..of.Obs, order_vector)), ]
#table2_se$order <- seq(nrow(table2_se))
table2_se <- table2_se %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("Long Baseline",condition), 4,
                  ifelse(grepl("SMS",condition), 5,
                  NA))))))
table2_se$priority <- 2
table2_se$X..of.Obs <- NA
table2_se[,1:11] <- data.frame(lapply(table2_se[,1:11], function(col) sapply(col, add_parentheses)))

# format table 2
table2 <- data.frame(lapply(table2, function(x) format(x, big.mark = ",", digits = 3)))
#table2 <- table2[order(match(table2$X..of.Obs, order_vector)), ]
#table2$order <- seq(nrow(table2))
table2 <- table2 %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("Long Baseline",condition), 4,
                  ifelse(grepl("SMS",condition), 5,
                  NA))))))
table2$priority <- 1


# bind rows
names(table2) <- c("condition","# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation","order","priority")
names(table2_se) <- names(table2)
table <- rbind(table2,table2_se)
table <- table[order(table$order, table$priority), ]
table <- table %>% select(-c(order,priority))
table <- table %>% select(-c(condition))
# remove NA
table <- data.frame(lapply(table, function(x) {
  x <- as.character(x)
  x[is.na(x)] <- ""
  gsub("NA", "", x)
}))

names(table) <- names(table2)[2:11]
rownames(table) <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Long Baseline","\u200B\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B\u200B\u200B")

kable(table,format = "markdown", digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%", height = "500px")

	# of Obs	Misinfo Sharing	Non-misinfo Sharing	Discernment Sharing	Misinfo Reliability	Non-misinfo Reliability	Discernment Reliability	Misinfo Manipulation	Non-misinfo Manipulation	Discernment Manipulation
All	30,974	0.516	0.707	0.183	2.54	3.06	0.306	3.03	2.55	1.70
		(0.0022)	(0.0021)	(0.0023)	(0.0072)	(0.0078)	(0.0062)	(0.0075)	(0.0078)	(0.0072)
Placebo	5,072	0.637	0.759	0.246	2.77	3.12	0.198	2.91	2.53	1.60
		(0.0053)	(0.0050)	(0.0034)	(0.0184)	(0.0200)	(0.0162)	(0.0188)	(0.0195)	(0.0184)
Treatments	25,902	0.493	0.697	0.197	2.50	3.04	0.327	3.06	2.56	1.72
		(0.0024)	(0.0023)	(0.0026)	(0.0078)	(0.0085)	(0.0066)	(0.0082)	(0.0085)	(0.0078)
Long Baseline	10,579	0.639	0.771	0.127	2.86	3.25	0.243	2.97	2.60	1.61
		(0.0037)	(0.0033)	(0.0037)	(0.0127)	(0.0134)	(0.0111)	(0.0129)	(0.0136)	(0.0123)
Text Course	15,323	0.392	0.646	0.114	2.26	2.90	0.384	3.12	2.52	1.79
		(0.0030)	(0.0031)	(0.0054)	(0.0093)	(0.0108)	(0.0082)	(0.0106)	(0.0108)	(0.0101)

# make it tex code
table$col_1 <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Long Baseline","\u200B\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B\u200B\u200B")
table <- table[, c("col_1", setdiff(names(table), "col_1"))]
latex_code <- paste0(apply(table, 1, function(row) paste(row, collapse = " & ")), " \\\\\n")
latex_code <- c(paste("","\\# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation \\\\ \\hline ", sep = " & "),latex_code)
latex_code <- gsub("\\[\\d+\\]", "", latex_code)
#save.tex
write.table(latex_code, file = "table2.tex", sep = " & ", quote = FALSE, row.names = FALSE, col.names = FALSE)

FB-Misinformation Experiment Memo

Kiet Le

2024-02-05

Goal

Experimental Design

Data

Treatments

Library

Loading dataset

Balance Table

Phase 1 Balance Table

Phase 2 Balance Table

Funnel Analysis statistics

Overall

Completion over time

Phase 1

Baseline fact course

Game

Video

SMS course

Check completion times

Phase 2

Short Baseline

SMS course

Long Baseline

Check completion times

Mean Survey Outcomes

Phase 1

Phase 2