Goal

The goal of this memo is to provide an analysis for both Phase 1 and Phase 2 of the FB-Misinformation experiment. This memo provides a documentation of the data cleaning and analysis process.

Experimental Design

The experiment was conducted in two separate phases (Phase 1 and Phase 2). Phase 1 respondents are assigned to one of four groups: (1) Bad News game, (2) Inoculation Science video, (3) Emotions text message course, and (4) Baseline 1. Phase 2 respondents are assigned to one of three groups: (1) Emotions text message course, (2) Long Baseline and (3) Short Baseline 2. Payment amount in Phase 1 was a fixed amount while payment amount in Phase 2 was randomly assigned to be either low, medium or high amount.

The chatbot will start by requesting consent, then randomize participants into one of the intervention groups. The placebo group will be shown a text message course adapted to the chatbot format, with timers that allow sufficient reading time in between messages, that gives users facts about misinformation, but contains no educational content on identifying or avoiding sharing misinformation. We will observe treatment compliance for the placebo group at five points in the intervention, based on responses to quiz questions that are part of the intervention, as well as with a question at the end that asks about what they learned.

  • The Bad News game group will be provided a link to a copy of the online game. Meta pixels on most buttons on the game website will allow us to observe a participant’s progress in the game. Note that we will not observe pixel activity for the subset of iOS users who have opted out of tracking (approximately 85% of iOS users); however, market penetration for Apple phones ranges from just 4% in Kenya to 23% in Ghana.} We will also ask a compliance question when a participant returns to the chatbot about how many followers they gained (the key metric in the game), as well as a question about what they learned.

  • The Inoculation Science video group will be shown the videos embedded in the chatbot, with timers that will prevent the participant from continuing the study before sufficient time has passed to watch the video. Participants must engage with the chatbot in order to proceed to each of the five videos. We will also observe treatment compliance through a question after the last video about what they learned.

  • Lastly, the Emotions text message course group will be shown the text message course adapted to the chatbot format, with timers that allow sufficient reading time in between messages. As in the placebo group’s text message course, we will observe treatment compliance for the Emotions course group at five points in the intervention, based on responses to quiz questions that are part of the intervention, as well as with a question at the end that asks about what they learned.

  • In Phase 2, the Long Baseline group will be shown a longer version of the Short Baseline.

After participants are exposed to their assigned intervention, they will be shown six posts and asked to respond to three questions for each post:

-Would you share this post? (yes/no)

-How manipulative do you find this post? (5-point Likert scale; “Not at all” to “Very”)

-How reliable do you find this post? (5-point Likert scale; “Not at all” to “Very”)

The sharing question will always come first, but the order of the manipulativeness and reliability questions will be randomized at the user-level. The six posts will have the following composition:

-2 non-misinformation posts

-1 misinformation post using the “false dichotomies” technique

-1 misinformation post using the “discrediting technique”

-1 misinformation post using the “emotional language” technique

-1 attention check

Each post will be randomly drawn from a set of (at least two) posts of that type, with the order of the six posts randomized.

Approximately 22 hours after a participant completes the chatbot (to allow enough time for payment to be made and received), we will send a message within the 24-hour window allowed by Facebook to recontact a user who has engaged with our chat to ask if the participant would like to be notified when additional paid surveys are available. Six weeks after the participant completes the chatbot, we will notify these users that they can participate in another study for $1 (in local currency) of mobile airtime. In addition, we will send sponsored messages (paid messages sent through Facebook Messenger) to participants who did not opt-in for notifications with the same recruitment message.

As in the first survey, the follow-up survey in the chatbot will start by requesting consent. Then, participants will be shown six posts and asked to respond to three questions for each post, using the same structure. The six posts will be drawn from the same set of posts as the posts in the first chatbot, without replacement for the individual participant (i.e., no participant will see the same post in both the first and follow-up chatbots). The user-level randomization for the ordering of the manipulativeness and reliability questions will be preserved in the follow-up survey.

Data

The raw data for was downloaded from Chatfuel and undergone an anonymization process before being stored in Sherlock. The anynomized data was then downloaded from Sherlock and stored in the data folder. The cleaning process involved dropping observations who are not in Phase 1 or Phase 2 and dropping actual repeaters. We identified actual repeaters by looking at whether they have a timestamp in more than one treatment arms and dropped them. Attempted repeaters are not actual repeaters, and these respondents were turned away from the chatbot when they tried to enter the chatbot for the second time after completing the first time. Since they did not repeat the chatbot, we don’t drop them from our dataset. We also checked to make sure all observations are unique by comparing the number of unique id with the number of observations in the dataset.

Treatments

Library

library(tidyverse)
library(here)
library(RColorBrewer)
library(janitor)
library(psych)
library(ggtext)
library(knitr)
library(kableExtra)
library(forcats)
library(gtools)
library(ggrepel)
library(DT)
#library(papeR)
library(compareGroups)
library(ggcorrplot)
library(ggplot2)
library(cobalt)
library(ggthemes)
library(nnet)

library(data.table)
library(broom)
library(kableExtra)
library(texreg)
library(patchwork)
library(readr)
library(lubridate)
library(boot)
library(wrappedtools)
library(stargazer)
library(fixest)
library(pander)
# Functions
mean_barplot <- function(data, y, ylab, limits = c(0, 1)) {
  if (limits == c(0, 1)) { labels = scales::percent } else labels = { waiver() }
             
  ggplot(data %>% 
          filter(!is.na(!!sym(y))),
       aes(x = condition, y = !!sym(y), fill = condition)) +
  geom_bar(stat = "summary", fun = "mean") +
  stat_summary(fun.data = mean_se, geom = "errorbar", fun.args = list(mult = 1), width = 0.1) +
  coord_cartesian(ylim = limits, expand = FALSE) +
  scale_y_continuous(labels = labels) +
  labs(y = ylab,
       x = "Condition") +
  theme(legend.position = "none") 
}

# Computes standard errors of a given continuous variable or one-hot binary variable, respectively #
se_cont = function(x, na.rm=FALSE) {
  if (na.rm) x <- na.omit(x)
  sqrt(var(x)/length(x))}

se_binary = function(x, na.rm=FALSE) {
  if (na.rm) x <- na.omit(x)
  sqrt(mean(x)*(1-mean(x))/length(x))}

# add parentheses
add_parentheses <- function(x) {
  if (!is.na(x) && x!="" && !is.na(as.numeric(x))) {
    formatted_x <- formatC(as.numeric(x), format = "f", digits = 4)
    formatted_x[formatted_x != "NA"] <- paste0("(", formatted_x[formatted_x != "NA"], ")")
    formatted_x
  } else {
    x
  }
}

Loading dataset

  • The anonymized dataset is stored in Github directory ~fb_misinfo_interventions/data/

  • The current working branch is 184-survey-data-analysis-phase-ii

df <- read_csv(here("data", "misinfo_clean.csv"))

Balance Table

Phase 1 Balance Table

  • Covariate Balance of Completers in Phase 1
df_filter <- df %>% 
  filter(is.na(arm_coded)==FALSE) %>%
  filter(country_coded == "Ghana" | country_coded == "Nigeria" | country_coded == "South Africa" | country_coded == "Kenya")

df_filter$baseline <- ifelse(df_filter$arm_coded == "Baseline 1" | df_filter$arm_coded == "Short Baseline 2", 1, 0)
df_filter$sms <- ifelse(df_filter$arm_coded == "SMS", 1, 0)
df_filter$game <- ifelse(df_filter$arm_coded == "Game", 1, 0)
df_filter$video <- ifelse(df_filter$arm_coded == "Video", 1, 0)
df_filter$longbaseline <- ifelse(df_filter$arm_coded == "Long Baseline", 1, 0)

df_filter_phase1 <- df_filter %>%
  filter(phase_coded=="Phase 1")

phase_1_a <- df_filter_phase1 %>%
  filter(baseline==1 | sms ==1)

phase_1_b <- df_filter_phase1 %>%
  filter(baseline==1 | game ==1)

phase_1_c <- df_filter_phase1 %>%
  filter(baseline==1 | video ==1)

df_filter_phase2 <- df_filter %>%
  filter(phase_coded=="Phase 2")

phase_2_a <- df_filter_phase2 %>% 
  filter(baseline==1 | sms ==1)

phase_2_b <- df_filter_phase2 %>%
  filter(baseline==1 | longbaseline ==1)

# Compute SMD by stratum
love.plot(sms ~ country_coded + MisinfoChat_start_time,
                 data=filter(phase_1_a, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 1, sms - baseline.")

# Compute SMD by stratum
love.plot(game ~ country_coded+ MisinfoChat_start_time,
                 data=filter(phase_1_b, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 1, game - baseline.")

# Compute SMD by stratum
love.plot(video ~ country_coded+ MisinfoChat_start_time,
                 data=filter(phase_1_c, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 1, video - baseline.")

Phase 2 Balance Table

  • Covariate Balance for Phase 2
# Compute SMD by stratum
love.plot(sms ~ country_coded + MisinfoChat_start_time,
                 data=filter(phase_2_a, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 2, sms - baseline.")

# Compute SMD by stratum
love.plot(longbaseline ~ country_coded + MisinfoChat_start_time,
                 data=filter(phase_2_b, completed=="TRUE"),
          groups = "arm_coded",
          binary = "std", # display standardized mean differences
          thresholds = c(m = .1), 
          colors =c(colorblind_pal()(8)[c(2)]))+ 
  scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
  # specify theme aesthetics
  theme_minimal() + 
  # specify text size and face
  theme(strip.text = element_text(size = 12), 
      axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
      axis.text.y = element_text(size = 8), 
      axis.title = element_text(size = 12, face = "bold"), 
      legend.position = "none", 
      plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
  # specify labels
  labs(title = "Covariate Balance", 
       x = 'Standardized Mean Difference', 
       y = "Covariate", 
       caption  = "For observations completed in Phase 2, long baseline - baseline.")

Funnel Analysis statistics

Overall

  • Participants who started the chatbot received a timestamp for the ‘MisinfoChat_start_time’ variable.

  • This table provides the location where participants dropped off from the chatbot, the number of drop off at each location, and the percentage of started dropped off.

df_dropoff <- df %>%
  select(phase_coded, entry_start_time, consent_coded, completed, arm_coded, quiz_consent_coded,
                                    entry_end_time,
                                    MisinfoChat_start_time,
                                    InterArms_start_time,
                                    InterArmsVideo_end_time,
                                    InterArmsGame_start_time,
                                    interArmsGame_end_time,
                                    InterArmsSMS_start_time,
                                    InterArmsSMS_end_time, 
                                    InterArmsBase_start_time,
                                    InterArmsBase_end_time, 
                                    InterArmsLongBase_start_time,
                                    InterArmsLongBase_end_time,
                                    MisinfoQuiz_start_time,
                                    misinfoQuiz_end_time,
                                    EndPay_start_time,
                                    EndPay_end_time)


df_dropoff$started_coded_num <- ifelse(!is.na(df_dropoff$MisinfoChat_start_time),1,0)
df_dropoff$consent_coded_num <- ifelse(!is.na(df_dropoff$consent_coded),1,0)
df_dropoff$started_arm_num <- ifelse(!is.na(df_dropoff$InterArms_start_time) | !is.na(df_dropoff$InterArmsSMS_start_time) | !is.na(df_dropoff$InterArmsGame_start_time) | !is.na(df_dropoff$InterArmsBase_start_time) | !is.na(df_dropoff$InterArmsLongBase_start_time),1,0)
df_dropoff$quiz_started_num <- ifelse(!is.na(df_dropoff$MisinfoQuiz_start_time),1,0)
df_dropoff$quiz_consent_coded_num <- ifelse(!is.na(df_dropoff$quiz_consent_coded),1,0)
df_dropoff$quiz_completed_num <- ifelse(!is.na(df_dropoff$misinfoQuiz_end_time),1,0)
df_dropoff$complete_all_coded_num <- ifelse(!is.na(df_dropoff$completed),1,0)

df_dropoff <- df_dropoff %>%
  mutate(score= rowSums(select(., started_coded_num, consent_coded_num, started_arm_num, quiz_consent_coded_num, quiz_completed_num, complete_all_coded_num, quiz_started_num))) %>%
  mutate(score = ifelse(score==7, "7-Completed all", ifelse(score == 6, "6-Dropped off during quiz reveal", ifelse(score == 5, "5-Dropped off during quiz", ifelse(score == 4, "4-Dropped off before quiz consent", ifelse(score == 4, "3-Dropped off during intervention", ifelse(score == 3, "3-Dropped off during intervention", ifelse(score == 2, "2-Dropped off before intervention", ifelse(score == 1, "1-Dropped off before consent", "Did not start")))))))))


df_dropoff %>%
  group_by(phase_coded) %>%
  count(score) %>%
  mutate(percentage = n*100/sum(n)) %>%
  mutate(percentage = round(percentage, 2)) %>%
  kable("html", col.names=(c("Phase", "Dropped off location", "N",
                             "% of Participants Started"))) %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>% scroll_box(width = "100%")
Phase Dropped off location N % of Participants Started
Phase 1 1-Dropped off before consent 5936 16.04
2-Dropped off before intervention 1001 2.70
3-Dropped off during intervention 9106 24.61
4-Dropped off before quiz consent 220 0.59
5-Dropped off during quiz 3727 10.07
6-Dropped off during quiz reveal 301 0.81
7-Completed all 16717 45.17
Phase 2 1-Dropped off before consent 8649 17.02
2-Dropped off before intervention 1875 3.69
3-Dropped off during intervention 8905 17.52
4-Dropped off before quiz consent 294 0.58
5-Dropped off during quiz 3922 7.72
6-Dropped off during quiz reveal 331 0.65
7-Completed all 26847 52.82
df_dropoff_table <- df_dropoff %>%
  group_by(score, phase_coded) %>%
  summarise(count = n(), .groups = 'drop') # Calculate count

# Ensure 'score' is a factor and ordered as you want
df_dropoff$score <- factor(df_dropoff$score, levels = unique(df_dropoff$score))
ggplot(df_dropoff_table, mapping = aes(x = score, y = count, group = phase_coded)) +
  geom_point(aes(color = phase_coded)) +  # Color points by phase
  geom_line(aes(color = phase_coded), alpha = 0.5) +  # Color lines by phase
  geom_text(aes(label = scales::comma(count)), vjust = -0.5, size = 3) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  scale_color_manual(values = c("Phase 1" = "purple", "Phase 2" = "green")) +
  labs(title = "Chatbot Funnel: Drop off at Different Stages",
       x = "Chatbot stage",
       y = "Number of observations",
       color = "Phase") +  # Label for color legend
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

  • started participants are those have a timestamp for the MisinfoChat_start_time variable.

  • consent participants are those who responded ‘I consent, start now’ in the consent variable.

  • completed_intervention participants are those who completed has a timestamp for either the baseline_learn, game_learn, sms_learn, or Video_learn variables (applicable to both Phase 1 and Phase 2)

  • quiz consent participants are those who responded ‘I acknowledge’ in the quiz_consent variable.

  • completed quiz participants are those who completed the misinformation quiz and have a timestamp for the misinfoQuiz_end_time variable.

  • completed all participants are those who completed the chatbot and have a stamp for the completed variable.

df$consent_binary_num = ifelse(df$consent_coded == "I consent, start now",1,0)
df<- df %>%
  mutate(started_coded_num=ifelse(!is.na(`signed up`),1,0)) %>%  
  mutate(completed_intervention = ifelse(!is.na(baseline_learn),baseline_learn,
                            ifelse(!is.na(game_learn),game_learn,
                                   ifelse(!is.na(sms_learn),sms_learn,
                                          ifelse(!is.na(Video_learn),Video_learn,NA))))) %>%
  mutate(completed_intervention_num =ifelse(!is.na(completed_intervention),1,0)) %>% 
  mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>% # calculating the time between signed up and completing the misinformation quiz. 
  mutate(completed_survey_num=ifelse(!is.na(time_completion_survey),1,0)) %>%
  mutate(completed_all_num=ifelse(!is.na(completed), 1, 0)) %>%
  mutate(quiz_consent_coded_num = ifelse(quiz_consent_coded == "I acknowledge",1,0))


df_a <- df %>% 
  group_by(phase_coded) %>%
  summarise(`started`=sum(started_coded_num==1, na.rm=TRUE),
            `consented`=sum(consent_binary_num==1, na.rm=TRUE),
            `completed intervention`=sum(completed_intervention_num==1, na.rm=TRUE),
            `consented quiz`=sum(quiz_consent_coded_num==1, na.rm=TRUE),
            `completed quiz` = sum(completed_survey_num==1, na.rm=TRUE),
            `completed all` = sum(completed_all_num==1, na.rm=TRUE), 
            ) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "n" # Name of new column
  ) %>%
  data.frame()

df_b <- df %>%
  group_by(phase_coded) %>%
  summarise(frac_started = sum(started_coded_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_consented_started  = sum(consent_binary_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE), 
            frac_completed_intervention_started  = sum(completed_intervention_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_consented_quiz_started  = sum(quiz_consent_coded_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_completed_quiz_started  = sum(completed_survey_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
            frac_completed_all_started  = sum(completed_all_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of started" # Name of new column
  ) %>%
  select(`percentage of started`) %>%
  data.frame()

df_c <- df %>%
  group_by(phase_coded) %>%
  summarise(frac__started_consented = NA, 
            frac_conseted_consented = sum(consent_binary_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_completed_intervention_consented = sum(completed_intervention_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_consented_quiz_consented = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_completed_quiz_consented = sum(completed_survey_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
            frac_completed_all_consented = sum(completed_all_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of consent" # Name of new column
  ) %>%
  select(`percentage of consent`) %>%
  data.frame()

df_d <- df %>%
  group_by(phase_coded) %>%
  summarise(frac_started_completed_intervention = NA, 
            frac_consented_completed_intervention = NA,
            frac_completed_intervention_completed_intervention = sum(completed_intervention_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
            frac_consented_quiz_completed_intervention = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
            frac_completed_quiz_completed_intervention = sum(completed_survey_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
            frac_completed_all_completed_intervention = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of completed intervention" # Name of new column
  ) %>%
  select(`percentage of completed intervention`) %>%
  data.frame()

df_e <- df %>% 
  group_by(phase_coded) %>%
  summarise(frac_started_completed_quiz = NA, 
            frac_consented_completed_quiz = NA,
            frac_completed_intervention_completed_quiz = NA,
            frac_consented_quiz_completed_quiz = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE), 
            frac_completed_quiz_completed_quiz = sum(completed_survey_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE),
            frac_completed_all_completed_quiz = sum(completed_all_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of quiz consent" # Name of new column
  ) %>%
  select(`percentage of quiz consent`) %>%
  data.frame()

df_f <- df %>% 
  group_by(phase_coded) %>%
  summarise(frac_started_completed_all = NA, 
            frac_consented_completed_all = NA,
            frac_completed_intervention_completed_all = NA,
            frac_consented_quiz_completed_all = NA,
            frac_completed_quiz_completed_all = sum(completed_survey_num==1, na.rm=TRUE)/sum(completed_survey_num==1, na.rm=TRUE),
            frac_completed_all_completed_all = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_survey_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of completed quiz" # Name of new column
  ) %>%
  select(`percentage of completed quiz`) %>%
  data.frame()

df_g <- df %>%
  group_by(phase_coded) %>%
  summarise(frac_started_completed_all = NA, 
            frac_consented_completed_all = NA,
            frac_completed_intervention_completed_all = NA,
            frac_consented_quiz_completed_all = NA,
            frac_completed_quiz_completed_all = NA,
            frac_completed_all_completed_all = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_all_num==1, na.rm=TRUE)) %>%
  pivot_longer(
    -phase_coded, # Columns to keep as is
    names_to = "stage", # Name of new column
    values_to = "percentage of completed all" # Name of new column
  ) %>%
  select(`percentage of completed all`) %>%
  data.frame()

funnel_df <- cbind(df_a, df_b, df_c, df_d, df_e, df_f, df_g) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>% data.frame()

funnel_df %>%
  kable(digits = 3,caption = "Funnel Statistics", 
        col.names = c("Phase", "Stage", "N", "Prop. of Started", "Prop. of Consent", "Prop. of Completed Intervetion", "Prop. of Quiz Consent", "Prop. of Completed Quiz", "Prop. of Completed All" )) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")  
Funnel Statistics
Phase Stage N Prop. of Started Prop. of Consent Prop. of Completed Intervetion Prop. of Quiz Consent Prop. of Completed Quiz Prop. of Completed All
Phase 1 started 37008 1.000 NA NA NA NA NA
consented 29993 0.810 1.000 NA NA NA NA
completed intervention 20965 0.566 0.699 1.000 NA NA NA
consented quiz 20662 0.558 0.689 0.986 1.000 NA NA
completed quiz 17016 0.460 0.567 0.812 0.824 1.000 NA
completed all 16724 0.452 0.558 0.798 0.809 0.983 1
Phase 2 started 50823 1.000 NA NA NA NA NA
consented 40681 0.800 1.000 NA NA NA NA
completed intervention 31392 0.618 0.772 1.000 NA NA NA
consented quiz 31020 0.610 0.763 0.988 1.000 NA NA
completed quiz 27178 0.535 0.668 0.866 0.876 1.000 NA
completed all 26862 0.529 0.660 0.856 0.866 0.988 1
ggplot(funnel_df, mapping = aes(x = stage, y = n, group = phase_coded)) +
  geom_point(aes(color = phase_coded)) +  # Color points by phase
  geom_line(aes(color = phase_coded), alpha = 0.5) +  # Color lines by phase
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  scale_color_manual(values = c("Phase 1" = "purple", "Phase 2" = "green")) +
  labs(title = "Chatbot Funnel: Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations",
       color = "Phase") +  # Label for color legend
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Completion over time

df_summary <- df %>% 
  select(phase_coded, MisinfoChat_start_time, completed_survey_num) %>%
  mutate(startdate = as.Date(MisinfoQuiz_start_time))

daily_summary <- df_summary %>%
  group_by(startdate) %>%
  count(completed_survey_num==1) %>%
  mutate(pct = n/sum(n))


ggplot(daily_summary,
       aes(x = startdate, y = pct)) +
  geom_bar(position = "identity", stat = "identity") +
  scale_y_continuous(limits = c(0, 1), labels = scales::percent, expand = c(0, 0, 0, 0)) +
  scale_x_date(breaks = "day") +
  labs(x = "Day",
       y = "% complete") +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 90))

Phase 1

  • Only participants who consented are considered.

  • Zooming into the dropout during the intervention. Dropout is much higher for game, video and SMS than for facts baseline.

dfphase1 <- df %>%
  filter(phase_coded == "Phase 1") %>%
  filter(consent_binary_num == 1) %>%
  filter(arm_coded == "Baseline 1" | arm_coded == "SMS" | arm_coded == "Video" | arm_coded == "Game" | is.na("arm_coded")) 

dfphase1 %>%
  mutate(n_started_intervention = if_else(!is.na(InterArmsBase_start_time) | !is.na(InterArms_start_time) | !is.na(InterArmsGame_start_time) | !is.na(InterArmsSMS_start_time), 1, 0),
         n_finished_intervention = if_else(!is.na(InterArmsBase_end_time) | !is.na(InterArmsVideo_end_time) | !is.na(interArmsGame_end_time) | !is.na(InterArmsSMS_end_time), 1, 0)) %>%
  filter(!is.na(arm)) %>% 
  group_by(arm_coded) %>%
  summarize(n_started_intervention = sum(n_started_intervention==1),
            n_finished_intervention = sum(n_finished_intervention==1),
            completed_intervention_after_starting = n_finished_intervention/n_started_intervention) %>%
  kable(digits = 3,caption = "Intervention Funnel for Phase 1", 
        col.names = c("Treatment Arm", "Started Intervention", "Completed Intervention", "Completed/Started" )) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")  
Intervention Funnel for Phase 1
Treatment Arm Started Intervention Completed Intervention Completed/Started
Baseline 1 7441 6138 0.825
Game 7458 4667 0.626
SMS 7543 5143 0.682
Video 7525 4962 0.659

Baseline fact course

df_base_funnel <- dfphase1 %>% 
  filter(!is.na(InterArmsBase_start_time)) %>% 
  summarize(n_started_baseline_intervention = sum(!is.na(InterArmsBase_start_time)),
            n_base2_ready = sum(!is.na(base2_ready)),
            n_base3_ready = sum(!is.na(base3_ready)),
            n_base4_ready = sum(!is.na(base4_ready)),
            n_base5_ready = sum(!is.na(base5_ready)),
            n_finished_baseline_intervention = sum(!is.na(InterArmsBase_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_base_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started baseline intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Facts Baseline",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Game

df_game_funnel <- dfphase1 %>% 
  filter(!is.na(InterArmsGame_start_time)) %>% 
  summarize(n_started_game_intervention = sum(!is.na(InterArmsGame_start_time)),
            n_game_done = sum(!is.na(game_done)),
            n_finished_game_intervention = sum(!is.na(interArmsGame_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_game_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started game intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Game",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Video

df_video_funnel <- dfphase1 %>% 
  filter(!is.na(InterArms_start_time)) %>% 
  summarize(n_started_video_intervention = sum(!is.na(InterArms_start_time)),
            n_video2_ready = sum(!is.na(video2_ready)),
            n_video2_done = sum(!is.na(video2_done)),
            n_video3_ready = sum(!is.na(video3_ready)),
            n_video3_done = sum(!is.na(video3_done)),
            n_video4_ready = sum(!is.na(video4_ready)),
            n_video4_done = sum(!is.na(video4_done)),
            n_video5_done = sum(!is.na(video5_done)),
            n_finished_video_intervention = sum(!is.na(InterArmsVideo_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_video_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started video intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Video",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

SMS course

df_sms_funnel <- dfphase1 %>% 
  filter(!is.na(InterArmsSMS_start_time)) %>% 
  summarize(n_started_sms_intervention = sum(!is.na(InterArmsSMS_start_time)),
            n_sms2_ready = sum(!is.na(sms2_ready)),
            n_sms3_ready = sum(!is.na(sms3_ready)),
            n_sms4_ready = sum(!is.na(sms4_ready)),
            n_sms5_ready = sum(!is.na(sms5_ready)),
            n_finished_sms_intervention = sum(!is.na(InterArmsSMS_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_sms_funnel,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started sms intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel SMS",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Check completion times

# payment end time is almost always missing, so we don't analyze payment time
# misinfo chat duration is always 0, so we don't show it
df_duration1 <- dfphase1 %>% 
  mutate(baseline_duration = InterArmsBase_end_time - InterArmsBase_start_time,
         video_duration = InterArmsVideo_end_time - InterArms_start_time,
         game_duration = interArmsGame_end_time - InterArmsGame_start_time,
         sms_duration = InterArmsSMS_end_time - InterArmsSMS_start_time,
         misinfo_quiz_duration = misinfoQuiz_end_time - MisinfoQuiz_start_time) %>%
  select(contains("duration")) %>% 
  pivot_longer(cols = everything(),
               names_pattern = "(.*)_duration",
               names_to = "stage",
               values_to = "time") %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>%
  filter(!is.na(time))
# strangely, there are some negative durations, we filter those
df_duration1 <- df_duration1 %>%
  filter(time >= 0)
ggplot(df_duration1,
       mapping = aes(x = time, group = stage, color = stage)) +
  geom_histogram() +
  facet_wrap(~ stage, scales = "free") +
  labs(x = "time in minutes")

ggplot(df_duration1,
       mapping = aes(y = time, x = stage, color = stage)) +
  geom_boxplot() +
  # restrict scale to focus on observations that are not outliers
  scale_y_continuous(limits = c(0, 150)) +
  labs(y = "time in minutes")

df_duration1 %>%
  group_by(stage) %>% 
  summarize(median_duration = median(time),
            avg_duration = mean(time),
            sd_duration = sd(time),
            min_duration = min(time),
            max_duration = max(time)) %>%
  kable(digits = 3,caption = "Phase 1: Duration of Interventions", 
        col.names = c("Stage", "Median", "Average", "SD", "Min","Max")) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")  
Phase 1: Duration of Interventions
Stage Median Average SD Min Max
baseline 10.783 mins 175.426 mins 2036.561 4.617 mins 68946.00 mins
video 15.500 mins 337.589 mins 2889.449 9.000 mins 69023.57 mins
game 17.383 mins 225.703 mins 1824.336 5.767 mins 48474.42 mins
sms 17.467 mins 254.841 mins 2452.954 8.150 mins 64620.12 mins
misinfo_quiz 6.817 mins 177.845 mins 2295.861 1.517 mins 73517.35 mins

Phase 2

  • Only participants who consented are considered.

  • Zooming into the dropout during the intervention. Dropout is higher for long baseline and SMS. The differences in dropout rates are much lower in Phase 2 than Phase 1.

dfphase2 <- df %>%
  filter(phase_coded == "Phase 2") %>%
  filter(consent_binary_num == 1) %>%
  filter(arm_coded == "Short Baseline 2" | arm_coded == "SMS" | arm_coded == "Long Baseline" | is.na("arm_coded")) 

dfphase2 %>%
  mutate(n_started_intervention = if_else(!is.na(InterArmsBase_start_time) | !is.na(InterArmsLongBase_start_time) | !is.na(InterArmsSMS_start_time), 1, 0),
         n_finished_intervention = if_else(!is.na(InterArmsBase_end_time) | !is.na(InterArmsLongBase_end_time) | !is.na(InterArmsSMS_end_time), 1, 0)) %>%
  filter(!is.na(arm)) %>% 
  group_by(arm_coded) %>%
  summarize(n_started_intervention = sum(n_started_intervention==1),
            n_finished_intervention = sum(n_finished_intervention==1),
            completed_intervention_after_starting = n_finished_intervention/n_started_intervention) %>%
  kable(digits = 3,caption = "Intervention Funnel for Phase 2", 
        col.names = c("Treatment Arm", "Started Intervention", "Completed Intervention", "Completed/Started" )) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")  
Intervention Funnel for Phase 2
Treatment Arm Started Intervention Completed Intervention Completed/Started
Long Baseline 14024 10740 0.766
SMS 20077 15497 0.772
Short Baseline 2 6185 5148 0.832

Short Baseline

df_base_funnel2 <- dfphase2 %>% 
  filter(!is.na(InterArmsBase_start_time)) %>% 
  summarize(n_started_baseline_intervention = sum(!is.na(InterArmsBase_start_time)),
            n_base2_ready = sum(!is.na(base2_ready)),
            n_base3_ready = sum(!is.na(base3_ready)),
            n_base4_ready = sum(!is.na(base4_ready)),
            n_base5_ready = sum(!is.na(base5_ready)),
            n_finished_baseline_intervention = sum(!is.na(InterArmsBase_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_base_funnel2,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started baseline intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel Facts Baseline",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

SMS course

df_sms_funnel2 <- dfphase2 %>% 
  filter(!is.na(InterArmsSMS_start_time)) %>% 
  summarize(n_started_sms_intervention = sum(!is.na(InterArmsSMS_start_time)),
            n_sms2_ready = sum(!is.na(sms2_ready)),
            n_sms3_ready = sum(!is.na(sms3_ready)),
            n_sms4_ready = sum(!is.na(sms4_ready)),
            n_sms5_ready = sum(!is.na(sms5_ready)),
            n_finished_sms_intervention = sum(!is.na(InterArmsSMS_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_sms_funnel2,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started sms intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel SMS",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Long Baseline

df_longbase_funnel2 <- dfphase2 %>% 
  filter(!is.na(InterArmsLongBase_start_time)) %>% 
  summarize(n_started_longbase_intervention = sum(!is.na(InterArmsLongBase_start_time)),
             n_base2_ready = sum(!is.na(base2_ready)),
            n_base3_ready = sum(!is.na(base3_ready)),
            n_base4_ready = sum(!is.na(base4_ready)),
            n_base5_ready = sum(!is.na(base5_ready)),
            n_base_experts_reason = sum(!is.na(base_experts_reason)),
            n_finished_longbase_intervention = sum(!is.na(InterArmsLongBase_end_time)),
            # quiz
            n_quiz_consent = sum(!is.na(quiz_consent)),
            n_completed = sum(completed, na.rm = TRUE)) %>%
  pivot_longer(cols = everything(),
               names_pattern = "n_(.*)",
               names_to = "stage",
               values_to = "n") %>%
  mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_longbase_funnel2,
       mapping = aes(x = stage, y = n, group = NA)) +
  geom_point() +
  geom_line(alpha = 0.5) +
  geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
  geom_text(aes(label = scales::percent(n/n[stage == "Started longbase intervention"])), vjust = 2, size = 2) +
  coord_cartesian(ylim = c(0, NA)) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Chatbot Funnel SMS",
       subtitle = "Number of Participants at Different Stages",
       x = "Funnel stage",
       y = "Number of observations") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Check completion times

# payment end time is almost always missing, so we don't analyze payment time
# misinfo chat duration is always 0, so we don't show it
df_duration2 <- dfphase2 %>% 
  mutate(shortbase_duration = InterArmsBase_end_time - InterArmsBase_start_time,
         longbase_duration = InterArmsLongBase_end_time - InterArmsLongBase_start_time,
         sms_duration = InterArmsSMS_end_time - InterArmsSMS_start_time,
         misinfo_quiz_duration = misinfoQuiz_end_time - MisinfoQuiz_start_time) %>%
  select(contains("duration")) %>% 
  pivot_longer(cols = everything(),
               names_pattern = "(.*)_duration",
               names_to = "stage",
               values_to = "time") %>%
  mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>%
  filter(!is.na(time))
# strangely, there are some negative durations, we filter those
df_duration2 <- df_duration2 %>%
  filter(time >= 0)
ggplot(df_duration2,
       mapping = aes(x = time, group = stage, color = stage)) +
  geom_histogram() +
  facet_wrap(~ stage, scales = "free") +
  labs(x = "time in minutes")

ggplot(df_duration2,
       mapping = aes(y = time, x = stage, color = stage)) +
  geom_boxplot() +
  # restrict scale to focus on observations that are not outliers
  scale_y_continuous(limits = c(0, 150)) +
  labs(y = "time in minutes")

df_duration2 %>%
  group_by(stage) %>% 
  summarize(median_duration = median(time),
            avg_duration = mean(time),
            sd_duration = sd(time),
            min_duration = min(time),
            max_duration = max(time)) %>%
  kable(digits = 3,caption = "Phase 2: Duration of Interventions", 
        col.names = c("Stage", "Median", "Average", "SD", "Min","Max")) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
  scroll_box( height = "500px")  
Phase 2: Duration of Interventions
Stage Median Average SD Min Max
shortbase 10.550 mins 122.126 mins 1377.030 4.217 mins 39524.40 mins
longbase 17.533 mins 153.456 mins 1189.853 3.967 mins 41738.68 mins
sms 16.650 mins 167.512 mins 1469.652 5.683 mins 47502.13 mins
misinfo_quiz 6.150 mins 108.910 mins 1242.914 1.200 mins 44377.45 mins

Mean Survey Outcomes

Sharing discernment is defined as the difference in proportion between non-misinformation posts and misinformation posts shared. For example, if user shared 2 out of 2 non-misinformation posts (i.e. \(1\)) and 1 out of 3 misinformation posts (i.e. \(\frac{1}{3}\)), discernment takes a value of \(\frac{2}{3}\).

Reliability and Manipulation discernment are constructed in such a way that misinformation and non-misinformation are assigned an equal weight. Since each user encounters 3 misinformation posts and 2 non-misinformation posts, non-misinformation is reweighted by \(\frac{3}{2}\)


\[ Sharing Discernemnt = \frac{\textit{# Non-Misinformation posts Shared}}{3} - \frac{\textit{# Misinformation posts Shared}}{2} \]

\[ \text{Reliability Discernment:} = \begin{cases} \textit{Reliability score} \times \frac{3}{2}, & \text{if non-misinformation} \\ \textit{Reliability score} \times -1, & \text{if misinformation} \\ 0, & \text{otherwise} \end{cases} \]


\[ \text{Manipulation Discernment:} = \begin{cases} \textit{Manipulation score}\times \frac{3}{2}, & \text{if non-misinformation} \\ \textit{Manipulation score} \times -1, & \text{if misinformation} \\ 0, & \text{otherwise} \end{cases} \]

Phase 1

df_chat <- df %>%
  select(
    -matches("manipulative_coded"),
    -matches("reliable_coded"),
    -matches("manipulative_coded_num"),
    -matches("reliable_coded_num"),
    -matches("share_coded")
  ) %>%
   rename(condition = arm_coded) %>%
  filter(phase_coded=="Phase 1") 


df_chat <- df_chat %>% 
  mutate(treatments = ifelse(condition != "Baseline 1",1,0),
         completed = ifelse(!is.na(baseline_learn),baseline_learn,
                            ifelse(!is.na(game_learn),game_learn,
                                   ifelse(!is.na(sms_learn),sms_learn,
                                          ifelse(!is.na(Video_learn),Video_learn,NA))))) %>%
  mutate(completed_coded = ifelse(!is.na(completed),1,0)) %>%
  mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>%
  mutate(completed_survey = ifelse(!is.na(time_completion_survey),1,0)) %>%
  mutate(completed_coded =  ifelse(!is.na(baseline_learn),baseline_learn,
                                   ifelse(!is.na(game_learn),game_learn,
                                          ifelse(!is.na(sms_learn),sms_learn,
                                                 ifelse(!is.na(Video_learn),Video_learn,NA))))) %>% 
  mutate(completed_binary = ifelse(!is.na(completed_coded),1,0))

# 12 questions asking about manipulative or reliable content 
unique_responses_manip_reliable <- df_chat %>% 
  select(contains("_manipulative"), ends_with("reliable")) %>% 
  unlist() %>% 
  unique()

unique_responses_share <- df %>% 
  select(contains("_share")) %>% 
  unlist() %>% 
  unique()

valid_responses_5point <- c("1 - Not at all", "2", "3", "4", "5 - Very", 1, 2, 3, 4, 5, "5- Very", "5-Very", "Very5", "5 very", "1-Not at all")
valid_responses_share <- c("Yes", "No", "yes", "no", "YES")

df_chat <- df_chat %>% 
  mutate_at(.vars = vars(contains(c("_manipulative"))),
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(ends_with(c("_reliable"))), # ends_with to exclude manipulative_reliable_order
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(contains(c("_share"))),
            .funs = funs(case_when(. %in% valid_responses_share ~ .,
                                   TRUE ~ NA_character_)))

df_particpants_posts <- df_chat %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_outcomes <- df_chat %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>% 
  pivot_wider(names_from = variable,
              values_from = value)

df_chat_long <- left_join(df_outcomes %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_chat_long <- df_chat_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))
# Table 2
#Table with columns ("# of Obs", "Misinfo Sharing", "Non-misinfo Sharing", "Discernment", "Misinfo Reliability", "Non-misinfo Reliability", "Discernment", "Misinfo Manip", "Non-misinfo Manip", "Discernment") that report the mean of survey outcome for completers above and the standard error in parentheses, and rows ("All", "Placebo", "Treatments", "Text Course", "Videos", "Game")

# restrict to completers
df_completed <- df_chat %>% 
  filter(completed_binary == 1 & consent == "I consent, start now")


df_completed_long <- df_completed %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>%
 pivot_wider(names_from = variable,
              values_from = value) 

df_particpants_posts <- df_completed %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_completed_long <- left_join(df_completed_long %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_completed_long <- df_completed_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))
# shares can be codified to include other responses e.g. "Yeah" or "No I will not share"

# Create all and substitute placebo 
df_completed_long$condition[df_completed_long$condition == "Baseline 1"] <- "placebo"
df_completed_long$treatments <- ifelse(df_completed_long$condition != "placebo",1,0)


# Post contains misinformation
df_completed_long$post_contains_misinfo <- ifelse(!grepl("^attention_check|^non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_misinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_misinfo[is.na(df_completed_long$post)] <- NA

# Post contains non-misinformation
df_completed_long$post_contains_nonmisinfo <- ifelse(grepl("non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_nonmisinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_nonmisinfo[is.na(df_completed_long$post)] <- NA


# Equal weighting throughout
# number of misinformation posts user encountered
completed_coded <- df_completed_long %>%
  group_by(anon_id,post_contains_misinfo,post_contains_nonmisinfo) %>%
  summarise(posts_misinfo = ifelse(post_contains_misinfo==1,n(),NA),
            posts_nonmisinfo = ifelse(post_contains_nonmisinfo==1,n(),NA)) %>%
  ungroup() %>%
  select(anon_id,posts_misinfo,posts_nonmisinfo) %>%
  distinct() %>%
  filter(!is.na(posts_misinfo) | !is.na(posts_nonmisinfo))

# Discernment: sharing
# every user sees 3 misinfo posts and 2 non-misinfo
df_completed_long$disc_sharing = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1 & df_completed_long$share==1,1*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1 & df_completed_long$share==1,-1,0))

# Discernment: reliability
df_completed_long$disc_reliability = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,df_completed_long$reliable*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1,-1*df_completed_long$reliable,0))

# Discernment: manipulation
df_completed_long$disc_manipulation = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,-1*df_completed_long$manipulative,
         ifelse(df_completed_long$post_contains_misinfo==1,df_completed_long$manipulative*(3/2),0))


# "Misinfo Sharing"
misinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_misinfo_sharing <- misinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- misinfo_sharing_ind %>%
  summarise(condition = "All",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_sharing_ind[misinfo_sharing_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(misinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- misinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))
table2_misinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_misinfo_sharing)

# "Non-misinfo Sharing"
nonmisinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_nonmisinfo_sharing <- nonmisinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_sharing_ind[nonmisinfo_sharing_ind$condition %in%c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

table2_nonmisinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_sharing)

# "Discernment"

# Discernment defined as Prop. Non-Misinfo Shared - Prop. Misinfo shared (i.e., equal weighting).
sharing_ind <- df_completed_long %>%
  select(anon_id, condition, treatments, post_contains_misinfo, post_contains_nonmisinfo, share, post) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(
    posts_misinfo_share = sum(share == 1 & post_contains_misinfo == 1, na.rm = TRUE)/3,
    posts_nonmisinfo_share = sum(share == 1 & post_contains_nonmisinfo == 1, na.rm = TRUE)/2) %>%
  ungroup()

sharing_ind$disc_diff_prop = sharing_ind$posts_nonmisinfo_share - sharing_ind$posts_misinfo_share

sharing_disc_ind <- sharing_ind %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_diff_prop, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_diff_prop, na.rm = TRUE))

table2_disc_sharing <- aggregate(disc_rate~condition,sharing_disc_ind, mean)
names(table2_disc_sharing) <- c("condition","mean")
table2_disc_sharing$nobs <- c(seq(nrow(sharing_disc_ind),length(sharing_disc_ind$condition)))
table2_disc_sharing$se <- aggregate(disc_rate~condition,sharing_disc_ind, se_cont)[,2]

# all 
aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(sharing_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(sharing_disc_ind[sharing_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(sharing_disc_ind$disc_rate, na.rm = TRUE))

# treatments 
aux_treatments <- sharing_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE), 
            nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_sharing <- bind_rows(aux_all,aux_treatments,table2_disc_sharing)


# "Misinfo Reliability"
misinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_misinfo_reliability <- misinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_reliability_ind[misinfo_reliability_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(misinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- misinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_misinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_misinfo_reliability)

# "NonMisinfo Reliability"

nonmisinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_nonmisinfo_reliability <- nonmisinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_reliability_ind[nonmisinfo_reliability_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_nonmisinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_reliability)

# "Discernment"
reliability_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_reliability, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_reliability, na.rm = TRUE))

table2_disc_reliability <- reliability_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(reliability_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(reliability_disc_ind[reliability_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(reliability_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- reliability_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_reliability <- bind_rows(aux_all,aux_treatments,table2_disc_reliability)

# "Misinfo Manipulation"
misinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_misinfo_manipulation <- misinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_manipulation_ind[misinfo_manipulation_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- misinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_misinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_misinfo_manipulation)

# "NonMisinfo Manipulation"

nonmisinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_nonmisinfo_manipulation <- nonmisinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_manipulation_ind[nonmisinfo_manipulation_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_nonmisinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_manipulation)


# "Discernment"
manipulation_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_manipulation, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_manipulation, na.rm = TRUE))

table2_disc_manipulation <- manipulation_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(manipulation_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(manipulation_disc_ind[manipulation_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(manipulation_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- manipulation_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_manipulation <- bind_rows(aux_all,aux_treatments,table2_disc_manipulation)



# Final table
table2 <- data.frame("condition" = table2_misinfo_sharing$condition,
                     "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$mean,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$mean,
                     "Discernment Sharing" = table2_disc_sharing$mean,
                     "Misinfo Reliability" = table2_misinfo_reliability$mean,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$mean,
                     "Discernment Reliability" = table2_disc_reliability$mean,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$mean,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$mean,
                     "Discernment Manipulation" = table2_disc_manipulation$mean)
table2_se <- data.frame("condition" = table2_misinfo_sharing$condition,
                      "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$se,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$se,
                     "Discernment Sharing" = table2_disc_sharing$se,
                     "Misinfo Reliability" = table2_misinfo_reliability$se,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$se,
                     "Discernment Reliability" = table2_disc_reliability$se,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$se,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$se,
                     "Discernment Manipulation" = table2_disc_manipulation$se)


# add parentheses to s.e.
#order_vector <- c("20,689", " 6,078", "14,611", " 5,083", " 4,904"," 4,624") 
#table2_se <- table2_se[order(match(table2_se$X..of.Obs, order_vector)), ]
#table2_se$order <- seq(nrow(table2_se))
table2_se <- table2_se %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("SMS",condition), 4,
                  ifelse(grepl("Video",condition), 5,
                  ifelse(grepl("Game",condition), 6,NA)))))))
table2_se$priority <- 2
table2_se$X..of.Obs <- NA
table2_se[,1:11] <- data.frame(lapply(table2_se[,1:11], function(col) sapply(col, add_parentheses)))

# format table 2
table2 <- data.frame(lapply(table2, function(x) format(x, big.mark = ",", digits = 3)))
#table2 <- table2[order(match(table2$X..of.Obs, order_vector)), ]
#table2$order <- seq(nrow(table2))
table2 <- table2 %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("SMS",condition), 4,
                  ifelse(grepl("Video",condition), 5,
                  ifelse(grepl("Game",condition), 6,NA)))))))
table2$priority <- 1


# bind rows
names(table2) <- c("condition","# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation","order","priority")
names(table2_se) <- names(table2)
table <- rbind(table2,table2_se)
table <- table[order(table$order, table$priority), ]
table <- table %>% select(-c(order,priority))
table <- table %>% select(-c(condition))
# remove NA
table <- data.frame(lapply(table, function(x) {
  x <- as.character(x)
  x[is.na(x)] <- ""
  gsub("NA", "", x)
}))

names(table) <- names(table2)[2:11]
rownames(table) <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B","Videos","\u200B\u200B\u200B\u200B\u200B","Game","\u200B\u200B\u200B\u200B\u200B\u200B")

kable(table,format = "markdown", digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%", height = "500px")
# of Obs Misinfo Sharing Non-misinfo Sharing Discernment Sharing Misinfo Reliability Non-misinfo Reliability Discernment Reliability Misinfo Manipulation Non-misinfo Manipulation Discernment Manipulation
All 20,412 0.604 0.750 0.1378 2.64 3.05 0.244 2.86 2.48 1.56
(0.0027) (0.0025) (0.0028) (0.0092) (0.0100) (0.0082) (0.0095) (0.0098) (0.0094)
Placebo 5,997 0.640 0.764 0.1300 2.81 3.17 0.218 2.86 2.55 1.54
​​ (0.0049) (0.0045) (0.0054) (0.0174) (0.0185) (0.0154) (0.0174) (0.0185) (0.0176)
Treatments 14,415 0.589 0.744 0.1464 2.58 3.00 0.255 2.86 2.45 1.57
​​​ (0.0033) (0.0030) (0.0033) (0.0108) (0.0118) (0.0098) (0.0113) (0.0115) (0.0111)
Text Course 5,009 0.381 0.640 0.1172 2.21 2.87 0.397 3.06 2.48 1.74
​​​​ (0.0052) (0.0055) (0.0050) (0.0165) (0.0195) (0.0160) (0.0192) (0.0194) (0.0192)
Videos 4,855 0.671 0.806 0.2443 2.76 3.16 0.244 2.80 2.43 1.54
​​​​​ (0.0053) (0.0047) (0.0059) (0.0189) (0.0204) (0.0161) (0.0191) (0.0199) (0.0179)
Game 4,551 0.736 0.794 0.0559 2.79 2.98 0.107 2.68 2.44 1.41
​​​​​​ (0.0050) (0.0050) (0.0055) (0.0198) (0.0213) (0.0185) (0.0196) (0.0206) (0.0202)
# make it tex code
table$col_1 <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B","Videos","\u200B\u200B\u200B\u200B\u200B","Game","\u200B\u200B\u200B\u200B\u200B\u200B")
table <- table[, c("col_1", setdiff(names(table), "col_1"))]
latex_code <- paste0(apply(table, 1, function(row) paste(row, collapse = " & ")), " \\\\\n")
latex_code <- c(paste("","\\# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation \\\\ \\hline ", sep = " & "),latex_code)
latex_code <- gsub("\\[\\d+\\]", "", latex_code)
# save .tex
write.table(latex_code, file = "table2.tex", sep = " & ", quote = FALSE, row.names = FALSE, col.names = FALSE)

Phase 2

df_chat <- df %>%
  select(
    -matches("manipulative_coded"),
    -matches("reliable_coded"),
    -matches("manipulative_coded_num"),
    -matches("reliable_coded_num"),
    -matches("share_coded")
  ) %>%
   rename(condition = arm_coded) %>%
  filter(phase_coded=="Phase 2") 


df_chat <- df_chat %>% 
  mutate(treatments = ifelse(condition != "Short Baseline 2",1,0),
         completed = ifelse(!is.na(baseline_learn),baseline_learn,
                                   ifelse(!is.na(sms_learn),sms_learn,
                                          NA))) %>%
  mutate(completed_coded = ifelse(!is.na(completed),1,0)) %>%
  mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>%
  mutate(completed_survey = ifelse(!is.na(time_completion_survey),1,0)) %>%
  mutate(completed_coded =  ifelse(!is.na(baseline_learn),baseline_learn,
                                          ifelse(!is.na(sms_learn),sms_learn,
                                                 NA))) %>% 
  mutate(completed_binary = ifelse(!is.na(completed_coded),1,0))

# 12 questions asking about manipulative or reliable content 
unique_responses_manip_reliable <- df_chat %>% 
  select(contains("_manipulative"), ends_with("reliable")) %>% 
  unlist() %>% 
  unique()

unique_responses_share <- df %>% 
  select(contains("_share")) %>% 
  unlist() %>% 
  unique()

valid_responses_5point <- c("1 - Not at all", "2", "3", "4", "5 - Very", 1, 2, 3, 4, 5, "5- Very", "5-Very", "Very5", "5 very", "1-Not at all")
valid_responses_share <- c("Yes", "No", "yes", "no", "YES")

df_chat <- df_chat %>% 
  mutate_at(.vars = vars(contains(c("_manipulative"))),
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(ends_with(c("_reliable"))), # ends_with to exclude manipulative_reliable_order
            .funs = funs(case_when(. %in% valid_responses_5point ~ .,
                                   TRUE ~ NA_character_))) %>% 
  mutate_at(.vars = vars(contains(c("_share"))),
            .funs = funs(case_when(. %in% valid_responses_share ~ .,
                                   TRUE ~ NA_character_)))

df_particpants_posts <- df_chat %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_outcomes <- df_chat %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>% 
  pivot_wider(names_from = variable,
              values_from = value)

df_chat_long <- left_join(df_outcomes %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_chat_long <- df_chat_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))
# Table 2
#Table with columns ("# of Obs", "Misinfo Sharing", "Non-misinfo Sharing", "Discernment", "Misinfo Reliability", "Non-misinfo Reliability", "Discernment", "Misinfo Manip", "Non-misinfo Manip", "Discernment") that report the mean of survey outcome for completers above and the standard error in parentheses, and rows ("All", "Placebo", "Treatments", "Text Course", "Videos", "Game")

# restrict to completers
df_completed <- df_chat %>% 
  filter(completed_binary == 1 & consent == "I consent, start now")


df_completed_long <- df_completed %>% 
  pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
               names_to = c("post_id", "variable"),
               names_pattern = "post_(.*)_(.*)") %>%
 pivot_wider(names_from = variable,
              values_from = value) 

df_particpants_posts <- df_completed %>% 
  pivot_longer(cols = contains("post_id"),
               names_pattern = "post_id_(.*)",
               names_to = "post_id",
               values_to = "post") %>% 
  select(anon_id, post_id, post)

df_completed_long <- left_join(df_completed_long %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))


# recode outcomes to numeric
df_completed_long <- df_completed_long %>% 
  mutate_at(.vars = vars(manipulative, reliable),
            .funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
                                   . == "5 - Very" | . == "5- Very" | . == "5-Very" | . ==  "Very5" | . == "5 very" ~ 5,
                                   TRUE ~ as.numeric(.)))) %>% 
  mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1, 
                           share == "No" | share == "no" ~ 0))
# shares can be codified to include other responses e.g. "Yeah" or "No I will not share"

# Create all and substitute placebo 
df_completed_long$condition[df_completed_long$condition == "Short Baseline 2"] <- "placebo"
df_completed_long$treatments <- ifelse(df_completed_long$condition != "placebo",1,0)


# Post contains misinformation
df_completed_long$post_contains_misinfo <- ifelse(!grepl("^attention_check|^non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_misinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_misinfo[is.na(df_completed_long$post)] <- NA

# Post contains non-misinformation
df_completed_long$post_contains_nonmisinfo <- ifelse(grepl("non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_nonmisinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_nonmisinfo[is.na(df_completed_long$post)] <- NA


# Equal weighting throughout
# number of misinformation posts user encountered
completed_coded <- df_completed_long %>%
  group_by(anon_id,post_contains_misinfo,post_contains_nonmisinfo) %>%
  summarise(posts_misinfo = ifelse(post_contains_misinfo==1,n(),NA),
            posts_nonmisinfo = ifelse(post_contains_nonmisinfo==1,n(),NA)) %>%
  ungroup() %>%
  select(anon_id,posts_misinfo,posts_nonmisinfo) %>%
  distinct() %>%
  filter(!is.na(posts_misinfo) | !is.na(posts_nonmisinfo))

# Discernment: sharing
# every user sees 3 misinfo posts and 2 non-misinfo
df_completed_long$disc_sharing = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1 & df_completed_long$share==1,1*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1 & df_completed_long$share==1,-1,0))

# Discernment: reliability
df_completed_long$disc_reliability = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,df_completed_long$reliable*(3/2),
         ifelse(df_completed_long$post_contains_misinfo==1,-1*df_completed_long$reliable,0))

# Discernment: manipulation
df_completed_long$disc_manipulation = 
  ifelse(df_completed_long$post_contains_nonmisinfo==1,-1*df_completed_long$manipulative,
         ifelse(df_completed_long$post_contains_misinfo==1,df_completed_long$manipulative*(3/2),0))


# "Misinfo Sharing"
misinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_misinfo_sharing <- misinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- misinfo_sharing_ind %>%
  summarise(condition = "All",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_sharing_ind[misinfo_sharing_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(misinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- misinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))
table2_misinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_misinfo_sharing)

# "Non-misinfo Sharing"
nonmisinfo_sharing_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(), 
            se = se_binary(share, na.rm = TRUE))

table2_nonmisinfo_sharing <- nonmisinfo_sharing_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(share_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_sharing_ind[nonmisinfo_sharing_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_sharing_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(share_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(share_rate, na.rm = TRUE))

table2_nonmisinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_sharing)

# "Discernment"

# Discernment defined as Prop. Non-Misinfo Shared - Prop. Misinfo shared (i.e., equal weighting).
sharing_ind <- df_completed_long %>%
  select(anon_id, condition, treatments, post_contains_misinfo, post_contains_nonmisinfo, share, post) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(
    posts_misinfo_share = sum(share == 1 & post_contains_misinfo == 1, na.rm = TRUE)/3,
    posts_nonmisinfo_share = sum(share == 1 & post_contains_nonmisinfo == 1, na.rm = TRUE)/2) %>%
  ungroup()

sharing_ind$disc_diff_prop = sharing_ind$posts_nonmisinfo_share - sharing_ind$posts_misinfo_share

sharing_disc_ind <- sharing_ind %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_diff_prop, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_diff_prop, na.rm = TRUE))

table2_disc_sharing <- aggregate(disc_rate~condition,sharing_disc_ind, mean)
names(table2_disc_sharing) <- c("condition","mean")
table2_disc_sharing$nobs <- c(seq(nrow(sharing_disc_ind),length(sharing_disc_ind$condition)))
table2_disc_sharing$se <- aggregate(disc_rate~condition,sharing_disc_ind, se_cont)[,2]

# all 
aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(sharing_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(sharing_disc_ind[sharing_disc_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(sharing_disc_ind$disc_rate, na.rm = TRUE))

# treatments 
aux_treatments <- sharing_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE), 
            nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_sharing <- bind_rows(aux_all,aux_treatments,table2_disc_sharing)


# "Misinfo Reliability"
misinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_misinfo_reliability <- misinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_reliability_ind[misinfo_reliability_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(misinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- misinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_misinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_misinfo_reliability)

# "NonMisinfo Reliability"

nonmisinfo_reliability_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable, na.rm = TRUE))

table2_nonmisinfo_reliability <- nonmisinfo_reliability_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(reliable_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_reliability_ind[nonmisinfo_reliability_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_reliability_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(reliable_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(reliable_rate, na.rm = TRUE))
table2_nonmisinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_reliability)

# "Discernment"
reliability_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_reliability, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_reliability, na.rm = TRUE))

table2_disc_reliability <- reliability_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(reliability_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(reliability_disc_ind[reliability_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
                      "se" = se_cont(reliability_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- reliability_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))

table2_disc_reliability <- bind_rows(aux_all,aux_treatments,table2_disc_reliability)

# "Misinfo Manipulation"
misinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_misinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_misinfo_manipulation <- misinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(misinfo_manipulation_ind[misinfo_manipulation_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- misinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_misinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_misinfo_manipulation)

# "NonMisinfo Manipulation"

nonmisinfo_manipulation_ind <- df_completed_long %>%
  filter(post_contains_nonmisinfo == 1) %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulative, na.rm = TRUE))

table2_nonmisinfo_manipulation <- nonmisinfo_manipulation_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
                      "nobs" = nrow(nonmisinfo_manipulation_ind[nonmisinfo_manipulation_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))

aux_treatments <- nonmisinfo_manipulation_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(manipulation_rate, na.rm = TRUE),
            nobs = n(),
            se = se_cont(manipulation_rate, na.rm = TRUE))
table2_nonmisinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_manipulation)


# "Discernment"
manipulation_disc_ind <- df_completed_long %>%
  group_by(anon_id, condition, treatments) %>%
  summarise(disc_rate = mean(disc_manipulation, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_manipulation, na.rm = TRUE))

table2_disc_manipulation <- manipulation_disc_ind %>%
  group_by(condition) %>%
  summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(), 
            se = se_cont(disc_rate, na.rm = TRUE)) %>%
  filter(!is.na(condition))

aux_all <- data.frame("condition" = "All", 
                      "mean" = mean(manipulation_disc_ind$disc_rate, na.rm = TRUE),
                      "nobs" = nrow(manipulation_disc_ind[manipulation_disc_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
                      "se" = se_cont(manipulation_disc_ind$disc_rate, na.rm = TRUE))

aux_treatments <- manipulation_disc_ind %>% 
  filter(treatments == 1) %>%
  group_by(treatments) %>%
  summarise(condition = "Treatments",
            mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
            se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_manipulation <- bind_rows(aux_all,aux_treatments,table2_disc_manipulation)



# Final table
table2 <- data.frame("condition" = table2_misinfo_sharing$condition,
                     "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$mean,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$mean,
                     "Discernment Sharing" = table2_disc_sharing$mean,
                     "Misinfo Reliability" = table2_misinfo_reliability$mean,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$mean,
                     "Discernment Reliability" = table2_disc_reliability$mean,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$mean,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$mean,
                     "Discernment Manipulation" = table2_disc_manipulation$mean)
table2_se <- data.frame("condition" = table2_misinfo_sharing$condition,
                      "# of Obs" = table2_misinfo_sharing$nobs,
                     "Misinfo Sharing" = table2_misinfo_sharing$se,
                     "Non-misinfo Sharing" = table2_nonmisinfo_sharing$se,
                     "Discernment Sharing" = table2_disc_sharing$se,
                     "Misinfo Reliability" = table2_misinfo_reliability$se,
                     "Non-misinfo Reliability" = table2_nonmisinfo_reliability$se,
                     "Discernment Reliability" = table2_disc_reliability$se,
                     "Misinfo Manipulation" = table2_misinfo_manipulation$se,
                     "Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$se,
                     "Discernment Manipulation" = table2_disc_manipulation$se)


# add parentheses to s.e.
#order_vector <- c("20,689", " 6,078", "14,611", " 5,083", " 4,904"," 4,624") 
#table2_se <- table2_se[order(match(table2_se$X..of.Obs, order_vector)), ]
#table2_se$order <- seq(nrow(table2_se))
table2_se <- table2_se %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("Long Baseline",condition), 4,
                  ifelse(grepl("SMS",condition), 5,
                  NA))))))
table2_se$priority <- 2
table2_se$X..of.Obs <- NA
table2_se[,1:11] <- data.frame(lapply(table2_se[,1:11], function(col) sapply(col, add_parentheses)))

# format table 2
table2 <- data.frame(lapply(table2, function(x) format(x, big.mark = ",", digits = 3)))
#table2 <- table2[order(match(table2$X..of.Obs, order_vector)), ]
#table2$order <- seq(nrow(table2))
table2 <- table2 %>% mutate(order = ifelse(grepl("All",condition), 1,
                  ifelse(grepl("placebo",condition), 2,
                  ifelse(grepl("Treatments",condition), 3,
                  ifelse(grepl("Long Baseline",condition), 4,
                  ifelse(grepl("SMS",condition), 5,
                  NA))))))
table2$priority <- 1


# bind rows
names(table2) <- c("condition","# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation","order","priority")
names(table2_se) <- names(table2)
table <- rbind(table2,table2_se)
table <- table[order(table$order, table$priority), ]
table <- table %>% select(-c(order,priority))
table <- table %>% select(-c(condition))
# remove NA
table <- data.frame(lapply(table, function(x) {
  x <- as.character(x)
  x[is.na(x)] <- ""
  gsub("NA", "", x)
}))

names(table) <- names(table2)[2:11]
rownames(table) <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Long Baseline","\u200B\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B\u200B\u200B")

kable(table,format = "markdown", digits = 3) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%", height = "500px")
# of Obs Misinfo Sharing Non-misinfo Sharing Discernment Sharing Misinfo Reliability Non-misinfo Reliability Discernment Reliability Misinfo Manipulation Non-misinfo Manipulation Discernment Manipulation
All 30,974 0.516 0.707 0.183 2.54 3.06 0.306 3.03 2.55 1.70
(0.0022) (0.0021) (0.0023) (0.0072) (0.0078) (0.0062) (0.0075) (0.0078) (0.0072)
Placebo 5,072 0.637 0.759 0.246 2.77 3.12 0.198 2.91 2.53 1.60
​​ (0.0053) (0.0050) (0.0034) (0.0184) (0.0200) (0.0162) (0.0188) (0.0195) (0.0184)
Treatments 25,902 0.493 0.697 0.197 2.50 3.04 0.327 3.06 2.56 1.72
​​​ (0.0024) (0.0023) (0.0026) (0.0078) (0.0085) (0.0066) (0.0082) (0.0085) (0.0078)
Long Baseline 10,579 0.639 0.771 0.127 2.86 3.25 0.243 2.97 2.60 1.61
​​​​ (0.0037) (0.0033) (0.0037) (0.0127) (0.0134) (0.0111) (0.0129) (0.0136) (0.0123)
Text Course 15,323 0.392 0.646 0.114 2.26 2.90 0.384 3.12 2.52 1.79
​​​​​​ (0.0030) (0.0031) (0.0054) (0.0093) (0.0108) (0.0082) (0.0106) (0.0108) (0.0101)
# make it tex code
table$col_1 <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Long Baseline","\u200B\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B\u200B\u200B")
table <- table[, c("col_1", setdiff(names(table), "col_1"))]
latex_code <- paste0(apply(table, 1, function(row) paste(row, collapse = " & ")), " \\\\\n")
latex_code <- c(paste("","\\# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation \\\\ \\hline ", sep = " & "),latex_code)
latex_code <- gsub("\\[\\d+\\]", "", latex_code)
#save.tex
write.table(latex_code, file = "table2.tex", sep = " & ", quote = FALSE, row.names = FALSE, col.names = FALSE)