The goal of this memo is to provide an analysis for both Phase 1 and Phase 2 of the FB-Misinformation experiment. This memo provides a documentation of the data cleaning and analysis process.
The experiment was conducted in two separate phases (Phase 1 and Phase 2). Phase 1 respondents are assigned to one of four groups: (1) Bad News game, (2) Inoculation Science video, (3) Emotions text message course, and (4) Baseline 1. Phase 2 respondents are assigned to one of three groups: (1) Emotions text message course, (2) Long Baseline and (3) Short Baseline 2. Payment amount in Phase 1 was a fixed amount while payment amount in Phase 2 was randomly assigned to be either low, medium or high amount.
The chatbot will start by requesting consent, then randomize participants into one of the intervention groups. The placebo group will be shown a text message course adapted to the chatbot format, with timers that allow sufficient reading time in between messages, that gives users facts about misinformation, but contains no educational content on identifying or avoiding sharing misinformation. We will observe treatment compliance for the placebo group at five points in the intervention, based on responses to quiz questions that are part of the intervention, as well as with a question at the end that asks about what they learned.
The Bad News game group will be provided a link to a copy of the online game. Meta pixels on most buttons on the game website will allow us to observe a participant’s progress in the game. Note that we will not observe pixel activity for the subset of iOS users who have opted out of tracking (approximately 85% of iOS users); however, market penetration for Apple phones ranges from just 4% in Kenya to 23% in Ghana.} We will also ask a compliance question when a participant returns to the chatbot about how many followers they gained (the key metric in the game), as well as a question about what they learned.
The Inoculation Science video group will be shown the videos embedded in the chatbot, with timers that will prevent the participant from continuing the study before sufficient time has passed to watch the video. Participants must engage with the chatbot in order to proceed to each of the five videos. We will also observe treatment compliance through a question after the last video about what they learned.
Lastly, the Emotions text message course group will be shown the text message course adapted to the chatbot format, with timers that allow sufficient reading time in between messages. As in the placebo group’s text message course, we will observe treatment compliance for the Emotions course group at five points in the intervention, based on responses to quiz questions that are part of the intervention, as well as with a question at the end that asks about what they learned.
In Phase 2, the Long Baseline group will be shown a longer version of the Short Baseline.
After participants are exposed to their assigned intervention, they will be shown six posts and asked to respond to three questions for each post:
-Would you share this post? (yes/no)
-How manipulative do you find this post? (5-point Likert scale; “Not at all” to “Very”)
-How reliable do you find this post? (5-point Likert scale; “Not at all” to “Very”)
The sharing question will always come first, but the order of the manipulativeness and reliability questions will be randomized at the user-level. The six posts will have the following composition:
-2 non-misinformation posts
-1 misinformation post using the “false dichotomies” technique
-1 misinformation post using the “discrediting technique”
-1 misinformation post using the “emotional language” technique
-1 attention check
Each post will be randomly drawn from a set of (at least two) posts of that type, with the order of the six posts randomized.
Approximately 22 hours after a participant completes the chatbot (to allow enough time for payment to be made and received), we will send a message within the 24-hour window allowed by Facebook to recontact a user who has engaged with our chat to ask if the participant would like to be notified when additional paid surveys are available. Six weeks after the participant completes the chatbot, we will notify these users that they can participate in another study for $1 (in local currency) of mobile airtime. In addition, we will send sponsored messages (paid messages sent through Facebook Messenger) to participants who did not opt-in for notifications with the same recruitment message.
As in the first survey, the follow-up survey in the chatbot will start by requesting consent. Then, participants will be shown six posts and asked to respond to three questions for each post, using the same structure. The six posts will be drawn from the same set of posts as the posts in the first chatbot, without replacement for the individual participant (i.e., no participant will see the same post in both the first and follow-up chatbots). The user-level randomization for the ordering of the manipulativeness and reliability questions will be preserved in the follow-up survey.
The raw data for was downloaded from Chatfuel and undergone an
anonymization process before being stored in Sherlock. The anynomized
data was then downloaded from Sherlock and stored in the
data
folder. The cleaning process involved dropping
observations who are not in Phase 1 or Phase 2 and dropping actual
repeaters. We identified actual repeaters by looking at whether they
have a timestamp in more than one treatment arms and dropped them.
Attempted repeaters are not actual repeaters, and these respondents were
turned away from the chatbot when they tried to enter the chatbot for
the second time after completing the first time. Since they did not
repeat the chatbot, we don’t drop them from our dataset. We also checked
to make sure all observations are unique by comparing the number of
unique id with the number of observations in the dataset.
The data cleaning script is located in FB_Misinfo_cleaning
The descriptive script is located in FB_Misinfo_descriptive
library(tidyverse)
library(here)
library(RColorBrewer)
library(janitor)
library(psych)
library(ggtext)
library(knitr)
library(kableExtra)
library(forcats)
library(gtools)
library(ggrepel)
library(DT)
#library(papeR)
library(compareGroups)
library(ggcorrplot)
library(ggplot2)
library(cobalt)
library(ggthemes)
library(nnet)
library(data.table)
library(broom)
library(kableExtra)
library(texreg)
library(patchwork)
library(readr)
library(lubridate)
library(boot)
library(wrappedtools)
library(stargazer)
library(fixest)
library(pander)
# Functions
mean_barplot <- function(data, y, ylab, limits = c(0, 1)) {
if (limits == c(0, 1)) { labels = scales::percent } else labels = { waiver() }
ggplot(data %>%
filter(!is.na(!!sym(y))),
aes(x = condition, y = !!sym(y), fill = condition)) +
geom_bar(stat = "summary", fun = "mean") +
stat_summary(fun.data = mean_se, geom = "errorbar", fun.args = list(mult = 1), width = 0.1) +
coord_cartesian(ylim = limits, expand = FALSE) +
scale_y_continuous(labels = labels) +
labs(y = ylab,
x = "Condition") +
theme(legend.position = "none")
}
# Computes standard errors of a given continuous variable or one-hot binary variable, respectively #
se_cont = function(x, na.rm=FALSE) {
if (na.rm) x <- na.omit(x)
sqrt(var(x)/length(x))}
se_binary = function(x, na.rm=FALSE) {
if (na.rm) x <- na.omit(x)
sqrt(mean(x)*(1-mean(x))/length(x))}
# add parentheses
add_parentheses <- function(x) {
if (!is.na(x) && x!="" && !is.na(as.numeric(x))) {
formatted_x <- formatC(as.numeric(x), format = "f", digits = 4)
formatted_x[formatted_x != "NA"] <- paste0("(", formatted_x[formatted_x != "NA"], ")")
formatted_x
} else {
x
}
}
The anonymized dataset is stored in Github directory
~fb_misinfo_interventions/data/
The current working branch is
184-survey-data-analysis-phase-ii
df_filter <- df %>%
filter(is.na(arm_coded)==FALSE) %>%
filter(country_coded == "Ghana" | country_coded == "Nigeria" | country_coded == "South Africa" | country_coded == "Kenya")
df_filter$baseline <- ifelse(df_filter$arm_coded == "Baseline 1" | df_filter$arm_coded == "Short Baseline 2", 1, 0)
df_filter$sms <- ifelse(df_filter$arm_coded == "SMS", 1, 0)
df_filter$game <- ifelse(df_filter$arm_coded == "Game", 1, 0)
df_filter$video <- ifelse(df_filter$arm_coded == "Video", 1, 0)
df_filter$longbaseline <- ifelse(df_filter$arm_coded == "Long Baseline", 1, 0)
df_filter_phase1 <- df_filter %>%
filter(phase_coded=="Phase 1")
phase_1_a <- df_filter_phase1 %>%
filter(baseline==1 | sms ==1)
phase_1_b <- df_filter_phase1 %>%
filter(baseline==1 | game ==1)
phase_1_c <- df_filter_phase1 %>%
filter(baseline==1 | video ==1)
df_filter_phase2 <- df_filter %>%
filter(phase_coded=="Phase 2")
phase_2_a <- df_filter_phase2 %>%
filter(baseline==1 | sms ==1)
phase_2_b <- df_filter_phase2 %>%
filter(baseline==1 | longbaseline ==1)
# Compute SMD by stratum
love.plot(sms ~ country_coded + MisinfoChat_start_time,
data=filter(phase_1_a, completed=="TRUE"),
groups = "arm_coded",
binary = "std", # display standardized mean differences
thresholds = c(m = .1),
colors =c(colorblind_pal()(8)[c(2)]))+
scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
# specify theme aesthetics
theme_minimal() +
# specify text size and face
theme(strip.text = element_text(size = 12),
axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
axis.text.y = element_text(size = 8),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "none",
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
# specify labels
labs(title = "Covariate Balance",
x = 'Standardized Mean Difference',
y = "Covariate",
caption = "For observations completed in Phase 1, sms - baseline.")
# Compute SMD by stratum
love.plot(game ~ country_coded+ MisinfoChat_start_time,
data=filter(phase_1_b, completed=="TRUE"),
groups = "arm_coded",
binary = "std", # display standardized mean differences
thresholds = c(m = .1),
colors =c(colorblind_pal()(8)[c(2)]))+
scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
# specify theme aesthetics
theme_minimal() +
# specify text size and face
theme(strip.text = element_text(size = 12),
axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
axis.text.y = element_text(size = 8),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "none",
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
# specify labels
labs(title = "Covariate Balance",
x = 'Standardized Mean Difference',
y = "Covariate",
caption = "For observations completed in Phase 1, game - baseline.")
# Compute SMD by stratum
love.plot(video ~ country_coded+ MisinfoChat_start_time,
data=filter(phase_1_c, completed=="TRUE"),
groups = "arm_coded",
binary = "std", # display standardized mean differences
thresholds = c(m = .1),
colors =c(colorblind_pal()(8)[c(2)]))+
scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
# specify theme aesthetics
theme_minimal() +
# specify text size and face
theme(strip.text = element_text(size = 12),
axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
axis.text.y = element_text(size = 8),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "none",
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
# specify labels
labs(title = "Covariate Balance",
x = 'Standardized Mean Difference',
y = "Covariate",
caption = "For observations completed in Phase 1, video - baseline.")
# Compute SMD by stratum
love.plot(sms ~ country_coded + MisinfoChat_start_time,
data=filter(phase_2_a, completed=="TRUE"),
groups = "arm_coded",
binary = "std", # display standardized mean differences
thresholds = c(m = .1),
colors =c(colorblind_pal()(8)[c(2)]))+
scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
# specify theme aesthetics
theme_minimal() +
# specify text size and face
theme(strip.text = element_text(size = 12),
axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
axis.text.y = element_text(size = 8),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "none",
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
# specify labels
labs(title = "Covariate Balance",
x = 'Standardized Mean Difference',
y = "Covariate",
caption = "For observations completed in Phase 2, sms - baseline.")
# Compute SMD by stratum
love.plot(longbaseline ~ country_coded + MisinfoChat_start_time,
data=filter(phase_2_b, completed=="TRUE"),
groups = "arm_coded",
binary = "std", # display standardized mean differences
thresholds = c(m = .1),
colors =c(colorblind_pal()(8)[c(2)]))+
scale_x_continuous(breaks = seq(-0.1, 0.1, .1)) +
# specify theme aesthetics
theme_minimal() +
# specify text size and face
theme(strip.text = element_text(size = 12),
axis.text.x = element_text(size = 12, angle = 90, vjust = 0.5, hjust=1),
axis.text.y = element_text(size = 8),
axis.title = element_text(size = 12, face = "bold"),
legend.position = "none",
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)) +
# specify labels
labs(title = "Covariate Balance",
x = 'Standardized Mean Difference',
y = "Covariate",
caption = "For observations completed in Phase 2, long baseline - baseline.")
Participants who started the chatbot received a timestamp for the ‘MisinfoChat_start_time’ variable.
This table provides the location where participants dropped off from the chatbot, the number of drop off at each location, and the percentage of started dropped off.
df_dropoff <- df %>%
select(phase_coded, entry_start_time, consent_coded, completed, arm_coded, quiz_consent_coded,
entry_end_time,
MisinfoChat_start_time,
InterArms_start_time,
InterArmsVideo_end_time,
InterArmsGame_start_time,
interArmsGame_end_time,
InterArmsSMS_start_time,
InterArmsSMS_end_time,
InterArmsBase_start_time,
InterArmsBase_end_time,
InterArmsLongBase_start_time,
InterArmsLongBase_end_time,
MisinfoQuiz_start_time,
misinfoQuiz_end_time,
EndPay_start_time,
EndPay_end_time)
df_dropoff$started_coded_num <- ifelse(!is.na(df_dropoff$MisinfoChat_start_time),1,0)
df_dropoff$consent_coded_num <- ifelse(!is.na(df_dropoff$consent_coded),1,0)
df_dropoff$started_arm_num <- ifelse(!is.na(df_dropoff$InterArms_start_time) | !is.na(df_dropoff$InterArmsSMS_start_time) | !is.na(df_dropoff$InterArmsGame_start_time) | !is.na(df_dropoff$InterArmsBase_start_time) | !is.na(df_dropoff$InterArmsLongBase_start_time),1,0)
df_dropoff$quiz_started_num <- ifelse(!is.na(df_dropoff$MisinfoQuiz_start_time),1,0)
df_dropoff$quiz_consent_coded_num <- ifelse(!is.na(df_dropoff$quiz_consent_coded),1,0)
df_dropoff$quiz_completed_num <- ifelse(!is.na(df_dropoff$misinfoQuiz_end_time),1,0)
df_dropoff$complete_all_coded_num <- ifelse(!is.na(df_dropoff$completed),1,0)
df_dropoff <- df_dropoff %>%
mutate(score= rowSums(select(., started_coded_num, consent_coded_num, started_arm_num, quiz_consent_coded_num, quiz_completed_num, complete_all_coded_num, quiz_started_num))) %>%
mutate(score = ifelse(score==7, "7-Completed all", ifelse(score == 6, "6-Dropped off during quiz reveal", ifelse(score == 5, "5-Dropped off during quiz", ifelse(score == 4, "4-Dropped off before quiz consent", ifelse(score == 4, "3-Dropped off during intervention", ifelse(score == 3, "3-Dropped off during intervention", ifelse(score == 2, "2-Dropped off before intervention", ifelse(score == 1, "1-Dropped off before consent", "Did not start")))))))))
df_dropoff %>%
group_by(phase_coded) %>%
count(score) %>%
mutate(percentage = n*100/sum(n)) %>%
mutate(percentage = round(percentage, 2)) %>%
kable("html", col.names=(c("Phase", "Dropped off location", "N",
"% of Participants Started"))) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>% scroll_box(width = "100%")
Phase | Dropped off location | N | % of Participants Started |
---|---|---|---|
Phase 1 | 1-Dropped off before consent | 5936 | 16.04 |
2-Dropped off before intervention | 1001 | 2.70 | |
3-Dropped off during intervention | 9106 | 24.61 | |
4-Dropped off before quiz consent | 220 | 0.59 | |
5-Dropped off during quiz | 3727 | 10.07 | |
6-Dropped off during quiz reveal | 301 | 0.81 | |
7-Completed all | 16717 | 45.17 | |
Phase 2 | 1-Dropped off before consent | 8649 | 17.02 |
2-Dropped off before intervention | 1875 | 3.69 | |
3-Dropped off during intervention | 8905 | 17.52 | |
4-Dropped off before quiz consent | 294 | 0.58 | |
5-Dropped off during quiz | 3922 | 7.72 | |
6-Dropped off during quiz reveal | 331 | 0.65 | |
7-Completed all | 26847 | 52.82 |
df_dropoff_table <- df_dropoff %>%
group_by(score, phase_coded) %>%
summarise(count = n(), .groups = 'drop') # Calculate count
# Ensure 'score' is a factor and ordered as you want
df_dropoff$score <- factor(df_dropoff$score, levels = unique(df_dropoff$score))
ggplot(df_dropoff_table, mapping = aes(x = score, y = count, group = phase_coded)) +
geom_point(aes(color = phase_coded)) + # Color points by phase
geom_line(aes(color = phase_coded), alpha = 0.5) + # Color lines by phase
geom_text(aes(label = scales::comma(count)), vjust = -0.5, size = 3) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
scale_color_manual(values = c("Phase 1" = "purple", "Phase 2" = "green")) +
labs(title = "Chatbot Funnel: Drop off at Different Stages",
x = "Chatbot stage",
y = "Number of observations",
color = "Phase") + # Label for color legend
theme(axis.text.x = element_text(angle = 90, hjust = 1))
started
participants are those have a timestamp for
the MisinfoChat_start_time
variable.
consent
participants are those who responded ‘I
consent, start now’ in the consent
variable.
completed_intervention
participants are those who
completed has a timestamp for either the baseline_learn
,
game_learn
, sms_learn
, or
Video_learn
variables (applicable to both Phase 1 and Phase
2)
quiz consent
participants are those who responded ‘I
acknowledge’ in the quiz_consent
variable.
completed quiz
participants are those who completed
the misinformation quiz and have a timestamp for the
misinfoQuiz_end_time
variable.
completed all
participants are those who completed
the chatbot and have a stamp for the completed
variable.
df$consent_binary_num = ifelse(df$consent_coded == "I consent, start now",1,0)
df<- df %>%
mutate(started_coded_num=ifelse(!is.na(`signed up`),1,0)) %>%
mutate(completed_intervention = ifelse(!is.na(baseline_learn),baseline_learn,
ifelse(!is.na(game_learn),game_learn,
ifelse(!is.na(sms_learn),sms_learn,
ifelse(!is.na(Video_learn),Video_learn,NA))))) %>%
mutate(completed_intervention_num =ifelse(!is.na(completed_intervention),1,0)) %>%
mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>% # calculating the time between signed up and completing the misinformation quiz.
mutate(completed_survey_num=ifelse(!is.na(time_completion_survey),1,0)) %>%
mutate(completed_all_num=ifelse(!is.na(completed), 1, 0)) %>%
mutate(quiz_consent_coded_num = ifelse(quiz_consent_coded == "I acknowledge",1,0))
df_a <- df %>%
group_by(phase_coded) %>%
summarise(`started`=sum(started_coded_num==1, na.rm=TRUE),
`consented`=sum(consent_binary_num==1, na.rm=TRUE),
`completed intervention`=sum(completed_intervention_num==1, na.rm=TRUE),
`consented quiz`=sum(quiz_consent_coded_num==1, na.rm=TRUE),
`completed quiz` = sum(completed_survey_num==1, na.rm=TRUE),
`completed all` = sum(completed_all_num==1, na.rm=TRUE),
) %>%
pivot_longer(
-phase_coded, # Columns to keep as is
names_to = "stage", # Name of new column
values_to = "n" # Name of new column
) %>%
data.frame()
df_b <- df %>%
group_by(phase_coded) %>%
summarise(frac_started = sum(started_coded_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
frac_consented_started = sum(consent_binary_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
frac_completed_intervention_started = sum(completed_intervention_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
frac_consented_quiz_started = sum(quiz_consent_coded_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
frac_completed_quiz_started = sum(completed_survey_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE),
frac_completed_all_started = sum(completed_all_num==1, na.rm=TRUE) / sum(started_coded_num==1, na.rm=TRUE)) %>%
pivot_longer(
-phase_coded, # Columns to keep as is
names_to = "stage", # Name of new column
values_to = "percentage of started" # Name of new column
) %>%
select(`percentage of started`) %>%
data.frame()
df_c <- df %>%
group_by(phase_coded) %>%
summarise(frac__started_consented = NA,
frac_conseted_consented = sum(consent_binary_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
frac_completed_intervention_consented = sum(completed_intervention_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
frac_consented_quiz_consented = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
frac_completed_quiz_consented = sum(completed_survey_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE),
frac_completed_all_consented = sum(completed_all_num==1, na.rm=TRUE)/sum(consent_binary_num==1, na.rm=TRUE)) %>%
pivot_longer(
-phase_coded, # Columns to keep as is
names_to = "stage", # Name of new column
values_to = "percentage of consent" # Name of new column
) %>%
select(`percentage of consent`) %>%
data.frame()
df_d <- df %>%
group_by(phase_coded) %>%
summarise(frac_started_completed_intervention = NA,
frac_consented_completed_intervention = NA,
frac_completed_intervention_completed_intervention = sum(completed_intervention_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
frac_consented_quiz_completed_intervention = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
frac_completed_quiz_completed_intervention = sum(completed_survey_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE),
frac_completed_all_completed_intervention = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_intervention_num==1, na.rm=TRUE)) %>%
pivot_longer(
-phase_coded, # Columns to keep as is
names_to = "stage", # Name of new column
values_to = "percentage of completed intervention" # Name of new column
) %>%
select(`percentage of completed intervention`) %>%
data.frame()
df_e <- df %>%
group_by(phase_coded) %>%
summarise(frac_started_completed_quiz = NA,
frac_consented_completed_quiz = NA,
frac_completed_intervention_completed_quiz = NA,
frac_consented_quiz_completed_quiz = sum(quiz_consent_coded_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE),
frac_completed_quiz_completed_quiz = sum(completed_survey_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE),
frac_completed_all_completed_quiz = sum(completed_all_num==1, na.rm=TRUE)/sum(quiz_consent_coded_num==1, na.rm=TRUE)) %>%
pivot_longer(
-phase_coded, # Columns to keep as is
names_to = "stage", # Name of new column
values_to = "percentage of quiz consent" # Name of new column
) %>%
select(`percentage of quiz consent`) %>%
data.frame()
df_f <- df %>%
group_by(phase_coded) %>%
summarise(frac_started_completed_all = NA,
frac_consented_completed_all = NA,
frac_completed_intervention_completed_all = NA,
frac_consented_quiz_completed_all = NA,
frac_completed_quiz_completed_all = sum(completed_survey_num==1, na.rm=TRUE)/sum(completed_survey_num==1, na.rm=TRUE),
frac_completed_all_completed_all = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_survey_num==1, na.rm=TRUE)) %>%
pivot_longer(
-phase_coded, # Columns to keep as is
names_to = "stage", # Name of new column
values_to = "percentage of completed quiz" # Name of new column
) %>%
select(`percentage of completed quiz`) %>%
data.frame()
df_g <- df %>%
group_by(phase_coded) %>%
summarise(frac_started_completed_all = NA,
frac_consented_completed_all = NA,
frac_completed_intervention_completed_all = NA,
frac_consented_quiz_completed_all = NA,
frac_completed_quiz_completed_all = NA,
frac_completed_all_completed_all = sum(completed_all_num==1, na.rm=TRUE)/sum(completed_all_num==1, na.rm=TRUE)) %>%
pivot_longer(
-phase_coded, # Columns to keep as is
names_to = "stage", # Name of new column
values_to = "percentage of completed all" # Name of new column
) %>%
select(`percentage of completed all`) %>%
data.frame()
funnel_df <- cbind(df_a, df_b, df_c, df_d, df_e, df_f, df_g) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>% data.frame()
funnel_df %>%
kable(digits = 3,caption = "Funnel Statistics",
col.names = c("Phase", "Stage", "N", "Prop. of Started", "Prop. of Consent", "Prop. of Completed Intervetion", "Prop. of Quiz Consent", "Prop. of Completed Quiz", "Prop. of Completed All" )) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
scroll_box( height = "500px")
Phase | Stage | N | Prop. of Started | Prop. of Consent | Prop. of Completed Intervetion | Prop. of Quiz Consent | Prop. of Completed Quiz | Prop. of Completed All |
---|---|---|---|---|---|---|---|---|
Phase 1 | started | 37008 | 1.000 | NA | NA | NA | NA | NA |
consented | 29993 | 0.810 | 1.000 | NA | NA | NA | NA | |
completed intervention | 20965 | 0.566 | 0.699 | 1.000 | NA | NA | NA | |
consented quiz | 20662 | 0.558 | 0.689 | 0.986 | 1.000 | NA | NA | |
completed quiz | 17016 | 0.460 | 0.567 | 0.812 | 0.824 | 1.000 | NA | |
completed all | 16724 | 0.452 | 0.558 | 0.798 | 0.809 | 0.983 | 1 | |
Phase 2 | started | 50823 | 1.000 | NA | NA | NA | NA | NA |
consented | 40681 | 0.800 | 1.000 | NA | NA | NA | NA | |
completed intervention | 31392 | 0.618 | 0.772 | 1.000 | NA | NA | NA | |
consented quiz | 31020 | 0.610 | 0.763 | 0.988 | 1.000 | NA | NA | |
completed quiz | 27178 | 0.535 | 0.668 | 0.866 | 0.876 | 1.000 | NA | |
completed all | 26862 | 0.529 | 0.660 | 0.856 | 0.866 | 0.988 | 1 |
ggplot(funnel_df, mapping = aes(x = stage, y = n, group = phase_coded)) +
geom_point(aes(color = phase_coded)) + # Color points by phase
geom_line(aes(color = phase_coded), alpha = 0.5) + # Color lines by phase
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
scale_color_manual(values = c("Phase 1" = "purple", "Phase 2" = "green")) +
labs(title = "Chatbot Funnel: Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations",
color = "Phase") + # Label for color legend
theme(axis.text.x = element_text(angle = 90, hjust = 1))
df_summary <- df %>%
select(phase_coded, MisinfoChat_start_time, completed_survey_num) %>%
mutate(startdate = as.Date(MisinfoQuiz_start_time))
daily_summary <- df_summary %>%
group_by(startdate) %>%
count(completed_survey_num==1) %>%
mutate(pct = n/sum(n))
ggplot(daily_summary,
aes(x = startdate, y = pct)) +
geom_bar(position = "identity", stat = "identity") +
scale_y_continuous(limits = c(0, 1), labels = scales::percent, expand = c(0, 0, 0, 0)) +
scale_x_date(breaks = "day") +
labs(x = "Day",
y = "% complete") +
theme(legend.position = "none",
axis.text.x = element_text(angle = 90))
Only participants who consented are considered.
Zooming into the dropout during the intervention. Dropout is much higher for game, video and SMS than for facts baseline.
dfphase1 <- df %>%
filter(phase_coded == "Phase 1") %>%
filter(consent_binary_num == 1) %>%
filter(arm_coded == "Baseline 1" | arm_coded == "SMS" | arm_coded == "Video" | arm_coded == "Game" | is.na("arm_coded"))
dfphase1 %>%
mutate(n_started_intervention = if_else(!is.na(InterArmsBase_start_time) | !is.na(InterArms_start_time) | !is.na(InterArmsGame_start_time) | !is.na(InterArmsSMS_start_time), 1, 0),
n_finished_intervention = if_else(!is.na(InterArmsBase_end_time) | !is.na(InterArmsVideo_end_time) | !is.na(interArmsGame_end_time) | !is.na(InterArmsSMS_end_time), 1, 0)) %>%
filter(!is.na(arm)) %>%
group_by(arm_coded) %>%
summarize(n_started_intervention = sum(n_started_intervention==1),
n_finished_intervention = sum(n_finished_intervention==1),
completed_intervention_after_starting = n_finished_intervention/n_started_intervention) %>%
kable(digits = 3,caption = "Intervention Funnel for Phase 1",
col.names = c("Treatment Arm", "Started Intervention", "Completed Intervention", "Completed/Started" )) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
scroll_box( height = "500px")
Treatment Arm | Started Intervention | Completed Intervention | Completed/Started |
---|---|---|---|
Baseline 1 | 7441 | 6138 | 0.825 |
Game | 7458 | 4667 | 0.626 |
SMS | 7543 | 5143 | 0.682 |
Video | 7525 | 4962 | 0.659 |
df_base_funnel <- dfphase1 %>%
filter(!is.na(InterArmsBase_start_time)) %>%
summarize(n_started_baseline_intervention = sum(!is.na(InterArmsBase_start_time)),
n_base2_ready = sum(!is.na(base2_ready)),
n_base3_ready = sum(!is.na(base3_ready)),
n_base4_ready = sum(!is.na(base4_ready)),
n_base5_ready = sum(!is.na(base5_ready)),
n_finished_baseline_intervention = sum(!is.na(InterArmsBase_end_time)),
# quiz
n_quiz_consent = sum(!is.na(quiz_consent)),
n_completed = sum(completed, na.rm = TRUE)) %>%
pivot_longer(cols = everything(),
names_pattern = "n_(.*)",
names_to = "stage",
values_to = "n") %>%
mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_base_funnel,
mapping = aes(x = stage, y = n, group = NA)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
geom_text(aes(label = scales::percent(n/n[stage == "Started baseline intervention"])), vjust = 2, size = 2) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Chatbot Funnel Facts Baseline",
subtitle = "Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
df_game_funnel <- dfphase1 %>%
filter(!is.na(InterArmsGame_start_time)) %>%
summarize(n_started_game_intervention = sum(!is.na(InterArmsGame_start_time)),
n_game_done = sum(!is.na(game_done)),
n_finished_game_intervention = sum(!is.na(interArmsGame_end_time)),
# quiz
n_quiz_consent = sum(!is.na(quiz_consent)),
n_completed = sum(completed, na.rm = TRUE)) %>%
pivot_longer(cols = everything(),
names_pattern = "n_(.*)",
names_to = "stage",
values_to = "n") %>%
mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_game_funnel,
mapping = aes(x = stage, y = n, group = NA)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
geom_text(aes(label = scales::percent(n/n[stage == "Started game intervention"])), vjust = 2, size = 2) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Chatbot Funnel Game",
subtitle = "Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
df_video_funnel <- dfphase1 %>%
filter(!is.na(InterArms_start_time)) %>%
summarize(n_started_video_intervention = sum(!is.na(InterArms_start_time)),
n_video2_ready = sum(!is.na(video2_ready)),
n_video2_done = sum(!is.na(video2_done)),
n_video3_ready = sum(!is.na(video3_ready)),
n_video3_done = sum(!is.na(video3_done)),
n_video4_ready = sum(!is.na(video4_ready)),
n_video4_done = sum(!is.na(video4_done)),
n_video5_done = sum(!is.na(video5_done)),
n_finished_video_intervention = sum(!is.na(InterArmsVideo_end_time)),
# quiz
n_quiz_consent = sum(!is.na(quiz_consent)),
n_completed = sum(completed, na.rm = TRUE)) %>%
pivot_longer(cols = everything(),
names_pattern = "n_(.*)",
names_to = "stage",
values_to = "n") %>%
mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_video_funnel,
mapping = aes(x = stage, y = n, group = NA)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
geom_text(aes(label = scales::percent(n/n[stage == "Started video intervention"])), vjust = 2, size = 2) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Chatbot Funnel Video",
subtitle = "Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
df_sms_funnel <- dfphase1 %>%
filter(!is.na(InterArmsSMS_start_time)) %>%
summarize(n_started_sms_intervention = sum(!is.na(InterArmsSMS_start_time)),
n_sms2_ready = sum(!is.na(sms2_ready)),
n_sms3_ready = sum(!is.na(sms3_ready)),
n_sms4_ready = sum(!is.na(sms4_ready)),
n_sms5_ready = sum(!is.na(sms5_ready)),
n_finished_sms_intervention = sum(!is.na(InterArmsSMS_end_time)),
# quiz
n_quiz_consent = sum(!is.na(quiz_consent)),
n_completed = sum(completed, na.rm = TRUE)) %>%
pivot_longer(cols = everything(),
names_pattern = "n_(.*)",
names_to = "stage",
values_to = "n") %>%
mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_sms_funnel,
mapping = aes(x = stage, y = n, group = NA)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
geom_text(aes(label = scales::percent(n/n[stage == "Started sms intervention"])), vjust = 2, size = 2) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Chatbot Funnel SMS",
subtitle = "Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
# payment end time is almost always missing, so we don't analyze payment time
# misinfo chat duration is always 0, so we don't show it
df_duration1 <- dfphase1 %>%
mutate(baseline_duration = InterArmsBase_end_time - InterArmsBase_start_time,
video_duration = InterArmsVideo_end_time - InterArms_start_time,
game_duration = interArmsGame_end_time - InterArmsGame_start_time,
sms_duration = InterArmsSMS_end_time - InterArmsSMS_start_time,
misinfo_quiz_duration = misinfoQuiz_end_time - MisinfoQuiz_start_time) %>%
select(contains("duration")) %>%
pivot_longer(cols = everything(),
names_pattern = "(.*)_duration",
names_to = "stage",
values_to = "time") %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>%
filter(!is.na(time))
# strangely, there are some negative durations, we filter those
df_duration1 <- df_duration1 %>%
filter(time >= 0)
ggplot(df_duration1,
mapping = aes(x = time, group = stage, color = stage)) +
geom_histogram() +
facet_wrap(~ stage, scales = "free") +
labs(x = "time in minutes")
ggplot(df_duration1,
mapping = aes(y = time, x = stage, color = stage)) +
geom_boxplot() +
# restrict scale to focus on observations that are not outliers
scale_y_continuous(limits = c(0, 150)) +
labs(y = "time in minutes")
df_duration1 %>%
group_by(stage) %>%
summarize(median_duration = median(time),
avg_duration = mean(time),
sd_duration = sd(time),
min_duration = min(time),
max_duration = max(time)) %>%
kable(digits = 3,caption = "Phase 1: Duration of Interventions",
col.names = c("Stage", "Median", "Average", "SD", "Min","Max")) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
scroll_box( height = "500px")
Stage | Median | Average | SD | Min | Max |
---|---|---|---|---|---|
baseline | 10.783 mins | 175.426 mins | 2036.561 | 4.617 mins | 68946.00 mins |
video | 15.500 mins | 337.589 mins | 2889.449 | 9.000 mins | 69023.57 mins |
game | 17.383 mins | 225.703 mins | 1824.336 | 5.767 mins | 48474.42 mins |
sms | 17.467 mins | 254.841 mins | 2452.954 | 8.150 mins | 64620.12 mins |
misinfo_quiz | 6.817 mins | 177.845 mins | 2295.861 | 1.517 mins | 73517.35 mins |
Only participants who consented are considered.
Zooming into the dropout during the intervention. Dropout is higher for long baseline and SMS. The differences in dropout rates are much lower in Phase 2 than Phase 1.
dfphase2 <- df %>%
filter(phase_coded == "Phase 2") %>%
filter(consent_binary_num == 1) %>%
filter(arm_coded == "Short Baseline 2" | arm_coded == "SMS" | arm_coded == "Long Baseline" | is.na("arm_coded"))
dfphase2 %>%
mutate(n_started_intervention = if_else(!is.na(InterArmsBase_start_time) | !is.na(InterArmsLongBase_start_time) | !is.na(InterArmsSMS_start_time), 1, 0),
n_finished_intervention = if_else(!is.na(InterArmsBase_end_time) | !is.na(InterArmsLongBase_end_time) | !is.na(InterArmsSMS_end_time), 1, 0)) %>%
filter(!is.na(arm)) %>%
group_by(arm_coded) %>%
summarize(n_started_intervention = sum(n_started_intervention==1),
n_finished_intervention = sum(n_finished_intervention==1),
completed_intervention_after_starting = n_finished_intervention/n_started_intervention) %>%
kable(digits = 3,caption = "Intervention Funnel for Phase 2",
col.names = c("Treatment Arm", "Started Intervention", "Completed Intervention", "Completed/Started" )) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
scroll_box( height = "500px")
Treatment Arm | Started Intervention | Completed Intervention | Completed/Started |
---|---|---|---|
Long Baseline | 14024 | 10740 | 0.766 |
SMS | 20077 | 15497 | 0.772 |
Short Baseline 2 | 6185 | 5148 | 0.832 |
df_base_funnel2 <- dfphase2 %>%
filter(!is.na(InterArmsBase_start_time)) %>%
summarize(n_started_baseline_intervention = sum(!is.na(InterArmsBase_start_time)),
n_base2_ready = sum(!is.na(base2_ready)),
n_base3_ready = sum(!is.na(base3_ready)),
n_base4_ready = sum(!is.na(base4_ready)),
n_base5_ready = sum(!is.na(base5_ready)),
n_finished_baseline_intervention = sum(!is.na(InterArmsBase_end_time)),
# quiz
n_quiz_consent = sum(!is.na(quiz_consent)),
n_completed = sum(completed, na.rm = TRUE)) %>%
pivot_longer(cols = everything(),
names_pattern = "n_(.*)",
names_to = "stage",
values_to = "n") %>%
mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_base_funnel2,
mapping = aes(x = stage, y = n, group = NA)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
geom_text(aes(label = scales::percent(n/n[stage == "Started baseline intervention"])), vjust = 2, size = 2) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Chatbot Funnel Facts Baseline",
subtitle = "Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
df_sms_funnel2 <- dfphase2 %>%
filter(!is.na(InterArmsSMS_start_time)) %>%
summarize(n_started_sms_intervention = sum(!is.na(InterArmsSMS_start_time)),
n_sms2_ready = sum(!is.na(sms2_ready)),
n_sms3_ready = sum(!is.na(sms3_ready)),
n_sms4_ready = sum(!is.na(sms4_ready)),
n_sms5_ready = sum(!is.na(sms5_ready)),
n_finished_sms_intervention = sum(!is.na(InterArmsSMS_end_time)),
# quiz
n_quiz_consent = sum(!is.na(quiz_consent)),
n_completed = sum(completed, na.rm = TRUE)) %>%
pivot_longer(cols = everything(),
names_pattern = "n_(.*)",
names_to = "stage",
values_to = "n") %>%
mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_sms_funnel2,
mapping = aes(x = stage, y = n, group = NA)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
geom_text(aes(label = scales::percent(n/n[stage == "Started sms intervention"])), vjust = 2, size = 2) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Chatbot Funnel SMS",
subtitle = "Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
df_longbase_funnel2 <- dfphase2 %>%
filter(!is.na(InterArmsLongBase_start_time)) %>%
summarize(n_started_longbase_intervention = sum(!is.na(InterArmsLongBase_start_time)),
n_base2_ready = sum(!is.na(base2_ready)),
n_base3_ready = sum(!is.na(base3_ready)),
n_base4_ready = sum(!is.na(base4_ready)),
n_base5_ready = sum(!is.na(base5_ready)),
n_base_experts_reason = sum(!is.na(base_experts_reason)),
n_finished_longbase_intervention = sum(!is.na(InterArmsLongBase_end_time)),
# quiz
n_quiz_consent = sum(!is.na(quiz_consent)),
n_completed = sum(completed, na.rm = TRUE)) %>%
pivot_longer(cols = everything(),
names_pattern = "n_(.*)",
names_to = "stage",
values_to = "n") %>%
mutate(stage = janitor::make_clean_names(stage, case = "sentence")) %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage)))
ggplot(df_longbase_funnel2,
mapping = aes(x = stage, y = n, group = NA)) +
geom_point() +
geom_line(alpha = 0.5) +
geom_text(aes(label = scales::comma(n)), vjust = -0.5, size = 3) +
geom_text(aes(label = scales::percent(n/n[stage == "Started longbase intervention"])), vjust = 2, size = 2) +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::comma) +
labs(title = "Chatbot Funnel SMS",
subtitle = "Number of Participants at Different Stages",
x = "Funnel stage",
y = "Number of observations") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
# payment end time is almost always missing, so we don't analyze payment time
# misinfo chat duration is always 0, so we don't show it
df_duration2 <- dfphase2 %>%
mutate(shortbase_duration = InterArmsBase_end_time - InterArmsBase_start_time,
longbase_duration = InterArmsLongBase_end_time - InterArmsLongBase_start_time,
sms_duration = InterArmsSMS_end_time - InterArmsSMS_start_time,
misinfo_quiz_duration = misinfoQuiz_end_time - MisinfoQuiz_start_time) %>%
select(contains("duration")) %>%
pivot_longer(cols = everything(),
names_pattern = "(.*)_duration",
names_to = "stage",
values_to = "time") %>%
mutate(stage = factor(stage, labels = unique(stage), levels = unique(stage))) %>%
filter(!is.na(time))
# strangely, there are some negative durations, we filter those
df_duration2 <- df_duration2 %>%
filter(time >= 0)
ggplot(df_duration2,
mapping = aes(x = time, group = stage, color = stage)) +
geom_histogram() +
facet_wrap(~ stage, scales = "free") +
labs(x = "time in minutes")
ggplot(df_duration2,
mapping = aes(y = time, x = stage, color = stage)) +
geom_boxplot() +
# restrict scale to focus on observations that are not outliers
scale_y_continuous(limits = c(0, 150)) +
labs(y = "time in minutes")
df_duration2 %>%
group_by(stage) %>%
summarize(median_duration = median(time),
avg_duration = mean(time),
sd_duration = sd(time),
min_duration = min(time),
max_duration = max(time)) %>%
kable(digits = 3,caption = "Phase 2: Duration of Interventions",
col.names = c("Stage", "Median", "Average", "SD", "Min","Max")) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>% collapse_rows() %>%
scroll_box( height = "500px")
Stage | Median | Average | SD | Min | Max |
---|---|---|---|---|---|
shortbase | 10.550 mins | 122.126 mins | 1377.030 | 4.217 mins | 39524.40 mins |
longbase | 17.533 mins | 153.456 mins | 1189.853 | 3.967 mins | 41738.68 mins |
sms | 16.650 mins | 167.512 mins | 1469.652 | 5.683 mins | 47502.13 mins |
misinfo_quiz | 6.150 mins | 108.910 mins | 1242.914 | 1.200 mins | 44377.45 mins |
Sharing discernment is defined as the difference in proportion between non-misinformation posts and misinformation posts shared. For example, if user shared 2 out of 2 non-misinformation posts (i.e. \(1\)) and 1 out of 3 misinformation posts (i.e. \(\frac{1}{3}\)), discernment takes a value of \(\frac{2}{3}\).
Reliability and Manipulation discernment are constructed in such a way that misinformation and non-misinformation are assigned an equal weight. Since each user encounters 3 misinformation posts and 2 non-misinformation posts, non-misinformation is reweighted by \(\frac{3}{2}\)
\[ Sharing Discernemnt = \frac{\textit{# Non-Misinformation posts Shared}}{3} - \frac{\textit{# Misinformation posts Shared}}{2} \]
\[ \text{Reliability Discernment:} = \begin{cases} \textit{Reliability score} \times \frac{3}{2}, & \text{if non-misinformation} \\ \textit{Reliability score} \times -1, & \text{if misinformation} \\ 0, & \text{otherwise} \end{cases} \]
\[ \text{Manipulation Discernment:} = \begin{cases} \textit{Manipulation score}\times \frac{3}{2}, & \text{if non-misinformation} \\ \textit{Manipulation score} \times -1, & \text{if misinformation} \\ 0, & \text{otherwise} \end{cases} \]
df_chat <- df %>%
select(
-matches("manipulative_coded"),
-matches("reliable_coded"),
-matches("manipulative_coded_num"),
-matches("reliable_coded_num"),
-matches("share_coded")
) %>%
rename(condition = arm_coded) %>%
filter(phase_coded=="Phase 1")
df_chat <- df_chat %>%
mutate(treatments = ifelse(condition != "Baseline 1",1,0),
completed = ifelse(!is.na(baseline_learn),baseline_learn,
ifelse(!is.na(game_learn),game_learn,
ifelse(!is.na(sms_learn),sms_learn,
ifelse(!is.na(Video_learn),Video_learn,NA))))) %>%
mutate(completed_coded = ifelse(!is.na(completed),1,0)) %>%
mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>%
mutate(completed_survey = ifelse(!is.na(time_completion_survey),1,0)) %>%
mutate(completed_coded = ifelse(!is.na(baseline_learn),baseline_learn,
ifelse(!is.na(game_learn),game_learn,
ifelse(!is.na(sms_learn),sms_learn,
ifelse(!is.na(Video_learn),Video_learn,NA))))) %>%
mutate(completed_binary = ifelse(!is.na(completed_coded),1,0))
# 12 questions asking about manipulative or reliable content
unique_responses_manip_reliable <- df_chat %>%
select(contains("_manipulative"), ends_with("reliable")) %>%
unlist() %>%
unique()
unique_responses_share <- df %>%
select(contains("_share")) %>%
unlist() %>%
unique()
valid_responses_5point <- c("1 - Not at all", "2", "3", "4", "5 - Very", 1, 2, 3, 4, 5, "5- Very", "5-Very", "Very5", "5 very", "1-Not at all")
valid_responses_share <- c("Yes", "No", "yes", "no", "YES")
df_chat <- df_chat %>%
mutate_at(.vars = vars(contains(c("_manipulative"))),
.funs = funs(case_when(. %in% valid_responses_5point ~ .,
TRUE ~ NA_character_))) %>%
mutate_at(.vars = vars(ends_with(c("_reliable"))), # ends_with to exclude manipulative_reliable_order
.funs = funs(case_when(. %in% valid_responses_5point ~ .,
TRUE ~ NA_character_))) %>%
mutate_at(.vars = vars(contains(c("_share"))),
.funs = funs(case_when(. %in% valid_responses_share ~ .,
TRUE ~ NA_character_)))
df_particpants_posts <- df_chat %>%
pivot_longer(cols = contains("post_id"),
names_pattern = "post_id_(.*)",
names_to = "post_id",
values_to = "post") %>%
select(anon_id, post_id, post)
df_outcomes <- df_chat %>%
pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
names_to = c("post_id", "variable"),
names_pattern = "post_(.*)_(.*)") %>%
pivot_wider(names_from = variable,
values_from = value)
df_chat_long <- left_join(df_outcomes %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))
# recode outcomes to numeric
df_chat_long <- df_chat_long %>%
mutate_at(.vars = vars(manipulative, reliable),
.funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
. == "5 - Very" | . == "5- Very" | . == "5-Very" | . == "Very5" | . == "5 very" ~ 5,
TRUE ~ as.numeric(.)))) %>%
mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1,
share == "No" | share == "no" ~ 0))
# Table 2
#Table with columns ("# of Obs", "Misinfo Sharing", "Non-misinfo Sharing", "Discernment", "Misinfo Reliability", "Non-misinfo Reliability", "Discernment", "Misinfo Manip", "Non-misinfo Manip", "Discernment") that report the mean of survey outcome for completers above and the standard error in parentheses, and rows ("All", "Placebo", "Treatments", "Text Course", "Videos", "Game")
# restrict to completers
df_completed <- df_chat %>%
filter(completed_binary == 1 & consent == "I consent, start now")
df_completed_long <- df_completed %>%
pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
names_to = c("post_id", "variable"),
names_pattern = "post_(.*)_(.*)") %>%
pivot_wider(names_from = variable,
values_from = value)
df_particpants_posts <- df_completed %>%
pivot_longer(cols = contains("post_id"),
names_pattern = "post_id_(.*)",
names_to = "post_id",
values_to = "post") %>%
select(anon_id, post_id, post)
df_completed_long <- left_join(df_completed_long %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))
# recode outcomes to numeric
df_completed_long <- df_completed_long %>%
mutate_at(.vars = vars(manipulative, reliable),
.funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
. == "5 - Very" | . == "5- Very" | . == "5-Very" | . == "Very5" | . == "5 very" ~ 5,
TRUE ~ as.numeric(.)))) %>%
mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1,
share == "No" | share == "no" ~ 0))
# shares can be codified to include other responses e.g. "Yeah" or "No I will not share"
# Create all and substitute placebo
df_completed_long$condition[df_completed_long$condition == "Baseline 1"] <- "placebo"
df_completed_long$treatments <- ifelse(df_completed_long$condition != "placebo",1,0)
# Post contains misinformation
df_completed_long$post_contains_misinfo <- ifelse(!grepl("^attention_check|^non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_misinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_misinfo[is.na(df_completed_long$post)] <- NA
# Post contains non-misinformation
df_completed_long$post_contains_nonmisinfo <- ifelse(grepl("non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_nonmisinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_nonmisinfo[is.na(df_completed_long$post)] <- NA
# Equal weighting throughout
# number of misinformation posts user encountered
completed_coded <- df_completed_long %>%
group_by(anon_id,post_contains_misinfo,post_contains_nonmisinfo) %>%
summarise(posts_misinfo = ifelse(post_contains_misinfo==1,n(),NA),
posts_nonmisinfo = ifelse(post_contains_nonmisinfo==1,n(),NA)) %>%
ungroup() %>%
select(anon_id,posts_misinfo,posts_nonmisinfo) %>%
distinct() %>%
filter(!is.na(posts_misinfo) | !is.na(posts_nonmisinfo))
# Discernment: sharing
# every user sees 3 misinfo posts and 2 non-misinfo
df_completed_long$disc_sharing =
ifelse(df_completed_long$post_contains_nonmisinfo==1 & df_completed_long$share==1,1*(3/2),
ifelse(df_completed_long$post_contains_misinfo==1 & df_completed_long$share==1,-1,0))
# Discernment: reliability
df_completed_long$disc_reliability =
ifelse(df_completed_long$post_contains_nonmisinfo==1,df_completed_long$reliable*(3/2),
ifelse(df_completed_long$post_contains_misinfo==1,-1*df_completed_long$reliable,0))
# Discernment: manipulation
df_completed_long$disc_manipulation =
ifelse(df_completed_long$post_contains_nonmisinfo==1,-1*df_completed_long$manipulative,
ifelse(df_completed_long$post_contains_misinfo==1,df_completed_long$manipulative*(3/2),0))
# "Misinfo Sharing"
misinfo_sharing_ind <- df_completed_long %>%
filter(post_contains_misinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(),
se = se_binary(share, na.rm = TRUE))
table2_misinfo_sharing <- misinfo_sharing_ind %>%
group_by(condition) %>%
summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(),
se = se_cont(share_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- misinfo_sharing_ind %>%
summarise(condition = "All",
mean = mean(share_rate, na.rm = TRUE),nobs = n(),
se = se_cont(share_rate, na.rm = TRUE))
aux_all <- data.frame("condition" = "All",
"mean" = mean(misinfo_sharing_ind$share_rate, na.rm = TRUE),
"nobs" = nrow(misinfo_sharing_ind[misinfo_sharing_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(misinfo_sharing_ind$share_rate, na.rm = TRUE))
aux_treatments <- misinfo_sharing_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(share_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(share_rate, na.rm = TRUE))
table2_misinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_misinfo_sharing)
# "Non-misinfo Sharing"
nonmisinfo_sharing_ind <- df_completed_long %>%
filter(post_contains_nonmisinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(),
se = se_binary(share, na.rm = TRUE))
table2_nonmisinfo_sharing <- nonmisinfo_sharing_ind %>%
group_by(condition) %>%
summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(),
se = se_cont(share_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE),
"nobs" = nrow(nonmisinfo_sharing_ind[nonmisinfo_sharing_ind$condition %in%c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE))
aux_treatments <- nonmisinfo_sharing_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(share_rate, na.rm = TRUE),nobs = n(),
se = se_cont(share_rate, na.rm = TRUE))
table2_nonmisinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_sharing)
# "Discernment"
# Discernment defined as Prop. Non-Misinfo Shared - Prop. Misinfo shared (i.e., equal weighting).
sharing_ind <- df_completed_long %>%
select(anon_id, condition, treatments, post_contains_misinfo, post_contains_nonmisinfo, share, post) %>%
group_by(anon_id, condition, treatments) %>%
summarise(
posts_misinfo_share = sum(share == 1 & post_contains_misinfo == 1, na.rm = TRUE)/3,
posts_nonmisinfo_share = sum(share == 1 & post_contains_nonmisinfo == 1, na.rm = TRUE)/2) %>%
ungroup()
sharing_ind$disc_diff_prop = sharing_ind$posts_nonmisinfo_share - sharing_ind$posts_misinfo_share
sharing_disc_ind <- sharing_ind %>%
group_by(anon_id, condition, treatments) %>%
summarise(disc_rate = mean(disc_diff_prop, na.rm = TRUE), nobs = n(),
se = se_cont(disc_diff_prop, na.rm = TRUE))
table2_disc_sharing <- aggregate(disc_rate~condition,sharing_disc_ind, mean)
names(table2_disc_sharing) <- c("condition","mean")
table2_disc_sharing$nobs <- c(seq(nrow(sharing_disc_ind),length(sharing_disc_ind$condition)))
table2_disc_sharing$se <- aggregate(disc_rate~condition,sharing_disc_ind, se_cont)[,2]
# all
aux_all <- data.frame("condition" = "All",
"mean" = mean(sharing_disc_ind$disc_rate, na.rm = TRUE),
"nobs" = nrow(sharing_disc_ind[sharing_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(sharing_disc_ind$disc_rate, na.rm = TRUE))
# treatments
aux_treatments <- sharing_disc_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(disc_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_sharing <- bind_rows(aux_all,aux_treatments,table2_disc_sharing)
# "Misinfo Reliability"
misinfo_reliability_ind <- df_completed_long %>%
filter(post_contains_misinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(),
se = se_cont(reliable, na.rm = TRUE))
table2_misinfo_reliability <- misinfo_reliability_ind %>%
group_by(condition) %>%
summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(misinfo_reliability_ind$reliable_rate, na.rm = TRUE),
"nobs" = nrow(misinfo_reliability_ind[misinfo_reliability_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(misinfo_reliability_ind$reliable_rate, na.rm = TRUE))
aux_treatments <- misinfo_reliability_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(reliable_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE))
table2_misinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_misinfo_reliability)
# "NonMisinfo Reliability"
nonmisinfo_reliability_ind <- df_completed_long %>%
filter(post_contains_nonmisinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(),
se = se_cont(reliable, na.rm = TRUE))
table2_nonmisinfo_reliability <- nonmisinfo_reliability_ind %>%
group_by(condition) %>%
summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE),
"nobs" = nrow(nonmisinfo_reliability_ind[nonmisinfo_reliability_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE))
aux_treatments <- nonmisinfo_reliability_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(reliable_rate, na.rm = TRUE),nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE))
table2_nonmisinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_reliability)
# "Discernment"
reliability_disc_ind <- df_completed_long %>%
group_by(anon_id, condition, treatments) %>%
summarise(disc_rate = mean(disc_reliability, na.rm = TRUE), nobs = n(),
se = se_cont(disc_reliability, na.rm = TRUE))
table2_disc_reliability <- reliability_disc_ind %>%
group_by(condition) %>%
summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(reliability_disc_ind$disc_rate, na.rm = TRUE),
"nobs" = nrow(reliability_disc_ind[reliability_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(reliability_disc_ind$disc_rate, na.rm = TRUE))
aux_treatments <- reliability_disc_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_reliability <- bind_rows(aux_all,aux_treatments,table2_disc_reliability)
# "Misinfo Manipulation"
misinfo_manipulation_ind <- df_completed_long %>%
filter(post_contains_misinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(),
se = se_cont(manipulative, na.rm = TRUE))
table2_misinfo_manipulation <- misinfo_manipulation_ind %>%
group_by(condition) %>%
summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
"nobs" = nrow(misinfo_manipulation_ind[misinfo_manipulation_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))
aux_treatments <- misinfo_manipulation_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(manipulation_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE))
table2_misinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_misinfo_manipulation)
# "NonMisinfo Manipulation"
nonmisinfo_manipulation_ind <- df_completed_long %>%
filter(post_contains_nonmisinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(),
se = se_cont(manipulative, na.rm = TRUE))
table2_nonmisinfo_manipulation <- nonmisinfo_manipulation_ind %>%
group_by(condition) %>%
summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
"nobs" = nrow(nonmisinfo_manipulation_ind[nonmisinfo_manipulation_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))
aux_treatments <- nonmisinfo_manipulation_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(manipulation_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE))
table2_nonmisinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_manipulation)
# "Discernment"
manipulation_disc_ind <- df_completed_long %>%
group_by(anon_id, condition, treatments) %>%
summarise(disc_rate = mean(disc_manipulation, na.rm = TRUE), nobs = n(),
se = se_cont(disc_manipulation, na.rm = TRUE))
table2_disc_manipulation <- manipulation_disc_ind %>%
group_by(condition) %>%
summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(manipulation_disc_ind$disc_rate, na.rm = TRUE),
"nobs" = nrow(manipulation_disc_ind[manipulation_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(manipulation_disc_ind$disc_rate, na.rm = TRUE))
aux_treatments <- manipulation_disc_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_manipulation <- bind_rows(aux_all,aux_treatments,table2_disc_manipulation)
# Final table
table2 <- data.frame("condition" = table2_misinfo_sharing$condition,
"# of Obs" = table2_misinfo_sharing$nobs,
"Misinfo Sharing" = table2_misinfo_sharing$mean,
"Non-misinfo Sharing" = table2_nonmisinfo_sharing$mean,
"Discernment Sharing" = table2_disc_sharing$mean,
"Misinfo Reliability" = table2_misinfo_reliability$mean,
"Non-misinfo Reliability" = table2_nonmisinfo_reliability$mean,
"Discernment Reliability" = table2_disc_reliability$mean,
"Misinfo Manipulation" = table2_misinfo_manipulation$mean,
"Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$mean,
"Discernment Manipulation" = table2_disc_manipulation$mean)
table2_se <- data.frame("condition" = table2_misinfo_sharing$condition,
"# of Obs" = table2_misinfo_sharing$nobs,
"Misinfo Sharing" = table2_misinfo_sharing$se,
"Non-misinfo Sharing" = table2_nonmisinfo_sharing$se,
"Discernment Sharing" = table2_disc_sharing$se,
"Misinfo Reliability" = table2_misinfo_reliability$se,
"Non-misinfo Reliability" = table2_nonmisinfo_reliability$se,
"Discernment Reliability" = table2_disc_reliability$se,
"Misinfo Manipulation" = table2_misinfo_manipulation$se,
"Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$se,
"Discernment Manipulation" = table2_disc_manipulation$se)
# add parentheses to s.e.
#order_vector <- c("20,689", " 6,078", "14,611", " 5,083", " 4,904"," 4,624")
#table2_se <- table2_se[order(match(table2_se$X..of.Obs, order_vector)), ]
#table2_se$order <- seq(nrow(table2_se))
table2_se <- table2_se %>% mutate(order = ifelse(grepl("All",condition), 1,
ifelse(grepl("placebo",condition), 2,
ifelse(grepl("Treatments",condition), 3,
ifelse(grepl("SMS",condition), 4,
ifelse(grepl("Video",condition), 5,
ifelse(grepl("Game",condition), 6,NA)))))))
table2_se$priority <- 2
table2_se$X..of.Obs <- NA
table2_se[,1:11] <- data.frame(lapply(table2_se[,1:11], function(col) sapply(col, add_parentheses)))
# format table 2
table2 <- data.frame(lapply(table2, function(x) format(x, big.mark = ",", digits = 3)))
#table2 <- table2[order(match(table2$X..of.Obs, order_vector)), ]
#table2$order <- seq(nrow(table2))
table2 <- table2 %>% mutate(order = ifelse(grepl("All",condition), 1,
ifelse(grepl("placebo",condition), 2,
ifelse(grepl("Treatments",condition), 3,
ifelse(grepl("SMS",condition), 4,
ifelse(grepl("Video",condition), 5,
ifelse(grepl("Game",condition), 6,NA)))))))
table2$priority <- 1
# bind rows
names(table2) <- c("condition","# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation","order","priority")
names(table2_se) <- names(table2)
table <- rbind(table2,table2_se)
table <- table[order(table$order, table$priority), ]
table <- table %>% select(-c(order,priority))
table <- table %>% select(-c(condition))
# remove NA
table <- data.frame(lapply(table, function(x) {
x <- as.character(x)
x[is.na(x)] <- ""
gsub("NA", "", x)
}))
names(table) <- names(table2)[2:11]
rownames(table) <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B","Videos","\u200B\u200B\u200B\u200B\u200B","Game","\u200B\u200B\u200B\u200B\u200B\u200B")
kable(table,format = "markdown", digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
scroll_box(width = "100%", height = "500px")
# of Obs | Misinfo Sharing | Non-misinfo Sharing | Discernment Sharing | Misinfo Reliability | Non-misinfo Reliability | Discernment Reliability | Misinfo Manipulation | Non-misinfo Manipulation | Discernment Manipulation | |
---|---|---|---|---|---|---|---|---|---|---|
All | 20,412 | 0.604 | 0.750 | 0.1378 | 2.64 | 3.05 | 0.244 | 2.86 | 2.48 | 1.56 |
| (0.0027) | (0.0025) | (0.0028) | (0.0092) | (0.0100) | (0.0082) | (0.0095) | (0.0098) | (0.0094) | |
Placebo | 5,997 | 0.640 | 0.764 | 0.1300 | 2.81 | 3.17 | 0.218 | 2.86 | 2.55 | 1.54 |
| (0.0049) | (0.0045) | (0.0054) | (0.0174) | (0.0185) | (0.0154) | (0.0174) | (0.0185) | (0.0176) | |
Treatments | 14,415 | 0.589 | 0.744 | 0.1464 | 2.58 | 3.00 | 0.255 | 2.86 | 2.45 | 1.57 |
| (0.0033) | (0.0030) | (0.0033) | (0.0108) | (0.0118) | (0.0098) | (0.0113) | (0.0115) | (0.0111) | |
Text Course | 5,009 | 0.381 | 0.640 | 0.1172 | 2.21 | 2.87 | 0.397 | 3.06 | 2.48 | 1.74 |
| (0.0052) | (0.0055) | (0.0050) | (0.0165) | (0.0195) | (0.0160) | (0.0192) | (0.0194) | (0.0192) | |
Videos | 4,855 | 0.671 | 0.806 | 0.2443 | 2.76 | 3.16 | 0.244 | 2.80 | 2.43 | 1.54 |
| (0.0053) | (0.0047) | (0.0059) | (0.0189) | (0.0204) | (0.0161) | (0.0191) | (0.0199) | (0.0179) | |
Game | 4,551 | 0.736 | 0.794 | 0.0559 | 2.79 | 2.98 | 0.107 | 2.68 | 2.44 | 1.41 |
| (0.0050) | (0.0050) | (0.0055) | (0.0198) | (0.0213) | (0.0185) | (0.0196) | (0.0206) | (0.0202) |
# make it tex code
table$col_1 <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B","Videos","\u200B\u200B\u200B\u200B\u200B","Game","\u200B\u200B\u200B\u200B\u200B\u200B")
table <- table[, c("col_1", setdiff(names(table), "col_1"))]
latex_code <- paste0(apply(table, 1, function(row) paste(row, collapse = " & ")), " \\\\\n")
latex_code <- c(paste("","\\# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation \\\\ \\hline ", sep = " & "),latex_code)
latex_code <- gsub("\\[\\d+\\]", "", latex_code)
# save .tex
write.table(latex_code, file = "table2.tex", sep = " & ", quote = FALSE, row.names = FALSE, col.names = FALSE)
df_chat <- df %>%
select(
-matches("manipulative_coded"),
-matches("reliable_coded"),
-matches("manipulative_coded_num"),
-matches("reliable_coded_num"),
-matches("share_coded")
) %>%
rename(condition = arm_coded) %>%
filter(phase_coded=="Phase 2")
df_chat <- df_chat %>%
mutate(treatments = ifelse(condition != "Short Baseline 2",1,0),
completed = ifelse(!is.na(baseline_learn),baseline_learn,
ifelse(!is.na(sms_learn),sms_learn,
NA))) %>%
mutate(completed_coded = ifelse(!is.na(completed),1,0)) %>%
mutate(time_completion_survey = misinfoQuiz_end_time - `signed up`) %>%
mutate(completed_survey = ifelse(!is.na(time_completion_survey),1,0)) %>%
mutate(completed_coded = ifelse(!is.na(baseline_learn),baseline_learn,
ifelse(!is.na(sms_learn),sms_learn,
NA))) %>%
mutate(completed_binary = ifelse(!is.na(completed_coded),1,0))
# 12 questions asking about manipulative or reliable content
unique_responses_manip_reliable <- df_chat %>%
select(contains("_manipulative"), ends_with("reliable")) %>%
unlist() %>%
unique()
unique_responses_share <- df %>%
select(contains("_share")) %>%
unlist() %>%
unique()
valid_responses_5point <- c("1 - Not at all", "2", "3", "4", "5 - Very", 1, 2, 3, 4, 5, "5- Very", "5-Very", "Very5", "5 very", "1-Not at all")
valid_responses_share <- c("Yes", "No", "yes", "no", "YES")
df_chat <- df_chat %>%
mutate_at(.vars = vars(contains(c("_manipulative"))),
.funs = funs(case_when(. %in% valid_responses_5point ~ .,
TRUE ~ NA_character_))) %>%
mutate_at(.vars = vars(ends_with(c("_reliable"))), # ends_with to exclude manipulative_reliable_order
.funs = funs(case_when(. %in% valid_responses_5point ~ .,
TRUE ~ NA_character_))) %>%
mutate_at(.vars = vars(contains(c("_share"))),
.funs = funs(case_when(. %in% valid_responses_share ~ .,
TRUE ~ NA_character_)))
df_particpants_posts <- df_chat %>%
pivot_longer(cols = contains("post_id"),
names_pattern = "post_id_(.*)",
names_to = "post_id",
values_to = "post") %>%
select(anon_id, post_id, post)
df_outcomes <- df_chat %>%
pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
names_to = c("post_id", "variable"),
names_pattern = "post_(.*)_(.*)") %>%
pivot_wider(names_from = variable,
values_from = value)
df_chat_long <- left_join(df_outcomes %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))
# recode outcomes to numeric
df_chat_long <- df_chat_long %>%
mutate_at(.vars = vars(manipulative, reliable),
.funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
. == "5 - Very" | . == "5- Very" | . == "5-Very" | . == "Very5" | . == "5 very" ~ 5,
TRUE ~ as.numeric(.)))) %>%
mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1,
share == "No" | share == "no" ~ 0))
# Table 2
#Table with columns ("# of Obs", "Misinfo Sharing", "Non-misinfo Sharing", "Discernment", "Misinfo Reliability", "Non-misinfo Reliability", "Discernment", "Misinfo Manip", "Non-misinfo Manip", "Discernment") that report the mean of survey outcome for completers above and the standard error in parentheses, and rows ("All", "Placebo", "Treatments", "Text Course", "Videos", "Game")
# restrict to completers
df_completed <- df_chat %>%
filter(completed_binary == 1 & consent == "I consent, start now")
df_completed_long <- df_completed %>%
pivot_longer(cols = c(contains(c("_manipulative", "_share")), ends_with("_reliable")),
names_to = c("post_id", "variable"),
names_pattern = "post_(.*)_(.*)") %>%
pivot_wider(names_from = variable,
values_from = value)
df_particpants_posts <- df_completed %>%
pivot_longer(cols = contains("post_id"),
names_pattern = "post_id_(.*)",
names_to = "post_id",
values_to = "post") %>%
select(anon_id, post_id, post)
df_completed_long <- left_join(df_completed_long %>% select(-contains("post_id_")), df_particpants_posts, by = c("anon_id", "post_id"))
# recode outcomes to numeric
df_completed_long <- df_completed_long %>%
mutate_at(.vars = vars(manipulative, reliable),
.funs = funs(case_when(. == "1 - Not at all" | . == "1-Not at all" ~ 1,
. == "5 - Very" | . == "5- Very" | . == "5-Very" | . == "Very5" | . == "5 very" ~ 5,
TRUE ~ as.numeric(.)))) %>%
mutate(share = case_when(share == "Yes" | share == "yes" | share == "YES" ~ 1,
share == "No" | share == "no" ~ 0))
# shares can be codified to include other responses e.g. "Yeah" or "No I will not share"
# Create all and substitute placebo
df_completed_long$condition[df_completed_long$condition == "Short Baseline 2"] <- "placebo"
df_completed_long$treatments <- ifelse(df_completed_long$condition != "placebo",1,0)
# Post contains misinformation
df_completed_long$post_contains_misinfo <- ifelse(!grepl("^attention_check|^non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_misinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_misinfo[is.na(df_completed_long$post)] <- NA
# Post contains non-misinformation
df_completed_long$post_contains_nonmisinfo <- ifelse(grepl("non_misinfo", df_completed_long$post),1,0)
df_completed_long$post_contains_nonmisinfo[grepl("^attention_check",df_completed_long$post)==TRUE] <- NA
df_completed_long$post_contains_nonmisinfo[is.na(df_completed_long$post)] <- NA
# Equal weighting throughout
# number of misinformation posts user encountered
completed_coded <- df_completed_long %>%
group_by(anon_id,post_contains_misinfo,post_contains_nonmisinfo) %>%
summarise(posts_misinfo = ifelse(post_contains_misinfo==1,n(),NA),
posts_nonmisinfo = ifelse(post_contains_nonmisinfo==1,n(),NA)) %>%
ungroup() %>%
select(anon_id,posts_misinfo,posts_nonmisinfo) %>%
distinct() %>%
filter(!is.na(posts_misinfo) | !is.na(posts_nonmisinfo))
# Discernment: sharing
# every user sees 3 misinfo posts and 2 non-misinfo
df_completed_long$disc_sharing =
ifelse(df_completed_long$post_contains_nonmisinfo==1 & df_completed_long$share==1,1*(3/2),
ifelse(df_completed_long$post_contains_misinfo==1 & df_completed_long$share==1,-1,0))
# Discernment: reliability
df_completed_long$disc_reliability =
ifelse(df_completed_long$post_contains_nonmisinfo==1,df_completed_long$reliable*(3/2),
ifelse(df_completed_long$post_contains_misinfo==1,-1*df_completed_long$reliable,0))
# Discernment: manipulation
df_completed_long$disc_manipulation =
ifelse(df_completed_long$post_contains_nonmisinfo==1,-1*df_completed_long$manipulative,
ifelse(df_completed_long$post_contains_misinfo==1,df_completed_long$manipulative*(3/2),0))
# "Misinfo Sharing"
misinfo_sharing_ind <- df_completed_long %>%
filter(post_contains_misinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(),
se = se_binary(share, na.rm = TRUE))
table2_misinfo_sharing <- misinfo_sharing_ind %>%
group_by(condition) %>%
summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(),
se = se_cont(share_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- misinfo_sharing_ind %>%
summarise(condition = "All",
mean = mean(share_rate, na.rm = TRUE),nobs = n(),
se = se_cont(share_rate, na.rm = TRUE))
aux_all <- data.frame("condition" = "All",
"mean" = mean(misinfo_sharing_ind$share_rate, na.rm = TRUE),
"nobs" = nrow(misinfo_sharing_ind[misinfo_sharing_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(misinfo_sharing_ind$share_rate, na.rm = TRUE))
aux_treatments <- misinfo_sharing_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(share_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(share_rate, na.rm = TRUE))
table2_misinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_misinfo_sharing)
# "Non-misinfo Sharing"
nonmisinfo_sharing_ind <- df_completed_long %>%
filter(post_contains_nonmisinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(share_rate = mean(share, na.rm = TRUE), nobs = n(),
se = se_binary(share, na.rm = TRUE))
table2_nonmisinfo_sharing <- nonmisinfo_sharing_ind %>%
group_by(condition) %>%
summarise(mean = mean(share_rate, na.rm = TRUE), nobs = n(),
se = se_cont(share_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE),
"nobs" = nrow(nonmisinfo_sharing_ind[nonmisinfo_sharing_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(nonmisinfo_sharing_ind$share_rate, na.rm = TRUE))
aux_treatments <- nonmisinfo_sharing_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(share_rate, na.rm = TRUE),nobs = n(),
se = se_cont(share_rate, na.rm = TRUE))
table2_nonmisinfo_sharing <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_sharing)
# "Discernment"
# Discernment defined as Prop. Non-Misinfo Shared - Prop. Misinfo shared (i.e., equal weighting).
sharing_ind <- df_completed_long %>%
select(anon_id, condition, treatments, post_contains_misinfo, post_contains_nonmisinfo, share, post) %>%
group_by(anon_id, condition, treatments) %>%
summarise(
posts_misinfo_share = sum(share == 1 & post_contains_misinfo == 1, na.rm = TRUE)/3,
posts_nonmisinfo_share = sum(share == 1 & post_contains_nonmisinfo == 1, na.rm = TRUE)/2) %>%
ungroup()
sharing_ind$disc_diff_prop = sharing_ind$posts_nonmisinfo_share - sharing_ind$posts_misinfo_share
sharing_disc_ind <- sharing_ind %>%
group_by(anon_id, condition, treatments) %>%
summarise(disc_rate = mean(disc_diff_prop, na.rm = TRUE), nobs = n(),
se = se_cont(disc_diff_prop, na.rm = TRUE))
table2_disc_sharing <- aggregate(disc_rate~condition,sharing_disc_ind, mean)
names(table2_disc_sharing) <- c("condition","mean")
table2_disc_sharing$nobs <- c(seq(nrow(sharing_disc_ind),length(sharing_disc_ind$condition)))
table2_disc_sharing$se <- aggregate(disc_rate~condition,sharing_disc_ind, se_cont)[,2]
# all
aux_all <- data.frame("condition" = "All",
"mean" = mean(sharing_disc_ind$disc_rate, na.rm = TRUE),
"nobs" = nrow(sharing_disc_ind[sharing_disc_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(sharing_disc_ind$disc_rate, na.rm = TRUE))
# treatments
aux_treatments <- sharing_disc_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(disc_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_sharing <- bind_rows(aux_all,aux_treatments,table2_disc_sharing)
# "Misinfo Reliability"
misinfo_reliability_ind <- df_completed_long %>%
filter(post_contains_misinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(),
se = se_cont(reliable, na.rm = TRUE))
table2_misinfo_reliability <- misinfo_reliability_ind %>%
group_by(condition) %>%
summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(misinfo_reliability_ind$reliable_rate, na.rm = TRUE),
"nobs" = nrow(misinfo_reliability_ind[misinfo_reliability_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(misinfo_reliability_ind$reliable_rate, na.rm = TRUE))
aux_treatments <- misinfo_reliability_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(reliable_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE))
table2_misinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_misinfo_reliability)
# "NonMisinfo Reliability"
nonmisinfo_reliability_ind <- df_completed_long %>%
filter(post_contains_nonmisinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(reliable_rate = mean(reliable, na.rm = TRUE), nobs = n(),
se = se_cont(reliable, na.rm = TRUE))
table2_nonmisinfo_reliability <- nonmisinfo_reliability_ind %>%
group_by(condition) %>%
summarise(mean = mean(reliable_rate, na.rm = TRUE), nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE),
"nobs" = nrow(nonmisinfo_reliability_ind[nonmisinfo_reliability_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(nonmisinfo_reliability_ind$reliable_rate, na.rm = TRUE))
aux_treatments <- nonmisinfo_reliability_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(reliable_rate, na.rm = TRUE),nobs = n(),
se = se_cont(reliable_rate, na.rm = TRUE))
table2_nonmisinfo_reliability <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_reliability)
# "Discernment"
reliability_disc_ind <- df_completed_long %>%
group_by(anon_id, condition, treatments) %>%
summarise(disc_rate = mean(disc_reliability, na.rm = TRUE), nobs = n(),
se = se_cont(disc_reliability, na.rm = TRUE))
table2_disc_reliability <- reliability_disc_ind %>%
group_by(condition) %>%
summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(reliability_disc_ind$disc_rate, na.rm = TRUE),
"nobs" = nrow(reliability_disc_ind[reliability_disc_ind$condition %in% c("placebo", "Game", "Video", "SMS"), ]),
"se" = se_cont(reliability_disc_ind$disc_rate, na.rm = TRUE))
aux_treatments <- reliability_disc_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_reliability <- bind_rows(aux_all,aux_treatments,table2_disc_reliability)
# "Misinfo Manipulation"
misinfo_manipulation_ind <- df_completed_long %>%
filter(post_contains_misinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(),
se = se_cont(manipulative, na.rm = TRUE))
table2_misinfo_manipulation <- misinfo_manipulation_ind %>%
group_by(condition) %>%
summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
"nobs" = nrow(misinfo_manipulation_ind[misinfo_manipulation_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(misinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))
aux_treatments <- misinfo_manipulation_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(manipulation_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE))
table2_misinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_misinfo_manipulation)
# "NonMisinfo Manipulation"
nonmisinfo_manipulation_ind <- df_completed_long %>%
filter(post_contains_nonmisinfo == 1) %>%
group_by(anon_id, condition, treatments) %>%
summarise(manipulation_rate = mean(manipulative, na.rm = TRUE), nobs = n(),
se = se_cont(manipulative, na.rm = TRUE))
table2_nonmisinfo_manipulation <- nonmisinfo_manipulation_ind %>%
group_by(condition) %>%
summarise(mean = mean(manipulation_rate, na.rm = TRUE), nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE),
"nobs" = nrow(nonmisinfo_manipulation_ind[nonmisinfo_manipulation_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(nonmisinfo_manipulation_ind$manipulation_rate, na.rm = TRUE))
aux_treatments <- nonmisinfo_manipulation_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(manipulation_rate, na.rm = TRUE),
nobs = n(),
se = se_cont(manipulation_rate, na.rm = TRUE))
table2_nonmisinfo_manipulation <- bind_rows(aux_all,aux_treatments,table2_nonmisinfo_manipulation)
# "Discernment"
manipulation_disc_ind <- df_completed_long %>%
group_by(anon_id, condition, treatments) %>%
summarise(disc_rate = mean(disc_manipulation, na.rm = TRUE), nobs = n(),
se = se_cont(disc_manipulation, na.rm = TRUE))
table2_disc_manipulation <- manipulation_disc_ind %>%
group_by(condition) %>%
summarise(mean = mean(disc_rate, na.rm = TRUE), nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE)) %>%
filter(!is.na(condition))
aux_all <- data.frame("condition" = "All",
"mean" = mean(manipulation_disc_ind$disc_rate, na.rm = TRUE),
"nobs" = nrow(manipulation_disc_ind[manipulation_disc_ind$condition %in% c("placebo", "Long Baseline", "SMS"), ]),
"se" = se_cont(manipulation_disc_ind$disc_rate, na.rm = TRUE))
aux_treatments <- manipulation_disc_ind %>%
filter(treatments == 1) %>%
group_by(treatments) %>%
summarise(condition = "Treatments",
mean = mean(disc_rate, na.rm = TRUE),nobs = n(),
se = se_cont(disc_rate, na.rm = TRUE))
table2_disc_manipulation <- bind_rows(aux_all,aux_treatments,table2_disc_manipulation)
# Final table
table2 <- data.frame("condition" = table2_misinfo_sharing$condition,
"# of Obs" = table2_misinfo_sharing$nobs,
"Misinfo Sharing" = table2_misinfo_sharing$mean,
"Non-misinfo Sharing" = table2_nonmisinfo_sharing$mean,
"Discernment Sharing" = table2_disc_sharing$mean,
"Misinfo Reliability" = table2_misinfo_reliability$mean,
"Non-misinfo Reliability" = table2_nonmisinfo_reliability$mean,
"Discernment Reliability" = table2_disc_reliability$mean,
"Misinfo Manipulation" = table2_misinfo_manipulation$mean,
"Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$mean,
"Discernment Manipulation" = table2_disc_manipulation$mean)
table2_se <- data.frame("condition" = table2_misinfo_sharing$condition,
"# of Obs" = table2_misinfo_sharing$nobs,
"Misinfo Sharing" = table2_misinfo_sharing$se,
"Non-misinfo Sharing" = table2_nonmisinfo_sharing$se,
"Discernment Sharing" = table2_disc_sharing$se,
"Misinfo Reliability" = table2_misinfo_reliability$se,
"Non-misinfo Reliability" = table2_nonmisinfo_reliability$se,
"Discernment Reliability" = table2_disc_reliability$se,
"Misinfo Manipulation" = table2_misinfo_manipulation$se,
"Non-misinfo Manipulation" = table2_nonmisinfo_manipulation$se,
"Discernment Manipulation" = table2_disc_manipulation$se)
# add parentheses to s.e.
#order_vector <- c("20,689", " 6,078", "14,611", " 5,083", " 4,904"," 4,624")
#table2_se <- table2_se[order(match(table2_se$X..of.Obs, order_vector)), ]
#table2_se$order <- seq(nrow(table2_se))
table2_se <- table2_se %>% mutate(order = ifelse(grepl("All",condition), 1,
ifelse(grepl("placebo",condition), 2,
ifelse(grepl("Treatments",condition), 3,
ifelse(grepl("Long Baseline",condition), 4,
ifelse(grepl("SMS",condition), 5,
NA))))))
table2_se$priority <- 2
table2_se$X..of.Obs <- NA
table2_se[,1:11] <- data.frame(lapply(table2_se[,1:11], function(col) sapply(col, add_parentheses)))
# format table 2
table2 <- data.frame(lapply(table2, function(x) format(x, big.mark = ",", digits = 3)))
#table2 <- table2[order(match(table2$X..of.Obs, order_vector)), ]
#table2$order <- seq(nrow(table2))
table2 <- table2 %>% mutate(order = ifelse(grepl("All",condition), 1,
ifelse(grepl("placebo",condition), 2,
ifelse(grepl("Treatments",condition), 3,
ifelse(grepl("Long Baseline",condition), 4,
ifelse(grepl("SMS",condition), 5,
NA))))))
table2$priority <- 1
# bind rows
names(table2) <- c("condition","# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation","order","priority")
names(table2_se) <- names(table2)
table <- rbind(table2,table2_se)
table <- table[order(table$order, table$priority), ]
table <- table %>% select(-c(order,priority))
table <- table %>% select(-c(condition))
# remove NA
table <- data.frame(lapply(table, function(x) {
x <- as.character(x)
x[is.na(x)] <- ""
gsub("NA", "", x)
}))
names(table) <- names(table2)[2:11]
rownames(table) <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Long Baseline","\u200B\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B\u200B\u200B")
kable(table,format = "markdown", digits = 3) %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
scroll_box(width = "100%", height = "500px")
# of Obs | Misinfo Sharing | Non-misinfo Sharing | Discernment Sharing | Misinfo Reliability | Non-misinfo Reliability | Discernment Reliability | Misinfo Manipulation | Non-misinfo Manipulation | Discernment Manipulation | |
---|---|---|---|---|---|---|---|---|---|---|
All | 30,974 | 0.516 | 0.707 | 0.183 | 2.54 | 3.06 | 0.306 | 3.03 | 2.55 | 1.70 |
| (0.0022) | (0.0021) | (0.0023) | (0.0072) | (0.0078) | (0.0062) | (0.0075) | (0.0078) | (0.0072) | |
Placebo | 5,072 | 0.637 | 0.759 | 0.246 | 2.77 | 3.12 | 0.198 | 2.91 | 2.53 | 1.60 |
| (0.0053) | (0.0050) | (0.0034) | (0.0184) | (0.0200) | (0.0162) | (0.0188) | (0.0195) | (0.0184) | |
Treatments | 25,902 | 0.493 | 0.697 | 0.197 | 2.50 | 3.04 | 0.327 | 3.06 | 2.56 | 1.72 |
| (0.0024) | (0.0023) | (0.0026) | (0.0078) | (0.0085) | (0.0066) | (0.0082) | (0.0085) | (0.0078) | |
Long Baseline | 10,579 | 0.639 | 0.771 | 0.127 | 2.86 | 3.25 | 0.243 | 2.97 | 2.60 | 1.61 |
| (0.0037) | (0.0033) | (0.0037) | (0.0127) | (0.0134) | (0.0111) | (0.0129) | (0.0136) | (0.0123) | |
Text Course | 15,323 | 0.392 | 0.646 | 0.114 | 2.26 | 2.90 | 0.384 | 3.12 | 2.52 | 1.79 |
| (0.0030) | (0.0031) | (0.0054) | (0.0093) | (0.0108) | (0.0082) | (0.0106) | (0.0108) | (0.0101) |
# make it tex code
table$col_1 <- c("All","\u200B","Placebo","\u200B\u200B","Treatments","\u200B\u200B\u200B","Long Baseline","\u200B\u200B\u200B\u200B","Text Course","\u200B\u200B\u200B\u200B\u200B\u200B")
table <- table[, c("col_1", setdiff(names(table), "col_1"))]
latex_code <- paste0(apply(table, 1, function(row) paste(row, collapse = " & ")), " \\\\\n")
latex_code <- c(paste("","\\# of Obs", "Misinfo Sharing", "Non-misinfo Sharing","Discernment Sharing", "Misinfo Reliability", "Non-misinfo Reliability","Discernment Reliability","Misinfo Manipulation", "Non-misinfo Manipulation","Discernment Manipulation \\\\ \\hline ", sep = " & "),latex_code)
latex_code <- gsub("\\[\\d+\\]", "", latex_code)
#save.tex
write.table(latex_code, file = "table2.tex", sep = " & ", quote = FALSE, row.names = FALSE, col.names = FALSE)