Goal
Summary
- Key Takeaways
Experimental Design
Data
- Samples
- Outcomes and Covariates
  - All Outcomes
Setup
Summary statistics by sample
Bootstrap samples
Recycling

Goal

The goal of this document is to explore the performance of the conditional misinfo outcome for the inoculation against misinformation project.

Summary

In this script, I take as input the intermediate data set and output a table of standard errors averaged over 100 bootstrap samples of various size.

Key Takeaways

I test sample sizes from 7,000 to 10,000, where our full sample is actually 8,684 and the subset of users who shared at least one non-misinfo post in the pre-survey is 7,688, and find no case in which the standard errors are better for the conditional misinfo sharing rate compared to the (standard) post-misinfo sharing rate, either in levels or proportional to the Facts Baseline mean.

After reflecting, I believe the inherent issue is that we observe behavior on 3 posts for the conditional outcome and 6 posts for the standard outcome. Doubling the number of observations per user reduces the variance more than reducing the variance by eliminating a source of noise (topic interest).

Experimental Design

The goal of this experiment is to study the effect of different text message courses on misinformation sharing and discernment. The figure below diagrams the experimental design. We recruited participants through Facebook ads to complete a five-day text message course, plus a pre- and post-survey, for a mobile airtime payment. Participants who clicked an ad were directed to the pre-survey on Qualtrics, where they were first randomized on whether they would see an accuracy nudge in the pre- and post-survey. Conditional on completing the pre-survey, participants were randomized into one of the text message course interventions and enrolled in the course. Participants in all course interventions, except the No-course baseline, received one text message a day for five days starting on the day they completed the pre-survey. Participants in the No-course baseline received the Combo course after the post-survey to ensure we fulfilled our recruitment promise of a text message course. On the last day of the course, participants received a link to complete the post-survey on Qualtrics. Participants who completed the post-survey were paid KSH 500 (about $4 in U.S. dollars) in mobile airtime. Seven to eleven weeks later, participants who completed the post-survey were randomized on whether they would see a prime in the text message recruiting them to a follow-up survey. Participants who completed the follow-up survey were paid KSH 350 (about $3 in U.S. dollars).

knitr::include_graphics("./figures/Experimental Design.png")

Experimental Design

The pre- and post-surveys each contained 10 posts:

3 non-misinfo posts
6 misinfo posts (2 emotions, 2 reasoning, 2 combo)
1 attention check post

These posts were created by taking 15 facts/domains, then creating a non-misinfo version and 3 misinfo versions (emotions, reasoning, combo), for a total of 60 posts (plus 2 attention check posts). Participants saw each of the 15 facts at least once.

Whichever 3 facts/domains participants saw in the form of non-misinfo in the pre-survey, they saw in the form of misinfo in the post-survey, so these 3 facts/domains were repeated. Which facts/domains were repeated vary by participant.

For each post, participants were asked 2 questions:

Would you share this post? (yes/no)
To the best of your knowledge, how accurate is the claim in the above post? (4-point Likert scale)

Data

This script uses:

intermediate_data_wide.csv: all data at the user level. Includes treatment conditions, covariates, and individual-aggregated outcomes. Sample of 8,684 participants who completed the pre- and post-survey.

For more information about how this dataset is generated and a data dictionary, see memo.

Samples

This script primarily uses the sample of 7,688 participants who completed the post-survey and shared at least one non-misinfo post in the pre-survey, meaning that the conditional misinfo outcome is defined.

Outcomes and Covariates

I do not use covariates in this memo, and focus only on the conditional misinfo sharing rate and the post-misinfo sharing rate.

All Outcomes

Research Question: Do the treatment interventions…	Outcome Measures	Estimands	Notes
Sharing Outcomes
change misinfo sharing?	conditional misinfo sharing rate	See equation below.	Defined only in post.
	count of misinfo posts shared $Y^M$	$\frac{E[Y^{M, post}(1)]}{6} - \frac{E[Y^{M, post}(0)]}{6}$, $(\frac{E[Y^{M, post}(1)]}{6} - \frac{E[Y^{M, pre}(1)]}{6}) - (\frac{E[Y^{M, post}(0)]}{6} - \frac{E[Y^{M, pre}(0)]}{6})$	post and post-pre misinfo sharing rate
change non-misinfo sharing?	count of non-misinfo posts shared $Y^N$	$\frac{E[Y^{N, post}(1)]}{3} - \frac{E[Y^{N, post}(0)]}{3}$, $(\frac{E[Y^{N, post}(1)]}{6} - \frac{E[Y^{N, pre}(1)]}{6}) - (\frac{E[Y^{N, post}(0)]}{6} - \frac{E[Y^{N, pre}(0)]}{6})$	post and post-pre non-misinfo sharing rate
change sharing discernment?	count of misinfo posts shared $Y^M$, count of non-misinfo posts shared $Y^N$	$(\frac{E[Y^{N, post}(1)]}{3} - \frac{E[Y^{M, post}(1)]}{6} ) - (\frac{E[Y^{N, post}(0)]}{3} - \frac{E[Y^{M, post}(0)]}{6})$	post sharing discernment
Accuracy Outcomes
change accuracy scoring of misinfo posts?	total score on misinfo posts $A^{M_1}$	$\frac{E[A^{M_1, post}(1)]}{6} - \frac{E[A^{M_1, post}(0)]}{6}$, $(\frac{E[A^{M_1, post}(1)]}{6} - \frac{E[A^{M_1, pre}(1)]}{6}) - (\frac{E[A^{M_1, post}(0)]}{6} - \frac{E[A^{M_1, pre}(0)]}{6})$	post and post-pre misinfo accuracy score
	count of misinfo posts with accuracy score > 0 $A^{M_2}$	$\frac{E[A^{M_2, post}(1)]}{6} - \frac{E[A^{M_2, post}(0)]}{6}$, $(\frac{E[A^{M_2, post}(1)]}{6} - \frac{E[A^{M_2, pre}(1)]}{6}) - (\frac{E[A^{M_2, post}(0)]}{6} - \frac{E[A^{M_2, pre}(0)]}{6})$	post and post-pre misinfo binarized accuracy score
change accuracy scoring of non-misinfo posts?	total score on non-misinfo posts $A^{N_1}$	$\frac{E[A^{N_1, post}(1)]}{3} - \frac{E[A^{N_1, post}(0)]}{3}$, $(\frac{E[A^{N_1, post}(1)]}{6} - \frac{E[A^{N_1, pre}(1)]}{6}) - (\frac{E[A^{N_1, post}(0)]}{6} - \frac{E[A^{N_1, pre}(0)]}{6})$	post and post-pre non-misinfo accuracy score
	count of non-misinfo posts with accuracy score > 0 $A^{N_2}$	$\frac{E[A^{N_2, post}(1)]}{3} - \frac{E[A^{N_2, post}(0)]}{3}$, $(\frac{E[A^{N_2, post}(1)]}{6} - \frac{E[A^{N_2, pre}(1)]}{6}) - (\frac{E[A^{N_2, post}(0)]}{6} - \frac{E[A^{N_2, pre}(0)]}{6})$	post and post-pre non-misinfo accuracy score
change accuracy discernment?	total score on misinfo posts $A^M$, total score on non-misinfo posts $A^N$	$(\frac{E[A^{N, post}(1)]}{3} - \frac{E[A^{M, post}(1)]}{6} ) - (\frac{E[A^{N, post}(0)]}{3} - \frac{E[A^{M, post}(0)]}{6})$	post accuracy discernment

knitr::include_graphics("./figures/primary_sharing_equation.png")

Setup

Load packages

# Set CRAN mirror 
options(repos = c(CRAN = "https://cran.rstudio.com"))

packages = c(
  "tidyverse", "data.table", "dtplyr", "rlang", "kableExtra", "haven", "ggcorrplot", "visdat", "VIM", "corrplot", "kableExtra", "fastDummies", "causalTree", "grf", "rpart", "glmnet", "splines", "MASS", "lmtest", "sandwich", "ggplot2", "stingr", "estimatr", "gridExtra", "repr"
)

# Load the packages, install if necessary
new_packages = packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages, dependencies = TRUE)
lapply(packages, require, character.only = TRUE) |> invisible()

Set functions

I define here functions to compute standard errors after dropping missing values, a bootstrap sampling function, and a function to get standard errors for all bootstrap samples in a list.

# Define the function to calculate standard error after removing missing values
calculate_standard_error <- function(df, variable) {
  # Calculate the standard error
  std_error <- sd(df[[variable]], na.rm = TRUE) / sqrt(sum(!is.na(df[[variable]])))
  std_error <- round(std_error, 5)
  return(std_error)
}

# Define the bootstrap sampling function
bootstrap_sample <- function(df, size) {
  sample_df <- df %>% sample_n(size, replace = TRUE)
  return(sample_df)
}

# Create a function to get standard errors for all bootstrap samples in a list
get_standard_errors <- function(samples, variable) {
  # Calculate the standard error for each sample in the list
  std_errors <- map_dbl(samples, ~ calculate_standard_error(.x, variable))
  return(std_errors)
}

Set bootstrap sizes

# Define bootstrap sizes
bootstrap_sizes <- c(7500, 7774, 8000, 8200, 8684, 9000, 10000)

Load data

I also report sample sizes before and after restricting to the subset of participants who shared at least one non-misinfo post in the pre-survey, which is the sample on which I bootstrap, and the standard errors on each of the outcomes of interest: conditional misinfo sharing rate and all misinfo sharing rate.

df_wide <- read.csv("../intermediate_outcomes/intermediate_data_wide.csv")

# compute the number of rows in each dataset
nrow_wide <- nrow(df_wide)

# restrict df_wide to participants with base_rate_pre > 0
df_conditional <- df_wide %>%
  filter(base_rate_pre > 0)

# compute the number of rows in new dataset
nrow_conditional <- nrow(df_conditional)

# calculate standard error on each data set and outcome of interest
se_full_cond_share <- calculate_standard_error(df_wide, "pre_spec_outcome_rates")
se_full_misinfo <- calculate_standard_error(df_wide, "misinfo_post")
se_cond_cond_share <- calculate_standard_error(df_conditional, "pre_spec_outcome_rates")
se_cond_misinfo <- calculate_standard_error(df_conditional, "misinfo_post")

# print the standard error on pre_spec_outcome_rates and on misinfo_post for each data df_wide and df_conditional
paste("The standard error of the conditional misinfo sharing rate on the full dataset of ", nrow_wide, "users is ", se_full_cond_share, ", which matches the 0.0046 reported in the paper.")

## [1] "The standard error of the conditional misinfo sharing rate on the full dataset of  8684 users is  0.00464 , which matches the 0.0046 reported in the paper."

paste("The standard error of the misinfo sharing rate on the full dataset of ", nrow_wide, "users is ", se_full_misinfo, ", which matches the 0.0037 reported in the paper.")

## [1] "The standard error of the misinfo sharing rate on the full dataset of  8684 users is  0.00367 , which matches the 0.0037 reported in the paper."

paste("The standard error of the conditional misinfo sharing rate on the subset of ", nrow_conditional, "users who shared at least one non-misinfo post in the pre-survey is ", se_cond_cond_share, ".")

## [1] "The standard error of the conditional misinfo sharing rate on the subset of  7688 users who shared at least one non-misinfo post in the pre-survey is  0.00464 ."

paste("The standard error of the misinfo sharing rate on the subset of ", nrow_conditional, "users who shared at least one non-misinfo post in the pre-survey is ", se_cond_misinfo, ".")

## [1] "The standard error of the misinfo sharing rate on the subset of  7688 users who shared at least one non-misinfo post in the pre-survey is  0.00382 ."

Summary statistics by sample

# Create a table with rows for df_wide and df_conditional, and columsn that show the mean and standard error for each outcome pre_spec_outcome_rates and misinfo_post
summary_stats <- data.frame(
  dataset = c("df_wide", "df_conditional"),
  pre_spec_outcome_rates_mean = c(mean(df_wide$pre_spec_outcome_rates, na.rm = TRUE), mean(df_conditional$pre_spec_outcome_rates, na.rm = TRUE)),
  pre_spec_outcome_rates_se = c(calculate_standard_error(df_wide, "pre_spec_outcome_rates"), calculate_standard_error(df_conditional, "pre_spec_outcome_rates")),
  misinfo_post_mean = c(mean(df_wide$misinfo_post, na.rm = TRUE), mean(df_conditional$misinfo_post, na.rm = TRUE)),
  misinfo_post_se = c(calculate_standard_error(df_wide, "misinfo_post"), calculate_standard_error(df_conditional, "misinfo_post"))
)

print(summary_stats)

##          dataset pre_spec_outcome_rates_mean pre_spec_outcome_rates_se
## 1        df_wide                     0.51485                   0.00464
## 2 df_conditional                     0.51485                   0.00464
##   misinfo_post_mean misinfo_post_se
## 1         0.4259174         0.00367
## 2         0.4614117         0.00382

Bootstrap samples

Take 100 samples for each bootstrap size from the subset of participants who shared at least one non-misinfo post in the pre-survey. Calculate the average standard error for each bootstrap size and each outcome.

bootstrap_size	average_standard_error_pre_spec	average_standard_error_misinfo	std_err_proportion_to_baseline_pre_spec	std_err_proportion_to_baseline_misinfo
7500	0.0046931	0.0038621	0.0083308	0.0076517
7774	0.0046107	0.0037947	0.0081846	0.0075182
8000	0.0045484	0.0037446	0.0080740	0.0074189
8200	0.0044906	0.0036940	0.0079714	0.0073187
8684	0.0043609	0.0035885	0.0077411	0.0071097
9000	0.0042838	0.0035276	0.0076043	0.0069890
10000	0.0040664	0.0033460	0.0072184	0.0066292

Recycling

# Create bootstrap function that samples with replacement
#bootstrap_sample <- function(df, size) {
#  sample_df <- df %>% sample_n(size, replace = TRUE)
  # Dynamically assign the sampled data frame to a variable name
 # assign(paste0("sample_", size), sample_df, envir = .GlobalEnv)
 # return(sample_df)
#}

# Define bootstrap sizes
#bootstrap_sizes <- c(8000, 8200, 8684)

# Create 100 bootstrap samples of size 8000, 8200, and 8684
#bootstrap_samples <- lapply(bootstrap_sizes, function(size) {
#  replicate(100, bootstrap_sample(df_conditional, size), simplify = FALSE)
#})

# create function to calculate standard errors
#calculate_standard_errors <- function(df,outcome) {
 # std_error <- sd(df[[outcome]]) / sqrt(nrow(df))
 # return(data.frame(standard_error = std_error))
#}

# Create a table with standard errors for each bootstrap sample
#table_standard_errors <- map_dfr(1:length(bootstrap_samples), function(i) {
 # size <- bootstrap_sizes[i]
 # samples <- bootstrap_samples[[i]]
  
  # Apply the calculate_standard_errors function to each sample
 # std_errors <- map_dfr(samples, ~ calculate_standard_errors(.x, "pre_spec_outcome"))
  
  # Add the bootstrap size to each row
 # std_errors <- std_errors %>%
 #   mutate(bootstrap_size = size)
  
 # return(std_errors)
#})

# Print the result
#print(table_standard_errors)

Exploring Conditional Misinfo Outcome for Inoculation against Misinformation

Kristine Koutout

2024-07-16

Goal

Summary

Key Takeaways

Experimental Design

Data

Samples

Outcomes and Covariates

All Outcomes

Setup

Load packages

Set functions

Set bootstrap sizes

Load data

Summary statistics by sample

Bootstrap samples

Recycling