Goal

The goal of this document is to provide comprehensive summary statistics for the inoculation against misinformation project that replaces analyses before April 2024.

Experimental Design

The goal of this experiment is to study the effect of different text message courses on misinformation sharing and discernment. The figure below diagrams the experimental design. We recruited participants through Facebook ads to complete a five-day text message course, plus a pre- and post-survey, for a mobile airtime payment. Participants who clicked an ad were directed to the pre-survey on Qualtrics, where they were first randomized on whether they would see an accuracy nudge in the pre- and post-survey. Conditional on completing the pre-survey, participants were randomized into one of the text message course interventions and enrolled in the course. Participants in all course interventions, except the No-course baseline, received one text message a day for five days starting on the day they completed the pre-survey. Participants in the No-course baseline received the Combo course after the post-survey to ensure we fulfilled our recruitment promise of a text message course. On the last day of the course, participants received a link to complete the post-survey on Qualtrics. Participants who completed the post-survey were paid KSH 500 (about $4 in U.S. dollars) in mobile airtime. Seven to eleven weeks later, participants who completed the post-survey were randomized on whether they would see a prime in the text message recruiting them to a follow-up survey. Participants who completed the follow-up survey were paid KSH 350 (about $3 in U.S. dollars).

knitr::include_graphics("./figures/Experimental Design.png")

Experimental Design

The pre- and post-surveys each contained 10 posts:

3 non-misinfo posts
6 misinfo posts (2 emotions, 2 reasoning, 2 combo)
1 attention check post

These posts were created by taking 15 facts/domains, then creating a non-misinfo version and 3 misinfo versions (emotions, reasoning, combo), for a total of 60 posts (plus 2 attention check posts). Participants saw each of the 15 facts at least once.

Whichever 3 facts/domains participants saw in the form of non-misinfo in the pre-survey, they saw in the form of misinfo in the post-survey, so these 3 facts/domains were repeated. Which facts/domains were repeated vary by participant.

For each posts, participants were asked 2 questions:

Would you share this post? (yes/no)
To the best of your knowledge, how accurate is the claim in the above post? (4-point Likert scale)

Data

In this script, we use two datasets.

intermediate_data_wide.csv: all data at the user level. Includes treatment conditions, covariates, and individual-aggregated outcomes. Sample of 8,684 participants who completed the pre- and post-survey.

intermediate_data_long_treatments.csv: all data at the user-post level. Includes treatment conditions, covariates, and individual-aggregated outcomes. Sample of 8,684 participants who completed the pre- and post-survey.

For more information about how these datasets are generated and a data dictionary, see memo.

Samples

This script primarily uses the sample of 8,684 participants who completed the post-survey.

For analyses involving the outcome referred to as the “Primary Misinfo Sharing Rate,” we drop participants who did not share any non-misinfo posts in the pre-survey, leaving a sample of 7,688 participants who completed the post-survey and intended to share at least 1 non-misinfo post in the pre-survey.

Outcomes and Covariates

All Outcomes

Research Question: Do the treatment interventions…	Outcome Measures	Estimands	Notes
Sharing Outcomes
change misinfo sharing?	primary misinfo sharing rate	See equation below.	Defined only in post.
	count of misinfo posts shared $Y^M$	$\frac{E[Y^{M, post}(1)]}{6} - \frac{E[Y^{M, post}(0)]}{6}$, $(\frac{E[Y^{M, post}(1)]}{6} - \frac{E[Y^{M, pre}(1)]}{6}) - (\frac{E[Y^{M, post}(0)]}{6} - \frac{E[Y^{M, pre}(0)]}{6})$	post and post-pre misinfo sharing rate
change non-misinfo sharing?	count of non-misinfo posts shared $Y^N$	$\frac{E[Y^{N, post}(1)]}{3} - \frac{E[Y^{N, post}(0)]}{3}$, $(\frac{E[Y^{N, post}(1)]}{6} - \frac{E[Y^{N, pre}(1)]}{6}) - (\frac{E[Y^{N, post}(0)]}{6} - \frac{E[Y^{N, pre}(0)]}{6})$	post and post-pre non-misinfo sharing rate
change sharing discernment?	count of misinfo posts shared $Y^M$, count of non-misinfo posts shared $Y^N$	$(\frac{E[Y^{N, post}(1)]}{3} - \frac{E[Y^{M, post}(1)]}{6} ) - (\frac{E[Y^{N, post}(0)]}{3} - \frac{E[Y^{M, post}(0)]}{6})$	post sharing discernment
Accuracy Outcomes
change accuracy scoring of misinfo posts?	total score on misinfo posts $A^{M_1}$	$\frac{E[A^{M_1, post}(1)]}{6} - \frac{E[A^{M_1, post}(0)]}{6}$, $(\frac{E[A^{M_1, post}(1)]}{6} - \frac{E[A^{M_1, pre}(1)]}{6}) - (\frac{E[A^{M_1, post}(0)]}{6} - \frac{E[A^{M_1, pre}(0)]}{6})$	post and post-pre misinfo accuracy score
	count of misinfo posts with accuracy score > 0 $A^{M_2}$	$\frac{E[A^{M_2, post}(1)]}{6} - \frac{E[A^{M_2, post}(0)]}{6}$, $(\frac{E[A^{M_2, post}(1)]}{6} - \frac{E[A^{M_2, pre}(1)]}{6}) - (\frac{E[A^{M_2, post}(0)]}{6} - \frac{E[A^{M_2, pre}(0)]}{6})$	post and post-pre misinfo binarized accuracy score
change accuracy scoring of non-misinfo posts?	total score on non-misinfo posts $A^{N_1}$	$\frac{E[A^{N_1, post}(1)]}{3} - \frac{E[A^{N_1, post}(0)]}{3}$, $(\frac{E[A^{N_1, post}(1)]}{6} - \frac{E[A^{N_1, pre}(1)]}{6}) - (\frac{E[A^{N_1, post}(0)]}{6} - \frac{E[A^{N_1, pre}(0)]}{6})$	post and post-pre non-misinfo accuracy score
	count of non-misinfo posts with accuracy score > 0 $A^{N_2}$	$\frac{E[A^{N_2, post}(1)]}{3} - \frac{E[A^{N_2, post}(0)]}{3}$, $(\frac{E[A^{N_2, post}(1)]}{6} - \frac{E[A^{N_2, pre}(1)]}{6}) - (\frac{E[A^{N_2, post}(0)]}{6} - \frac{E[A^{N_2, pre}(0)]}{6})$	post and post-pre non-misinfo accuracy score
change accuracy discernment?	total score on misinfo posts $A^M$, total score on non-misinfo posts $A^N$	$(\frac{E[A^{N, post}(1)]}{3} - \frac{E[A^{M, post}(1)]}{6} ) - (\frac{E[A^{N, post}(0)]}{3} - \frac{E[A^{M, post}(0)]}{6})$	post accuracy discernment

knitr::include_graphics("./figures/primary_sharing_equation.png")

All Covariates

Demographics

knitr::include_graphics("./figures/demographics_table.png")

Demographics Definitions

Pre-treatment variables

Description	Variable Construction
Pre misinfo sharing rate	$\frac{E[Y^{M, pre}]}{6}$
Pre non-misinfo sharing rate	$\frac{E[Y^{N, pre}]}{3}$
Pre sharing discernment	$\frac{E[Y^{N, pre}(1)]}{3} - \frac{E[Y^{M, pre}(1)]}{6}$
Pre misinfo accuracy score	$\frac{E[A^{M, pre}]}{6}$
Pre non-misinfo accuracy score	$\frac{E[A^{N, pre}]}{3}$
Pre accuracy discernment	$\frac{E[A^{N, pre}(1)]}{3} - \frac{E[A^{M, pre}(1)]}{6}$
Pre attention check passing	$1[\text{passed attention check}]$

Setup

Load packages

packages = c(
  "tidyverse", "data.table", "dtplyr", "rlang", "kableExtra", "haven", "ggcorrplot", "visdat", "VIM", "corrplot", "kableExtra", "fastDummies"
)

# Load the packages, install if necessary
new_packages = packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages, dependencies = TRUE)
lapply(packages, require, character.only = TRUE) |> invisible()

Load data

df_wide <- read.csv("../intermediate_outcomes/intermediate_data_wide.csv")
df_long <- read.csv("../intermediate_outcomes/intermediate_data_long.csv") %>% filter(pre_post != "followup")

# compute the number of rows in each dataset
nrow_wide <- nrow(df_wide)
nrow_long <- nrow(df_long)

# print the number of rows in each dataset in a statement that says "There are x rows in the wide dataset and y rows in the long dataset."
paste("There are", nrow_wide, "rows in the wide dataset. There are", nrow_long, "rows in the long dataset, which is equal to the", nrow_wide*18, "expected for 18 user-post observations for each", nrow_wide, "user in the wide dataset.")

## [1] "There are 8684 rows in the wide dataset. There are 156312 rows in the long dataset, which is equal to the 156312 expected for 18 user-post observations for each 8684 user in the wide dataset."

Create functions

Standard error functions

# function to compute se for continuous variables
se_cont = function(x, na.rm=FALSE) {
  if (na.rm) x = na.omit(x)
  sqrt(var(x)/length(x))}

# function to compute se for binary variables
se_binary = function(x, na.rm=FALSE) {
  if (na.rm) x = na.omit(x)
  sqrt(mean(x)*(1-mean(x))/length(x))}

Summary statistics for continuous variables function

# function to generate summary statistics for continuous variable
summary_stats_continous <- function(dt, variable, group = NULL){
  # Get descriptive statistics for a single continuous variable
  variable_name = variable 
  variable = sym(variable)

  if(!is.null(group)){
    group_name = group
    group = sym(group)
    dt <- dt |> group_by({{group}})
  } # grouping if group variable is selected

  tab = dt |> lazy_dt() |>
    mutate(
      `Number_Missing` = sum(is.na({{variable}})),  # calculate missing values
    ) |>
    filter(!is.na({{variable}})) |>  # filtering out the missing values for the variable
    summarise(
        Total_Observations = n(),
        Number_Missing = first(`Number_Missing`),
        Mean = as.numeric(mean({{variable}})),
        SD = as.numeric(sd({{variable}})),
        SE = as.numeric(sd({{variable}}) / sqrt(n())),
        Min = as.numeric(min({{variable}})),
        Max =  as.numeric(max({{variable}})),
        Decile_10 = as.numeric(quantile({{variable}}, probs = 0.1)),
        Decile_20 = as.numeric(quantile({{variable}}, probs = 0.2)),
        Decile_30 = as.numeric(quantile({{variable}}, probs = 0.3)),
        Decile_40 = as.numeric(quantile({{variable}}, probs = 0.4)),
        Decile_50 = as.numeric(quantile({{variable}}, probs = 0.5)),
        Decile_60 = as.numeric(quantile({{variable}}, probs = 0.6)),
        Decile_70 = as.numeric(quantile({{variable}}, probs = 0.7)),
        Decile_80 = as.numeric(quantile({{variable}}, probs = 0.8)),
        Decile_90 = as.numeric(quantile({{variable}}, probs = 0.9))
      ) |> # compute summary statistics 
      mutate(
        Proportion_Missing = Number_Missing / Total_Observations,
        Variable = variable_name
      ) |> # adds a column indicating the proportion of missing values relative to the total observations.
      collect()

  if (is.null(group)){
    tab <- tab |> select(Variable, everything())
  } else {
    tab <- tab |> mutate(Group = paste(group_name, ": ", {{group}})) |> select(Variable, Group, everything()) |> select(-{{group}})
  }

  return(tab)
}

summary_stats_table_continous <- function(dt, variables, group = NULL,
  filename = NULL, print_html = TRUE, caption = NULL
  ){
  # Get descriptive statistics for a list of continuous variables and return a combined table
  results <- lapply(variables, function(x) summary_stats_continous(dt, x, group)) # apply the summary_stats_continous function to each variable in the list
  results <- bind_rows(results) # bind the list of dataframes into a single dataframe

  if(!is.null(filename)){
    write.csv(results, paste0("./tables/", filename)) # save as csv file
  }

  if (print_html){
    results = results |> 
      kable(digits = 3,caption = caption) |>
      kable_styling(bootstrap_options = c("striped", "hover")) |>
      scroll_box( height = "500px") # generate html table 
  }
  
  return(results)
}

Summary statistics for binary variables function

# Define the function to create summary statistics for binary variables
summary_stats_binary <- function(data, variables, group = NULL) {
  combined_results <- data.frame(
    Variable = character(),
    Group = character(),
    Frequency = integer(),
    Proportion = numeric(),
    stringsAsFactors = FALSE
  ) # preset a dataframe
  
  for (variable in variables) {
    if (!is.null(group)) {
      # Grouped calculations
      summary_df <- data %>%
        group_by_at(vars(c(group, variable))) %>%
        summarise(Frequency = sum(!!sym(variable)), .groups = 'drop') %>%
        mutate(Proportion = Frequency / sum(Frequency)) %>%
        ungroup() %>%
        mutate(Variable = variable,
               Group = ifelse(is.na(.data[[group]]), "Missing", as.character(.data[[group]]))) %>% 
        select(Variable, Group, Frequency, Proportion)
    } else {
      # Ungrouped calculations
      summary_df <- data %>%
        summarise(
          Frequency = sum(!!sym(variable)),
          Proportion = sum(!!sym(variable)) / nrow(data),
          .groups = 'drop' # Avoid regrouping warning
        ) %>%
        ungroup() %>%
        mutate(Variable = variable) %>% # Add variable name
        select(Variable, everything())
    }
    
    # Combine the result with the overall results
    combined_results <- bind_rows(combined_results, summary_df)
  }
  
  return(combined_results)
}

summary_stats_tab_binary <- function(dt, variables, group = NULL,
                                    filename = NULL, print_html = TRUE, caption = NULL
) {
  
  results <- lapply(variables, function(x) summary_stats_binary(dt, x, group))
  results <- bind_rows(results) 
  
  if(!is.null(filename)){
    write.csv(results, paste0("./tables/", filename)) # save as csv file
  }
  
  if (print_html){
    results = results |> 
      kable(digits = 3,caption = caption) |>
      kable_styling(bootstrap_options = c("striped", "hover")) |>
      scroll_box( height = "500px") # generate html table 
  }
  
  return(results)
}

Function for generating probability distribution

Density plot

create_density_plot <- function(data, x_var, group_var, x_min, x_max, title, x_label, y_label,  file_basename)  {
  plt <-
  ggplot(data, aes_string(x = x_var, group = group_var, fill = group_var)) +
    geom_density(alpha = 0.7, color = "black") +
    labs(title = title,
         x = x_label,
         y = y_label) +
    scale_x_continuous(limits = c(x_min, x_max)) +
    theme_minimal()+
    guides(fill = guide_legend(title = NULL))

  print(plt)

  # Save the plot to JPG
  ggsave(paste0("./generated_figures/", file_basename, ".jpg"), plt, device = "jpg", width = 10, height = 8)

}

Histogram

create_histogram_grouped <- function(data, x_var, group_var, bin_width, title, x_label, y_label, line_color = "black", file_basename) {
  # Generate the histogram
  plt <- ggplot(data, aes_string(x = x_var, fill = group_var)) +
    geom_histogram(binwidth = bin_width, color = line_color, position = "dodge", alpha = 0.5) +
    labs(title = title, x = x_label, y = y_label) +
    theme_minimal() +
    guides(fill = guide_legend(title = NULL))

  # Print the plot
  print(plt)

  # Save the plot to JPG
  ggsave(paste0("./generated_figures/", file_basename, ".jpg"), plt, device = "jpg", width = 10, height = 8)
  
}

Create variables lists

# list of continuous covariates
continuous_var_list <- c(
  "age", "social_media_hours", "base_rate_pre", "misinfo_pre", "new_disc_pre", "base_rate_post", "base_rate_diff", "misinfo_post", "misinfo_diff", "new_disc_post"
)

# list of binary variables
binary_var_list <- c(
  "att_check_pre", "att_check_post", "gender_Man", "education_High_school_or_less", "education_Some_college", "education_Bachelor_degree", "education_Graduate_degree", "marital_Married_or_in_a_domestic_partnership", "employment_Employed", "employment_Unemployed", "employment_Student", "location_Mostly_urban", "location_Suburban", "location_Mostly_rural", "religion_Christian", "religiosity_Attends", "social_media_bin_Yes", "social_media_share_80_100", "social_media_share_60_80", "social_media_share_40_60", "social_media_share_20_40", "social_media_share_0_20"
)

# list of variables that need histograms
histogram_var_list <- c(
  "base_rate_pre", "misinfo_pre", "new_disc_pre", "base_rate_post", "base_rate_diff", "misinfo_post", "misinfo_diff", "new_disc_post"
)

# list of variables that need density plots
pdf_var_list <- c(
  "age", "social_media_hours"
)

# list of demographic variables
demographic_var_list <- c(
  "gender_Man", "education_High_school_or_less", "education_Some_college", "education_Bachelor_degree", "education_Graduate_degree", "marital_Married_or_in_a_domestic_partnership", "employment_Employed", "employment_Unemployed", "employment_Student", "location_Mostly_urban", "location_Surburban", "location_Mostly_rural", "religion_Christian", "religiosity_Attends", "social_media_bin_Yes", "social_media_share_80_100", "social_media_share_60_80", "social_media_share_40_60", "social_media_share_20_40", "social_media_share_0_20"
)

#list of pre-treatment variables
pre_treat_var_list <- c(
  "base_rate_pre", "misinfo_pre", "new_disc_pre", "att_check_pre"
)


# list of outcomes
outcome_list <- c(
  "base_rate_post", "base_rate_diff", "misinfo_post", "misinfo_diff", "new_disc_post"
)

Full Sample Summary Statistics

Continuous variables

summary_stats_table_continous(df_wide,
  variables = continuous_var_list,
  caption = "Summary Statistics for Continous Variables",
  print_html = TRUE
  )

Summary Statistics for Continous Variables
Variable	Total_Observations	Mean	SD	SE	Min	Max	Decile_10	Decile_20	Decile_30	Decile_40	Decile_50	Decile_60	Decile_70	Decile_80	Decile_90
age	8684	26.397	7.592	0.081	1	150	20.000	21.000	22.000	23.000	25.000	26.000	28.000	30.000	35.000
social_media_hours	8684	5.779	3.713	0.040	0	24	2.000	3.000	4.000	4.000	5.000	6.000	6.000	8.000	10.000
base_rate_pre	8684	0.693	0.343	0.004	0	1	0.000	0.333	0.667	0.667	0.667	1.000	1.000	1.000	1.000
misinfo_pre	8684	0.546	0.331	0.004	0	1	0.000	0.167	0.333	0.500	0.500	0.667	0.833	0.833	1.000
new_disc_pre	8684	0.147	0.300	0.003	-1	1	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.333	0.500
base_rate_post	8684	0.574	0.377	0.004	0	1	0.000	0.000	0.333	0.333	0.667	0.667	1.000	1.000	1.000
base_rate_diff	8684	-0.119	0.394	0.004	-1	1	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.000	0.333
misinfo_post	8684	0.426	0.342	0.004	0	1	0.000	0.000	0.167	0.333	0.333	0.500	0.667	0.833	1.000
misinfo_diff	8684	-0.120	0.332	0.004	-1	1	-0.500	-0.333	-0.333	-0.167	0.000	0.000	0.000	0.167	0.333
new_disc_post	8684	0.148	0.302	0.003	-1	1	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500

# loop over a set of variables to generate pdfs
for (var in pdf_var_list) {
  create_density_plot(df_wide, var, NULL, NULL, NULL, paste("Density Plot of", var), var, "Density", paste("density_plot_", var))
}

# loop over a set of variables to generate histograms
for (var in histogram_var_list) {
  create_histogram_grouped(df_wide, var, NULL, NULL, paste("Histogram of", var), var, "Frequency", line_color = "black", paste("histogram_", var))
}

Binary variables

summary_stats_tab_binary(df_wide,  
  variables = binary_var_list,
  caption = "Summary Statistics for Binary Variables",
  print_html = TRUE
  )

Summary Statistics for Binary Variables
Variable	Group	Frequency	Proportion
att_check_pre	NA	4187	0.482
att_check_post	NA	4809	0.554
gender_Man	NA	5679	0.654
education_High_school_or_less	NA	2356	0.271
education_Some_college	NA	3087	0.355
education_Bachelor_degree	NA	3131	0.361
education_Graduate_degree	NA	110	0.013
marital_Married_or_in_a_domestic_partnership	NA	2827	0.326
employment_Employed	NA	2823	0.325
employment_Unemployed	NA	3483	0.401
employment_Student	NA	2378	0.274
location_Mostly_urban	NA	2475	0.285
location_Suburban	NA	3346	0.385
location_Mostly_rural	NA	2863	0.330
religion_Christian	NA	8179	0.942
religiosity_Attends	NA	8217	0.946
social_media_bin_Yes	NA	8626	0.993
social_media_share_80_100	NA	733	0.084
social_media_share_60_80	NA	2023	0.233
social_media_share_40_60	NA	2660	0.306
social_media_share_20_40	NA	1737	0.200
social_media_share_0_20	NA	1531	0.176

Outcomes by Groups & Time

By Treatment Group

summary_stats_table_continous(df_wide,
  variables = outcome_list,
  group = "treatment",
  caption = "Summary Statistics for Outcomes by Treatment Group",
  print_html = TRUE)

Summary Statistics for Outcomes by Treatment Group
Variable	Group	Total_Observations	Mean	SD	SE	Min	Max	Decile_10	Decile_20	Decile_30	Decile_40	Decile_50	Decile_60	Decile_70	Decile_80	Decile_90
base_rate_post	treatment : Combo	1740	0.522	0.379	0.009	0.000	1	0.000	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000
base_rate_post	treatment : Emotions	1836	0.519	0.375	0.009	0.000	1	0.000	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000
base_rate_post	treatment : Facts Baseline	1790	0.607	0.379	0.009	0.000	1	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000	1.000
base_rate_post	treatment : No-course Baseline	1628	0.680	0.354	0.009	0.000	1	0.000	0.333	0.667	0.667	0.667	1.000	1.000	1.000	1.000
base_rate_post	treatment : Reasoning	1690	0.550	0.374	0.009	0.000	1	0.000	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000
base_rate_diff	treatment : Combo	1740	-0.164	0.398	0.010	-1.000	1	-0.667	-0.333	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treatment : Emotions	1836	-0.170	0.418	0.010	-1.000	1	-0.667	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treatment : Facts Baseline	1790	-0.076	0.378	0.009	-1.000	1	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333	0.333
base_rate_diff	treatment : No-course Baseline	1628	-0.024	0.359	0.009	-1.000	1	-0.333	-0.333	0.000	0.000	0.000	0.000	0.000	0.333	0.333
base_rate_diff	treatment : Reasoning	1690	-0.153	0.390	0.009	-1.000	1	-0.667	-0.333	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
misinfo_post	treatment : Combo	1740	0.371	0.336	0.008	0.000	1	0.000	0.000	0.167	0.167	0.333	0.333	0.500	0.667	1.000
misinfo_post	treatment : Emotions	1836	0.353	0.318	0.007	0.000	1	0.000	0.000	0.167	0.167	0.333	0.333	0.500	0.667	0.833
misinfo_post	treatment : Facts Baseline	1790	0.465	0.352	0.008	0.000	1	0.000	0.167	0.167	0.333	0.500	0.500	0.667	0.833	1.000
misinfo_post	treatment : No-course Baseline	1628	0.546	0.334	0.008	0.000	1	0.000	0.167	0.333	0.500	0.500	0.667	0.833	0.833	1.000
misinfo_post	treatment : Reasoning	1690	0.404	0.335	0.008	0.000	1	0.000	0.000	0.167	0.167	0.333	0.500	0.667	0.667	1.000
misinfo_diff	treatment : Combo	1740	-0.175	0.335	0.008	-1.000	1	-0.667	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167
misinfo_diff	treatment : Emotions	1836	-0.188	0.345	0.008	-1.000	1	-0.667	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167
misinfo_diff	treatment : Facts Baseline	1790	-0.074	0.316	0.007	-1.000	1	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167	0.333
misinfo_diff	treatment : No-course Baseline	1628	-0.019	0.301	0.007	-1.000	1	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167	0.167	0.333
misinfo_diff	treatment : Reasoning	1690	-0.136	0.328	0.008	-1.000	1	-0.667	-0.333	-0.333	-0.167	-0.167	0.000	0.000	0.167	0.167
new_disc_post	treatment : Combo	1740	0.151	0.304	0.007	-1.000	1	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500
new_disc_post	treatment : Emotions	1836	0.166	0.304	0.007	-0.833	1	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.500	0.667
new_disc_post	treatment : Facts Baseline	1790	0.142	0.297	0.007	-1.000	1	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500
new_disc_post	treatment : No-course Baseline	1628	0.134	0.291	0.007	-0.833	1	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500
new_disc_post	treatment : Reasoning	1690	0.146	0.310	0.008	-0.667	1	-0.167	0.000	0.000	0.000	0.083	0.167	0.333	0.333	0.500

# loop over a set of variables and each treatment group to generate histograms

for (var in outcome_list) {
    create_histogram_grouped(df_wide, var, "treatment", NULL, paste("Histogram of", var, "by Treatment Group"), var, "Frequency", line_color = "black", paste("histogram_", var))
}

By Accuracy Group

summary_stats_table_continous(df_wide,
  variables = outcome_list,
  group = "accuracy",
  caption = "Summary Statistics for Outcomes by Accuracy Group",
  print_html = TRUE)

Summary Statistics for Outcomes by Accuracy Group
Variable	Group	Total_Observations	Mean	SD	SE	Min	Max	Decile_10	Decile_20	Decile_30	Decile_40	Decile_50	Decile_60	Decile_70	Decile_80	Decile_90
base_rate_post	accuracy : Accuracy After	4322	0.615	0.368	0.006	0	1	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000	1.000
base_rate_post	accuracy : Accuracy Inter	4362	0.533	0.383	0.006	0	1	0.000	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000
base_rate_diff	accuracy : Accuracy After	4322	-0.127	0.393	0.006	-1	1	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.000	0.333
base_rate_diff	accuracy : Accuracy Inter	4362	-0.111	0.395	0.006	-1	1	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.000	0.333
misinfo_post	accuracy : Accuracy After	4322	0.454	0.337	0.005	0	1	0.000	0.167	0.167	0.333	0.500	0.500	0.667	0.833	1.000
misinfo_post	accuracy : Accuracy Inter	4362	0.398	0.344	0.005	0	1	0.000	0.000	0.167	0.167	0.333	0.500	0.500	0.833	1.000
misinfo_diff	accuracy : Accuracy After	4322	-0.125	0.328	0.005	-1	1	-0.500	-0.333	-0.333	-0.167	-0.167	0.000	0.000	0.167	0.333
misinfo_diff	accuracy : Accuracy Inter	4362	-0.115	0.335	0.005	-1	1	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167	0.333
new_disc_post	accuracy : Accuracy After	4322	0.161	0.307	0.005	-1	1	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.500	0.500
new_disc_post	accuracy : Accuracy Inter	4362	0.135	0.296	0.004	-1	1	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500

# loop over a set of variables and each accuracy group to generate histograms

for (var in outcome_list) {
    create_histogram_grouped(df_wide, var, "accuracy", NULL, paste("Histogram of", var, "by Accuracy Group"), var, "Frequency", line_color = "black", paste("histogram_", var))
}

By Accuracy Group and Treatment Group

## Create a treatment accuracy group variable

df_wide <- df_wide |> mutate(treat_accuracy = paste(treatment, accuracy, sep = "_"))

summary_stats_table_continous(df_wide,
  variables = outcome_list,
  group = "treat_accuracy",
  caption = "Summary Statistics for Outcomes by Accuracy Group and Treatment Group",
  print_html = TRUE)

Summary Statistics for Outcomes by Accuracy Group and Treatment Group
Variable	Group	Total_Observations	Mean	SD	SE	Min	Max	Decile_10	Decile_20	Decile_30	Decile_40	Decile_50	Decile_60	Decile_70	Decile_80	Decile_90
base_rate_post	treat_accuracy : Combo_Accuracy After	859	0.559	0.377	0.013	0.000	1.000	0.000	0.000	0.333	0.333	0.667	0.667	1.000	1.000	1.000
base_rate_post	treat_accuracy : Combo_Accuracy Inter	881	0.486	0.379	0.013	0.000	1.000	0.000	0.000	0.333	0.333	0.333	0.667	0.667	1.000	1.000
base_rate_post	treat_accuracy : Emotions_Accuracy After	894	0.563	0.369	0.012	0.000	1.000	0.000	0.333	0.333	0.333	0.667	0.667	1.000	1.000	1.000
base_rate_post	treat_accuracy : Emotions_Accuracy Inter	942	0.477	0.375	0.012	0.000	1.000	0.000	0.000	0.333	0.333	0.333	0.667	0.667	1.000	1.000
base_rate_post	treat_accuracy : Facts Baseline_Accuracy After	886	0.657	0.364	0.012	0.000	1.000	0.000	0.333	0.333	0.667	0.667	1.000	1.000	1.000	1.000
base_rate_post	treat_accuracy : Facts Baseline_Accuracy Inter	904	0.558	0.387	0.013	0.000	1.000	0.000	0.000	0.333	0.333	0.667	0.667	1.000	1.000	1.000
base_rate_post	treat_accuracy : No-course Baseline_Accuracy After	827	0.719	0.332	0.012	0.000	1.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000	1.000	1.000
base_rate_post	treat_accuracy : No-course Baseline_Accuracy Inter	801	0.639	0.371	0.013	0.000	1.000	0.000	0.333	0.333	0.667	0.667	1.000	1.000	1.000	1.000
base_rate_post	treat_accuracy : Reasoning_Accuracy After	856	0.582	0.367	0.013	0.000	1.000	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000	1.000
base_rate_post	treat_accuracy : Reasoning_Accuracy Inter	834	0.518	0.379	0.013	0.000	1.000	0.000	0.000	0.333	0.333	0.667	0.667	0.667	1.000	1.000
base_rate_diff	treat_accuracy : Combo_Accuracy After	859	-0.166	0.401	0.014	-1.000	1.000	-0.667	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treat_accuracy : Combo_Accuracy Inter	881	-0.162	0.395	0.013	-1.000	1.000	-0.667	-0.333	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treat_accuracy : Emotions_Accuracy After	894	-0.175	0.423	0.014	-1.000	1.000	-0.667	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treat_accuracy : Emotions_Accuracy Inter	942	-0.165	0.413	0.013	-1.000	1.000	-0.667	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treat_accuracy : Facts Baseline_Accuracy After	886	-0.082	0.364	0.012	-1.000	1.000	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treat_accuracy : Facts Baseline_Accuracy Inter	904	-0.070	0.391	0.013	-1.000	1.000	-0.667	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333	0.333
base_rate_diff	treat_accuracy : No-course Baseline_Accuracy After	827	-0.037	0.354	0.012	-1.000	1.000	-0.333	-0.333	0.000	0.000	0.000	0.000	0.000	0.333	0.333
base_rate_diff	treat_accuracy : No-course Baseline_Accuracy Inter	801	-0.012	0.364	0.013	-1.000	1.000	-0.333	-0.333	0.000	0.000	0.000	0.000	0.000	0.333	0.333
base_rate_diff	treat_accuracy : Reasoning_Accuracy After	856	-0.170	0.396	0.014	-1.000	1.000	-0.667	-0.333	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
base_rate_diff	treat_accuracy : Reasoning_Accuracy Inter	834	-0.136	0.384	0.013	-1.000	1.000	-0.667	-0.333	-0.333	-0.333	0.000	0.000	0.000	0.000	0.333
misinfo_post	treat_accuracy : Combo_Accuracy After	859	0.387	0.327	0.011	0.000	1.000	0.000	0.000	0.167	0.167	0.333	0.500	0.500	0.667	0.867
misinfo_post	treat_accuracy : Combo_Accuracy Inter	881	0.356	0.344	0.012	0.000	1.000	0.000	0.000	0.167	0.167	0.333	0.333	0.500	0.667	1.000
misinfo_post	treat_accuracy : Emotions_Accuracy After	894	0.381	0.318	0.011	0.000	1.000	0.000	0.000	0.167	0.167	0.333	0.500	0.500	0.667	0.833
misinfo_post	treat_accuracy : Emotions_Accuracy Inter	942	0.326	0.316	0.010	0.000	1.000	0.000	0.000	0.167	0.167	0.333	0.333	0.500	0.667	0.833
misinfo_post	treat_accuracy : Facts Baseline_Accuracy After	886	0.504	0.345	0.012	0.000	1.000	0.000	0.167	0.333	0.333	0.500	0.667	0.667	0.833	1.000
misinfo_post	treat_accuracy : Facts Baseline_Accuracy Inter	904	0.427	0.354	0.012	0.000	1.000	0.000	0.000	0.167	0.333	0.333	0.500	0.667	0.833	1.000
misinfo_post	treat_accuracy : No-course Baseline_Accuracy After	827	0.578	0.320	0.011	0.000	1.000	0.167	0.333	0.333	0.500	0.667	0.667	0.833	0.833	1.000
misinfo_post	treat_accuracy : No-course Baseline_Accuracy Inter	801	0.513	0.345	0.012	0.000	1.000	0.000	0.167	0.333	0.333	0.500	0.667	0.667	1.000	1.000
misinfo_post	treat_accuracy : Reasoning_Accuracy After	856	0.427	0.334	0.011	0.000	1.000	0.000	0.000	0.167	0.333	0.333	0.500	0.667	0.833	1.000
misinfo_post	treat_accuracy : Reasoning_Accuracy Inter	834	0.380	0.335	0.012	0.000	1.000	0.000	0.000	0.167	0.167	0.333	0.333	0.500	0.667	1.000
misinfo_diff	treat_accuracy : Combo_Accuracy After	859	-0.187	0.331	0.011	-1.000	0.833	-0.667	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167
misinfo_diff	treat_accuracy : Combo_Accuracy Inter	881	-0.163	0.339	0.011	-1.000	1.000	-0.667	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167
misinfo_diff	treat_accuracy : Emotions_Accuracy After	894	-0.182	0.344	0.011	-1.000	0.833	-0.667	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167
misinfo_diff	treat_accuracy : Emotions_Accuracy Inter	942	-0.193	0.347	0.011	-1.000	1.000	-0.667	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167
misinfo_diff	treat_accuracy : Facts Baseline_Accuracy After	886	-0.078	0.309	0.010	-1.000	1.000	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167	0.333
misinfo_diff	treat_accuracy : Facts Baseline_Accuracy Inter	904	-0.070	0.322	0.011	-1.000	1.000	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167	0.333
misinfo_diff	treat_accuracy : No-course Baseline_Accuracy After	827	-0.026	0.298	0.010	-1.000	1.000	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167	0.167	0.333
misinfo_diff	treat_accuracy : No-course Baseline_Accuracy Inter	801	-0.011	0.303	0.011	-1.000	1.000	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.167	0.167	0.333
misinfo_diff	treat_accuracy : Reasoning_Accuracy After	856	-0.149	0.328	0.011	-1.000	0.833	-0.667	-0.333	-0.333	-0.167	-0.167	0.000	0.000	0.167	0.167
misinfo_diff	treat_accuracy : Reasoning_Accuracy Inter	834	-0.123	0.327	0.011	-1.000	1.000	-0.500	-0.333	-0.167	-0.167	0.000	0.000	0.000	0.000	0.167
new_disc_post	treat_accuracy : Combo_Accuracy After	859	0.172	0.315	0.011	-1.000	1.000	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.500	0.667
new_disc_post	treat_accuracy : Combo_Accuracy Inter	881	0.130	0.292	0.010	-0.833	1.000	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500
new_disc_post	treat_accuracy : Emotions_Accuracy After	894	0.182	0.315	0.011	-0.833	1.000	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.500	0.667
new_disc_post	treat_accuracy : Emotions_Accuracy Inter	942	0.151	0.293	0.010	-0.667	1.000	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500
new_disc_post	treat_accuracy : Facts Baseline_Accuracy After	886	0.154	0.293	0.010	-1.000	1.000	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.333	0.500
new_disc_post	treat_accuracy : Facts Baseline_Accuracy Inter	904	0.131	0.300	0.010	-1.000	1.000	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500
new_disc_post	treat_accuracy : No-course Baseline_Accuracy After	827	0.142	0.287	0.010	-0.833	1.000	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.333	0.500
new_disc_post	treat_accuracy : No-course Baseline_Accuracy Inter	801	0.125	0.295	0.010	-0.833	1.000	-0.167	0.000	0.000	0.000	0.000	0.167	0.167	0.333	0.500
new_disc_post	treat_accuracy : Reasoning_Accuracy After	856	0.155	0.321	0.011	-0.667	1.000	-0.167	0.000	0.000	0.000	0.167	0.167	0.333	0.500	0.667
new_disc_post	treat_accuracy : Reasoning_Accuracy Inter	834	0.137	0.298	0.010	-0.667	1.000	-0.167	0.000	0.000	0.000	0.000	0.167	0.333	0.333	0.500

# loop over a set of variables and each accuracy x treatment group to generate histograms

for (var in outcome_list) {
    create_histogram_grouped(df_wide, var, "treat_accuracy", NULL, paste("Histogram of", var, "by Accuracy and Treatment Group"), var, "Frequency", line_color = "black", paste("histogram_", var))
}

Summary Statistics for Inoculation against Misinformation

Kristine Koutout

2024-04-11

Goal

Experimental Design

Data

Samples

Outcomes and Covariates

All Outcomes

All Covariates

Demographics

Pre-treatment variables

Setup

Load packages

Load data

Create functions

Standard error functions

Summary statistics for continuous variables function

Summary statistics for binary variables function

Function for generating probability distribution

Density plot

Histogram

Create variables lists

Full Sample Summary Statistics

Continuous variables

Binary variables

Outcomes by Groups & Time

By Treatment Group

By Accuracy Group

By Accuracy Group and Treatment Group

By Time

Description	Variable Construction
Pre misinfo sharing rate	\(\frac{E[Y^{M, pre}]}{6}\)
Pre non-misinfo sharing rate	\(\frac{E[Y^{N, pre}]}{3}\)
Pre sharing discernment	\(\frac{E[Y^{N, pre}(1)]}{3} - \frac{E[Y^{M, pre}(1)]}{6}\)
Pre misinfo accuracy score	\(\frac{E[A^{M, pre}]}{6}\)
Pre non-misinfo accuracy score	\(\frac{E[A^{N, pre}]}{3}\)
Pre accuracy discernment	\(\frac{E[A^{N, pre}(1)]}{3} - \frac{E[A^{M, pre}(1)]}{6}\)
Pre attention check passing	\(1[\text{passed attention check}]\)