The objective of this memo is to show summary statistics of the Inoculation against misinformation (IAM) experiment data (see Wiki; paper).
The experiment had 3 phases:
pre-survey
post-survey
follow-up survey
As observed in Table 2 in the paper (page 18), from the 25,287 participants who started the pre-survey, only 22,526 completed the pre-survey, 8,684 completed the post-survey, and 5,316 completed the follow-up survey.
The sample considered below consists of 8,684 participants who completed the post-survey. For the primary outcome, “Misinfo Sharing rate”, 7,688 participants are considered, as they completed the post-survey and shared at least one non-misinformation post in the pre-survey.
Information from the follow-up survey is not currently considered.
Each participant was exposed to 9 posts in the pre-survey and 9 posts in the post-survey. 3 of the 9 posts shown each time were non-misinformation. The rest were misinformation.
We consider 12 different outcomes. Outcomes with equal weighting (7, 8, 11, 12) are not currently included in the paper. For six of the outcomes we report for pre-survey and post-survey: Misinfo posts, Non-misinfo posts, Sharing discernment score, Sharing discernment score with equal weighting, Accuracy discernment score, and Accuracy discernment score with equal weighting.
1. Primary outcome: Misinfo Sharing rate
\[ \textit{Misinfo Sharing rate}_i = \frac{\sum^3_{j = 1} [S^{M,post}_i(j)|S^{N,pre}_i(j) = 1]}{\sum^3_{j = 1} S^{N,pre}_i(j)} \]
The denominator is the number of non-misinformation posts participant i shares in the pre-survey out of the three they are shown. The numerator is the number of misinformation posts corresponding to the non-misinformation posts participant i shares in the pre-survey that participant i shares in the post-survey.
2. Sharing rate
Proportion of posts shared out of the 9 posts shown. Values range from 0 to 1.
3. Misinfo Posts
Proportion of misinformation posts shared out of the 6 posts shown. Values range from 0 to 1.
4. Non-misinfo posts
Proportion of non-misinformation posts shared out of the 3 posts shown. Values range from 0 to 1.
5. Sharing Discernment Score
Difference between the number of non-misinformation posts and the number of misinformation posts a participant decides to share. Values range from -6 to 3.
6. Post - Pre difference in Sharing Discernment Score
Difference in the Sharing Discernment Score between the post- and pre-survey. Values range from -9 to 9.
7. Sharing Discernment Score, equal weighting
Difference between the number of non-misinformation posts and the number of misinformation posts a participant decides to share, weighting non-misinformation posts double. Values range from -6 to 6.
8. Post - Pre difference in Sharing Discernment Score, equal weighting
Difference in the Sharing Discernment Score with equal weighting between the post- and pre-survey. Values range from -12 to 12.
9. Accuracy Discernment Score
The score is defined based on the answer to the question “To the best of your knowledge, how accurate is the claim in the above post?”:
The accuracy discernment score is a weighted sum across the 9 posts shown to the participant, where the answers take the following weights:
\[ \text{If post is misinformation:} \begin{cases} -3 & \text{if responded "Very accurate"} \\ -1 & \text{if responded "Somewhat accurate"} \\ 1 & \text{if responded "Not very accurate"} \\ 3 & \text{if responded "Not at all accurate"} \end{cases} \]
\[ \text{If post is non-misinformation:} \begin{cases} 3 & \text{if responded "Very accurate"} \\ 1 & \text{if responded "Somewhat accurate"} \\ -1 & \text{if responded "Not very accurate"} \\ -3 & \text{if responded "Not at all accurate"} \end{cases} \]
Values range from -27 to 27.
10. Post - Pre difference in Accuracy Discernment Score
Difference in the Accuracy Discernment Score between the post- and pre-survey. Values range from -54 to 54.
11. Accuracy Discernment Score, equal weighting
The score is defined based on the answer to the question “To the best of your knowledge, how accurate is the claim in the above post?”:
The accuracy discernment score is a weighted sum across the 9 posts shown to the participant, where the answers take the following weights:
\[ \text{If post is misinformation:} \begin{cases} -3 & \text{if responded "Very accurate"} \\ -1 & \text{if responded "Somewhat accurate"} \\ 1 & \text{if responded "Not very accurate"} \\ 3 & \text{if responded "Not at all accurate"} \end{cases} \]
\[ \text{If post is non-misinformation:} \begin{cases} 6 & \text{if responded "Very accurate"} \\ 2 & \text{if responded "Somewhat accurate"} \\ -2 & \text{if responded "Not very accurate"} \\ -6 & \text{if responded "Not at all accurate"} \end{cases} \]
Values range from -36 to 36.
12. Post - Pre difference in Accuracy Discernment Score, equal weighting
Difference in the Accuracy Discernment Score with equal weighting between the post- and pre-survey. Values range from -72 to 72.
In this script, we use two datasets.
intermediate_data_wide.csv
: Contains sharing and
perceived accuracy aggregated at the individual level. Includes
treatment conditions and covariates.
intermediate_data_long_treatments.csv
: Contains sharing
and perceived accuracy at the individual-post level. Includes treatment
conditions, covariates, and individual-aggregated outcomes.
For more information about how these datasets are generated and a data dictionary, see memo.
Standard error functions
se_cont = function(x, na.rm=FALSE) {
if (na.rm) x = na.omit(x)
sqrt(var(x)/length(x))}
se_binary = function(x, na.rm=FALSE) {
if (na.rm) x = na.omit(x)
sqrt(mean(x)*(1-mean(x))/length(x))}
Load data
wide <- read.csv("intermediate_data_wide.csv")
long <- read.csv("intermediate_data_long.csv") %>% filter(pre_post != "followup")
long_treatments <- read.csv("../../main_analysis/intermediate_outcomes/intermediate_data_long_treatments.csv") %>% filter(pre_post != "followup")
# Summary table
summary_table <- function(data, continuous_vars, var_names, possible_range) {
# Empty dataframe to store summary data frames
summary_dataframe <- data.frame(Variable = character(),
n = numeric(),
Mean = numeric(),
SD = numeric(), Min = numeric(),
Q1 = numeric(), Median = numeric(),
Q3 = numeric(), Max = numeric(),
Possible_Range = character(),
stringsAsFactors = FALSE)
# Loop through each continuous variable
for (i in seq_along(continuous_vars)) {
var <- continuous_vars[i]
var_name <- var_names[i]
range <- possible_range[i]
# Summary data frame for continuous variable
summary_data <- data %>%
summarise(
Variable = var_name,
n = n(),
Mean = formatC(mean(data[[var]], na.rm = TRUE), format = "f", digits = 3),
SD = formatC(sd(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Min = formatC(min(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Q1 = formatC(quantile(data[[var]], 0.25, na.rm = TRUE), format = "f", digits = 3),
Median = formatC(median(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Q3 = formatC(quantile(data[[var]], 0.75, na.rm = TRUE), format = "f", digits = 3),
Max = formatC(max(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Possible_Range = range
)
# Append the summary data frame to the main dataframe
summary_dataframe <- rbind(summary_dataframe, summary_data)
}
# Return the main dataframe with all summary data frames
return(summary_dataframe)
}
summary_table_categorical <- function(data, categorical_vars, var_names, possible_range) {
# Empty dataframe to store summary data frames
summary_dataframe <- data.frame(Variable = character(),
n = numeric(),
Share = numeric(),
Possible_Range = character(),
stringsAsFactors = FALSE)
# Loop through each categorical variable
for (i in seq_along(categorical_vars)) {
var <- categorical_vars[i]
var_name <- var_names[i]
range <- possible_range[i]
# Summary data frame for categorical variable
summary_data <- data %>%
filter(as.character(.data[[var]]) == "1") %>%
summarise(
Variable = var_name,
n = n(),
Share = formatC(n() / nrow(data), format = "f", digits = 3),
Possible_Range = range
)
# Append the summary data frame to the main dataframe
summary_dataframe <- rbind(summary_dataframe, summary_data)
}
# Return the main dataframe with all summary data frames
return(summary_dataframe)
}
# Specify the continuous variables
{
continuous_vars_pre_post <- c("share_diff",
"misinfo_diff",
"base_rate_diff",
"disc_diff",
"disc_rates_diff_eq",
"acc_disc_diff",
"acc_disc_diff_eq")
continuous_vars_names_pre_post <- c("Post - Pre Sharing rate",
"Post - Pre Misinfo sharing rate",
"Post - Pre Non-Misinfo sharing rate",
"Post - Pre Sharing Discernment score",
"Post - Pre Sharing Discernment score, equal weighting",
"Post - Pre Accuracy Discernment score",
"Post - Pre Accuracy Discernment score, equal weighting")
possible_range_prepost <- c("[-1,1]","[-1,1]","[-1,1]","[-9,9]","[-12,12]","[-54,54]","[-72,72]")
continuous_vars_pre <- c("share_pre",
"misinfo_pre",
"base_rate_pre",
"disc_pre",
"disc_score_pre_eq",
"acc_disc_pre",
"acc_disc_pre_eq")
continuous_vars_names_pre <- c("Pre Sharing rate",
"Pre Misinfo Sharing rate",
"Pre Non-Misinfo sharing rate",
"Pre Sharing Discernment score",
"Pre Sharing Discernment score, equal weighting",
"Pre Accuracy Discernment score",
"Pre Accuracy Discernment score, equal weighting")
possible_range_pre <- c("[0,1]","[0,1]","[0,1]","[-6,3]","[-6,6]","[-27,27]","[-36,36]")
continuous_vars_post <- c("share_post",
"misinfo_post",
"base_rate_post",
"disc_post",
"disc_score_post_eq",
"acc_disc_post",
"acc_disc_post_eq")
continuous_vars_names_post <- c("Post Sharing rate",
"Pre Misinfo Sharing rate",
"Pre Non-Misinfo sharing rate",
"Pre Sharing Discernment score",
"Pre Sharing Discernment score, equal weighting",
"Pre Accuracy Discernment score",
"Pre Accuracy Discernment score, equal weighting")
possible_range_post <- c("[0,1]","[0,1]","[0,1]","[-6,3]","[-6,6]","[-27,27]","[-36,36]")
}
# Create tables for original, pre-treatment, and post-treatment
pre_post <- summary_table(wide, continuous_vars_pre_post,continuous_vars_names_pre_post,possible_range_prepost)
pre <- summary_table(wide, continuous_vars_pre,continuous_vars_names_pre,possible_range_pre)
post <- summary_table(wide, continuous_vars_post,continuous_vars_names_post,possible_range_post)
# Create the tabset with the tables as tab content
tabset <- tabsetPanel(
tabPanel("Pre", DT::datatable(pre, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("Post", DT::datatable(post, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("Pre - Post", DT::datatable(pre_post, options = list(pageLength = 7), rownames = FALSE))
)
# Render the tabset
htmltools::tagList(tabset)
Pre-Survey
# Plot histograms and density plots
variables <- c("share_pre", "misinfo_pre", "base_rate_pre",
"disc_pre", "disc_score_pre_eq",
"acc_disc_pre", "acc_disc_pre_eq")
labels <- c("Pre Sharing rate",
"Pre Misinfo sharing rate",
"Pre Non-Misinfo sharing rate",
"Pre Sharing Discernment score",
"Pre Sharing Discernment score, equal weighting",
"Pre Accuracy Discernment score",
"Pre Accuracy Discernment score, equal weighting")
# Create a list to store the plots
plots <- list()
# Loop through each variable and create histograms and density plots
for (i in seq_along(variables)) {
# Histogram plot
hist_plot <- ggplot(wide, aes(x = .data[[variables[i]]])) +
geom_histogram(aes(y = ..count..), bins = 30, fill = "skyblue", color = "black") +
labs(title = paste("Histogram\n", labels[i]),
x = labels[i], y = "Count") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Density plot
density_plot <- ggplot(wide, aes(x = .data[[variables[i]]])) +
geom_density(aes(y = ..scaled..), alpha = 0.5, fill = "orange", adjust = 2) +
labs(title = paste("Density Plot\n", labels[i]),
x = labels[i], y = "Density") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Add plots to the list
plots[[length(plots) + 1]] <- hist_plot
plots[[length(plots) + 1]] <- density_plot
}
# Print each plot
invisible(lapply(plots, print))
Post-Survey
# Plot histograms and density plots
variables <- c("share_post", "misinfo_post", "base_rate_post",
"disc_post", "disc_score_post_eq",
"acc_disc_post", "acc_disc_post_eq")
labels <- c("Post Sharing rate",
"Post Misinfo sharing rate",
"Post Non-Misinfo sharing rate",
"Post Sharing Discernment score",
"Post Sharing Discernment score, equal weighting",
"Post Accuracy Discernment score",
"Post Accuracy Discernment score, equal weighting")
# Create a list to store the plots
plots <- list()
# Loop through each variable and create histograms and density plots
for (i in seq_along(variables)) {
# Histogram plot
hist_plot <- ggplot(wide, aes(x = .data[[variables[i]]])) +
geom_histogram(aes(y = ..count..), bins = 30, fill = "skyblue", color = "black") +
labs(title = paste("Histogram\n", labels[i]),
x = labels[i], y = "Count") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Density plot
density_plot <- ggplot(wide, aes(x = .data[[variables[i]]])) +
geom_density(aes(y = ..scaled..), alpha = 0.5, fill = "orange", adjust = 2) +
labs(title = paste("Density Plot\n", labels[i]),
x = labels[i], y = "Density") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Add plots to the list
plots[[length(plots) + 1]] <- hist_plot
plots[[length(plots) + 1]] <- density_plot
}
# Print each plot
invisible(lapply(plots, print))
Pre-Post
# Plot histograms and density plots
variables <- c("share_diff", "misinfo_diff", "base_rate_diff",
"disc_diff", "disc_rates_diff_eq",
"acc_disc_diff", "acc_disc_diff_eq")
labels <- c("Post - Pre Sharing rate",
"Post - Pre Misinfo sharing rate",
"Post - Pre Non-Misinfo sharing rate",
"Post - Pre Sharing Discernment score",
"Post - Pre Sharing Discernment score, equal weighting",
"Post - Pre Accuracy Discernment score",
"Post - Pre Accuracy Discernment score, equal weighting")
# Create a list to store the plots
plots <- list()
# Loop through each variable and create histograms and density plots
for (i in seq_along(variables)) {
# Histogram plot
hist_plot <- ggplot(wide, aes(x = .data[[variables[i]]])) +
geom_histogram(aes(y = ..count..), bins = 30, fill = "skyblue", color = "black") +
labs(title = paste("Histogram\n", labels[i]),
x = labels[i], y = "Count") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Density plot
density_plot <- ggplot(wide, aes(x = .data[[variables[i]]])) +
geom_density(aes(y = ..scaled..), alpha = 0.5, fill = "orange", adjust = 2) +
labs(title = paste("Density Plot\n", labels[i]),
x = labels[i], y = "Density") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Add plots to the list
plots[[length(plots) + 1]] <- hist_plot
plots[[length(plots) + 1]] <- density_plot
}
# Print each plot
invisible(lapply(plots, print))
# Function to create a summary table
summary_table_treatment <- function(data, continuous_vars, var_names, possible_range, period) {
# Empty dataframe to store summary data frames
summary_dataframe <- data.frame(Variable = character(),
Period = character(),
n = numeric(),
Mean = numeric(),
SD = numeric(), Min = numeric(),
Q1 = numeric(), Median = numeric(),
Q3 = numeric(), Max = numeric(),
Possible_Range = character(),
stringsAsFactors = FALSE)
# Loop through each continuous variable
for (i in seq_along(continuous_vars)) {
var <- continuous_vars[i]
var_name <- var_names[i]
range <- possible_range[i]
# Summary data frame for continuous variable
summary_data <- data %>%
summarise(
Variable = var_name,
Period = period,
n = n(),
Mean = formatC(mean(data[[var]], na.rm = TRUE), format = "f", digits = 3),
SD = formatC(sd(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Min = formatC(min(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Q1 = formatC(quantile(data[[var]], 0.25, na.rm = TRUE), format = "f", digits = 3),
Median = formatC(median(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Q3 = formatC(quantile(data[[var]], 0.75, na.rm = TRUE), format = "f", digits = 3),
Max = formatC(max(data[[var]], na.rm = TRUE), format = "f", digits = 3),
Possible_Range = range
)
# Append the summary data frame to the main dataframe
summary_dataframe <- rbind(summary_dataframe, summary_data)
}
# Return the main dataframe with all summary data frames
return(summary_dataframe)
}
continuous_vars_names <- c("Sharing rate",
"Misinfo sharing rate",
"Non-Misinfo sharing rate",
"Sharing Discernment score",
"Sharing Discernment score, equal weighting",
"Accuracy Discernment score",
"Accuracy Discernment score, equal weighting")
summary_tables <- list()
# Loop over each treatment value
for (t in c(unique(wide$treatment),"All")) {
# Filter the data for the current treatment
if (t != "All") {
filtered_data <- wide %>% dplyr::filter(treatment == t)
} else {
filtered_data <- wide
}
# Create a combined summary table for pre, post, and pre_post
combined_summary <- rbind(
summary_table_treatment(filtered_data, continuous_vars_pre, continuous_vars_names, possible_range_pre, "Pre"),
summary_table_treatment(filtered_data, continuous_vars_post, continuous_vars_names, possible_range_post, "Post"),
summary_table_treatment(filtered_data, continuous_vars_pre_post, continuous_vars_names, possible_range_prepost, "Pre-Post")
)
# Store the combined summary table in the list
summary_tables[[t]] <- combined_summary
}
# Initialize an empty list to store tab panels
tab_panels <- list()
# Loop through each treatment in summary_tables
for (treatment in names(summary_tables)) {
tab_name <- treatment
tab_content <- DT::datatable(summary_tables[[treatment]], options = list(pageLength = 7), rownames = FALSE)
tab_panels[[length(tab_panels) + 1]] <- tabPanel(tab_name, tab_content)
}
# Create the tabset with the tab panels
tabset <- tabsetPanel(!!!tab_panels)
# Render the tabset
htmltools::tagList(tabset)
No-course Baseline
# Plot histograms and density plots
variables <- c("share_diff", "misinfo_diff", "base_rate_diff",
"disc_diff", "disc_rates_diff_eq",
"acc_disc_diff", "acc_disc_diff_eq")
labels <- c("Post - Pre Sharing rate",
"Post - Pre Misinfo sharing rate",
"Post - Pre Non-Misinfo sharing rate",
"Post - Pre Sharing Discernment score",
"Post - Pre Sharing Discernment score, equal weighting",
"Post - Pre Accuracy Discernment score",
"Post - Pre Accuracy Discernment score, equal weighting")
# Create a list to store the plots
plots <- list()
wide_control <- wide %>% filter(treatment == ("No-course Baseline"))
# Loop through each variable and create histograms and density plots
for (i in seq_along(variables)) {
# Histogram plot
hist_plot <- ggplot(wide_control, aes(x = .data[[variables[i]]])) +
geom_histogram(aes(y = ..count..), bins = 30, fill = "skyblue", color = "black") +
labs(title = paste("Histogram\n", labels[i]),
x = labels[i], y = "Count") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Density plot
density_plot <- ggplot(wide_control, aes(x = .data[[variables[i]]])) +
geom_density(aes(y = ..scaled..), alpha = 0.5, fill = "orange", adjust = 2) +
labs(title = paste("Density Plot\n", labels[i]),
x = labels[i], y = "Density") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Add plots to the list
plots[[length(plots) + 1]] <- hist_plot
plots[[length(plots) + 1]] <- density_plot
}
# Print each plot
invisible(lapply(plots, print))
Emotions
# Plot histograms and density plots
variables <- c("share_diff", "misinfo_diff", "base_rate_diff",
"disc_diff", "disc_rates_diff_eq",
"acc_disc_diff", "acc_disc_diff_eq")
labels <- c("Post - Pre Sharing rate",
"Post - Pre Misinfo sharing rate",
"Post - Pre Non-Misinfo sharing rate",
"Post - Pre Sharing Discernment score",
"Post - Pre Sharing Discernment score, equal weighting",
"Post - Pre Accuracy Discernment score",
"Post - Pre Accuracy Discernment score, equal weighting")
# Create a list to store the plots
plots <- list()
wide_emotions <- wide %>% filter(treatment == ("Emotions"))
# Loop through each variable and create histograms and density plots
for (i in seq_along(variables)) {
# Histogram plot
hist_plot <- ggplot(wide_emotions, aes(x = .data[[variables[i]]])) +
geom_histogram(aes(y = ..count..), bins = 30, fill = "skyblue", color = "black") +
labs(title = paste("Histogram\n", labels[i]),
x = labels[i], y = "Count") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Density plot
density_plot <- ggplot(wide_emotions, aes(x = .data[[variables[i]]])) +
geom_density(aes(y = ..scaled..), alpha = 0.5, fill = "orange", adjust = 2) +
labs(title = paste("Density Plot\n", labels[i]),
x = labels[i], y = "Density") +
theme_minimal() +
theme(plot.title = element_text(size = 9.5),
axis.text.x = element_text(size = 6))
# Add plots to the list
plots[[length(plots) + 1]] <- hist_plot
plots[[length(plots) + 1]] <- density_plot
}
# Print each plot
invisible(lapply(plots, print))
Covariate | Description |
---|---|
Age | Integer denoting age of participants |
Man | Complement to ‘woman’ and ‘other’ |
High school or less | including ‘Less than a high school diploma’ and ‘High school degree or equivalent’ (reference) |
Some college | including ‘Some college, no degree’ and ‘Associate degree’ |
Bachelor’s degree | Bachelor’s degree |
Graduate degree | including ‘Master’s degree’ and ‘Doctorate or professional degree’ |
Married | Complement to ‘Single,’ ‘Widowed,’ ‘Divorced,’ and ‘Separated’ |
Employed | including ‘Employed full time,’ ‘Employed part time,’ and ‘Self-employed’ |
Unemployed | including ‘Unemployed and currently looking for work,’ ‘Unemployed not currently looking for work,’ ‘Retired,’ ‘Homemaker,’ and ‘Unable to work’ (reference) |
Student | Student |
Mostly urban | Live in mostly urban area |
Suburban | Live in suburban area |
Mostly rural | Live in mostly rural area (reference) |
Christian | Complement to ‘None,’ ‘Hinduism,’ ‘Muslim,’ ‘Traditionalist,’ and ‘Other’ |
Attends religious services | Frequency of attending religious services including ‘Less than once a month,’ ‘One to three times per month,’ ‘Once a week,’ ‘More than once a week but less than daily,’ and ‘Daily’; Complement to ‘Never’ (reference) |
Uses social media | 1 if ‘Yes’ and 0 if ‘No’ (reference) |
Passed att. check Pre | 1 if the user passed the attention check and 0 otherwise |
Hours per day spent in social media | Integer denoting how many hours are spent on average each day on social media |
Social media share 0-20% | Proportion of the content the participant sees on social media that they choose to share is between 0 and 20% (reference) |
Social media share 20-40% | Proportion of the content the participant sees on social media that they choose to share is between 20 and 40% |
Social media share 40-60% | Proportion of the content the participant sees on social media that they choose to share is between 40 and 60% |
Social media share 60-80% | Proportion of the content the participant sees on social media that they choose to share is between 60 and 80% |
Social media share 80-100% | Proportion of the content the participant sees on social media that they choose to share is between 80 and 100% |
All posts | Pre-survey sharing: proportion of all posts shared |
Non-misinformation posts | Pre-survey sharing: proportion of non-misinformation posts shared |
Misinformation posts | Pre-survey sharing: proportion of misinformation posts shared |
# Specify the continuous variables
{
continuous_vars <- c("age",
"social_media_hours")
continuous_vars_names <- c("Age",
"Hours spent on social media")
continuous_possible_range <- c("[18,99]","[0,24]")
categorical_vars <- c("gender_Man",
"education_High_school_or_less","education_Some_college","education_Bachelor_degree","education_Graduate_degree",
"marital_Married_or_in_a_domestic_partnership",
"employment_Employed","employment_Unemployed","employment_Student",
"location_Mostly_urban","location_Suburban","location_Mostly_rural",
"religion_Christian","religiosity_Attends",
"social_media_bin_Yes",
"att_check_pre",
"social_media_share_80_100","social_media_share_60_80","social_media_share_40_60","social_media_share_20_40","social_media_share_0_20")
categorical_vars_names <- c("Man",
"High school or less","Some college","Bachelor degree","Graduate degree",
"Married",
"Employed","Unemployed","Student",
"Mostly urban","Suburban","Mostly rural",
"Christian","Attends religious services",
"Uses social media",
"Passed attention check in pre",
"Shares 80-100% posts on social media","Shares 60-80% posts on social media","Shares 40-60% posts on social media","Shares 20-40% posts on social media","Shares 0-20% posts on social media")
categorical_possible_range <- c("[0,1]",
"[0,1]","[0,1]","[0,1]","[0,1]",
"[0,1]",
"[0,1]","[0,1]","[0,1]",
"[0,1]","[0,1]","[0,1]",
"[0,1]","[0,1]",
"[0,1]",
"[0,1]",
"[0,1]","[0,1]","[0,1]","[0,1]","[0,1]")
}
# Create tables for categorical and continuous
continuous <- summary_table(wide, continuous_vars,continuous_vars_names,continuous_possible_range)
categorical <- summary_table_categorical(wide, categorical_vars, categorical_vars_names, categorical_possible_range)
# Create the tabset with the tables as tab content
tabset <- tabsetPanel(
tabPanel("Categorical", DT::datatable(categorical, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("Continuous", DT::datatable(continuous, options = list(pageLength = 2), rownames = FALSE))
)
# Render the tabset
htmltools::tagList(tabset)
# Calculate PDFs
age_density <- density(as.numeric(wide$age))
hours_density <- density(as.numeric(wide$social_media_hours))
# Create a data frame for plotting
age_pdf <- data.frame(x = age_density$x, y = age_density$y)
hours_pdf <- data.frame(x = hours_density$x, y = hours_density$y)
# Plot PDFs
# Age
pdf_plot <- ggplot() +
geom_line(data = age_pdf, aes(x = x, y = y), color = "blue", linetype = "solid") +
labs(x = "Value", y = "Density", title = "Age") +
theme_minimal() +
scale_color_manual(values = c("Age" = "blue")) +
scale_linetype_manual(values = c("Age" = "solid"))
# Display the plot
print(pdf_plot)
# Hours spent on social media
pdf_plot <- ggplot() +
geom_line(data = hours_pdf, aes(x = x, y = y), color = "red", linetype = "solid") +
labs(x = "Value", y = "Density", title = "Hours spent on social media") +
theme_minimal() +
scale_color_manual(values = c("Hours spent on social media" = "red")) +
scale_linetype_manual(values = c("Hours spent on social media" = "solid"))
# render the plot
print(pdf_plot)
# Sample data
data <- data.frame(
gender_Man = rbinom(1000,1,length(wide$gender_Man[wide$gender_Man == 1])/length(wide$gender_Man)),
education_High_school_or_less = rbinom(1000,1,length(wide$education_High_school_or_less[wide$education_High_school_or_less == 1])/length(wide$education_High_school_or_less)),
education_Some_college = rbinom(1000,1,length(wide$education_Some_college[wide$education_Some_college == 1])/length(wide$education_Some_college)),
education_Bachelor_degree = rbinom(1000,1,length(wide$education_Bachelor_degree[wide$education_Bachelor_degree == 1])/length(wide$education_Bachelor_degree)),
education_Graduate_degree = rbinom(1000,1,length(wide$education_Graduate_degree[wide$education_Graduate_degree == 1])/length(wide$education_Graduate_degree)),
marital_Married_or_in_a_domestic_partnership = rbinom(1000,1,length(wide$marital_Married_or_in_a_domestic_partnership[wide$marital_Married_or_in_a_domestic_partnership == 1])/length(wide$marital_Married_or_in_a_domestic_partnership)),
employment_Employed = rbinom(1000,1,length(wide$employment_Employed[wide$employment_Employed == 1])/length(wide$employment_Employed)),
employment_Unemployed = rbinom(1000,1,length(wide$employment_Unemployed[wide$employment_Unemployed == 1])/length(wide$employment_Unemployed)),
employment_Student = rbinom(1000,1,length(wide$employment_Student[wide$employment_Student == 1])/length(wide$employment_Student)),
location_Mostly_urban = rbinom(1000,1,length(wide$location_Mostly_urban[wide$location_Mostly_urban == 1])/length(wide$location_Mostly_urban)),
location_Suburban = rbinom(1000,1,length(wide$location_Suburban[wide$location_Suburban == 1])/length(wide$location_Suburban)),
location_Mostly_rural = rbinom(1000,1,length(wide$location_Mostly_rural[wide$location_Mostly_rural == 1])/length(wide$location_Mostly_rural)),
religion_Christian = rbinom(1000,1,length(wide$religion_Christian[wide$religion_Christian == 1])/length(wide$religion_Christian)),
religiosity_Attends = rbinom(1000,1,length(wide$religiosity_Attends[wide$religiosity_Attends == 1])/length(wide$religiosity_Attends)),
social_media_bin_Yes = rbinom(1000,1,length(wide$social_media_bin_Yes[wide$social_media_bin_Yes == 1])/length(wide$social_media_bin_Yes)),
att_check_pre = rbinom(1000,1,length(wide$att_check_pre[wide$att_check_pre == 1])/length(wide$att_check_pre)),
social_media_share_80_100 = rbinom(1000,1,length(wide$social_media_share_80_100[wide$social_media_share_80_100 == 1])/length(wide$social_media_share_80_100)),
social_media_share_60_80 = rbinom(1000,1,length(wide$social_media_share_60_80[wide$social_media_share_60_80 == 1])/length(wide$social_media_share_60_80)),
social_media_share_40_60 = rbinom(1000,1,length(wide$social_media_share_40_60[wide$social_media_share_40_60 == 1])/length(wide$social_media_share_40_60)),
social_media_share_20_40 = rbinom(1000,1,length(wide$social_media_share_20_40[wide$social_media_share_20_40 == 1])/length(wide$social_media_share_20_40)),
social_media_share_0_20 = rbinom(1000,1,length(wide$social_media_share_0_20[wide$social_media_share_0_20 == 1])/length(wide$social_media_share_0_20))
)
# Loop through each categorical variable
for (i in seq_along(categorical_vars)) {
var <- categorical_vars[i]
var_name <- categorical_vars_names[i]
# Calculate PMF
pmf_data <- data %>%
count(!!sym(var)) %>%
mutate(proportion = n / sum(n))
# Plot PMF
pmf_plot <- ggplot(pmf_data, aes(x = !!sym(var), y = proportion)) +
geom_bar(stat = "identity", fill = "navy") +
labs(title = paste("", var_name), x = var_name, y = "Proportion") +
theme_minimal() +
theme(axis.title = element_text(size = 10)) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(breaks = c(0,1))
print(pmf_plot)
}
The heatmap below shows that the correlation between covariates is not extreme. The largest non-trivial correlation is between Age and Married (0.40).
prepost_correlation <- function(data, pre_vars, post_vars, var_names, possible_range) {
# Empty dataframe to store summary data frames
prepost_correlation_dataframe <- data.frame(Variable = character(),
Correlation = numeric(),
stringsAsFactors = FALSE)
# Loop through each pair of pre and post variables
for (i in seq_along(pre_vars)) {
pre_var <- pre_vars[i]
post_var <- post_vars[i]
var_name <- var_names[i]
range <- possible_range[i]
# Calculate correlation between pre and post variables
correlation <- formatC(cor(data[[pre_var]], data[[post_var]], use = "complete.obs"), format = "f", digits = 3)
# Data frame for the pre-post correlation
prepost_correlation_data <- data.frame(Variable = var_name, Correlation = correlation)
# Append the pre-post correlation data frame to the main dataframe
prepost_correlation_dataframe <- rbind(prepost_correlation_dataframe, prepost_correlation_data)
}
# Return the main dataframe with all pre-post correlation data frames
return(prepost_correlation_dataframe)
}
continuous_vars_pre <- c("share_pre",
"misinfo_pre",
"base_rate_pre",
"disc_pre",
"disc_score_pre_eq",
"acc_disc_pre",
"acc_disc_pre_eq")
continuous_vars_post <- c("share_post",
"misinfo_post",
"base_rate_post",
"disc_post",
"disc_score_post_eq",
"acc_disc_post",
"acc_disc_post_eq")
var_names <- c("Sharing rate",
"Misinfo Sharing rate",
"Non-Misinfo sharing rate",
"Sharing Discernment score",
"Sharing Discernment score, equal weighting",
"Accuracy Discernment score",
"Accuracy Discernment score, equal weighting")
possible_range <- rep("[-1,1]", 7)
correlation_prepost_correlation_control <- prepost_correlation(wide %>% filter(treatment == "No-course Baseline"), continuous_vars_pre, continuous_vars_post, var_names, possible_range)
correlation_prepost_correlation_facts <- prepost_correlation(wide %>% filter(treatment == "Facts Baseline"), continuous_vars_pre, continuous_vars_post, var_names, possible_range)
correlation_prepost_correlation_emotions <- prepost_correlation(wide %>% filter(treatment == "Emotions"), continuous_vars_pre, continuous_vars_post, var_names, possible_range)
correlation_prepost_correlation_reasoning <- prepost_correlation(wide %>% filter(treatment == "Reasoning"), continuous_vars_pre, continuous_vars_post, var_names, possible_range)
correlation_prepost_correlation_combo <- prepost_correlation(wide %>% filter(treatment == "Combo"), continuous_vars_pre, continuous_vars_post, var_names, possible_range)
correlation_prepost_correlation <- prepost_correlation(wide, continuous_vars_pre, continuous_vars_post, var_names, possible_range)
# Create the tabset with the tables as tab content
tabset <- tabsetPanel(
tabPanel("No-Course Baseline", DT::datatable(correlation_prepost_correlation_control, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("Facts Baseline", DT::datatable(correlation_prepost_correlation_facts, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("Emotions", DT::datatable(correlation_prepost_correlation_emotions, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("Reasoning", DT::datatable(correlation_prepost_correlation_reasoning, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("Combo", DT::datatable(correlation_prepost_correlation_combo, options = list(pageLength = 7), rownames = FALSE)),
tabPanel("All", DT::datatable(correlation_prepost_correlation, options = list(pageLength = 7), rownames = FALSE))
)
# Render the tabset
htmltools::tagList(tabset)
Within and between user standard deviation and variance.
Within: standard deviation and variance of \(x_{ip} - \bar{x_i} + \bar{\bar{x}}\)
Between: standard deviation and variance of: \(\bar{x_i}\)
Where i is the individual, p is a post, and \(\bar{\bar{x}}\) is the global mean of the variable (share).
# data <- filtered_data
# treatment <- t
# period <- p
# share_type <- s
# Function to create a variance summary table
variance_summary_table <- function(data, treatment, period, share_type) {
# Filter data based on the period and share type
if (period != "pre and post") {
data <- data %>% filter(pre_post == period)
}
if (share_type == "misinfo share") {
data <- data %>% filter(post_is_misinfo == 1)
} else if (share_type == "non-misinfo share") {
data <- data %>% filter(post_is_misinfo == 0)
}
# Define data as panel
data <- pdata.frame(data, index = c('user'))
within_sd <- formatC(xtsum::within_sd(data, 'share', 'user'), format = "f", digits = 4)
within_variance <- formatC(xtsum::within_sd(data, 'share', 'user')^2, format = "f", digits = 4)
between_sd <- formatC(xtsum::between_sd(data, 'share', 'user'), format = "f", digits = 4)
between_variance <- formatC(xtsum::between_sd(data, 'share', 'user')^2, format = "f", digits = 4)
summary_table <- data.frame(
Outcome = share_type,
Treatment = treatment,
Period = period,
Within_user_Standard_Deviation = within_sd,
Within_user_Variance = within_variance,
Between_user_Standard_Deviation = between_sd,
Between_user_Variance = between_variance
)
return(summary_table)
}
# Initialize an empty list to store the variance summary tables
variance_summary_tables <- list()
treatments <- c("No-course Baseline", "Facts Baseline", "Emotions", "Reasoning", "Combo", "All")
periods <- c("pre", "post", "pre and post")
share_types <- c("share", "misinfo share", "non-misinfo share")
# Loop over each combination of treatment, period, and share type
for (t in treatments) {
# Empty dataframe to store all combinations for the current treatment
all_combinations <- NULL
for (p in periods) {
for (s in share_types) {
if (t != "All") {
# Filter the data for the current treatment
filtered_data <- long_treatments %>% filter(treatment == t)
} else {
filtered_data <- long_treatments
}
# Create a variance summary table for the current combination
variance_summary <- variance_summary_table(filtered_data, t, p, s)
# Store the variance summary table in all_combinations dataframe
all_combinations <- rbind(all_combinations, variance_summary)
}
}
# Store the all_combinations dataframe in the list with treatment name as key
variance_summary_tables[[t]] <- all_combinations
}
# Initialize an empty list to store tab panels
variance_tab_panels <- list()
# Loop through each key in variance_summary_tables
for (treatment in names(variance_summary_tables)) {
tab_name <- paste0(treatment)
tab_content <- DT::datatable(variance_summary_tables[[treatment]], options = list(pageLength = 9), rownames = FALSE)
variance_tab_panels[[length(variance_tab_panels) + 1]] <- tabPanel(tab_name, tab_content)
}
# Create the tabset with the variance tab panels
variance_tabset <- tabsetPanel(!!!variance_tab_panels)
# Render the tabset
htmltools::tagList(variance_tabset)