Purpose

The purpose of this script to 1) anonymize the raw Chatfuel dataset and 2) to document the data handling process. The raw Chatfuel dataset is manually downloaded from Chatfuel and contains chatfuel user id as the main identification of participants. The data does not contain personal private information of respondents such as name, profile picture url, etc. This raw Chatfuel dataset is named Project_Don_t_Get_Duped_....csv and is uploaded to Sherlock server using path ~ssh://sherlock/oak/stanford/groups/athey/fb_misinfo_interventions/data/chatfuel/raw.
The anonymization process will take place after the raw dataset is uploaded to Sherlock server. The anonymization process will include the following steps:
1. Load the raw dataset Project_Don_t_Get_Duped_....csv, drop partipants from the GSB Golub Capital Social Lab and generate a unique analytic_id for each respondents.
2. Order the data by MisinfoChat_start_time, using signed up and chatfuel user id as tiebreakers (in that order).
3. Add a new variable rn which is simply the row number for each entry.
4. Create a new unique id, analytic_id, by splicing together rn, signed up, and MisinfoChat_start_time into one string.
5. Delete the chatfuel user id variable.
6. Save the anynomized dataset as fb_misinfo_anon.csv to the same directory as the raw dataset, ~ssh://sherlock/oak/stanford/groups/athey/fb_misinfo_interventions/data/chatfuel/raw
For cleaning purposes, the data is then manually downloaded from Sherlock server to the local machine and uploaded to the Github directory ~fb_misinfo_interventions/data/.
Lab and project members are removed from the dataset at this stage.

Library

library(tidyverse)
library(kableExtra)
library(broom)
library(here)
library(data.table)

Anonymization Process

Loading the raw dataset

The raw dataset was last pulled on 02/23/2024

data <- read_csv("./data/chatfuel/raw/Project_Don_t_Get_Duped_2024_02_23_06_43_57.csv") #loading the raw Chatfuel dataset

Checking for unique `chatfuel user id`

We check unique chatfuel user id to see if there are any duplicates by comparing the number of unique chatfuel user id with the number of rows in the dataset.

a= as.character(length(unique(data$`chatfuel user id`)))


if (nrow(data) != length(unique(data$`chatfuel user id`))) {
  stop("The number of rows does not match the number of unique `chatfuel user id` values.")
} else {
  # Continue 
  print("Unique observations check passed.")
}

## [1] "Unique observations check passed."

Dropping participants from the GSB Golub Capital Social Lab

Ruth Appel: 5668883889871868
Szymon Sacher: 6536224449774331
Susan Athey: 6093789924066999
Kristine Koutout: 6340332249327821
Kiet Le: 6454866794609331
Mike Luca : 6410378582382260

chatfuel_user_id_to_drop <- c("5668883889871868", "6536224449774331", "6093789924066999", "6340332249327821", "6454866794609331", "6410378582382260")
data <- data %>% filter(!(`chatfuel user id` %in% chatfuel_user_id_to_drop))

Generating `analytic_id`

ordered_data <- data[order(data$MisinfoChat_start_time, data$"signed up", data$"chatfuel user id"), ]
ordered_data$rn <- row.names(ordered_data)
ordered_data$analytic_id <- paste(ordered_data$rn, ordered_data$"signed up", ordered_data$MisinfoChat_start_time, sep = "_")
ordered_data$"chatfuel user id" <- NULL

count <- as.character(length(unique(ordered_data$analytic_id)))

The anonymized raw dataset contains 160283 unique analytic_id and 160283 rows.

Exporting the anonymized dataset

The anonymized dataset is named fbmisinfo_anon.csv and is saved in the data folder. The directory is ~fb_misinfo_interventions/data/

df_chat_anon <- ordered_data %>%
  mutate(anon_id = 1:nrow(.)) %>%
fwrite("./data/chatfuel/raw/fbmisinfo_anon.csv.gz")

FB-Misinformation Anonymization

Kiet Le

2024-03-12

Purpose

Library

Anonymization Process

Loading the raw dataset

Checking for unique `chatfuel user id`

Generating `analytic_id`

Exporting the anonymized dataset

FB-Misinformation Anonymization

Kiet Le

2024-03-12

Purpose

Library

Anonymization Process

Loading the raw dataset

Checking for unique chatfuel user id

Dropping participants from the GSB Golub Capital Social Lab

Generating analytic_id

Exporting the anonymized dataset

Checking for unique `chatfuel user id`

Generating `analytic_id`