The purpose of this script to 1) anonymize the raw Chatfuel
dataset and 2) to document the data handling process. The raw Chatfuel
dataset is manually downloaded from Chatfuel and contains
chatfuel user id as the main identification of
participants. The data does not contain personal private information of
respondents such as name, profile picture url, etc. This raw Chatfuel
dataset is named Project_Don_t_Get_Duped_....csv and is
uploaded to Sherlock server using path
~ssh://sherlock/oak/stanford/groups/athey/fb_misinfo_interventions/data/chatfuel/raw.
The anonymization process will take place after the raw dataset is uploaded to Sherlock server. The anonymization process will include the following steps:
Project_Don_t_Get_Duped_....csv,
drop partipants from the GSB Golub Capital Social Lab and generate a
unique analytic_id for each respondents.MisinfoChat_start_time, using
signed up and chatfuel user id as tiebreakers
(in that order).rn which is simply the row number
for each entry.analytic_id, by splicing
together rn, signed up, and
MisinfoChat_start_time into one string.chatfuel user id variable.fb_misinfo_anon.csv to
the same directory as the raw dataset,
~ssh://sherlock/oak/stanford/groups/athey/fb_misinfo_interventions/data/chatfuel/rawFor cleaning purposes, the data is then manually downloaded from
Sherlock server to the local machine and uploaded to the Github
directory ~fb_misinfo_interventions/data/.
Lab and project members are removed from the dataset at this stage.
chatfuel user idchatfuel user id to see if there are
any duplicates by comparing the number of unique
chatfuel user id with the number of rows in the
dataset.a= as.character(length(unique(data$`chatfuel user id`)))
if (nrow(data) != length(unique(data$`chatfuel user id`))) {
stop("The number of rows does not match the number of unique `chatfuel user id` values.")
} else {
# Continue
print("Unique observations check passed.")
}## [1] "Unique observations check passed."
analytic_idordered_data <- data[order(data$MisinfoChat_start_time, data$"signed up", data$"chatfuel user id"), ]
ordered_data$rn <- row.names(ordered_data)
ordered_data$analytic_id <- paste(ordered_data$rn, ordered_data$"signed up", ordered_data$MisinfoChat_start_time, sep = "_")
ordered_data$"chatfuel user id" <- NULL
count <- as.character(length(unique(ordered_data$analytic_id)))analytic_id and 160283 rows.