First, download the AllData.csv
master file from Jun’s
OneDrive and store the master file in your project folder. Once the
master file is available locally, you can use this code to load it into
a data frame called AllData
, create a
VariableList
data frame containing all available variable
names, create a KeptData
data frame containing only some
varialbes of interest, and complete some formatting and sorting
tasks.
See Adding fresh data from Brandwatch for code that will let us add fresh Brandwatch data to the master file.
if(!require(tidyverse)) install.packages("tidyverse")
library(tidyverse)
AllData <- read.csv("AllData.csv")
# Getting all variable names and putting them
# in a data frame for easier review
VariableList <- as.data.frame(colnames(AllData))
view(VariableList)
#Selecting columns needed for the analysis
KeptData <- select(AllData,
Date,
Author,
full_name,
party,
type,
state,
Full.Text,
Url,
Thread.Id,
Twitter.Reply.Count,
Twitter.Retweets,
Twitter.Likes)
KeptData$Thread.Id <- as.character(KeptData$Thread.Id)
#Formatting "Date" as POSIXct object
KeptData$Date <- as.POSIXct(KeptData$Date, tz = "America/Chicago")
#Sorting by Date
KeptData <- arrange(KeptData,Date)
#Re-expressing "Date" as "WeekOf," the Monday of the week containing "Date."
KeptData <- KeptData %>%
mutate(WeekOf = floor_date(Date,
unit = "week"))
Use this code to read fresh Brandwatch data into an
AddData
data frame, convert the fresh data’s
Date
variable to Central time in
yyyy-mm-dd hh:mm:ss
format, bind the fresh data to the
AllData data frame, and list any columns that appear in one data frame
but not in the other.
We should communicate with one another before running this code. Otherwise, the master file could end up with missing or duplicate data.
Note: Brandwatch has replaced “Twitter” with “X” on some fields. Future data imports will need code that aligns the new field names with the old ones.
# Code is untested. Monitor for errors on first use.
# Required packages
if(!require(tidyverse)) install.packages("tidyverse")
library(tidyverse)
# Read existing data
AllData <- read.csv ("AllData.csv")
# Read & format data to be added from Brandwatch
BWAddData <- read.csv("freshdatafilename.csv", skip = 6)
AddData$Date <- as.POSIXct(AddData$Date, tz = "America/Chicago")
# Add author information to new data
Addresses <- read.csv("TwitterAddresses118thCongress.csv")
AddData <- merge(BWAddData, Addresses,
by = "Author",
all.x = TRUE)
# Display any unmatched columns in each data frame
setdiff(colnames(AllData), colnames(AddData))
setdiff(colnames(AddData), colnames(AllData))
# Add new data to existing data
AllData <- bind_rows(AllData,AddData)
# Deduping the new AllData master file
AllData <- AllData %>%
distinct(Url, .keep_all = TRUE)
# Saving the new AllData master file
# write.csv(AllData, "AllData.csv")