Loading, filtering and cleaning the data

First, download the AllData.csv master file from Jun’s OneDrive and store the master file in your project folder. Once the master file is available locally, you can use this code to load it into a data frame called AllData, create a VariableList data frame containing all available variable names, create a KeptData data frame containing only some varialbes of interest, and complete some formatting and sorting tasks.

See Adding fresh data from Brandwatch for code that will let us add fresh Brandwatch data to the master file.

if(!require(tidyverse)) install.packages("tidyverse")
library(tidyverse)

AllData <- read.csv("AllData.csv")

# Getting all variable names and putting them
# in a data frame for easier review

VariableList <- as.data.frame(colnames(AllData))
view(VariableList)

#Selecting columns needed for the analysis

KeptData <- select(AllData,
                   Date,
                   Author,
                   full_name,
                   party,
                   type,
                   state,
                   Full.Text,
                   Url,
                   Thread.Id,
                   Twitter.Reply.Count,
                   Twitter.Retweets,
                   Twitter.Likes)
KeptData$Thread.Id <- as.character(KeptData$Thread.Id)

#Formatting "Date" as POSIXct object

KeptData$Date <- as.POSIXct(KeptData$Date, tz = "America/Chicago")

#Sorting by Date

KeptData <- arrange(KeptData,Date)

#Re-expressing "Date" as "WeekOf," the Monday of the week containing "Date."

KeptData <- KeptData %>% 
  mutate(WeekOf = floor_date(Date,
                             unit = "week"))

Adding fresh data from Brandwatch

Use this code to read fresh Brandwatch data into an AddData data frame, convert the fresh data’s Date variable to Central time in yyyy-mm-dd hh:mm:ss format, bind the fresh data to the AllData data frame, and list any columns that appear in one data frame but not in the other.

We should communicate with one another before running this code. Otherwise, the master file could end up with missing or duplicate data.

Note: Brandwatch has replaced “Twitter” with “X” on some fields. Future data imports will need code that aligns the new field names with the old ones.

# Code is untested. Monitor for errors on first use.
# Required packages

if(!require(tidyverse)) install.packages("tidyverse")
library(tidyverse)

# Read existing data

AllData <- read.csv ("AllData.csv")

# Read & format data to be added from Brandwatch

BWAddData <- read.csv("freshdatafilename.csv", skip = 6)
AddData$Date <- as.POSIXct(AddData$Date, tz = "America/Chicago")

# Add author information to new data

Addresses <- read.csv("TwitterAddresses118thCongress.csv")
AddData <- merge(BWAddData, Addresses,
                 by = "Author",
                 all.x = TRUE)

# Display any unmatched columns in each data frame

setdiff(colnames(AllData), colnames(AddData))
setdiff(colnames(AddData), colnames(AllData))

# Add new data to existing data

AllData <- bind_rows(AllData,AddData)

# Deduping the new AllData master file

AllData <- AllData %>% 
  distinct(Url, .keep_all = TRUE)

# Saving the new AllData master file

# write.csv(AllData, "AllData.csv")