Investigating Political Elections Using Social Media

Social media is a powerful platform that allows individuals to express their opinions, thoughts, and concerns. In doing so, the general population makes availiable data and senetiments towards real world events. Platforms like Twitter are powerful in that we can search the topic and pull the sentiments of hundreds of thousands of people to get an overall idea of what may happen, such as in elections. This gives us an idea as to what the population is going to decide (if they make decide to express themselves through social media), especially our younger population that will soon make up most of our voting.

Data

The data set used in this project is from the following github repository: https://github.com/fivethirtyeight/data/blob/master/governors-forecast-2018/README.md. This data has the people running for governor election in each state, their party, and their predicted probability of winning the election. I then ordered each candidate by largest win probability, and removed duplicate candidates since each candidate’s win probability was reforecasted throughout the month.

Loading the data into R

predictions <- as.data.frame(read.csv('https://raw.githubusercontent.com/hvasquez81/DATA607-Final-Project/master/governor_state_forecast.csv', stringsAsFactors = FALSE))
head(predictions)
##   forecastdate state district special     candidate party incumbent
## 1   10/11/2018    AK       NA      NA Mike Dunleavy     R     FALSE
## 2   10/11/2018    AK       NA      NA   Mark Begich     D     FALSE
## 3   10/11/2018    AK       NA      NA   Bill Walker     U      TRUE
## 4   10/11/2018    AK       NA      NA        Others           FALSE
## 5   10/11/2018    AL       NA      NA      Kay Ivey     R      TRUE
## 6   10/11/2018    AL       NA      NA   Walt Maddox     D     FALSE
##     model win_probability voteshare p10_voteshare p90_voteshare
## 1 classic          0.7413     39.55         32.10         46.93
## 2 classic          0.2211     31.28         22.43         40.15
## 3 classic          0.0376     25.74         19.22         32.54
## 4 classic          0.0000      3.42          1.12          6.26
## 5 classic          0.9866     58.59         53.95         63.21
## 6 classic          0.0134     41.41         36.79         46.05

This data set leaves 82 unique candidates with their highest win probabilities in the month.

Clean the dataset

#Install the dplyr package to manipulate and transform data
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#create a subset called candidates that does not include 'Others' and win probabality = 0
candidates <- subset(predictions, (win_probability !=0 &  candidate != 'Others'))
candidates <- select(candidates, state,candidate,party,incumbent,win_probability)

head(candidates)
##   state      candidate party incumbent win_probability
## 1    AK  Mike Dunleavy     R     FALSE          0.7413
## 2    AK    Mark Begich     D     FALSE          0.2211
## 3    AK    Bill Walker     U      TRUE          0.0376
## 5    AL       Kay Ivey     R      TRUE          0.9866
## 6    AL    Walt Maddox     D     FALSE          0.0134
## 7    AR Asa Hutchinson     R      TRUE          0.9990
#this leaves us with 74 candidates.

Our next step from here is to connect to Twitter and mine tweets relating to each candidate

## Loading required package: NLP
## 
## Attaching package: 'twitteR'
## The following objects are masked from 'package:dplyr':
## 
##     id, location

Connect to Twitter

setup_twitter_oauth(api_key,api_secretkey,access_token,access_secret_token)
## [1] "Using direct authentication"
total_candidates <- nrow(candidates)

#Create a dataframe for the  candidates

 
can <- candidates[1,2]
search_list = c(can,paste0("@",tolower(str_replace(can," ", ""))),paste0("@",str_replace(can," ", "")),paste0("#",str_replace(can," ", "")),paste0("#",tolower(str_replace(can," ", ""))))
  
  
#run the data mining on 10000 tweets before 11/6/2018
can_tweets <- searchTwitteR(search_list, lang = 'en', n = 5000)
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 566
#Turn the pulled list of tweets into a df
tweets_df <- twListToDF(can_tweets)

#add candidate column
rows <- nrow(tweets_df)
can_column <- as.data.frame(matrix(nrow = rows, ncol = 1))

for (i in 1:rows) {
  can_column[i,1] <- can
}
names(can_column) <- "Candidate"

tweets_df <- cbind(tweets_df,can_column)



#create a loop to iterate through each candidate
for (i in 2:total_candidates) {
  #now rerun the process for the rest of the candidates using the schema provided by the first can
  
  #choose the next candidate in the list
  can <- candidates[i,2]
  
  print(paste0("Now mining: ",can))
  
  #create the search list
  search_list = c(can,paste0("@",tolower(str_replace(can," ", ""))),paste0("@",str_replace(can," ", "")),paste0("#",str_replace(can," ", "")),paste0("#",tolower(str_replace(can," ", ""))))
    
  #run the data mining on 10000 tweets before 11/6/2018, if applicable
  can_tweets <- searchTwitter(search_list, lang = 'en',n = 5000)

  #make a temp df of the current canidates tweets
  temp_tweet_df <- twListToDF(can_tweets)
  
  #get number of tweets
  rows = nrow(temp_tweet_df)
  
  #add candidate column
  can_column <- as.data.frame(matrix(nrow = rows, ncol = 1))
  
  #fill in candidate
  for (i in 1:rows) {
  can_column[i,1] <- can
  }
  names(can_column) <- "Candidate"
  
  #column bind
  temp_tweet_df <- cbind(temp_tweet_df,can_column)

  
  #rbind it to the total tweets df
  tweets_df <- rbind(tweets_df,temp_tweet_df)
  
  print(paste0(can," has been Twitter mined"))
}
## [1] "Now mining: Mark Begich"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 18
## [1] "Mark Begich has been Twitter mined"
## [1] "Now mining: Bill Walker"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Bill Walker has been Twitter mined"
## [1] "Now mining: Kay Ivey"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 196
## [1] "Kay Ivey has been Twitter mined"
## [1] "Now mining: Walt Maddox"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 63
## [1] "Walt Maddox has been Twitter mined"
## [1] "Now mining: Asa Hutchinson"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 61
## [1] "Asa Hutchinson has been Twitter mined"
## [1] "Now mining: Jared Henderson"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 23
## [1] "Jared Henderson has been Twitter mined"
## [1] "Now mining: Doug Ducey"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1500
## [1] "Doug Ducey has been Twitter mined"
## [1] "Now mining: David Garcia"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 650
## [1] "David Garcia has been Twitter mined"
## [1] "Now mining: Gavin Newsom"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Gavin Newsom has been Twitter mined"
## [1] "Now mining: John Cox"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Rate limited .... blocking for a minute and retrying up to 119 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 118 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 117 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 116 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 115 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 114 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 113 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 112 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 111 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 110 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 109 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 108 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 107 times ..."
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 2056
## [1] "John Cox has been Twitter mined"
## [1] "Now mining: Jared Polis"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 600
## [1] "Jared Polis has been Twitter mined"
## [1] "Now mining: Walker Stapleton"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 104
## [1] "Walker Stapleton has been Twitter mined"
## [1] "Now mining: Ned Lamont"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 200
## [1] "Ned Lamont has been Twitter mined"
## [1] "Now mining: Bob Stefanowski"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 49
## [1] "Bob Stefanowski has been Twitter mined"
## [1] "Now mining: Andrew Gillum"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Andrew Gillum has been Twitter mined"
## [1] "Now mining: Ron DeSantis"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 4899
## [1] "Ron DeSantis has been Twitter mined"
## [1] "Now mining: Brian Kemp"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Brian Kemp has been Twitter mined"
## [1] "Now mining: Stacey Abrams"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Rate limited .... blocking for a minute and retrying up to 119 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 118 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 117 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 116 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 115 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 114 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 113 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 112 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 111 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 110 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 109 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 108 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 107 times ..."
## [1] "Stacey Abrams has been Twitter mined"
## [1] "Now mining: David Ige"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 169
## [1] "David Ige has been Twitter mined"
## [1] "Now mining: Andria Tupola"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 11
## [1] "Andria Tupola has been Twitter mined"
## [1] "Now mining: Fred Hubbell"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 18
## [1] "Fred Hubbell has been Twitter mined"
## [1] "Now mining: Kim Reynolds"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 400
## [1] "Kim Reynolds has been Twitter mined"
## [1] "Now mining: Brad Little"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1461
## [1] "Brad Little has been Twitter mined"
## [1] "Now mining: Paulette Jordan"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 12
## [1] "Paulette Jordan has been Twitter mined"
## [1] "Now mining: JB Pritzker"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 300
## [1] "JB Pritzker has been Twitter mined"
## [1] "Now mining: Bruce Rauner"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 761
## [1] "Bruce Rauner has been Twitter mined"
## [1] "Now mining: Kris Kobach"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Kris Kobach has been Twitter mined"
## [1] "Now mining: Laura Kelly"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1990
## [1] "Laura Kelly has been Twitter mined"
## [1] "Now mining: Greg Orman"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 4
## [1] "Greg Orman has been Twitter mined"
## [1] "Now mining: Charlie Baker"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Rate limited .... blocking for a minute and retrying up to 119 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 118 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 117 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 116 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 115 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 114 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 113 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 112 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 111 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 110 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 109 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 108 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 107 times ..."
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 2524
## [1] "Charlie Baker has been Twitter mined"
## [1] "Now mining: Jay Gonzalez"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 100
## [1] "Jay Gonzalez has been Twitter mined"
## [1] "Now mining: Larry Hogan"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 700
## [1] "Larry Hogan has been Twitter mined"
## [1] "Now mining: Ben Jealous"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 600
## [1] "Ben Jealous has been Twitter mined"
## [1] "Now mining: Janet Mills"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 298
## [1] "Janet Mills has been Twitter mined"
## [1] "Now mining: Shawn Moody"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 18
## [1] "Shawn Moody has been Twitter mined"
## [1] "Now mining: Gretchen Whitmer"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Gretchen Whitmer has been Twitter mined"
## [1] "Now mining: Bill Schuette"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 160
## [1] "Bill Schuette has been Twitter mined"
## [1] "Now mining: Tim Walz"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 2000
## [1] "Tim Walz has been Twitter mined"
## [1] "Now mining: Jeff Johnson"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1900
## [1] "Jeff Johnson has been Twitter mined"
## [1] "Now mining: Pete Ricketts"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 168
## [1] "Pete Ricketts has been Twitter mined"
## [1] "Now mining: Bob Krist"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 9
## [1] "Bob Krist has been Twitter mined"
## [1] "Now mining: Chris Sununu"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 54
## [1] "Chris Sununu has been Twitter mined"
## [1] "Now mining: Molly Kelly"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1600
## [1] "Molly Kelly has been Twitter mined"
## [1] "Now mining: Michelle Lujan Grisham"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 70
## [1] "Michelle Lujan Grisham has been Twitter mined"
## [1] "Now mining: Steve Pearce"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 400
## [1] "Steve Pearce has been Twitter mined"
## [1] "Now mining: Adam Laxalt"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 42
## [1] "Adam Laxalt has been Twitter mined"
## [1] "Now mining: Steve Sisolak"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 141
## [1] "Steve Sisolak has been Twitter mined"
## [1] "Now mining: Andrew Cuomo"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Rate limited .... blocking for a minute and retrying up to 119 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 118 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 117 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 116 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 115 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 114 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 113 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 112 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 111 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 110 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 109 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 108 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 107 times ..."
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 2892
## [1] "Andrew Cuomo has been Twitter mined"
## [1] "Now mining: Marc Molinaro"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 35
## [1] "Marc Molinaro has been Twitter mined"
## [1] "Now mining: Mike DeWine"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 278
## [1] "Mike DeWine has been Twitter mined"
## [1] "Now mining: Richard Cordray"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 64
## [1] "Richard Cordray has been Twitter mined"
## [1] "Now mining: Kevin Stitt"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 246
## [1] "Kevin Stitt has been Twitter mined"
## [1] "Now mining: Drew Edmondson"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 5
## [1] "Drew Edmondson has been Twitter mined"
## [1] "Now mining: Kate Brown"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1461
## [1] "Kate Brown has been Twitter mined"
## [1] "Now mining: Knute Buehler"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 25
## [1] "Knute Buehler has been Twitter mined"
## [1] "Now mining: Tom Wolf"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1893
## [1] "Tom Wolf has been Twitter mined"
## [1] "Now mining: Scott Wagner"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 157
## [1] "Scott Wagner has been Twitter mined"
## [1] "Now mining: Gina Raimondo"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 204
## [1] "Gina Raimondo has been Twitter mined"
## [1] "Now mining: Allan Fung"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 8
## [1] "Allan Fung has been Twitter mined"
## [1] "Now mining: Henry McMaster"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 121
## [1] "Henry McMaster has been Twitter mined"
## [1] "Now mining: James Smith"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 3999
## [1] "James Smith has been Twitter mined"
## [1] "Now mining: Kristi Noem"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 170
## [1] "Kristi Noem has been Twitter mined"
## [1] "Now mining: Billie Sutton"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 21
## [1] "Billie Sutton has been Twitter mined"
## [1] "Now mining: Bill Lee"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Bill Lee has been Twitter mined"
## [1] "Now mining: Karl Dean"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Rate limited .... blocking for a minute and retrying up to 119 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 118 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 117 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 116 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 115 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 114 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 113 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 112 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 111 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 110 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 109 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 108 times ..."
## [1] "Rate limited .... blocking for a minute and retrying up to 107 times ..."
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 69
## [1] "Karl Dean has been Twitter mined"
## [1] "Now mining: Greg Abbott"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 1199
## [1] "Greg Abbott has been Twitter mined"
## [1] "Now mining: Lupe Valdez"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 21
## [1] "Lupe Valdez has been Twitter mined"
## [1] "Now mining: Phil Scott"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 800
## [1] "Phil Scott has been Twitter mined"
## [1] "Now mining: Christine Hallquist"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 35
## [1] "Christine Hallquist has been Twitter mined"
## [1] "Now mining: Tony Evers"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Tony Evers has been Twitter mined"
## [1] "Now mining: Scott Walker"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## [1] "Scott Walker has been Twitter mined"
## [1] "Now mining: Mark Gordon"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 675
## [1] "Mark Gordon has been Twitter mined"
## [1] "Now mining: Mary Throne"
## Warning in if (nchar(searchString) > 1000) {: the condition has length > 1
## and only the first element will be used
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 5000 tweets were requested but the
## API can only return 400
## [1] "Mary Throne has been Twitter mined"

Transorm the tweets_df

head(tweets_df)
##                                                                                                                                             text
## 1                                                                                                         @benjiwittig WE WANT MIKE DUNLEAVY JR.
## 2   RT @DermotMCole: Gov. Mike Dunleavy's new budget director, who doesn't know anything about Alaska, will assume far more power over the statÂ…
## 3 Unit-Asrc Construction, LLC\nARCTIC SLOPE REGIONAL CORPORATION ENDORSES MIKE DUNLEAVY FOR GOVERNOR\n(Utqiagvik, AK) -Â… https://t.co/NNxdOerCQp
## 4   Wanna hear something odd? The Cavs acquired the draft rights of Albert Miralles for Matthew Dellavedova, traded theÂ… https://t.co/RWVzNUcpfi
## 5   RT @DermotMCole: Gov. Mike Dunleavy's new budget director, who doesn't know anything about Alaska, will assume far more power over the statÂ…
## 6   RT @ktva: On Thursday, a dozen Native organizations sent a letter to newly elected Gov. Mike Dunleavy, asking that he make funding availablÂ…
##   favorited favoriteCount   replyToSN             created truncated
## 1     FALSE             1 benjiwittig 2018-12-08 04:49:48     FALSE
## 2     FALSE             0        <NA> 2018-12-08 01:13:10     FALSE
## 3     FALSE             0        <NA> 2018-12-07 23:44:03      TRUE
## 4     FALSE             0        <NA> 2018-12-07 23:25:44      TRUE
## 5     FALSE             0        <NA> 2018-12-07 22:31:29     FALSE
## 6     FALSE             0        <NA> 2018-12-07 22:19:31     FALSE
##            replyToSID                  id replyToUID
## 1 1071264901937668096 1071265564721586176  909464131
## 2                <NA> 1071211048462479360       <NA>
## 3                <NA> 1071188622060118016       <NA>
## 4                <NA> 1071184009177300992       <NA>
## 5                <NA> 1071170359112609792       <NA>
## 6                <NA> 1071167348990590976       <NA>
##                                                                           statusSource
## 1   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 2   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
## 3                   <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 4 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
## 5                   <a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>
## 6   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
##        screenName retweetCount isRetweet retweeted longitude latitude
## 1       BenMaki24            0     FALSE     FALSE      <NA>     <NA>
## 2      Andyinak49           18      TRUE     FALSE      <NA>     <NA>
## 3        sailu071            0     FALSE     FALSE      <NA>     <NA>
## 4    LOUisButIsnt            0     FALSE     FALSE      <NA>     <NA>
## 5 alaskatravelgrm           18      TRUE     FALSE      <NA>     <NA>
## 6     ShanahStone            8      TRUE     FALSE      <NA>     <NA>
##       Candidate
## 1 Mike Dunleavy
## 2 Mike Dunleavy
## 3 Mike Dunleavy
## 4 Mike Dunleavy
## 5 Mike Dunleavy
## 6 Mike Dunleavy
tweets_mined = nrow(tweets_df)
tweets <- select(tweets_df, text, Candidate)
str(tweets)
## 'data.frame':    91673 obs. of  2 variables:
##  $ text     : chr  "@benjiwittig WE WANT MIKE DUNLEAVY JR." "RT @DermotMCole: Gov. Mike Dunleavy's new budget director, who doesn't know anything about Alaska, will assume "| __truncated__ "Unit-Asrc Construction, LLC\nARCTIC SLOPE REGIONAL CORPORATION ENDORSES MIKE DUNLEAVY FOR GOVERNOR\n(Utqiagvik,"| __truncated__ "Wanna hear something odd? The Cavs acquired the draft rights of Albert Miralles for Matthew Dellavedova, traded"| __truncated__ ...
##  $ Candidate: chr  "Mike Dunleavy" "Mike Dunleavy" "Mike Dunleavy" "Mike Dunleavy" ...
#create sentiment columns
pos_sentiment = as.data.frame(matrix(nrow = tweets_mined,ncol = 1))
neg_sentiment = as.data.frame(matrix(nrow = tweets_mined,ncol = 1))
total_sentiment = as.data.frame(matrix(nrow = tweets_mined,ncol = 1))

tweets <- cbind(tweets,pos_sentiment,neg_sentiment,total_sentiment)
names(tweets) <- c("Tweet","Candidate","PositiveScore", "NegativeScore","SentimentScore")

head(tweets)
##                                                                                                                                            Tweet
## 1                                                                                                         @benjiwittig WE WANT MIKE DUNLEAVY JR.
## 2   RT @DermotMCole: Gov. Mike Dunleavy's new budget director, who doesn't know anything about Alaska, will assume far more power over the statÂ…
## 3 Unit-Asrc Construction, LLC\nARCTIC SLOPE REGIONAL CORPORATION ENDORSES MIKE DUNLEAVY FOR GOVERNOR\n(Utqiagvik, AK) -Â… https://t.co/NNxdOerCQp
## 4   Wanna hear something odd? The Cavs acquired the draft rights of Albert Miralles for Matthew Dellavedova, traded theÂ… https://t.co/RWVzNUcpfi
## 5   RT @DermotMCole: Gov. Mike Dunleavy's new budget director, who doesn't know anything about Alaska, will assume far more power over the statÂ…
## 6   RT @ktva: On Thursday, a dozen Native organizations sent a letter to newly elected Gov. Mike Dunleavy, asking that he make funding availablÂ…
##       Candidate PositiveScore NegativeScore SentimentScore
## 1 Mike Dunleavy            NA            NA             NA
## 2 Mike Dunleavy            NA            NA             NA
## 3 Mike Dunleavy            NA            NA             NA
## 4 Mike Dunleavy            NA            NA             NA
## 5 Mike Dunleavy            NA            NA             NA
## 6 Mike Dunleavy            NA            NA             NA

Sentiment Analysis

#import positive and negative words
negwords <- scan('https://raw.githubusercontent.com/hvasquez81/DATA607-Final-Project/master/negwords.txt', what = 'character', comment.char = ';')
poswords <- scan('https://raw.githubusercontent.com/hvasquez81/DATA607-Final-Project/master/poswords.txt', what = 'character', comment.char = ';')

####Credit########################################################################################
# https://www.youtube.com/watch?v=WfoVINuxIJA&list=PLjPbBibKHH18I0mDb_H4uP3egypHIsvMn&index=34 #
##################################################################################################

for (i in 1:tweets_mined){
  pos_sent = 0
  neg_sent = 0
  total_sent = 0

  if (i%%10000 == 0){
    print(paste0('Tweet number ', i, ' has analyzed'))
  }
  
  #pull the current tweet
  current_tweet <- tweets[i,1]

  #remove punctuation, numbers, change to lower
  current_tweet <- gsub("[[:punct:]]", "",current_tweet)
  current_tweet <- tolower(current_tweet)
  current_tweet_vector <- unlist(strsplit(current_tweet," "))

  #Check the sentiments on the current tweet
  pos_sent <- sum(!is.na(match(current_tweet_vector,poswords)))
  neg_sent <- sum(!is.na(match(current_tweet_vector,negwords)))
  total_sent <- (pos_sent - neg_sent)
  
  #set sentiment scores into dataframe
  tweets[i,3] <- pos_sent
  tweets[i,4] <- neg_sent
  tweets[i,5] <- total_sent
  
}
## [1] "Tweet number 10000 has analyzed"
## [1] "Tweet number 20000 has analyzed"
## [1] "Tweet number 30000 has analyzed"
## [1] "Tweet number 40000 has analyzed"
## [1] "Tweet number 50000 has analyzed"
## [1] "Tweet number 60000 has analyzed"
## [1] "Tweet number 70000 has analyzed"
## [1] "Tweet number 80000 has analyzed"
## [1] "Tweet number 90000 has analyzed"

Let’s manipulate the data and see what the average sentiment is per tweet

Data Manipulation

head(tweets)
##                                                                                                                                            Tweet
## 1                                                                                                         @benjiwittig WE WANT MIKE DUNLEAVY JR.
## 2   RT @DermotMCole: Gov. Mike Dunleavy's new budget director, who doesn't know anything about Alaska, will assume far more power over the statÂ…
## 3 Unit-Asrc Construction, LLC\nARCTIC SLOPE REGIONAL CORPORATION ENDORSES MIKE DUNLEAVY FOR GOVERNOR\n(Utqiagvik, AK) -Â… https://t.co/NNxdOerCQp
## 4   Wanna hear something odd? The Cavs acquired the draft rights of Albert Miralles for Matthew Dellavedova, traded theÂ… https://t.co/RWVzNUcpfi
## 5   RT @DermotMCole: Gov. Mike Dunleavy's new budget director, who doesn't know anything about Alaska, will assume far more power over the statÂ…
## 6   RT @ktva: On Thursday, a dozen Native organizations sent a letter to newly elected Gov. Mike Dunleavy, asking that he make funding availablÂ…
##       Candidate PositiveScore NegativeScore SentimentScore
## 1 Mike Dunleavy             0             0              0
## 2 Mike Dunleavy             0             0              0
## 3 Mike Dunleavy             1             0              1
## 4 Mike Dunleavy             0             1             -1
## 5 Mike Dunleavy             0             0              0
## 6 Mike Dunleavy             0             0              0
#We're interested with the tweets that have sentiment scores, so we need to remove the one's that don't have a positive or negative score.
tweets_manip <- filter(tweets,!(PositiveScore == 0 & NegativeScore == 0))
head(tweets_manip)
##                                                                                                                                            Tweet
## 1 Unit-Asrc Construction, LLC\nARCTIC SLOPE REGIONAL CORPORATION ENDORSES MIKE DUNLEAVY FOR GOVERNOR\n(Utqiagvik, AK) -Â… https://t.co/NNxdOerCQp
## 2   Wanna hear something odd? The Cavs acquired the draft rights of Albert Miralles for Matthew Dellavedova, traded theÂ… https://t.co/RWVzNUcpfi
## 3   On his first day in office, Republican Mike Dunleavy pledged to work for rural Alaskans and began rolling back spenÂ… https://t.co/XTSeerF72V
## 4   RT @ktva: "Alaska cannot be proud of its statistics. We have got to flip the chart on this, and all I can tell Alaskans is this, the primarÂ…
## 5   @Dan_Dunleavy Just listened to your interview on Toronto Mike'd, Dan! I'm a big fan, and just having a little fun wÂ… https://t.co/iTsjo8HMwJ
## 6                                                                  Governor Mike Dunleavy you know donald trump refused to rent to black people.
##       Candidate PositiveScore NegativeScore SentimentScore
## 1 Mike Dunleavy             1             0              1
## 2 Mike Dunleavy             0             1             -1
## 3 Mike Dunleavy             1             0              1
## 4 Mike Dunleavy             1             0              1
## 5 Mike Dunleavy             1             0              1
## 6 Mike Dunleavy             1             1              0
#Get the average positive sentiment, the average negative sentiment, and and the average total sentiment per candidate
tweet_summary <- select(tweets_manip,Candidate:SentimentScore) %>% group_by(Candidate) %>% summarize(positive_average = mean(PositiveScore), negative_average = mean(NegativeScore), average_sentiment = mean(SentimentScore)) %>% arrange(desc(average_sentiment))
tweet_summary
## # A tibble: 74 x 4
##    Candidate       positive_average negative_average average_sentiment
##    <chr>                      <dbl>            <dbl>             <dbl>
##  1 Andria Tupola               2.6             0                 2.6  
##  2 Mary Throne                 1.63            0.104             1.52 
##  3 Walt Maddox                 1.5             0.075             1.42 
##  4 Karl Dean                   1.47            0.344             1.12 
##  5 David Ige                   1.30            0.181             1.12 
##  6 Fred Hubbell                1.54            0.538             1    
##  7 Lupe Valdez                 1.11            0.111             1    
##  8 Paulette Jordan             1.25            0.25              1    
##  9 Jared Polis                 1.39            0.408             0.986
## 10 Janet Mills                 1.40            0.449             0.956
## # ... with 64 more rows

Plotting

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
## 
##     annotate
library(reshape2)

df <- tweet_summary
dfm <- melt(df[,c('Candidate','negative_average','positive_average','average_sentiment')], id.vars = 1)

ggplot(dfm, aes(x = Candidate, y = value)) + geom_bar(aes(fill = variable), stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90, size = 5))

Lets check the total number of negative and positive sentiments

sentiments_sum <- tweets %>% group_by(Candidate) %>% summarize(negative_sentiments = sum(SentimentScore < 0), positive_sentiments = sum(SentimentScore > 0)) %>% mutate(neg_percentage = round(negative_sentiments/(negative_sentiments+positive_sentiments),4)) %>% mutate(pos_percentage = round(positive_sentiments/(negative_sentiments+positive_sentiments),4)) %>% mutate(total_tweets = negative_sentiments+positive_sentiments)
sentiments_sum
## # A tibble: 74 x 6
##    Candidate negative_sentim~ positive_sentim~ neg_percentage
##    <chr>                <int>            <int>          <dbl>
##  1 Adam Lax~               30               11          0.732
##  2 Allan Fu~                3                1          0.75 
##  3 Andrew C~              690             1430          0.326
##  4 Andrew G~              341              517          0.397
##  5 Andria T~                0                5          0    
##  6 Asa Hutc~                6               26          0.188
##  7 Ben Jeal~              225               55          0.804
##  8 Bill Lee               618             2264          0.214
##  9 Bill Sch~               34               28          0.548
## 10 Bill Wal~             1151             1934          0.373
## # ... with 64 more rows, and 2 more variables: pos_percentage <dbl>,
## #   total_tweets <int>

We can see from this that some candidates don’t have nearly as much tweets as others. For the most part the candidates with the least tweets seem to have higher positive sentiment tweets with proportion to their overall tweets

## Warning: Removed 1 rows containing missing values (geom_point).

We can kinda of see that there is a weak relationship between the number of tweets, and the average sentiment of those tweets. It appears from this group of candidates that the more tweets that were mined, the lower their average sentiment was. This might suggest that candidates with bad publicity are often trending on social media.

## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
## Warning: Removed 1 rows containing missing values (geom_point).

From the 4 graphs above it appears that candidates that won differ slightly from those that lost. A majority that candidates that won seem to have average sentiments between 0 and 1. However those that lost seem to have a slightly higher average sentiment score. On the amount of tweets, it appears that the winners were not trending as much, with the exception of a few candidates that had over 2000 tweets mined. For a majority of the winners, less than 2000 tweets were mined each. The percentage of tweets rated positive and negative seem to be balanced. It appears that most winning candidates have positive percentages between 50-75% and negative percentages of 25-50%. For the candidates that lost, the spears in positive and negative percentages seem to be even all across.

## # A tibble: 2 x 5
##     Win AvergePositivePe~ AvergeNegativePe~ AverageTweets AverageSentiment
##   <int>             <dbl>             <dbl>         <dbl>            <dbl>
## 1     0             0.565             0.435          528.            0.247
## 2     1             0.609             0.391          840.            0.246

Conclusion

The statistics for winning and losing candidates are quite similar, and would be a bit difficult to tell the difference between the two soley on just Twitter data. It does appear however that winning candidates have a slightly higher percentage of their tweets that are positive on average, and a slightly lower percentage of their tweets that are negative. However, on average winning candidates have nearly double the amount of tweets (that were mined) despite having an average sentiment that is roughly 5% lower.

Challenges

I do expect the amount of younger aged voters to increase, and in doing so use social media as a platform to express their opinions and views. With this happening, it would increase the availiablity of data which would provide a better understanding of how the general population feels. In some instances, I was not able to entirely mine the 5000 tweets per candidate that I anticipated. This is just because some candidates do not trend as much as I hoped. However, younger generations that are much more familiar with social media, like Twitter would be able to provide that data that could be used to reproduce a smiliar study and possibly provide better results.

Another challenge I came across was the sentiment analysis. Initially I anticipated using MonkeyLearn to perform the sentiment analysis, however the free tier only limits me to 300 tweets per month, and I had 50000+ tweets. I did manage to slightly work around it and found a list of positive and negative sentiment words that helped me perform a simple sentiment scoring.