Topics Presentation on Twitter Activity and Senate Approval Ratings

4/28/2021

Predicting Shifts in Senator Approval Ratings Using Twitter API Data and Sentiment Analysis

Presented by Mrugank Dave, Bridget Donkor, Samuel Elalouf, and Dennis Lee

Business Context and Problem

We are finding ways to use the standard twitter API to predict shifts in approval ratings for US Senators over time.
The standard twitter api allows users to collect tweets from individual timelines and provides data on each tweet such as whether or not it was a retweet, how many people liked or “favorited” it, how many times it was retweeted, etc.
Our goal is to use this data to predict shifts in approval ratings over 30 day periods by pairing twitter data with polling data.

Peer Feedback and Insights

We had a lot of helpful peer feedback and tried to use that to inform how we changed our project. Examples:
Some peers helped us realize that our initial plan of using replies might be limited, which turned out to be true, so we shifted focus to the content and metadata of Senators’ tweets themselves.
Some peers worried that if we sampled replies we might include people below voting age or non-citizens. We didn’t worry about this since people who can’t vote can still impact approval. This ended up being irrelevant when we shifted scope.
People also asked if we would use all tweets by a politician or just ones relevant to an election. We decided that all tweets makes sense since even a politicians attitude towards cultural issues could effect their rating.

Data Collection and Hurdles

Initially, we hoped to build sentiment profiles for each tweet based on the sentiment of their response. Twitter does not allow access to retweet without a premium account, so we shifted focus to sentiment and metadata of tweets by senators.

We found a data set that included approval ratings broken down into 30 day periods, stretching back from September 2020 to almost a year prior, that we could use to gauge shifts in approval.

We then selected a set of US senators and pulled a few thousand tweets from each of their timelines.

Data Collection and Hurdles: Part 2

Another problem we faced is that there is no up to date data set that includes Twitter handles for US senators. There is a Harvard data set that is commonly cited for this purpose, but upon inspection it contains numerous errors (wrong handles, press handles, etc.). In response, we built a new CSV file that contained names and handles for the 116th US congress (minus a couple members, such as Kamala Harris, who no longer have senate twitter accounts).

Data Wrangling

We selected a subset of senators (all the senators holding office during the time we had approval raitings for) and then modified this data frame to include Twitter handles for the various senators using the csv we made so we could pull tweets. We then used this to generate a larger dataframe that included a few thousand tweets for each senator.
We tokenized the tweets, removed stopwords etc., and used the bing lexicon to detect positive or negative words.
Then we used the tweet metadata to break the tweets down into chunks that correlate with the 30 day windows in the approval data so we could relate the two sets of data.

The Data

The Data: Part 2

A Party Comparison

Data Analytics

We decided to use a linear regression model to see the relationship between the tweets’ metadata and sentiment score and the approval ratings of the various senators. We ran our predictions, trying to predict shifts in approval in 30 day periods that correlated to our data set.
For predicting shifts in approval ratings between 1 and 100, our model had a Mean Absolute Error of 3.88.
While the MAE is good, we worry that our model may be biased and could be improved by increasing the amount of data.

Conclusions and Insights

Our model can use Twitter metadata paired with sentiment analysis to reasonably predict shifts in approval rating.
Our new (hand made) data set of Twitter handles for the 116th congress solves problems faced by previous data sets (like the Harvard data set).
The data we generated by combining approval data with Twitter data could be expanded for future use.
We learned a lot about the Twitter API and think our project could be expanded to analyze replies in an enterprise context.