Package loading:

library(tidyverse)
library(DT)
library(plotly)               # This package does interactive graphs
library(rtweet)               # This package accesses Twitter data
library(lubridate)  

For this notebook, I am using rtweet which is a package that accesses data from twitter. I am also using lubridate and plotly to help with the data formatting and presentation.

get_token()        # this shows the token. make sure key is the same as consumer_key above
<Token>
<oauth_endpoint>
 request:   https://api.twitter.com/oauth/request_token
 authorize: https://api.twitter.com/oauth/authenticate
 access:    https://api.twitter.com/oauth/access_token
<oauth_app> RTweets for PSYC 541
  key:    9z2JeDsn0047F1W96VWxZCEVV
  secret: <hidden>
<credentials> oauth_token, oauth_token_secret
---

The above R code allows me to access the Twitter API data using my developer account.

  1. What I want to do is examine the Twitter habits of Donald Trump on his personal Twitter account, realDonaldTrump. The R code below pulls data for the most recent 10,000 Tweets.
trump_tweets <- get_timeline("realDonaldTrump", n = 10000)
  1. One interesting question is: which hashtags does Donald Trump use the most? The following R code creates a table of every hashtag Donald Trump has used along with the number of times that he has used it, ranked in order from most uses to least.
trump_tweets %>% 
  select(hashtags) %>%                   # Focus on the hashtags
  unnest() %>%                           # Separate multiple hashtags
  mutate(hashtags = tolower(hashtags)) %>%      # make all hashtags lowercase
  count(hashtags, sort=TRUE) %>%                # count how often they appear
  datatable()                                   # create an interactive table

NA
NA

As this table shows, the great majority of Donald Trump’s Tweets do not include a hashtag at all, but the next most popular hashtags include maga, usmca, kag2020, 2a, and kag. Apparently, Donald Trump is very fond of using his maga slogan as a hashtag.

  1. Now, I would like to find out how many tweets Donald Trump creates per day. The R code below will display how many tweets Donald Trump created on each day within our dataset.
trump_tweets %>% 
  group_by(day = date(created_at)) %>%    # extract the date, group by it
  summarize(tweets_per_day = n())         # count the number of tweets each day

Just by taking a quick glance, it looks like Donald Trump is a frequent tweeter–sending out sometimes several dozen tweets in a day.

So, I now would like to find out how many tweets Donald Trump has made on average per day. The R code below will find and display this average for us.

trump_tweets %>% 
  group_by(day = date(created_at)) %>%    # extract the date, group by it
  summarize(tweets_per_day = n()) %>%    # count the number of tweets each day
  summarize(mean(tweets_per_day))

Wow! It appears that Donald Trump averages over thirty tweets per day. It seems to me that that’s a lot of tweeting!

  1. I am now curious about what the number of Donald Trump’s tweets per day would like like on a histogram. Of course, it is nice to be using plotly, which allows for a dynamic graph with options for the user to play with built-in to the interface. The R code below creates this plotly graphic for us, which is interactive.
trump_tweets %>%
  mutate(day = date(created_at)) %>% 
  plot_ly(x = ~day) %>%                                        
  add_histogram() %>% 
  layout(title = "Number of tweets from @realDonaldTrump")

It appears that Trump’s tweets come noticabley more frequently on some days than on others.

  1. So, we’ve seen Trump’s tweet counts by day, but now I am curious about whether Donald Trump does most of his tweeting at a certain time of day. The R code below creates a table that shows how many tweets Trump has created by the hour (0 is midnight, 1 is 1 AM, and so on), using Eastern Standard Time, because Washington D.C. uses Eastern Standard Time and that is where Donald Trump lives, and possibly where he has sent most of his tweets from.
trump_tweets %>% 
  mutate(time = with_tz(created_at, "America/New_York")) %>% 
  mutate(time = hour(time)) %>% 
  count(time) %>% 
  datatable(options = (list(pageLength = 24)), rownames = F)

NA

Surprisingly, it looks like Donald Trump has created a good number of tweets at midnight, although it seems that most of his tweeting occurs between 7am and 10am.

Once again, I would like to display this data with plotly because it’s more fun and more intuitive that way. This graph shows us just how quiet Donald Trump’s Twitter account has been at 3 and 4 in the morning, and how prolific he is at 9am.

trump_tweets %>% 
  mutate(time = with_tz(created_at, "America/New_York")) %>%    # convert to Eastern time zone
  mutate(time = hour(time)) %>%                                 # extract the hour
  plot_ly(x = ~time) %>%                                        # create plotly graph
  add_histogram() %>%                                              # make histogram
layout(title = "When Does @realDonaldTrump Tweet?", 
         xaxis = list(title = "Time of Day (0 = midnight)"),
         yaxis = list(title = "Number of Tweets"))
  1. Does Donald Trump create more tweets on some days of the week than on others? Let’s take a look. The R code below creates a table showing each day of the week, and how many tweets Donald Trump has sent overall on each.
trump_tweets %>% 
  mutate(Day = wday(created_at,           # find the weekday that the tweet was created
                    label = T)) %>%       # use labels (Sun, Mon, etc) rather than numbers
  count(Day) %>%                          # count the number of tweets each day
  datatable(rownames = F)

NA

We can see that Donald Trump sends a ton of tweets of Wednesdays especially, and slightly fewer tweets on Mondays, but overall he is about equally likely to send a lot of tweets no matter what day of the week it is.

Now, let’s use R to create another plotly graphic to get a better look at this pattern.

trump_tweets %>% 
  mutate(Day = wday(created_at,           # find the weekday that the tweet was created
                    label = T)) %>%       # use labels (Sun, Mon, etc) rather than numbers
  plot_ly(x = ~Day) %>%
  add_histogram()

This graph illustrates this same pattern; more tweets on Wednesdays, fewer tweets on Mondays, but plenty of tweets no matter what day of the week.

  1. A heatmap is a graphic that uses a gradient ranging from one color to another (for instance, from bright yellow to dark blue) to create a visually-appealing representation of data. In this case, I would like to use plotly to create a heatmap to chart out the frequency of Donald Trump tweets according to both the hour of the day and the day of the week. This helps us to really quickly understand the question of “When does Donald Trump tweet?” - The heat map shows brighter/yellower coloration in the times/days that Donald Trump tweets more, and shows the darker/bluer coloration in those times/days when he tweets less.
trump_tweets %>% 
  mutate(day = wday(created_at, label = T)) %>% 
  mutate(hour = hour(with_tz(created_at, "America/New_York"))) %>% 
  plot_ly(x = ~day, y = ~hour) %>% 
  add_histogram2d(nbinsx = 7, nbinsy = 24) %>%
  layout(title = "When Does @realDonaldTrump Tweet?", 
         xaxis = list(title = "Day of the Week"),
         yaxis = list(title = "Time of Day (0 = Midnight)"))

This heatmap shows us that Donald Trump does the most tweeting on Wednesdays before ten in the morning.

