For this homework assignment, you will use data from Twitter that include tweets (2011 to 2017) from Colorado senators, which can be downloaded from Canvas. Just FYI—some tweets were cut off before Twitter’s character limit; just work with the data you have. The original data are from FiveThirtyEight.
When a question asks you to make a plot, remember to set a theme, title, subtitle, labels, colors, etc. It is up to you how to personalize your plots, but put in some effort and think about making the plotting approach consistent throughout the document. For example, you could use the same theme for all plots. I also like to use the subtitle as a place for the main summary for the viewer.
Colorado is on fire right now and has experienced many wildfires over the years. Let’s examine senators’ tweet activity related to wildfires based on hashtags. Using the character vector of hashtags you extracted in Question 1, search for the hashtags that include “fire” or “wildfire”. How many hashtags included “fire”? How many included “wildfire”?
## [1] 16
## [1] 8
The number of hashtags that included that word “fire” was 16, and the number of hashtags that included that word “wildfire” was 8.
Now, let’s look at general tweets concerning wildfires. First, subset
the data to a dataframe that includes tweets containing the word
“wildfire” and their corresponding timestamp and user. Specifically, (a)
select text, date, and user and
(b) filter to text strings that include the word “wildfire” using
dplyr::filter() and stringr::str_detect().
## # A tibble: 33 × 3
## text created_at user
## <chr> <chr> <chr>
## 1 "Intro'd bill to help #wildfire recovery & prevention e… 10/6/17 1… SenB…
## 2 "Tune in to watch @USDA @forestservice briefing on fixing #… 9/26/17 1… SenB…
## 3 "As #opioid addiction rips through Colorado & the count… 7/5/17 19… SenB…
## 4 "Our thoughts and sympathies are with all of the families a… 7/12/16 2… SenB…
## 5 "Glad to see our wildfire mitigation provision, the Good Ne… 8/6/15 22… SenB…
## 6 "#TBT to speaking with the brave firefighters from @forests… 6/11/15 2… SenB…
## 7 "CO already experiencing climate change - wildfire, drought… 6/2/14 19… SenB…
## 8 "RT @WildernessNow: Great work by @SenBennetCO on linking d… 11/6/13 1… SenB…
## 9 "Time to work in Congress and with @forestservice to invest… 11/5/13 2… SenB…
## 10 "RT @SenateAg: Since 1980, wildfires have caused over $28 b… 11/5/13 2… SenB…
## # ℹ 23 more rows
Which Colorado senator tweets more about wildfires?
## # A tibble: 2 × 2
## # Groups: user [2]
## user n
## <chr> <int>
## 1 SenBennetCO 20
## 2 SenCoryGardner 13
Senator Bennet tweet more about wildfires
Using the same wildfires dataframe, create a summary
table that shows the number of tweets containing the word “wildfire” by
year (2011-2017). Which year has the most tweets about wildfires? Why
might this be the case? (Hint: Think about what happened in the previous
year.)
## # A tibble: 7 × 2
## # Groups: year [7]
## year n
## <dbl> <int>
## 1 2011 2
## 2 2012 3
## 3 2013 13
## 4 2014 1
## 5 2015 6
## 6 2016 5
## 7 2017 3
In 2013 the senators tweeted the most about wildfires in Colorado, this is most likely due to the Black Forest Fire which was the most destructive fire in Colorado until 2020. There were also 3 other wildfires that year making it the worst year for wildfires in Colorado state history (this was broken in 2020 but the data given onlt recorded tweets from 2011 to 2017).
Create a bar chart that answers the question: Are Colorado senators
more active at a certain time of year? Hints: Convert month
to a factor. Fill by user.
Overall, Senator Cory Gardner is more active on twitter than
senator Bennet. It looks like Cory Gardner was more active during the
late summer to early fall season. Both senators were not as active
during November and December.
Create a histogram of tweets by hour of day to visualize when our senators are tweeting.
# set global options for figures, code, warnings, and messages
knitr::opts_chunk$set(fig.width=6, fig.height=4, fig.path="../figs/",
echo=FALSE, warning=FALSE, message=FALSE)
# load in packages
library(tidyverse)
library(dplyr)
library(ggplot2)
library(stringr)
library(lubridate)
# load in data
senator_tweets <- readr::read_csv(file = "senators_co.csv")
# selecting hashtags within the text variable
hashtags <- stringr::str_extract_all(senator_tweets$text, pattern = "#(\\d|\\w)+")
# total number of tweets with hashtags
num_hashtags <- sum(length(hashtags))
print(num_hashtags)
# hashtags that include "fire"
hashtags_fire <- stringr::str_subset(unlist(hashtags), "fire")
print(length(hashtags_fire))
# hashtags that include "wildfire"
hashtags_wildfire <- stringr::str_subset(unlist(hashtags), "wildfire")
print(length(hashtags_wildfire))
# filter to tweets concerning wildfires
wildfire <- senator_tweets %>%
dplyr::select(text, created_at, user) %>%
dplyr::filter(stringr::str_detect(text, "wildfire"))
print(wildfire)
# number of wildfire tweets by senator
senator <- wildfire %>%
group_by(user) %>%
count()
print(senator)
# number of wildfire tweets by year
timing <- wildfire %>%
mutate(date = mdy_hm(created_at),
year = year(date)) %>%
group_by(year) %>%
count()
print(timing)
# create plot of tweets by month and user
monthly_tweets <- senator_tweets %>%
mutate(date = mdy_hm(created_at),
month = month(date)) %>%
group_by(month, user)
ggplot(data = monthly_tweets,
aes(x = month,
fill = user)) +
geom_bar(position = "dodge",
color = "black") +
scale_x_continuous(breaks = seq(from = 1,
to = 12,
by =1),
labels = month.abb) +
labs(x = "Month",
y = "Tweet Count",
title = "Tweets Count of Colorado Senators by Month") +
scale_fill_manual(values = c("SenBennetCO" = "blue", "SenCoryGardner" = "red")) +
theme_minimal()
# create plot of cumulative hourly tweets by senator
hourly_tweets <- senator_tweets %>%
mutate(date = mdy_hm(created_at),
hour = hour(date)) %>%
group_by(hour, user)
ggplot(data = hourly_tweets,
aes(x = hour,
fill = user)) +
geom_histogram(bins = 24,
color = "black") +
scale_x_continuous(breaks = seq(from = 0,
to = 24,
by = 1)) +
labs(x = "Hour of the Day",
y = "Tweet Count",
title = "Tweets Count of Colorado Senators by Hour of the Day") +
scale_fill_manual(values = c("SenBennetCO" = "blue", "SenCoryGardner" = "red")) +
theme_minimal()