Warning: package 'ggplot2' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dslabs)data("trump_tweets")
# Altering the trump_tweets with:# - changing the created_at to just the year-month-day format and extracting year # - creating case_whens for filtering out specific substrings in tweets with grepl() and sorting them into categories # - filter to just 2014-2017 and take away one singular outlier above 50,000 retweetstweets <- trump_tweets %>%mutate(created_at =as.Date(created_at, format ="%Y-%m-%d %h:%m:%s"),year =as.numeric(format(created_at, "%Y")),Category =case_when(grepl("maga|make america great again", text, ignore.case =TRUE) ~"MAGA",grepl("immigrant|illegal|wall|border", text, ignore.case =TRUE) ~"Immigration",grepl("Hillary|Crooked Hillary", text) ~"Hillary Clinton",grepl("Obama|Obamacare", text) ~"Obama",grepl("republican", text, ignore.case =TRUE) ~"Republicans",grepl("nytimes|new york times|cnn|nbc|fake news", text, ignore.case =TRUE) ~"Media/News",grepl("tax|economy|job|trade|stock", text, ignore.case =TRUE) ~"Economy",grepl("China|Russia|Iran|North Korea|Middle East", text) ~"Foreign Policy", ) ) %>%filter(year >=2014& year <2018, retweet_count <50000, !is.na(Category))
Scatterplot Graph
# - if/else statement to start off at size = 1 and increase to a max of size = 5 depending on the y-axis# - manual color scale to match the contents/vibes of the category its attributed toggplot(tweets, aes(x = created_at, y = retweet_count, color = Category)) +geom_point(alpha =0.75, size =ifelse(tweets$retweet_count >10000,tweets$retweet_count/10000, 1)) +labs(title ="Donald Trump's Tweet History By Topic in its Political Context (2014-2017)",x ="Year",y ="Retweet Count", color ="Category") +scale_color_manual(name ="Tweet Topic", values =c("#7fc460","#ea3aa2","#3275ab","#fd6d03","#ff3d1f","#f2be3f", "#504a83", "#e24f3b")) +geom_vline(xintercept =as.Date("2015-06-16"), color ="#eb9e94", linetype ="dotdash", size =0.80) +geom_vline(xintercept =as.Date("2016-11-08"), color ="#e24f3b", linetype ="dotdash", size =0.80) +theme_bw(base_family ="Times", base_size =12)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
My graph is on Donald Trump’s tweet topics from 2014 through 2017 as a scatterplot. I wanted to filter out the specific 2014-2017 period because my focus is on how political his tweets can get and its varied topics. To do so, I used the grepl function to extract key words from his tweets (text column) and assigned them a category of their own, alongside another category of the year it was tweeted. Then, I manually assigned colors to my geom_point, with blue hues being Democrats (Clinton/Obama), reddish ones being Maga/Republicans, and other varied colors for certain topics. Included in my scatterplot are two lines – the first geom_vline being June 16th, 2015, when Trump first announced his candidacy for POTUS. The second, more redder line, indicates November 8th, 2016 (Election Day) – dividing up my scatterplot into three parts (Pre-Candidacy, Candidacy Run, Post-Election/Presidency). A trend via color can be seen that before his candidacy, he would tweet about mostly Obama and a few varied topics with little retweets/popularity on Twitter. After he announced his candidacy, his retweet count began to steadily increase, spoke more about the media and news, and as Election Day approached, his tweets regarding Hillary Clinton spiked in popularity. After Election Day and leading into his presidency his tweets are steadily popular with a variety of topics in his tweets regarding his known talking points such as immigration, the economy, foreign policy. With this graph and a legend to compare colors to, one can tell from the visualization on how election dates can increase retweet count and popularity.