Data was obtained here : https://www.kaggle.com/datasets/datadrivendecision/trump-tweets-2009-2025/data
Complete archive of 90,000+ posts from Donald Trump on X (Twitter) and Truth Social spanning from 2009 to present. Includes timestamps, engagement metrics, and full text content.
For this project I will limit myself to emails Trump posts to others, not responses.
I will filter myself to exclusively to : quote_flag==“False” (Posts with text)
knitr::opts_knit$set(root.dir = "/Users/isaiahmireles/Desktop")
df <-
read.csv("Trump folder/trump_tweets_dataset.csv")
Q1) What rhetorical patterns characterize Donald Trump’s public discourse?
Q2) How does the frequency of “hate speech” in Donald Trump’s rhetoric vary over time?
Q3) What is the prevailing sentiment (positive, negative, or neutral) expressed toward groups defined by ethnicity, religion, or sexual orientation & Immigation in Donald Trump’s rhetoric?
Q4) What are the beliefs of trump in regards to education?
colnames(df)
## [1] "id" "date" "platform" "handle"
## [5] "text" "favorite_count" "repost_count" "quote_flag"
## [9] "repost_flag" "deleted_flag" "word_count" "hashtags"
## [13] "urls" "user_mentions" "media_count" "media_urls"
## [17] "post_url" "in_reply_to"
library(tidyverse)
library(lubridate)
df <- df %>%
mutate(date = ymd_hms(date))
lst <- as.list(df)
# Number of unique values :
cardinality <- lapply(lst, function(x){length(unique(x))})
cardinality$platform
## [1] 2
unique(df$platform)
## [1] "Truth Social" "Twitter"
Elon Musk on Buying Twitter and Turning It Into X
Elon Musk carries actual kitchen sink into Twitter HQ amid $44bn purchase
before <- nrow(df)
library(stringr)
df <-
df |>
filter(
quote_flag == "False",
!str_detect(text, "^\\[Video\\]|^\\[Image\\]")
) |>
filter(!str_detect(text, "^RT")) |>
filter(!str_detect(text, "^https:// truthsocial.com")) |>
filter(!str_detect(text, "^https:// justthenews.com")) |>
filter(!str_detect(text, "^https:// x.com")) |>
filter(!str_detect(text, "^https"))
after <- nrow(df)
paste0("We reduced the data by ", after-before, " observations -- 30k obs.")
## [1] "We reduced the data by -28885 observations -- 30k obs."
Notice I am not looking at links to other websites (e.g. Truth Social, YouTube, …etc ) he’s posted – again, I just want what he say’s not what he necessarily reposts
“[Video]” denotes videos
“[Image]” denotes Image
replies <- unique(df$in_reply_to) |> as.data.frame()
replies
before <- nrow(df)
df <- df[df$in_reply_to %in% c("", "nan"), ]
df <- df |> filter(repost_flag != "True")
after <- nrow(df)
paste0("We reduced the data by ", after-before, " observations -- 4k")
## [1] "We reduced the data by -3649 observations -- 4k"
cardinality_func <- function(x){length(unique(x))}
df <-
df |>
select(-c(in_reply_to, id, quote_flag, repost_flag)) |>
arrange(date)
df
Overall, I have taken away 32,534 observation – 32k
0.64 percent the original size
Requires^ Definition and parameters to model “hate speech”.
Trump is 79 years old
Twitter is 19 years old