Data was obtained here : https://www.kaggle.com/datasets/datadrivendecision/trump-tweets-2009-2025/data

Context :

Complete archive of 90,000+ posts from Donald Trump on X (Twitter) and Truth Social spanning from 2009 to present. Includes timestamps, engagement metrics, and full text content.

For this project I will limit myself to emails Trump posts to others, not responses.
I will filter myself to exclusively to : quote_flag==“False” (Posts with text)

knitr::opts_knit$set(root.dir = "/Users/isaiahmireles/Desktop")

df <- 
  read.csv("Trump folder/trump_tweets_dataset.csv")

Research Q :

Q1) What rhetorical patterns characterize Donald Trump’s public discourse?
Q2) How does the frequency of “hate speech” in Donald Trump’s rhetoric vary over time?
Q3) What is the prevailing sentiment (positive, negative, or neutral) expressed toward groups defined by ethnicity, religion, or sexual orientation & Immigation in Donald Trump’s rhetoric?
Q4) What are the beliefs of trump in regards to education?

Data Basics :

colnames :

colnames(df)

##  [1] "id"             "date"           "platform"       "handle"        
##  [5] "text"           "favorite_count" "repost_count"   "quote_flag"    
##  [9] "repost_flag"    "deleted_flag"   "word_count"     "hashtags"      
## [13] "urls"           "user_mentions"  "media_count"    "media_urls"    
## [17] "post_url"       "in_reply_to"

Change Date format

library(tidyverse)
library(lubridate)

df <- df %>%
  mutate(date = ymd_hms(date))

lst <- as.list(df)

# Number of unique values : 
cardinality <- lapply(lst, function(x){length(unique(x))})

cardinality$platform

## [1] 2

unique(df$platform)

## [1] "Truth Social" "Twitter"

Turns out he made his own social media?

Elon Musk on Buying Twitter and Turning It Into X

Elon Musk carries actual kitchen sink into Twitter HQ amid $44bn purchase

Filtering Obs. :

DELETE : Websites, Images & Videos

before <- nrow(df)

library(stringr)
df <- 
  df |>
  filter(
    quote_flag == "False",
    !str_detect(text, "^\\[Video\\]|^\\[Image\\]")
  ) |>
  filter(!str_detect(text, "^RT")) |>
  filter(!str_detect(text, "^https:// truthsocial.com")) |>
  filter(!str_detect(text, "^https:// justthenews.com")) |>
  filter(!str_detect(text, "^https:// x.com")) |>
  filter(!str_detect(text, "^https")) 

after <- nrow(df)
paste0("We reduced the data by ", after-before, " observations -- 30k obs.")

## [1] "We reduced the data by -28885 observations -- 30k obs."

Notice I am not looking at links to other websites (e.g. Truth Social, YouTube, …etc ) he’s posted – again, I just want what he say’s not what he necessarily reposts
“[Video]” denotes videos
“[Image]” denotes Image

KEEP : Replies

Notice NULL values

replies <- unique(df$in_reply_to) |> as.data.frame()
replies

before <- nrow(df)

df <- df[df$in_reply_to %in% c("", "nan"), ] 

df <- df |> filter(repost_flag != "True") 
after <- nrow(df)
paste0("We reduced the data by ", after-before, " observations -- 4k")

## [1] "We reduced the data by -3649 observations -- 4k"

Relevant Col

cardinality_func <- function(x){length(unique(x))}
df <-
  df |> 
  select(-c(in_reply_to, id, quote_flag, repost_flag)) |> 
  arrange(date) 

df

Overall, I have taken away 32,534 observation – 32k
0.64 percent the original size

Notes :

Q2) How does the frequency of “hate speech” in Donald Trump’s rhetoric vary over time?

Requires^ Definition and parameters to model “hate speech”.

Facts :

Trump is 79 years old
- Donald Trump was 60 when twitter came out
Twitter is 19 years old

Trump Tweets

Isaiah C. Mireles

2026-02-19