Data was obtained here : https://www.kaggle.com/datasets/datadrivendecision/trump-tweets-2009-2025/data

Context :

Complete archive of 90,000+ posts from Donald Trump on X (Twitter) and Truth Social spanning from 2009 to present. Includes timestamps, engagement metrics, and full text content.

  • For this project I will limit myself to emails Trump posts to others, not responses.

  • I will filter myself to exclusively to : quote_flag==“False” (Posts with text)

knitr::opts_knit$set(root.dir = "/Users/isaiahmireles/Desktop")
df <- 
  read.csv("Trump folder/trump_tweets_dataset.csv")

Research Q :

  • Q1) What rhetorical patterns characterize Donald Trump’s public discourse?

  • Q2) How does the frequency of “hate speech” in Donald Trump’s rhetoric vary over time?

  • Q3) What is the prevailing sentiment (positive, negative, or neutral) expressed toward groups defined by ethnicity, religion, or sexual orientation & Immigation in Donald Trump’s rhetoric?

  • Q4) What are the beliefs of trump in regards to education?

Data Basics :

colnames :

colnames(df)
##  [1] "id"             "date"           "platform"       "handle"        
##  [5] "text"           "favorite_count" "repost_count"   "quote_flag"    
##  [9] "repost_flag"    "deleted_flag"   "word_count"     "hashtags"      
## [13] "urls"           "user_mentions"  "media_count"    "media_urls"    
## [17] "post_url"       "in_reply_to"

Change Date format

library(tidyverse)
library(lubridate)

df <- df %>%
  mutate(date = ymd_hms(date))
lst <- as.list(df)

# Number of unique values : 
cardinality <- lapply(lst, function(x){length(unique(x))})
cardinality$platform
## [1] 2
unique(df$platform)
## [1] "Truth Social" "Twitter"
  • Turns out he made his own social media?

Elon Musk on Buying Twitter and Turning It Into X

Elon Musk carries actual kitchen sink into Twitter HQ amid $44bn purchase

Filtering Obs. :

DELETE : Websites, Images & Videos

before <- nrow(df)

library(stringr)
df <- 
  df |>
  filter(
    quote_flag == "False",
    !str_detect(text, "^\\[Video\\]|^\\[Image\\]")
  ) |>
  filter(!str_detect(text, "^RT")) |>
  filter(!str_detect(text, "^https:// truthsocial.com")) |>
  filter(!str_detect(text, "^https:// justthenews.com")) |>
  filter(!str_detect(text, "^https:// x.com")) |>
  filter(!str_detect(text, "^https")) 

after <- nrow(df)
paste0("We reduced the data by ", after-before, " observations -- 30k obs.")
## [1] "We reduced the data by -28885 observations -- 30k obs."
  • Notice I am not looking at links to other websites (e.g. Truth Social, YouTube, …etc ) he’s posted – again, I just want what he say’s not what he necessarily reposts

  • “[Video]” denotes videos

  • “[Image]” denotes Image

KEEP : Replies

  • Notice NULL values
replies <- unique(df$in_reply_to) |> as.data.frame()
replies
before <- nrow(df)

df <- df[df$in_reply_to %in% c("", "nan"), ] 

df <- df |> filter(repost_flag != "True") 
after <- nrow(df)
paste0("We reduced the data by ", after-before, " observations -- 4k")
## [1] "We reduced the data by -3649 observations -- 4k"

Relevant Col

cardinality_func <- function(x){length(unique(x))}
df <-
  df |> 
  select(-c(in_reply_to, id, quote_flag, repost_flag)) |> 
  arrange(date) 

df
  • Overall, I have taken away 32,534 observation – 32k

  • 0.64 percent the original size

Notes :

  • Q2) How does the frequency of “hate speech” in Donald Trump’s rhetoric vary over time?

Requires^ Definition and parameters to model “hate speech”.

Facts :

  • Trump is 79 years old

    • Donald Trump was 60 when twitter came out
  • Twitter is 19 years old