Polling and news coverage sentimental analysis

#Abstract: This project will analyze the relationship between presidential polling and news coverage from CNN and Fox news. I would like to understand how news coverage influences a cadidates favorability. My objective is to demonstrate the current trend for the candidates polling, the changes overtime, and check if there is a correlation between news coverage and polling surveyy

sources:

#I will use the presidential polling survey data set from https://projects.fivethirtyeight.com/polls/president-general/ and scrape articles containing Biden and Trump from Foxnews.com and CNN.com, for the past 10 months.

load libraries

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'rvest'
## 
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding
## 
## 
## 
## Attaching package: 'jsonlite'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     flatten
## 
## 
## Loading required package: RColorBrewer
## 
## 
## Attaching package: 'textdata'
## 
## 
## The following object is masked from 'package:httr':
## 
##     cache_info

Read the president polls

Polls <- read.csv("https://raw.githubusercontent.com/Angelogallardo05/DATA607-final/main/president_polls%20(1).csv", header = TRUE)

clean data remove NA and just keep polls contaiing Trump and Biden

here we see that the latest sharpe percentage decrease occured in late 2023, when Joe biden polling fell by about 7%

# Convert 'start_date' to Date format if needed
candidate_subset$start_date <- as.Date(candidate_subset$start_date)

# Extract month and year from 'start_date'
candidate_subset <- candidate_subset %>%
  mutate(month_year = floor_date(start_date, unit = "month"))  # Round down to the nearest month

# Calculate average percentage for each candidate within each month
monthly_avg <- candidate_subset %>%
  group_by(candidate_name, month_year) %>%
  summarise(avg_pct = mean(pct, na.rm = TRUE)) %>%
  ungroup()

## `summarise()` has grouped output by 'candidate_name'. You can override using
## the `.groups` argument.

# Calculate percentage change based on monthly averages
monthly_avg <- monthly_avg %>%
  arrange(candidate_name, month_year) %>%
  group_by(candidate_name) %>%
  mutate(percentage_change = (avg_pct - lag(avg_pct)) / lag(avg_pct) * 100) %>%
  ungroup()

# Plotting using ggplot2
ggplot(monthly_avg, aes(x = month_year, y = percentage_change, color = candidate_name)) +
  geom_line() +
  geom_point() +
  labs(x = "Month", y = "Percentage Change", title = "Monthly Percentage Change of Candidates") +
  theme_minimal()

## Warning: Removed 2 rows containing missing values (`geom_line()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

now lets see the polling overtime for the swing states. So far, Tump is leading in all of the swing states by a good margin

swing_states <- c("Arizona", "Georgia", "Pennsylvania","Michigan","Nevada", "North Carolina", "Wisconsin")  # List of swing states
# Filter and aggregate data
filtered_subset <- candidate_subset %>%
  filter(year(start_date) >= 2023, state %in% swing_states) %>%
  mutate(month_year = floor_date(start_date, "month")) %>%  # Round down to month
  group_by(state, month_year, candidate_name) %>%
  summarize(avg_pct = mean(pct, na.rm = TRUE))

## `summarise()` has grouped output by 'state', 'month_year'. You can override
## using the `.groups` argument.

# Determine the number of unique states
unique_states <- unique(filtered_subset$state)
num_states <- length(unique_states)

# Calculate an appropriate plot height based on the number of states
plot_height <- min(10 * num_states, 120)  # Adjust the multiplier (10) based on your preference

options(repr.plot.width = 10, repr.plot.height = plot_height)  # Adjust width and height

# Plotting with ggplot
ggplot(filtered_subset, aes(x = month_year, y = avg_pct, color = candidate_name)) +
  geom_line() +
  labs(x = "Start Date", y = "Percentage (%)", color = "Candidate") +
  scale_x_date(date_breaks = "2 months", date_labels = "%b %Y") +  # Display every 2 months
  theme(
    axis.text.x = element_text(size = 8, angle = 45, hjust = 1)  # Adjust size for better readability
  ) +
  facet_wrap(~ state, nrow = ceiling(num_states / 3), scales = "free_y")  # Adjust nrow for better layout

Now we will scrape cnn.com with articles caontaining trump and biden

#Trump cnn articles

trump_cnn <- c(
  "https://www.cnn.com/2024/04/30/politics/trump-political-violence-2024-time-interview/index.html",
  "https://www.cnn.com/2024/03/23/politics/trump-campaign-courthouse-bond/index.html",
  "https://www.cnn.com/2024/02/28/politics/supreme-court-immunity-trump-biskupic-analysis/index.html",
  "https://www.cnn.com/2024/01/29/politics/trump-border-middle-east-election/index.html",
  "https://www.cnn.com/2023/12/19/politics/donald-trump-extreme-rhetoric-2024-election/index.html",
  "https://www.cnn.com/2023/11/21/politics/trump-gag-order-appeal-election-year-stress-test/index.html",
  "https://www.cnn.com/2023/10/26/politics/trump-legal-system-rules-2024/index.html",
  "https://www.cnn.com/2023/09/26/politics/trump-organization-business-fraud/index.html",
  "https://www.cnn.com/2023/08/10/politics/trump-legal-drama-grows/index.html",
  "https://www.cnn.com/2023/07/28/politics/trump-classified-documents-case-2024/index.html",
  "https://www.cnn.com/2023/06/10/politics/trump-campaign-indictment-georgia-north-carolina/index.html"
)

# Create DataFrame
trump_cnn_df <- data.frame(URL = trump_cnn)

scrape the trump cnn articles

# Create empty lists to store extracted data
titles <- c()
dates <- c()
texts <- c()
presidents <- c()  # Create an empty list for the President column

# Loop through each URL in the dataframe
for (i in 1:nrow(trump_cnn_df)) {
  url <- trump_cnn_df$URL[i]
  
  # Read HTML content from the URL
  page <- read_html(url)
  
  # Extract title
  title <- page %>%
    html_nodes("title") %>%
    html_text() %>%
    first()  # Take the first element (assuming there's only one title)
  titles <- c(titles, title)
  
  # Extract date
  date <- page %>%
    html_node(xpath = "//*[contains(@class, 'timestamp')]") %>%
    html_text() %>%
    trimws()  # Trim extra spaces
  date <- gsub("^[^,]+,", "", date)
  dates <- c(dates, date)
  
  # Extract text content from JSON-LD script
  json_ld_script <- page %>%
    html_nodes("script[type='application/ld+json']") %>%
    html_text() %>%
    paste(collapse = "\n")
  
  # Parse JSON-LD data
  if (nzchar(json_ld_script)) {  # Check if JSON-LD script is not empty
    json_data <- jsonlite::fromJSON(json_ld_script)  # Use jsonlite:: prefix
    article_text <- json_data$articleBody
  } else {
    article_text <- NA  
  }
  
  texts <- c(texts, article_text)
  
  # Add "Donald Trump" to the presidents list (for each URL)
  presidents <- c(presidents, "Donald Trump")
}

# Create a dataframe from extracted data
trump_cnn_data <- data.frame(
  URL = trump_cnn_df$URL,
  Title = titles,
  Date = dates,
  Text = texts,
  President = presidents,  # Include the new "President" column
  stringsAsFactors = FALSE  # Ensure strings are treated as characters, not factors
)

#Biden cnn articles

biden_cnn <- c(
  "https://www.cnn.com/2024/04/28/politics/biden-trump-nostalgia/index.html",
  "https://www.cnn.com/2024/03/19/politics/joe-biden-western-swing/index.html",
  "https://www.cnn.com/2024/02/22/politics/biden-bully-pulpit-trump-russia/index.html",
  "https://www.cnn.com/2024/01/24/politics/joe-biden-campaign-donald-trump/index.html",
  "https://www.cnn.com/2023/12/07/politics/biden-compromise-border-security/index.html",
  "https://www.cnn.com/2023/11/24/politics/biden-remarks-initial-hostage-release-israel-gaza/index.html",
  "https://www.cnn.com/2023/10/19/politics/biden-oval-office-speech-israel-ukraine/index.html",
  "https://www.cnn.com/2023/09/18/politics/biden-iran-americans-politics/index.html",
  "https://www.cnn.com/2023/08/27/politics/biden-summer-vacation-maui-ukraine-2024-election/index.html",
  "https://www.cnn.com/2023/07/28/politics/biden-economy-maine-trump/index.html",
  "https://www.cnn.com/2023/06/26/politics/joe-biden-russia/index.html"
)

# Create DataFrame
biden_cnn_df <- data.frame(URL = biden_cnn)

scrape Biden articles

# Create empty lists to store extracted data
titles <- c()
dates <- c()
texts <- c()
presidents <- c()  # Create an empty list for the President column

# Loop through each URL in the dataframe
for (i in 1:nrow(biden_cnn_df)) {
  url <- biden_cnn_df$URL[i]
  
  # Read HTML content from the URL
  page1 <- read_html(url)
  
  # Extract title
  title <- page1 %>%
    html_nodes("title") %>%
    html_text() %>%
    first()  # Take the first element (assuming there's only one title)
  titles <- c(titles, title)
  
  # Extract date
  date <- page1 %>%
    html_node(xpath = "//*[contains(@class, 'timestamp')]") %>%
    html_text() %>%
    trimws()  # Trim extra spaces
  date <- gsub("^[^,]+,", "", date)
  dates <- c(dates, date)
  
  # Extract text content from JSON-LD script
  json_ld_script <- page1 %>%
    html_nodes("script[type='application/ld+json']") %>%
    html_text() %>%
    paste(collapse = "\n")
  
  # Parse JSON-LD data
  if (nzchar(json_ld_script)) {  # Check if JSON-LD script is not empty
    json_data <- jsonlite::fromJSON(json_ld_script)  # Use jsonlite:: prefix
    article_text <- json_data$articleBody
  } else {
    article_text <- NA  # Set to NA if JSON-LD script is empty or not found
  }
  
  texts <- c(texts, article_text)
  
  # Add "Joe Biden" to the presidents list (for each URL)
  presidents <- c(presidents, "Joe Biden")
}

# Create a dataframe from extracted data
biden_cnn_data <- data.frame(
  URL = biden_cnn_df$URL,
  Title = titles,
  Date = dates,
  Text = texts,
  President = presidents,  # Include the new "President" column
  stringsAsFactors = FALSE  # Ensure strings are treated as characters, not factors
)

combine the cnn data

combined_cnn_data <- rbind(trump_cnn_data, biden_cnn_data)

# Optional: Reset row names/index of the combined dataframe
rownames(combined_cnn_data) <- NULL

## # A tibble: 6 × 2
##   word       sentiment
##   <chr>      <chr>    
## 1 2-faces    negative 
## 2 abnormal   negative 
## 3 abolish    negative 
## 4 abominable negative 
## 5 abominably negative 
## 6 abominate  negative

cnn positive and negative coverage percentages

combined_cnn_data <- combined_cnn_data %>%
  mutate(Date = lubridate::mdy(Date))

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Date = lubridate::mdy(Date)`.
## Caused by warning:
## ! All formats failed to parse. No formats found.

# Convert 'Date' to month-year format for aggregation
data_with_sentiment <- data_with_sentiment %>%
  mutate(month_year = floor_date(Date, unit = "month"))

# Aggregate sentiment scores by month, president, and sentiment
sentiment_summary <- data_with_sentiment %>%
  group_by(President, month_year, sentiment) %>%
  summarize(count = n()) %>%
  spread(sentiment, count, fill = 0) %>%
  mutate(total_words = positive + negative,
         positive_pct = positive / total_words * 100,
         negative_pct = negative / total_words * 100) %>%
  select(President, month_year, positive_pct, negative_pct)

## `summarise()` has grouped output by 'President', 'month_year'. You can override
## using the `.groups` argument.

#Sentiment scores from cnn

sentiment_scores <- tokenized_data %>%
  inner_join(bing_lexicon, by = "word") %>%
  count(President, sentiment) %>%
  spread(sentiment, n, fill = 0)

# Calculate sentiment percentages
sentiment_scores <- sentiment_scores %>%
  group_by(President) %>%
  mutate(total_words = sum(positive, negative),
         positive_percent = (positive / total_words) * 100,
         negative_percent = (negative / total_words) * 100) %>%
  select(President, positive_percent, negative_percent)


print(sentiment_scores)

## # A tibble: 2 × 3
## # Groups:   President [2]
##   President    positive_percent negative_percent
##   <chr>                   <dbl>            <dbl>
## 1 Donald Trump             45.5             54.5
## 2 Joe Biden                50.5             49.5

cnn coverage over time

presidents <- unique(sentiment_summary$President)

for (president in presidents) {
  # Filter data for the current president
  president_data <- sentiment_summary %>%
    filter(President == president)
  
  # Create line plot for positive and negative sentiment
  plot <- ggplot(president_data, aes(x = month_year)) +
    geom_line(aes(y = positive_pct, color = "Positive")) +
    geom_line(aes(y = negative_pct, color = "Negative")) +
    labs(title = paste("Sentiment Analysis for", president),
         x = 'Month-Year',
         y = 'Percentage',
         color = 'Sentiment') +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_color_manual(values = c("red", "blue"),
                       labels = c("Negative", "Positive"))
  
  
  print(plot)
}

Articles containing Trump on Foxnews

trump_fox <- c(
  "https://www.foxnews.com/politics/trump-leads-biden-important-issues-voters-2024-election-polls-show",
  "https://www.foxnews.com/opinion/issue-cost-trump-2024-campaign",
  "https://www.foxnews.com/politics/challenging-biden-debates-trumps-tells-ingraham-ill-do-it-right-now-your-show",
  "https://www.foxnews.com/politics/trump-starts-2024-in-strongest-possible-position-republican-presidential-primary-race",
  "https://www.foxnews.com/media/2023-year-liberal-outlets-attacked-platforming-trump",
  "https://www.foxnews.com/media/trump-vs-biden-dramatic-difference-media-treat-each-campaign",
  "https://www.foxnews.com/politics/trump-accept-speakership-for-short-period-republicans-decide-permanent-replacement",
  "https://www.foxnews.com/politics/trumps-race-lose-four-half-months-ahead-first-votes-gop-presidential-nomination",
  "https://www.foxnews.com/politics/trump-indicted-fourth-time-he-leads-2024-primary-field",
  "https://www.foxnews.com/politics/trump-reacts-to-abuse-by-doj-this-is-what-you-get-for-leading-the-2024-polls",
  "https://www.foxnews.com/media/trump-calls-nara-radical-left-group-couldve-stuffed-boxes-during-mar-a-lago-document-raid"
)

# Create DataFrame
trump_fox_df <- data.frame(URL = trump_fox)

extract Trump news articles on Foxnews

## Processed URL: https://www.foxnews.com/politics/trump-leads-biden-important-issues-voters-2024-election-polls-show 
## Title: New poll reveals voters prefer Trump on these key issues | Fox News 
## Date: April 22, 2024 7:03pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/opinion/issue-cost-trump-2024-campaign 
## Title: The issue that could cost Trump the 2024 campaign | Fox News 
## Date: March 13, 2024 5:00am EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/challenging-biden-debates-trumps-tells-ingraham-ill-do-it-right-now-your-show 
## Title: On challenging Biden to debates, Trumps tells Ingraham 'I’ll do it right now on your show' | Fox News 
## Date: February 20, 2024 8:00pm EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/trump-starts-2024-in-strongest-possible-position-republican-presidential-primary-race 
## Title: Trump starts 2024 in ‘strongest possible position’ since 2016 in Republican presidential primary race | Fox News 
## Date: January 2, 2024 11:56am EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/media/2023-year-liberal-outlets-attacked-platforming-trump 
## Title: 2023: The year news outlets were shamed for ‘platforming’ Trump | Fox News 
## Date: December 26, 2023 5:00am EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/media/trump-vs-biden-dramatic-difference-media-treat-each-campaign 
## Title: Trump vs. Biden: A dramatic difference in how the media treat each campaign | Fox News 
## Date: November 15, 2023 3:00am EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/trump-accept-speakership-for-short-period-republicans-decide-permanent-replacement 
## Title: Trump would accept House speakership for a ‘short period’ while Republicans decide on a permanent replacement | Fox News 
## Date: October 5, 2023 4:59pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/trumps-race-lose-four-half-months-ahead-first-votes-gop-presidential-nomination 
## Title: It's Trump's race to lose four and a half months before the first votes for the GOP presidential nomination | Fox News 
## Date: September 4, 2023 12:01pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/trump-indicted-fourth-time-he-leads-2024-primary-field 
## Title: With the charges in Georgia, Trump now faces four criminal indictments | Fox News 
## Date: August 15, 2023 7:00am EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/trump-reacts-to-abuse-by-doj-this-is-what-you-get-for-leading-the-2024-polls 
## Title: Trump reacts to 'abuse' by DOJ: 'This is what you get' for leading the polls | Fox News 
## Date: July 27, 2023 6:55pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/media/trump-calls-nara-radical-left-group-couldve-stuffed-boxes-during-mar-a-lago-document-raid 
## Title: Trump calls NARA 'radical left group,' says they could've 'stuffed' the boxes during Mar-a-Lago document raid | Fox News 
## Date: June 19, 2023 8:54pm EDT 
## Text Length: 2

biden_fox <- c(
  "https://www.foxnews.com/politics/new-poll-biden-2024-lead-vanishing-trump-trial",
  "https://www.foxnews.com/politics/locking-it-up-biden-clinches-2024-democrat-presidential-nomination-during-tuesdays-primaries",
  "https://www.foxnews.com/politics/biden-making-unannounced-visit-walter-reed-physical-exam",
  "https://www.foxnews.com/us/biden-brought-chaos-trump-says-hell-bring-success-hunter-biden-heads-court-more-top-headlines",
  "https://www.foxnews.com/politics/bidens-polling-problem-running-re-election-2024-president-ends-2023-underwater",
  "https://www.foxnews.com/politics/white-house-quietly-walks-back-bidens-comment-adding-conditions-sending-assistance-israel",
  "https://www.foxnews.com/politics/biden-shuns-calls-de-escalate-vows-us-israel-back-prepares-ground-war-hamas",
  "https://www.foxnews.com/politics/fox-news-politics-biden-impeachment",
  "https://www.foxnews.com/politics/biden-family-huddles-in-secrecy-exclusive-lake-tahoe-home-special-counsel-investigation-hunter",
  "https://www.foxnews.com/politics/white-house-ignores-republicans-demanding-answers-bidens-knowledge-hunter-bidens-business-dealings",
  "https://www.foxnews.com/politics/biden-admits-not-big-abortion-terminations-last-three-months-negotiated"
)

# Create DataFrame
biden_fox_df <- data.frame(URL = biden_fox)

## Processed URL: https://www.foxnews.com/politics/new-poll-biden-2024-lead-vanishing-trump-trial 
## Title: New poll shows Biden’s 2024 lead vanishing with Trump on trial | Fox News 
## Date: April 24, 2024 7:50pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/locking-it-up-biden-clinches-2024-democrat-presidential-nomination-during-tuesdays-primaries 
## Title: Locking it up: Biden clinches 2024 Democrat presidential nomination during Tuesday's primaries | Fox News 
## Date: March 12, 2024 7:23pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/biden-making-unannounced-visit-walter-reed-physical-exam 
## Title: Biden says he's making unannounced visit to Walter Reed for physical exam | Fox News 
## Date: February 28, 2024 9:24am EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/us/biden-brought-chaos-trump-says-hell-bring-success-hunter-biden-heads-court-more-top-headlines 
## Title: FOX NEWS FIRST NEWSLETTER FOR THURSDAY, JAN 11 2024 | Fox News 
## Date: January 11, 2024 6:24am EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/bidens-polling-problem-running-re-election-2024-president-ends-2023-underwater 
## Title: Biden’s polling problem: Running for re-election in 2024, the President ends 2023 underwater | Fox News 
## Date: December 31, 2023 11:36am EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/white-house-quietly-walks-back-bidens-comment-adding-conditions-sending-assistance-israel 
## Title: White House quietly walks back Biden’s comment on adding conditions for sending assistance to Israel | Fox News 
## Date: November 30, 2023 12:35pm EST 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/biden-shuns-calls-de-escalate-vows-us-israel-back-prepares-ground-war-hamas 
## Title: Biden shuns calls to de-escalate, vows US ‘has Israel’s back’ as it prepares for ground war with Hamas | Fox News 
## Date: October 10, 2023 3:21pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/fox-news-politics-biden-impeachment 
## Title: Fox News Politics: Biden impeachment takes off | Fox News 
## Date: September 12, 2023 1:44pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/biden-family-huddles-in-secrecy-exclusive-lake-tahoe-home-special-counsel-investigation-hunter 
## Title: Biden family huddles in secrecy in exclusive Lake Tahoe home amid special counsel investigation into Hunter | Fox News 
## Date: August 23, 2023 12:16am EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/white-house-ignores-republicans-demanding-answers-bidens-knowledge-hunter-bidens-business-dealings 
## Title: White House ignores Republicans demanding answers on Biden's knowledge of Hunter Biden's business dealings | Fox News 
## Date: July 28, 2023 4:35pm EDT 
## Text Length: 2 
## 
## Processed URL: https://www.foxnews.com/politics/biden-admits-not-big-abortion-terminations-last-three-months-negotiated 
## Title: Biden admits he’s ‘not big on abortion,’ says terminations in ‘last 3 months have to be negotiated’ | Fox News 
## Date: June 29, 2023 8:49am EDT 
## Text Length: 2

## Rows: 44
## Columns: 5
## $ URL       <chr> "https://www.foxnews.com/politics/trump-leads-biden-importan…
## $ Title     <chr> "New poll reveals voters prefer Trump on these key issues | …
## $ Date      <chr> "April 22, 2024 7:03pm EDT", "March 13, 2024 5:00am EDT", "F…
## $ Text      <chr> "Most voters prefer former President Donald Trump to handle …
## $ President <chr> "Donald Trump", "Donald Trump", "Donald Trump", "Donald Trum…

## Rows: 44
## Columns: 5
## $ URL       <chr> "https://www.foxnews.com/politics/trump-leads-biden-importan…
## $ Title     <chr> "New poll reveals voters prefer Trump on these key issues | …
## $ Date      <dttm> 2024-04-22 19:03:00, 2024-03-13 05:00:00, 2024-02-20 20:00:…
## $ Text      <chr> "Most voters prefer former President Donald Trump to handle …
## $ President <chr> "Donald Trump", "Donald Trump", "Donald Trump", "Donald Trum…

fox_tokenized_data <- combined_fox_data %>%
  unnest_tokens(word, Text)



# Join the tokenized data with the bing lexicon to assign sentiment scores
fox_data_with_sentiment <- fox_tokenized_data %>%
  inner_join(bing_lexicon, by = c(word = "word"))

#Fox news sentiment nalysis

fox positive and negative coverage percentages

# Convert 'Date' to month-year format for aggregation
fox_data_with_sentiment <- fox_data_with_sentiment %>%
  mutate(month_year = floor_date(Date, unit = "month"))

# Aggregate sentiment scores by month, president, and sentiment
fox_sentiment_summary <- fox_data_with_sentiment %>%
  group_by(President, month_year, sentiment) %>%
  summarize(count = n()) %>%
  spread(sentiment, count, fill = 0) %>%
  mutate(total_words = positive + negative,
         positive_pct = positive / total_words * 100,
         negative_pct = negative / total_words * 100) %>%
  select(President, month_year, positive_pct, negative_pct)

## `summarise()` has grouped output by 'President', 'month_year'. You can override
## using the `.groups` argument.

# Check the structure of sentiment_summary
print(fox_sentiment_summary)

## # A tibble: 20 × 4
## # Groups:   President, month_year [20]
##    President    month_year          positive_pct negative_pct
##    <chr>        <dttm>                     <dbl>        <dbl>
##  1 Donald Trump 2023-06-01 00:00:00         50           50  
##  2 Donald Trump 2023-07-01 00:00:00         50           50  
##  3 Donald Trump 2023-08-01 00:00:00         33.3         66.7
##  4 Donald Trump 2023-09-01 00:00:00         75           25  
##  5 Donald Trump 2023-10-01 00:00:00        100            0  
##  6 Donald Trump 2023-11-01 00:00:00        100            0  
##  7 Donald Trump 2023-12-01 00:00:00        100            0  
##  8 Donald Trump 2024-01-01 00:00:00         71.4         28.6
##  9 Donald Trump 2024-02-01 00:00:00         54.5         45.5
## 10 Donald Trump 2024-03-01 00:00:00         60           40  
## 11 Donald Trump 2024-04-01 00:00:00         83.3         16.7
## 12 Joe Biden    2023-06-01 00:00:00         50           50  
## 13 Joe Biden    2023-07-01 00:00:00         80           20  
## 14 Joe Biden    2023-08-01 00:00:00         60           40  
## 15 Joe Biden    2023-09-01 00:00:00          0          100  
## 16 Joe Biden    2023-10-01 00:00:00         50           50  
## 17 Joe Biden    2023-12-01 00:00:00         33.3         66.7
## 18 Joe Biden    2024-02-01 00:00:00          0          100  
## 19 Joe Biden    2024-03-01 00:00:00         66.7         33.3
## 20 Joe Biden    2024-04-01 00:00:00         40           60

#fox sentiment scores. Here we see that Fox news is indeed very favorable towards Trump while impartial towards Biden.

fox_sentiment_scores <- fox_tokenized_data %>%
  inner_join(bing_lexicon, by = "word") %>%
  count(President, sentiment) %>%
  spread(sentiment, n, fill = 0)

# Calculate sentiment percentages
fox_sentiment_scores <- fox_sentiment_scores %>%
  group_by(President) %>%
  mutate(total_words = sum(positive, negative),
         positive_percent = (positive / total_words) * 100,
         negative_percent = (negative / total_words) * 100) %>%
  select(President, positive_percent, negative_percent)


print(fox_sentiment_scores)

## # A tibble: 2 × 3
## # Groups:   President [2]
##   President    positive_percent negative_percent
##   <chr>                   <dbl>            <dbl>
## 1 Donald Trump             65.7             34.3
## 2 Joe Biden                50               50

fox coverage over time

f_presidents <- unique(fox_sentiment_summary$President)

for (president in f_presidents) {
  # Filter data for the current president
  president_data <- fox_sentiment_summary %>%
    filter(President == president)
  
  # Create line plot for positive and negative sentiment
  plot <- ggplot(president_data, aes(x = month_year)) +
    geom_line(aes(y = positive_pct, color = "Positive")) +
    geom_line(aes(y = negative_pct, color = "Negative")) +
    labs(title = paste("Sentiment Analysis for", president),
         x = 'Month-Year',
         y = 'Percentage',
         color = 'Sentiment') +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_color_manual(values = c("red", "blue"),
                       labels = c("negative", "positive"))
  
  # Print the plot
  print(plot)
}

glimpse(fox_sentiment_summary)

## Rows: 20
## Columns: 4
## Groups: President, month_year [20]
## $ President    <chr> "Donald Trump", "Donald Trump", "Donald Trump", "Donald T…
## $ month_year   <dttm> 2023-06-01, 2023-07-01, 2023-08-01, 2023-09-01, 2023-10-…
## $ positive_pct <dbl> 50.00000, 50.00000, 33.33333, 75.00000, 100.00000, 100.00…
## $ negative_pct <dbl> 50.00000, 50.00000, 66.66667, 25.00000, 0.00000, 0.00000,…

glimpse(sentiment_summary)

## Rows: 22
## Columns: 4
## Groups: President, month_year [22]
## $ President    <chr> "Donald Trump", "Donald Trump", "Donald Trump", "Donald T…
## $ month_year   <date> 2023-06-01, 2023-07-01, 2023-08-01, 2023-09-01, 2023-10-…
## $ positive_pct <dbl> 43.33333, 45.54455, 42.85714, 46.25000, 47.89916, 42.5000…
## $ negative_pct <dbl> 56.66667, 54.45545, 57.14286, 53.75000, 52.10084, 57.5000…

glimpse(monthly_avg)

## Rows: 74
## Columns: 4
## $ candidate_name    <chr> "Donald Trump", "Donald Trump", "Donald Trump", "Don…
## $ month_year        <date> 2021-04-01, 2021-05-01, 2021-06-01, 2021-07-01, 202…
## $ avg_pct           <dbl> 41.06667, 41.00000, 41.60000, 43.10000, 44.75000, 45…
## $ percentage_change <dbl> NA, -0.1623377, 1.4634146, 3.6057692, 3.8283063, 1.6…

combined_sentiments <- bind_rows(data_with_sentiment, fox_data_with_sentiment)
glimpse(combined_sentiments)

## Rows: 2,222
## Columns: 7
## $ URL        <chr> "https://www.cnn.com/2024/04/30/politics/trump-political-vi…
## $ Title      <chr> "Trump doesn’t rule out political violence if he loses, and…
## $ Date       <dttm> 2024-04-30, 2024-04-30, 2024-04-30, 2024-04-30, 2024-04-30…
## $ President  <chr> "Donald Trump", "Donald Trump", "Donald Trump", "Donald Tru…
## $ word       <chr> "trump", "win", "win", "fairness", "contentious", "baseless…
## $ sentiment  <chr> "positive", "positive", "positive", "positive", "negative",…
## $ month_year <dttm> 2024-04-01, 2024-04-01, 2024-04-01, 2024-04-01, 2024-04-01…

negative_words <- combined_sentiments %>%
  filter(sentiment == "negative")

# Group by President and concatenate negative words
negative_words <- negative_words %>%
  group_by(President) %>%
  summarise(negative_text = paste(word, collapse = " "))


# Generate word cloud for each President
for (i in 1:nrow(negative_words)) {
  president <- negative_words$President[i]
  text <- negative_words$negative_text[i]
  
  # Clean the text (optional): remove punctuation, numbers, and extra spaces
  clean_text <- str_replace_all(text, "[^[:alpha:][:space:]]", "")
  
  # Create word cloud
  wordcloud(words = unlist(str_split(clean_text, "\\s+")), 
            min.freq = 1,
            max.words = 50,
            random.order = FALSE,
            colors = brewer.pal(8, "Dark2"),
            main = paste("Negative Word Cloud for", president))
}

## Loading required namespace: tm

## Warning in tm_map.SimpleCorpus(corpus, tm::removePunctuation): transformation
## drops documents

## Warning in tm_map.SimpleCorpus(corpus, function(x) tm::removeWords(x,
## tm::stopwords())): transformation drops documents

## Warning in tm_map.SimpleCorpus(corpus, tm::removePunctuation): transformation
## drops documents

## Warning in tm_map.SimpleCorpus(corpus, function(x) tm::removeWords(x,
## tm::stopwords())): transformation drops documents

Conclusion:

#It looks like there a positive correlation between positive news coverage and polling surveys. The swing states seem to be at partially influence by the news coverage, whether they view Fox news or Cnn. In addition, there does seem to be some bias in coverage from both medias. The negative sentiments can provide clues as to what voters may seem as an important issue

DATA607 Final

angel

2024-05-03