BAIS-462 Assignment 7

Yahoo Finance Sentiment Analysis

Author

Sadie Liptak

Sentiment Analysis: Ford vs. Tesla

In this assignment, I will perform a sentiment analysis comparing two competitive brands, Ford and Tesla, by examining unstructured text data such as social media reviews or articles. The goal is to answer three questions using sentiment analysis tools, with at least one question involving a chronological component and one using the NRC emotive lexicon. You will collect relevant data, aggregate it as needed, and perform sentiment analysis to address the questions. The analysis should be communicated through tables and visualizations that help interpret the findings. The assignment involves using R to process the data and generate insights into how public sentiment varies over time or across different emotional responses for Ford and Tesla. The results will be summarized in a published HTML document on RPubs, which includes the R scripts used to collect and analyze the data.

# Load necessary libraries
library(tidyverse)
library(tidytext)
library(lubridate)
library(ggplot2)
library(textdata)
library(wordcloud)

Load Data

# Load the data from the CSV file
data <- read.csv("Yahoo Finance business information.csv")

Data Cleaning

# Clean and tokenize the summary text
data_cleaned <- data %>%
  mutate(summary_cleaned = tolower(summary)) %>%
  unnest_tokens(word, summary_cleaned) %>%
  anti_join(stop_words)  # Remove stop words

Sentiment Analysis Using NRC Lexicon

# Load the NRC sentiment lexicon
nrc_sentiments <- get_sentiments("nrc")

# Perform sentiment analysis
sentiment_analysis <- data_cleaned %>%
  inner_join(nrc_sentiments, by = "word") %>%
  count(company_id, sentiment, sort = TRUE) %>%
  mutate(sentiment_score = n)

Questions!

Question 1: Sentiment Change Around Earnings Reports

Here we’ll analyze how sentiment changes around earnings reports for Tesla and Ford. This will show us if there are any trends or spikes in sentiment before or after earnings dates.

# Filter for Ford and Tesla
ford_tesla_sentiment <- sentiment_analysis %>%
  filter(company_id %in% c("F", "TSLA")) %>%
  left_join(data %>% select(company_id, earnings_date), by = "company_id") %>%
  group_by(company_id, earnings_date, sentiment) %>%
  summarize(sentiment_score = sum(sentiment_score), .groups = "drop")

# Stacked bar chart for Ford and Tesla
ggplot(ford_tesla_sentiment, aes(x = as.Date(earnings_date), y = sentiment_score, fill = sentiment)) +
  geom_bar(stat = "identity", position = "stack") +
  facet_wrap(~ company_id, scales = "free_x") +
  labs(title = "Sentiment Trends Around Earnings Reports (Ford vs Tesla)",
       x = "Date",
       y = "Sentiment Score",
       fill = "Sentiment") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

This analysis examines sentiment trends around earnings reports for Ford and Tesla by aggregating sentiment scores for each company, sentiment type (positive, negative, etc.), and earnings date. The data set is filtered to include only Ford (“F”) and Tesla (“TSLA”), and the earnings date is joined from a separate data table. The sentiment scores are then summed for each combination of company, earnings date, and sentiment type. A stacked bar chart is created to visualize the sentiment trends, with the x-axis representing the earnings date, the y-axis showing the sentiment score, and different colors representing different sentiment types. The chart is faceted by company, allowing for a clear comparison of sentiment trends before and after earnings reports for both Ford and Tesla. The x-axis labels are rotated for better readability. This visualization provides insights into how public sentiment shifts around key financial events for each company.

Question 2: Sentiment Breakdown Using the NRC Emotive Lexicon

In this question, we’ll analyze the sentiment breakdown (positive, negative, joy, anger, etc.) for both Tesla and Ford. This will help us see the types of emotions associated with each company.

# Filter for Ford and Tesla
ford_tesla_emotion <- sentiment_analysis %>%
  filter(company_id %in% c("F", "TSLA")) %>%
  group_by(company_id, sentiment) %>%
  summarize(sentiment_score = sum(sentiment_score), .groups = "drop")

# Visualize emotion breakdown for Ford and Tesla
ggplot(ford_tesla_emotion, aes(x = sentiment, y = sentiment_score, fill = company_id)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Emotion Breakdown for Ford vs Tesla",
       x = "Emotion",
       y = "Sentiment Score",
       fill = "Company") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for readability

This analysis compares the emotional sentiment between Ford and Tesla by aggregating sentiment scores for each company and emotion type (positive, negative, etc.). The data is filtered to include only Ford (“F”) and Tesla (“TSLA”) and grouped by company and sentiment type. The total sentiment score for each combination of company and sentiment type is then calculated. A bar plot is generated to visually compare the sentiment scores for each emotion type, with separate bars for each company, allowing for a clear comparison of how different emotions (positive, negative, etc.) are associated with Ford and Tesla. The x-axis labels are rotated for better readability, and the plot provides a breakdown of the emotional sentiment toward each company.

Question 3: General Sentiment Comparison

For this question, we’ll aggregate sentiment for each company (Tesla vs Ford) and compare their overall sentiment.

# Filter for Ford and Tesla, aggregate positive vs. negative sentiment by company
general_sentiment <- sentiment_analysis %>%
  filter(sentiment %in% c("positive", "negative"),
         company_id %in% c("F", "TSLA")) %>%
  group_by(company_id, sentiment) %>%
  summarize(sentiment_score = sum(sentiment_score), .groups = "drop")

# Visualize general sentiment comparison for Ford and Tesla
ggplot(general_sentiment, aes(x = company_id, y = sentiment_score, fill = sentiment)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "General Sentiment Comparison: Ford vs. Tesla",
       x = "Company",
       y = "Sentiment Score") +
  theme_minimal()

This analysis compares the positive and negative sentiment for Ford and Tesla by aggregating sentiment scores from a dataset of sentiment analysis. The data is filtered to include only positive and negative sentiments for these two companies, and then the sentiment scores are summed for each company and sentiment type. A bar plot is created to visually compare the total sentiment scores, with separate bars for positive and negative sentiment for each company. The visualization allows for an easy comparison of customer perceptions, highlighting which company has more positive or negative sentiment overall, providing insights into public opinion of Ford and Tesla.

--- title: "BAIS-462 Assignment 7" subtitle: "Yahoo Finance Sentiment Analysis" author: "Sadie Liptak" editor: visual toc: TRUE format: html: code-tools: TRUE embed-resources: TRUE execute: message: FALSE echo: TRUE warning: FALSE --- ## Sentiment Analysis: Ford vs. Tesla In this assignment, I will perform a sentiment analysis comparing two competitive brands, Ford and Tesla, by examining unstructured text data such as social media reviews or articles. The goal is to answer three questions using sentiment analysis tools, with at least one question involving a chronological component and one using the NRC emotive lexicon. You will collect relevant data, aggregate it as needed, and perform sentiment analysis to address the questions. The analysis should be communicated through tables and visualizations that help interpret the findings. The assignment involves using R to process the data and generate insights into how public sentiment varies over time or across different emotional responses for Ford and Tesla. The results will be summarized in a published HTML document on RPubs, which includes the R scripts used to collect and analyze the data. ```{r} # Load necessary libraries library(tidyverse) library(tidytext) library(lubridate) library(ggplot2) library(textdata) library(wordcloud) ``` ## Load Data ```{r} # Load the data from the CSV file data <- read.csv("Yahoo Finance business information.csv") ``` ## Data Cleaning ```{r} # Clean and tokenize the summary text data_cleaned <- data %>% mutate(summary_cleaned = tolower(summary)) %>% unnest_tokens(word, summary_cleaned) %>% anti_join(stop_words) # Remove stop words ``` ### Sentiment Analysis Using NRC Lexicon ```{r} # Load the NRC sentiment lexicon nrc_sentiments <- get_sentiments("nrc") # Perform sentiment analysis sentiment_analysis <- data_cleaned %>% inner_join(nrc_sentiments, by = "word") %>% count(company_id, sentiment, sort = TRUE) %>% mutate(sentiment_score = n) ``` ## Questions! ### Question 1: Sentiment Change Around Earnings Reports Here we’ll analyze how sentiment changes around earnings reports for Tesla and Ford. This will show us if there are any trends or spikes in sentiment before or after earnings dates. ```{r} # Filter for Ford and Tesla ford_tesla_sentiment <- sentiment_analysis %>% filter(company_id %in% c("F", "TSLA")) %>% left_join(data %>% select(company_id, earnings_date), by = "company_id") %>% group_by(company_id, earnings_date, sentiment) %>% summarize(sentiment_score = sum(sentiment_score), .groups = "drop") # Stacked bar chart for Ford and Tesla ggplot(ford_tesla_sentiment, aes(x = as.Date(earnings_date), y = sentiment_score, fill = sentiment)) + geom_bar(stat = "identity", position = "stack") + facet_wrap(~ company_id, scales = "free_x") + labs(title = "Sentiment Trends Around Earnings Reports (Ford vs Tesla)", x = "Date", y = "Sentiment Score", fill = "Sentiment") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for readability ``` This analysis examines sentiment trends around earnings reports for Ford and Tesla by aggregating sentiment scores for each company, sentiment type (positive, negative, etc.), and earnings date. The data set is filtered to include only Ford ("F") and Tesla ("TSLA"), and the earnings date is joined from a separate data table. The sentiment scores are then summed for each combination of company, earnings date, and sentiment type. A stacked bar chart is created to visualize the sentiment trends, with the x-axis representing the earnings date, the y-axis showing the sentiment score, and different colors representing different sentiment types. The chart is faceted by company, allowing for a clear comparison of sentiment trends before and after earnings reports for both Ford and Tesla. The x-axis labels are rotated for better readability. This visualization provides insights into how public sentiment shifts around key financial events for each company. ### Question 2: Sentiment Breakdown Using the NRC Emotive Lexicon In this question, we’ll analyze the sentiment breakdown (positive, negative, joy, anger, etc.) for both Tesla and Ford. This will help us see the types of emotions associated with each company. ```{r} # Filter for Ford and Tesla ford_tesla_emotion <- sentiment_analysis %>% filter(company_id %in% c("F", "TSLA")) %>% group_by(company_id, sentiment) %>% summarize(sentiment_score = sum(sentiment_score), .groups = "drop") # Visualize emotion breakdown for Ford and Tesla ggplot(ford_tesla_emotion, aes(x = sentiment, y = sentiment_score, fill = company_id)) + geom_bar(stat = "identity", position = "dodge") + labs(title = "Emotion Breakdown for Ford vs Tesla", x = "Emotion", y = "Sentiment Score", fill = "Company") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for readability ``` This analysis compares the emotional sentiment between Ford and Tesla by aggregating sentiment scores for each company and emotion type (positive, negative, etc.). The data is filtered to include only Ford ("F") and Tesla ("TSLA") and grouped by company and sentiment type. The total sentiment score for each combination of company and sentiment type is then calculated. A bar plot is generated to visually compare the sentiment scores for each emotion type, with separate bars for each company, allowing for a clear comparison of how different emotions (positive, negative, etc.) are associated with Ford and Tesla. The x-axis labels are rotated for better readability, and the plot provides a breakdown of the emotional sentiment toward each company. ### Question 3: General Sentiment Comparison For this question, we'll aggregate sentiment for each company (Tesla vs Ford) and compare their overall sentiment. ```{r} # Filter for Ford and Tesla, aggregate positive vs. negative sentiment by company general_sentiment <- sentiment_analysis %>% filter(sentiment %in% c("positive", "negative"), company_id %in% c("F", "TSLA")) %>% group_by(company_id, sentiment) %>% summarize(sentiment_score = sum(sentiment_score), .groups = "drop") # Visualize general sentiment comparison for Ford and Tesla ggplot(general_sentiment, aes(x = company_id, y = sentiment_score, fill = sentiment)) + geom_bar(stat = "identity", position = "dodge") + labs(title = "General Sentiment Comparison: Ford vs. Tesla", x = "Company", y = "Sentiment Score") + theme_minimal() ``` This analysis compares the positive and negative sentiment for Ford and Tesla by aggregating sentiment scores from a dataset of sentiment analysis. The data is filtered to include only positive and negative sentiments for these two companies, and then the sentiment scores are summed for each company and sentiment type. A bar plot is created to visually compare the total sentiment scores, with separate bars for positive and negative sentiment for each company. The visualization allows for an easy comparison of customer perceptions, highlighting which company has more positive or negative sentiment overall, providing insights into public opinion of Ford and Tesla.