Assignment 8: Text Analytics

Author

Noah Garczewski

Introduction

For this assignment, I wanted to perform sentiment analysis (using the NRC lexicon) on transcripts of press conferences given by Jimmy & Dee Haslam (the owners of the Cleveland Browns) and Kevin Stefanski (the head coach of the Cleveland Browns).

I used https://browns.1rmg.com (a website of the Cleveland Browns press office) to scrape six transcripts, three from the Haslams and three from Stefanski.

For the Haslams, I chose to use their preseason press conferences, given in late July of 2023, 2024, and 2025. Since NFL owners typically only give a few press conferences each year, I thought this would be most representative of their sentiments about the team.

NFL coaches give press conferences much more frequently (during the season, often several per week) and their press conferences are usually much shorter than those of NFL owners. However, for consistency, I chose to use press conferences given by Stefanski on the same day (or nearly the same day) as the Haslams’ press conferences.

Before performing my analysis, I tokenized all of the scraped transcripts and removed stop words. I also removed the names of frequently mentioned individuals and places.

Notably, I did not distinguish between the questions asked by reporters and the answers given, nor did I distinguish between opening statements and responses to questions. This is partly because doing so is very difficult to do because of how the transcript is presented. It is also partly because both the questions and answers are telling for the purposes of sentiment analysis.

# Various frequently occuring names/places to remove from the data
remove_words <-
  c("haslam", "jimmy", "dee", "berry", "andrew", "stefanski", "kevin", "cleveland", "browns", "brown", "deshaun", "watson", "brook", "park", "depodesta", "paul")

# Tokenize the transcripts, remove stop words and the selected words above.
tidy_transcripts <-
  all_transcripts %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words, by = "word") %>%
  filter(!word %in% remove_words)

Question 1

What is the dominant sentiment of Haslam and Stefanski press conferences?

The first thing I wanted to analyze is the dominant sentiment of the press conferences. I wanted to gain a general sense of the primary “vibe” of Haslam and Stefanski press conferences by finding the top 5 sentiments in each transcript.

All of the transcripts I gathered have extremely positive sentiments, with secondary positive sentiments such as “anticipation” and “trust” also appearing very frequently.

This is somewhat surprising given the team’s (extremely) poor performance. However, it makes more sense when considering that the NFL press conferences are more about expressing confidence in the team (however misguided that confidence may be), as opposed to giving frank analyses of the team and its performance.

 tidy_transcripts %>%
   inner_join(nrc, by = "word") %>%
   count(name, sentiment, sort = TRUE) %>%
   group_by(name) %>%
   slice_max(n, n = 5) %>%
   ungroup() %>%
   ggplot(aes(x = reorder(sentiment, n), y = n, fill = sentiment)) +
   geom_col() +
   coord_flip() +
   facet_wrap(~ name, scales = "free_y") +
   labs(title = "Top 5 Sentiments per Transcript",
        x = "Sentiment", 
        y = "Raw Count") +
   theme(legend.position = "none")

Question 2

How does the ratio of positive to negative sentiment words differ across each Stefanski press conference?

Next, I wanted to know the ratio of positive to negative sentiments for Kevin Stefanski’s 2023, 2024, and 2025 press conferences.

Treating 2023 as a kind of “baseline”, there was a substantial increase from 2023 to 2024, indicating more positive sentiments ahead of the 2024 season. This makes sense, as the Browns had an unexpected playoff appearance in 2023, so expectations were high heading into 2024.

Conversely, sentiments became much more negative in the pre-2025 season press conference. This also makes perfect sense. The 2024 season was extremely disappointing and the Browns performed considerably worse than expected. Additionally, a number of head-scratching personnel decisions made the 2025 outlook very poor.

tidy_transcripts %>%
  inner_join(nrc, by = "word") %>%
  filter(name %in% c("stefanski_2023", "stefanski_2024", "stefanski_2025"),
         sentiment %in% c("positive", "negative")) %>%
  count(name, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>%
  mutate(ratio = positive / negative) %>%
  ggplot(aes(name, ratio)) +
  geom_col() +
  geom_text(aes(label = round(ratio, 2)), vjust = -0.4) +
  labs(title = "Positive:Negative Sentiment Ratio",
       x = "Transcript", 
       y = "Ratio")

Question 3

Do the Haslams trust their football personnel?

The final question I was interested in answering was the degree to which the Haslams trust their football personnel. Since “trust” is once of the sentiments in the NRC lexicon, I simply calculated the proportion of words in each of the three Haslam transcripts showing this sentiment.

The proportion of “trust” words declined substantially from 2023 to 2024 and only slightly from 2024 to 2025. On the one hand, this could reflect declining trust in the coaching and management staff. Such a conclusion would be well warranted, given the declining performance of the team.

However, I suspect that there are other factors at play. It would be convenient if press conferences given by ownership focused primarily on football. In reality, ownership often spends much of these press conferences discussing business matters relating to the team or league. In this case, the Haslams’ recent press conferences focused heavily on the new stadium project that has been a subject of discussion in recent years.

In particular, in 2024 and 2025, there were negotiations between the team and various government entities about the project. Since politicians are known to be serially untrustworthy, it is really no surprise that fewer “trust” words appeared in these recent transcripts.

tidy_transcripts %>%
  filter(name %in% c("haslam_2023", "haslam_2024", "haslam_2025")) %>%
  mutate(year = year(date)) %>%
  group_by(year) %>%
  summarize(
    trust_freq = sum(word %in% (nrc %>% filter(sentiment == "trust") %>% pull(word))) / n(),
    .groups = "drop"
  ) %>%
  ggplot(aes(x = factor(year), y = trust_freq)) +
  geom_col(fill = "red") +
  labs(title = "Trust Words in Haslam Press Conferences",
       x = "Year",
       y = "Proportion of 'Trust' Words")