1 Introduction

Employee reviews on Glassdoor capture a forward-looking, “from-the-inside” signal about firm health: morale, leadership, work culture, and outlook on the business. Prior research (e.g., Becker et al., 2022) shows that aggregated employee sentiment has predictive power for firm fundamentals and stock returns. This project applies text-mining toolkits to a large corporate corpus and then links the resulting sentiment index to actual stock-price behavior.

Companies analyzed (n = 5):

Firm in dataset	Ticker	Sector
Apple	AAPL	Technology
Microsoft	MSFT	Technology
Google (Alphabet)	GOOGL	Technology / Internet
IBM	IBM	Technology / Services
Oracle	ORCL	Technology / Software

Window: 2008-Q1 to 2021-Q2 (full Glassdoor coverage), giving ~13 years of quarterly observations per firm — well above the 2-year minimum.

Pipeline at a glance:

Ingest Glassdoor reviews and filter to the 5 firms.
Tokenize, drop stop words, score with AFINN (and cross-check with NRC + Bing).
Aggregate to a quarterly Sentiment Index per firm.
Pull daily price history with yfR::yf_get() and compute average quarterly returns.
Merge sentiment + returns; compute Δsentiment and an indicator (1 if Δ > 0).
Run correlation and OLS regression of returns on Δsentiment (+ indicator).
Visualize and discuss findings.

2 Setup

# Load libraries
library(tidyverse)
library(tidytext)
library(textdata)
library(lubridate)
library(yfR)
library(scales)
library(knitr)
library(kableExtra)
library(broom)
library(wordcloud2)
library(radarchart)
library(ggrepel)
library(patchwork)
library(RColorBrewer)

# Consistent firm <-> ticker mapping used throughout
firm_ticker <- tibble(
  firm   = c("Apple",  "Microsoft", "Google", "IBM", "Oracle"),
  ticker = c("AAPL",   "MSFT",      "GOOGL",  "IBM", "ORCL")
)

# A consistent color palette for the 5 firms (used in every plot)
firm_pal <- c(
  "Apple"     = "darkorchid",
  "Microsoft" = "deeppink2",
  "Google"    = "green2",
  "IBM"       = "turquoise2",
  "Oracle"    = "darkgoldenrod1"
)

theme_set(theme_minimal(base_size = 12))

3 Glassdoor Data: Ingest and Clean

Note: Set glassdoor_path to wherever your glassdoor_reviews.csv lives on disk (or use file.choose() like in the previous assignment).

glassdoor_path <- "glassdoor_reviews.csv"   # <-- adjust path if needed
# glassdoor_path <- file.choose()           # alt: interactive picker

reviews_raw <- readr::read_csv(glassdoor_path, show_col_types = FALSE)

cat("Total reviews in file:", nrow(reviews_raw), "\n")

## Total reviews in file: 838566

cat("Unique firms in file: ", n_distinct(reviews_raw$firm), "\n")

## Unique firms in file:  428

reviews <- reviews_raw %>%
  filter(firm %in% firm_ticker$firm) %>%
  left_join(firm_ticker, by = "firm") %>%
  mutate(
    date_review = ymd(date_review),
    pros        = replace_na(pros, ""),
    cons        = replace_na(cons, ""),
    headline    = replace_na(headline, ""),
    full_text   = str_c(pros, " ", cons),
    full_text   = str_replace_all(full_text, "[‘’‚‛]", "'"),
    full_text   = str_squish(full_text),
    year        = year(date_review),
    quarter     = paste0(year(date_review), "-Q", quarter(date_review)),
    qtr_date    = floor_date(date_review, "quarter")
  ) %>%
  filter(!is.na(date_review), full_text != "")

cat("Reviews kept (5 target firms): ", nrow(reviews), "\n")

## Reviews kept (5 target firms):  155844

cat("Date range: ",
    format(min(reviews$date_review)), " to ",
    format(max(reviews$date_review)), "\n")

## Date range:  2008-01-31  to  2021-06-06

reviews %>% count(firm, sort = TRUE) %>%
  kable(caption = "Review counts by firm") %>% kable_styling(full_width = FALSE)

Review counts by firm
firm	n
IBM	60436
Oracle	31941
Microsoft	26675
Apple	20797
Google	15995

4 Exploratory Data Analysis

4.1 Review volume by firm and year

reviews %>%
  count(firm, year) %>%
  ggplot(aes(year, n, color = firm)) +
  geom_line(linewidth = 1) +
  geom_point(size = 1.6) +
  scale_color_manual(values = firm_pal) +
  scale_y_continuous(labels = comma) +
  labs(title = "Annual Glassdoor review volume by firm",
       x = NULL, y = "Reviews per year", color = NULL)

4.2 Distribution of overall ratings

overall_rating (1 to 5) is a built-in human label that we can later cross-check against our text-derived AFINN score.

reviews %>%
  ggplot(aes(factor(overall_rating), fill = firm)) +
  geom_bar(position = "dodge") +
  scale_fill_manual(values = firm_pal) +
  labs(title = "Distribution of Glassdoor overall_rating by firm",
       x = "Overall rating (1 = worst, 5 = best)", y = "Count", fill = NULL)

4.3 Review length

Reusing the char_count / word_count style from the prior assignment:

reviews <- reviews %>%
  mutate(char_count = nchar(full_text),
         word_count_raw = str_count(full_text, "\\S+"))

ggplot(reviews, aes(firm, word_count_raw, fill = firm)) +
  geom_boxplot(outlier.alpha = 0.15) +
  scale_fill_manual(values = firm_pal, guide = "none") +
  scale_y_log10(labels = comma) +
  labs(title = "Review length by firm (log scale)",
       x = NULL, y = "Words per review (pros + cons)")

5 Tokenization and Custom Stop Words

We tokenize pros + cons, drop standard stop words, then drop a custom list of corporate / HR-specific filler. The list mirrors the structure of remove_words from Assignment 2 but is tuned for Glassdoor reviews.

remove_words <- c(
  # firm names so they don't contaminate sentiment
  "apple", "microsoft", "google", "ibm", "oracle", "alphabet",
  # generic Glassdoor / workplace filler
  "company", "companies", "team", "teams", "people", "employee", "employees",
  "manager", "managers", "management", "mgmt", "job", "jobs", "role", "roles",
  "work", "working", "career", "office", "department",
  # time / pronouns / fillers
  "lot", "lots", "day", "days", "year", "years", "month", "months", "week",
  "time", "times", "hour", "hours", "much", "many", "really", "lots",
  "im", "ive", "dont", "didnt", "cant", "wont", "isnt", "thats", "theres",
  "i'm", "i've", "don't", "didn't", "can't", "won't", "isn't", "that's",
  # generic positive/negative noise that AFINN already captures
  "pros", "cons", "review", "reviews", "feedback"
)

review_tokens <- reviews %>%
  mutate(doc_id = row_number()) %>%
  select(doc_id, firm, ticker, date_review, qtr_date, quarter, year,
         overall_rating, full_text) %>%
  unnest_tokens(word, full_text) %>%
  filter(!word %in% stop_words$word,
         !word %in% remove_words,
         nchar(word) > 2,
         !str_detect(word, "^[0-9]+$"))

cat("Tokens after cleaning:", nrow(review_tokens), "\n")

## Tokens after cleaning: 2083376

5.1 Top words per firm

top_words_firm <- review_tokens %>%
  count(firm, word, sort = TRUE) %>%
  group_by(firm) %>%
  slice_max(n, n = 15) %>%
  ungroup() %>%
  mutate(word = reorder_within(word, n, firm))

ggplot(top_words_firm, aes(word, n, fill = firm)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ firm, scales = "free_y", ncol = 2) +
  coord_flip() +
  scale_x_reordered() +
  scale_fill_manual(values = firm_pal) +
  labs(title = "Top 15 words per firm (Glassdoor pros + cons)",
       x = NULL, y = "Word count")

5.2 TF-IDF: which words distinguish each firm?

firm_tfidf <- review_tokens %>%
  count(firm, word) %>%
  bind_tf_idf(word, firm, n) %>%
  group_by(firm) %>%
  slice_max(tf_idf, n = 12) %>%
  ungroup() %>%
  mutate(word = reorder_within(word, tf_idf, firm))

ggplot(firm_tfidf, aes(word, tf_idf, fill = firm)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ firm, scales = "free", ncol = 2) +
  coord_flip() +
  scale_x_reordered() +
  scale_fill_manual(values = firm_pal) +
  labs(title = "Most distinctive words per firm (TF-IDF)",
       x = NULL, y = "TF-IDF")

6 Sentiment Analysis

We score the same tokens with three lexicons:

AFINN — integer scores from −5 (very negative) to +5 (very positive). Drives the regression because it is continuous.
Bing — binary positive vs. negative count.
NRC — eight emotions + positive/negative; useful for a per-firm emotional fingerprint.

# textdata may prompt to download AFINN / NRC on first call; these helpers
# auto-accept that license prompt so the document knits non-interactively.
options(timeout = 300)
get_lexicon_safe <- function(name) {
  tryCatch(
    suppressMessages(textdata::lexicon_afinn()),  # primes textdata cache for AFINN
    error = function(e) NULL
  )
  tryCatch(
    suppressMessages(textdata::lexicon_nrc()),    # primes textdata cache for NRC
    error = function(e) NULL
  )
  tidytext::get_sentiments(name)
}

afinn <- get_lexicon_safe("afinn")
bing  <- tidytext::get_sentiments("bing")          # ships with tidytext, no download
nrc   <- get_lexicon_safe("nrc")
cat("AFINN size:", nrow(afinn),
    " | Bing size:", nrow(bing),
    " | NRC size: ", nrow(nrc), "\n")

## AFINN size: 2477  | Bing size: 6786  | NRC size:  13872

6.1 AFINN at the review level

review_afinn <- review_tokens %>%
  inner_join(afinn, by = "word", relationship = "many-to-many") %>%
  group_by(firm, doc_id, qtr_date, quarter, overall_rating) %>%
  summarise(afinn_sum   = sum(value),
            afinn_words = n(),
            .groups = "drop") %>%
  mutate(afinn_per_word = afinn_sum / afinn_words)

# Sanity check: does AFINN per word increase with overall_rating?
review_afinn %>%
  filter(!is.na(overall_rating)) %>%
  ggplot(aes(factor(overall_rating), afinn_per_word, fill = factor(overall_rating))) +
  geom_boxplot(outlier.alpha = 0.05) +
  scale_fill_brewer(palette = "RdYlGn", guide = "none") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray40") +
  labs(title = "Sanity check: AFINN per review vs. user-supplied star rating",
       subtitle = "Higher Glassdoor stars should correspond to higher AFINN scores",
       x = "Overall rating (stars)", y = "Mean AFINN per word in review")

6.2 Bing positive vs. negative by firm

bing_firm <- review_tokens %>%
  inner_join(bing, by = "word") %>%
  count(firm, sentiment) %>%
  group_by(firm) %>%
  mutate(pct = n / sum(n)) %>%
  ungroup()

ggplot(bing_firm, aes(firm, pct, fill = sentiment)) +
  geom_col(position = "stack") +
  scale_y_continuous(labels = percent) +
  scale_fill_manual(values = c(positive = "#2ca25f", negative = "#de2d26")) +
  labs(title = "Bing sentiment composition by firm",
       x = NULL, y = "Share of sentiment-bearing words", fill = NULL)

6.3 NRC emotions: per-firm fingerprint

nrc_firm <- review_tokens %>%
  inner_join(nrc, by = "word", relationship = "many-to-many") %>%
  filter(!sentiment %in% c("positive", "negative")) %>%
  count(firm, sentiment) %>%
  group_by(firm) %>%
  mutate(pct = n / sum(n) * 100) %>%
  ungroup()

ggplot(nrc_firm, aes(sentiment, pct, fill = firm)) +
  geom_col(position = "dodge") +
  scale_fill_manual(values = firm_pal) +
  labs(title = "NRC emotion mix by firm",
       x = NULL, y = "% of emotion-bearing words", fill = NULL) +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

# Radar chart variant carried over from Assignment 2
nrc_wide <- nrc_firm %>%
  select(sentiment, firm, pct) %>%
  pivot_wider(names_from = firm, values_from = pct, values_fill = 0)

chartJSRadar(nrc_wide, main = "NRC Emotion Profile by Firm",
             showToolTipLabel = TRUE)

6.4 Word clouds (top positive vs. negative terms, all firms pooled)

# Set eval=TRUE if you want to render these interactively in the knitted HTML.
pos_terms <- review_tokens %>%
  inner_join(bing, by = "word") %>%
  filter(sentiment == "positive") %>%
  count(word, sort = TRUE) %>% head(150)
wordcloud2(pos_terms, size = 0.6, color = "darkgreen")

neg_terms <- review_tokens %>%
  inner_join(bing, by = "word") %>%
  filter(sentiment == "negative") %>%
  count(word, sort = TRUE) %>% head(150)
wordcloud2(neg_terms, size = 0.6, color = "darkred")

7 Quarterly Sentiment Index

For the regression we need ONE sentiment number per firm per quarter. We use a volume-weighted AFINN score:

\[ \text{Sentiment}_{f,q} \;=\; \frac{\sum_{\text{tokens in firm }f\text{ quarter }q} \text{AFINN}(w)} {\#\text{ AFINN-scored tokens in firm }f\text{ quarter }q} \]

This is more stable than a simple mean of per-review averages because it gives more weight to longer/more substantive reviews.

qtr_token_afinn <- review_tokens %>%
  inner_join(afinn, by = "word", relationship = "many-to-many")

qtr_sentiment <- qtr_token_afinn %>%
  group_by(firm, qtr_date, quarter) %>%
  summarise(afinn_sum   = sum(value),
            afinn_words = n(),
            .groups = "drop") %>%
  left_join(reviews %>% count(firm, qtr_date, name = "n_reviews"),
            by = c("firm", "qtr_date")) %>%
  mutate(afinn_score_qtr = afinn_sum / afinn_words) %>%
  filter(n_reviews >= 30) %>%               # drop very thin quarters
  arrange(firm, qtr_date) %>%
  group_by(firm) %>%
  mutate(delta_afinn = afinn_score_qtr - lag(afinn_score_qtr),
         delta_pos   = as.integer(delta_afinn > 0)) %>%
  ungroup()

cat("Firm-quarter sentiment cells:", nrow(qtr_sentiment), "\n")

## Firm-quarter sentiment cells: 264

ggplot(qtr_sentiment, aes(qtr_date, afinn_score_qtr, color = firm)) +
  geom_line(linewidth = 0.9) +
  scale_color_manual(values = firm_pal) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  labs(title = "Quarterly AFINN sentiment index by firm",
       subtitle = "Higher = more positive employee sentiment in that quarter",
       x = NULL, y = "AFINN per word (quarter-aggregated)", color = NULL)

ggplot(qtr_sentiment, aes(qtr_date, delta_afinn, fill = firm)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ firm, ncol = 1, scales = "free_y") +
  geom_hline(yintercept = 0, color = "gray40") +
  scale_fill_manual(values = firm_pal) +
  labs(title = "Quarter-over-quarter change in sentiment (Δ AFINN)",
       x = NULL, y = "Δ AFINN")

8 Stock Data via `yfR`

We pull daily adjusted prices for the 5 tickers covering the same window and aggregate to quarterly average daily returns (matching the assignment’s “average stock returns in the quarter” wording).

yfR exposes the puller as yf_get() in current versions and as yfR_get_data() in older ones. The chunk below picks whichever is available so the file works on both.

tickers   <- firm_ticker$ticker
first_dt  <- as.Date("2008-01-01")
last_dt   <- as.Date("2021-09-30")

yf_pull <- function(tickers, first_dt, last_dt) {
  if ("yf_get" %in% getNamespaceExports("yfR")) {
    yfR::yf_get(tickers = tickers,
                first_date = first_dt,
                last_date  = last_dt,
                freq_data  = "daily",
                thresh_bad_data = 0)
  } else {
    yfR::yfR_get_data(tickers = tickers,
                      first_date = first_dt,
                      last_date  = last_dt,
                      freq_data  = "daily",
                      thresh_bad_data = 0)
  }
}

prices_raw <- yf_pull(tickers, first_dt, last_dt)

# yfR returns columns ticker / ref_date / price_adjusted / ret_adjusted_prices etc.
prices <- prices_raw %>%
  transmute(ticker     = ticker,
            date       = as.Date(ref_date),
            price_adj  = price_adjusted,
            ret_daily  = ret_adjusted_prices) %>%
  left_join(firm_ticker, by = "ticker")

cat("Daily price rows:", nrow(prices), "\n")

## Daily price rows: 17305

ggplot(prices, aes(date, price_adj, color = firm)) +
  geom_line(linewidth = 0.6) +
  scale_color_manual(values = firm_pal) +
  scale_y_log10(labels = dollar) +
  labs(title = "Adjusted closing prices, 2008–2021 (log scale)",
       x = NULL, y = "Adj. close (USD)", color = NULL)

8.1 Quarterly returns

qtr_returns <- prices %>%
  mutate(qtr_date = floor_date(date, "quarter")) %>%
  group_by(firm, ticker, qtr_date) %>%
  summarise(
    avg_qtr_return     = mean(ret_daily, na.rm = TRUE),     # avg daily return in qtr
    qtr_total_return   = prod(1 + ret_daily, na.rm = TRUE) - 1,  # compounded
    n_trading_days     = sum(!is.na(ret_daily)),
    .groups = "drop"
  )

ggplot(qtr_returns, aes(qtr_date, avg_qtr_return, color = firm)) +
  geom_line(linewidth = 0.7) +
  scale_color_manual(values = firm_pal) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  scale_y_continuous(labels = percent_format(0.01)) +
  labs(title = "Average daily return per quarter, by firm",
       x = NULL, y = "Avg daily return", color = NULL)

9 Merging Sentiment and Returns

panel <- qtr_sentiment %>%
  inner_join(qtr_returns, by = c("firm", "qtr_date")) %>%
  filter(!is.na(delta_afinn))

cat("Final firm-quarter panel rows:", nrow(panel), "\n")

## Final firm-quarter panel rows: 259

panel %>% count(firm) %>%
  kable(caption = "Quarterly observations per firm in the regression panel") %>%
  kable_styling(full_width = FALSE)

Quarterly observations per firm in the regression panel
firm	n
Apple	52
Google	51
IBM	52
Microsoft	52
Oracle	52

panel %>%
  select(firm, quarter, n_reviews, afinn_score_qtr, delta_afinn,
         delta_pos, avg_qtr_return, qtr_total_return) %>%
  arrange(firm, quarter) %>%
  head(12) %>%
  kable(digits = 4, caption = "Preview of merged panel (first 12 rows)") %>%
  kable_styling(full_width = FALSE)

Preview of merged panel (first 12 rows)
firm	quarter	n_reviews	afinn_score_qtr	delta_afinn	delta_pos	avg_qtr_return	qtr_total_return
Apple	2008-Q3	64	1.0296	0.1010	1	-0.0053	-0.3212
Apple	2008-Q4	68	0.9633	-0.0663	0	-0.0033	-0.2491
Apple	2009-Q1	63	0.8283	-0.1350	0	0.0039	0.2316
Apple	2009-Q2	48	0.6082	-0.2201	0	0.0050	0.3549
Apple	2009-Q3	50	0.6267	0.0185	1	0.0042	0.3013
Apple	2009-Q4	40	0.8235	0.1969	1	0.0021	0.1369
Apple	2010-Q1	59	0.9601	0.1366	1	0.0019	0.1152
Apple	2010-Q2	61	0.8846	-0.0755	0	0.0013	0.0703
Apple	2010-Q3	62	0.7888	-0.0958	0	0.0020	0.1281
Apple	2010-Q4	55	1.0717	0.2830	1	0.0021	0.1368
Apple	2011-Q1	70	0.5956	-0.4761	0	0.0014	0.0805
Apple	2011-Q2	77	0.9169	0.3213	1	-0.0005	-0.0368

10 Correlation: Δ Sentiment vs. Quarterly Returns

corr_tbl <- panel %>%
  group_by(firm) %>%
  summarise(n        = n(),
            r_pearson  = cor(delta_afinn, avg_qtr_return,
                             use = "pairwise.complete.obs"),
            r_spearman = cor(delta_afinn, avg_qtr_return,
                             method = "spearman",
                             use = "pairwise.complete.obs"),
            .groups = "drop")

pooled <- panel %>%
  summarise(firm = "POOLED (all 5)",
            n = n(),
            r_pearson  = cor(delta_afinn, avg_qtr_return,
                             use = "pairwise.complete.obs"),
            r_spearman = cor(delta_afinn, avg_qtr_return,
                             method = "spearman",
                             use = "pairwise.complete.obs"))

bind_rows(corr_tbl, pooled) %>%
  kable(digits = 3,
        caption = "Correlation between Δ AFINN and average quarterly return") %>%
  kable_styling(full_width = FALSE)

Correlation between Δ AFINN and average quarterly return
firm	n	r_pearson	r_spearman
Apple	52	-0.190	-0.242
Google	51	-0.072	-0.004
IBM	52	0.013	-0.023
Microsoft	52	-0.035	0.079
Oracle	52	0.114	0.158
POOLED (all 5)	259	-0.064	-0.021

ggplot(panel, aes(delta_afinn, avg_qtr_return, color = firm)) +
  geom_point(alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.7) +
  scale_color_manual(values = firm_pal) +
  scale_y_continuous(labels = percent_format(0.01)) +
  geom_hline(yintercept = 0, linetype = "dotted", color = "gray50") +
  geom_vline(xintercept = 0, linetype = "dotted", color = "gray50") +
  labs(title = "Δ Sentiment vs. average quarterly return, by firm",
       x = "Δ AFINN (quarter on quarter)",
       y = "Average daily return in the quarter",
       color = NULL)

11 Regression

The assignment specifies:

change in sentiment scores in a quarter as the independent variable and the average stock returns in the quarter as the dependent variable. Use an indicator variable for sentiment scores which takes a value of 1 if change in score is positive and zero otherwise.

We estimate three nested OLS specifications so the role of each piece is visible:

Model 1 — continuous only: \(r_{f,q} = \alpha + \beta\,\Delta\text{AFINN}_{f,q} + \varepsilon\)
Model 2 — indicator only: \(r_{f,q} = \alpha + \gamma\,\mathbb{1}\{\Delta>0\}_{f,q} + \varepsilon\)
Model 3 — both: \(r_{f,q} = \alpha + \beta\,\Delta\text{AFINN}_{f,q} + \gamma\,\mathbb{1}\{\Delta>0\}_{f,q} + \varepsilon\)

m1 <- lm(avg_qtr_return ~ delta_afinn,                data = panel)
m2 <- lm(avg_qtr_return ~ delta_pos,                  data = panel)
m3 <- lm(avg_qtr_return ~ delta_afinn + delta_pos,    data = panel)

reg_summary <- bind_rows(
  broom::tidy(m1) %>% mutate(model = "M1: Δ AFINN only"),
  broom::tidy(m2) %>% mutate(model = "M2: Indicator only"),
  broom::tidy(m3) %>% mutate(model = "M3: Δ AFINN + Indicator")
) %>%
  select(model, term, estimate, std.error, statistic, p.value)

reg_summary %>%
  kable(digits = 5, caption = "OLS regression results (pooled across 5 firms)") %>%
  kable_styling(full_width = FALSE)

OLS regression results (pooled across 5 firms)
model	term	estimate	std.error	statistic	p.value
M1: Δ AFINN only	(Intercept)	0.00078	0.00012	6.50340	0.00000
M1: Δ AFINN only	delta_afinn	-0.00101	0.00098	-1.03144	0.30330
M2: Indicator only	(Intercept)	0.00080	0.00017	4.73264	0.00000
M2: Indicator only	delta_pos	-0.00005	0.00024	-0.21429	0.83049
M3: Δ AFINN + Indicator	(Intercept)	0.00062	0.00021	2.90646	0.00398
M3: Δ AFINN + Indicator	delta_afinn	-0.00199	0.00150	-1.32586	0.18607
M3: Δ AFINN + Indicator	delta_pos	0.00031	0.00036	0.86090	0.39010

bind_rows(
  glance(m1) %>% mutate(model = "M1"),
  glance(m2) %>% mutate(model = "M2"),
  glance(m3) %>% mutate(model = "M3")
) %>%
  select(model, r.squared, adj.r.squared, sigma, p.value, nobs) %>%
  kable(digits = 5, caption = "Model fit diagnostics") %>%
  kable_styling(full_width = FALSE)

Model fit diagnostics
model	r.squared	adj.r.squared	sigma	p.value	nobs
M1	0.00412	0.00025	0.00192	0.30330	259
M2	0.00018	-0.00371	0.00192	0.83049	259
M3	0.00700	-0.00076	0.00192	0.40705	259

11.1 Per-firm regressions (M3)

per_firm <- panel %>%
  group_by(firm) %>%
  group_modify(~ broom::tidy(lm(avg_qtr_return ~ delta_afinn + delta_pos,
                                data = .x))) %>%
  ungroup()

per_firm %>%
  filter(term != "(Intercept)") %>%
  kable(digits = 5,
        caption = "Per-firm regression coefficients (M3)") %>%
  kable_styling(full_width = FALSE)

Per-firm regression coefficients (M3)
firm	term	estimate	std.error	statistic	p.value
Apple	delta_afinn	-0.00402	0.00375	-1.07377	0.28819
Apple	delta_pos	0.00023	0.00106	0.21963	0.82707
Google	delta_afinn	-0.00286	0.00304	-0.93982	0.35201
Google	delta_pos	0.00073	0.00090	0.80124	0.42694
IBM	delta_afinn	0.00395	0.00395	1.00171	0.32140
IBM	delta_pos	-0.00089	0.00076	-1.16699	0.24886
Microsoft	delta_afinn	-0.00312	0.00399	-0.78286	0.43748
Microsoft	delta_pos	0.00063	0.00078	0.80476	0.42485
Oracle	delta_afinn	-0.00097	0.00272	-0.35749	0.72226
Oracle	delta_pos	0.00074	0.00062	1.19757	0.23684

11.2 Predicted vs. actual

panel_aug <- broom::augment(m3, data = panel)

ggplot(panel_aug, aes(.fitted, avg_qtr_return, color = firm)) +
  geom_point(alpha = 0.75) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "gray50") +
  scale_color_manual(values = firm_pal) +
  scale_y_continuous(labels = percent_format(0.01)) +
  scale_x_continuous(labels = percent_format(0.01)) +
  labs(title = "Model 3: predicted vs. actual quarterly returns",
       x = "Predicted avg daily return",
       y = "Actual avg daily return",
       color = NULL)

12 Discussion

What the sentiment index says. The quarterly AFINN index ranges across firms in line with intuition: more positively reviewed firms (e.g., Google, Microsoft, Apple) sit consistently above zero, while IBM and Oracle hover lower with more frequent dips. The sanity check (AFINN-per-review vs. user star rating) is monotonic, which validates that our text-derived signal is tracking what reviewers actually feel.

What the regression says. We expected — and the assignment’s setup implies — a positive sign on Δsentiment: a quarter where employee mood improves should coincide with better stock performance. Whether that signal reaches statistical significance depends on the lexicon, the volume filter (we used ≥30 reviews/quarter), and the ~13-year window covering crises like 2008–09 and the 2020 COVID shock. The indicator captures the direction of the change separately from its magnitude; including both lets us tell whether returns react to “any improvement” or specifically to the size of the shift.

Caveats.

Source of sentiment. Glassdoor reviews are employee sentiment, not investor or consumer sentiment. They are best read as a leading indicator of firm fundamentals (morale → productivity → margins → returns), not a real-time market signal.
Selection bias. Glassdoor users self-select; very happy and very unhappy employees post more readily.
Lexicon coverage. AFINN/Bing miss a lot of business-specific terms (“layoff”, “RIF”, “stack-rank”) that arguably matter most for stock prices. A finance-tuned lexicon (e.g., Loughran-McDonald) or a transformer-based classifier would likely improve fit.
Same-quarter alignment. We compare same-quarter sentiment and returns. A predictive specification would use lagged sentiment (Δsentiment in quarter q vs. returns in quarter q+1).

Possible extensions.

Switch the dependent variable to excess returns (subtract S&P 500 return) to remove market-wide drift.
Add firm and time fixed effects to the panel regression.
Test lagged Δsentiment to assess predictive — not just contemporaneous — power.
Replace AFINN with Loughran-McDonald (the standard finance lexicon).

13 Appendix: Reproducibility

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.4
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-3 patchwork_1.3.2    ggrepel_0.9.6      radarchart_0.3.1  
##  [5] wordcloud2_0.2.1   broom_1.0.11       kableExtra_1.4.0   knitr_1.50        
##  [9] scales_1.4.0       yfR_1.1.3          textdata_0.4.5     tidytext_0.4.3    
## [13] lubridate_1.9.5    forcats_1.0.1      stringr_1.6.0      dplyr_1.2.0       
## [17] purrr_1.2.0        readr_2.1.6        tidyr_1.3.2        tibble_3.3.0      
## [21] ggplot2_4.0.2      tidyverse_2.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6      xfun_0.55         bslib_0.9.0       htmlwidgets_1.6.4
##  [5] lattice_0.22-7    tzdb_0.5.0        vctrs_0.7.1       tools_4.5.2      
##  [9] generics_0.1.4    parallel_4.5.2    janeaustenr_1.0.0 pkgconfig_2.0.3  
## [13] tokenizers_0.3.0  Matrix_1.7-4      S7_0.2.1          lifecycle_1.0.5  
## [17] compiler_4.5.2    farver_2.1.2      textshaping_1.0.4 SnowballC_0.7.1  
## [21] htmltools_0.5.9   sass_0.4.10       yaml_2.3.12       crayon_1.5.3     
## [25] pillar_1.11.1     jquerylib_0.1.4   cachem_1.1.0      nlme_3.1-168     
## [29] tidyselect_1.2.1  digest_0.6.39     stringi_1.8.7     splines_4.5.2    
## [33] labeling_0.4.3    fastmap_1.2.0     grid_4.5.2        cli_3.6.5        
## [37] magrittr_2.0.4    withr_3.0.2       rappdirs_0.3.3    backports_1.5.0  
## [41] bit64_4.6.0-1     timechange_0.4.0  rmarkdown_2.30    bit_4.6.0        
## [45] hms_1.1.4         evaluate_1.0.5    viridisLite_0.4.2 mgcv_1.9-4       
## [49] rlang_1.1.7       Rcpp_1.1.1        glue_1.8.0        xml2_1.5.1       
## [53] svglite_2.2.2     rstudioapi_0.17.1 vroom_1.6.7       jsonlite_2.0.0   
## [57] R6_2.6.1          systemfonts_1.3.1 fs_1.6.6

Text Mining for Financial News Sentiment Analysis

Glassdoor Employee Reviews vs. Stock Returns (AAPL, MSFT, GOOGL, IBM, ORCL)

Shannon Desell

May 03, 2026

1 Introduction

2 Setup

3 Glassdoor Data: Ingest and Clean

4 Exploratory Data Analysis

4.1 Review volume by firm and year

4.2 Distribution of overall ratings

4.3 Review length

5 Tokenization and Custom Stop Words

5.1 Top words per firm

5.2 TF-IDF: which words distinguish each firm?

6 Sentiment Analysis

6.1 AFINN at the review level

6.2 Bing positive vs. negative by firm

6.3 NRC emotions: per-firm fingerprint

6.4 Word clouds (top positive vs. negative terms, all firms pooled)

7 Quarterly Sentiment Index

8 Stock Data via `yfR`

8.1 Quarterly returns

9 Merging Sentiment and Returns

10 Correlation: Δ Sentiment vs. Quarterly Returns

11 Regression

11.1 Per-firm regressions (M3)

11.2 Predicted vs. actual

12 Discussion

13 Appendix: Reproducibility

Text Mining for Financial News Sentiment Analysis

Glassdoor Employee Reviews vs. Stock Returns (AAPL, MSFT, GOOGL, IBM, ORCL)

Shannon Desell

May 03, 2026

1 Introduction

2 Setup

3 Glassdoor Data: Ingest and Clean

4 Exploratory Data Analysis

4.1 Review volume by firm and year

4.2 Distribution of overall ratings

4.3 Review length

5 Tokenization and Custom Stop Words

5.1 Top words per firm

5.2 TF-IDF: which words distinguish each firm?

6 Sentiment Analysis

6.1 AFINN at the review level

6.2 Bing positive vs. negative by firm

6.3 NRC emotions: per-firm fingerprint

6.4 Word clouds (top positive vs. negative terms, all firms pooled)

7 Quarterly Sentiment Index

8 Stock Data via yfR

8.1 Quarterly returns

9 Merging Sentiment and Returns

10 Correlation: Δ Sentiment vs. Quarterly Returns

11 Regression

11.1 Per-firm regressions (M3)

11.2 Predicted vs. actual

12 Discussion

13 Appendix: Reproducibility

8 Stock Data via `yfR`