Final Project ECI 588

Author

Delaney Burns

Understanding Student Perceptions: A Text Mining Analysis of RateMyProfessors Reviews

Prepare

Introduction

Instructors and institutions often rely on student feedback to improve course design and teaching practices. RateMyProfessors.com (RMP) offers a unique source of informal, publicly available student reviews that capture unfiltered perceptions of teaching effectiveness, course rigor, and overall satisfaction. These reviews are rich in natural language, making them ideal for text mining and sentiment analysis.

The goal of this project is to explore how sentiment and thematic content differ between high- and low-rated professors. By combining sentiment analysis using the Bing lexicon and topic modeling using Structural Topic Modeling (STM), this study aims to uncover not only whether students feel positively or negatively about their instructors, but also what they talk about when describing their classroom experiences. These insights can inform instructional design, course evaluation practices, and student support strategies.

Research Questions

How does sentiment differ between high- and low-rated professors on RateMyProfessors.com?
What topics or themes do students emphasize when reviewing high- versus low-rated instructors?

Load Libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tidytext)
library(readr)
library(stringr)
library(ggplot2)
library(wordcloud)

Loading required package: RColorBrewer

library(textdata)
library(knitr)
library(dplyr)
library(tm)

Loading required package: NLP

Attaching package: 'NLP'

The following object is masked from 'package:ggplot2':

    annotate

library(topicmodels)
library(SnowballC)
library(stm)

stm v1.3.7 successfully loaded. See ?stm for help. 
 Papers, resources, and other materials at structuraltopicmodel.com

library(ldatuning)
library(LDAvis)
library(wordcloud)

Load, Combine, and Preview RMP Data from Multiple CSV Files

Multiple CSV files were combined into one dataset. Columns relevant to the analysis such as professor name, star rating, department, and free-text comments were selected for further processing.

folder_path <- "data"
file_list <- list.files(path = folder_path, pattern = "\\.csv$", full.names = TRUE)


combined_data <- file_list %>%
  lapply(function(file) read_csv(file, col_types = cols(.default = "c"))) %>%
  bind_rows()

combined_data <- combined_data %>%
  select(professor_name, school_name, department_name, star_rating, diff_index, comments)

glimpse(combined_data)

Rows: 19,685
Columns: 6
$ professor_name  <chr> "Robert  Olshansky", "Marshall  Levett", "Marshall  Le…
$ school_name     <chr> "University Of Illinois at Urbana-Champaign", "Austin …
$ department_name <chr> "Urban & Regional Planning department", "Counseling de…
$ star_rating     <chr> "3.5", "5.0", "5.0", "3.6", "3.6", "3.6", "3.6", "3.0"…
$ diff_index      <chr> "2.0", "1.0", "1.0", "4.5", "4.5", "4.5", "4.5", "2.0"…
$ comments        <chr> "Good guy, laid back and interested in his field. Clas…

Wrangle

Clean & Transform Data

Ratings were converted to numeric values, and reviews were labeled as “High” (4 or 5 stars) or “Low” (below 4). Reviews without text or ratings were filtered out to ensure clean input for text analysis.

combined_data <- combined_data %>%
  mutate(
    star_rating = as.numeric(star_rating),
    diff_index = as.numeric(diff_index)
  ) %>%
  filter(!is.na(comments), !is.na(star_rating))


combined_data <- combined_data %>%
  mutate(rating_group = if_else(star_rating >= 4, "High", "Low"))

Tokenization and Word Frequency Visualization

Review comments were broken into individual words using tokenization. Stop words were removed, and the most common words were visualized using a word cloud to highlight frequent student language patterns.

comment_words <- combined_data %>%
  select(professor_name, rating_group, comments) %>%
  unnest_tokens(word, comments) %>%
  anti_join(stop_words, by = "word")

word_counts <- comment_words %>%
  count(word, sort = TRUE)

set.seed(123)

wordcloud(words = word_counts$word,
          freq = word_counts$n,
          max.words = 50,
          random.order = FALSE,
          scale = c(3, 0.5),
          colors = brewer.pal(7, "Dark2"))

Word Cloud Interpretation: Words like “class,” “professor,” and “teacher” appeared most frequently, indicating their central role in student feedback on Rate My Professor.

Prepare Documents for STM

Text reviews were cleaned and lowercased. Then they were formatted into the structure required by STM, including documents, vocabulary, and metadata. Based upon the word cloud, certain words were removed to improve the clarity of the topics.

#sample_data <- combined_data %>%
 # filter(!is.na(comments)) %>%
  #sample_n(10000)

#saveRDS(sample_data, "sample_data.rds")

#The above code chunk is commented out for reproducibility. 

sample_data <- readRDS("sample_data.rds")

#Removing unhelpful words from data 
custom_stopwords <- c("class", "professor", "teacher", "course", "student", "lecture", "subject", "material", "things", "really", "students", "professors")

pattern <- paste0("\\b(", paste(custom_stopwords, collapse = "|"), ")\\b")

sample_data$comments <- gsub(pattern, "", sample_data$comments, ignore.case = TRUE)

processed <- textProcessor(documents = sample_data$comments,
                           metadata = sample_data,
                           lowercase = TRUE,
                           removestopwords = TRUE,
                           removenumbers = TRUE,
                           removepunctuation = TRUE,
                           stem = FALSE,
                           wordLengths = c(3, Inf))

Building corpus... 
Converting to Lower Case... 
Removing punctuation... 
Removing stopwords... 
Removing numbers... 
Creating Output...

prep <- prepDocuments(processed$documents,
                      processed$vocab,
                      processed$meta,
                      lower.thresh = 10)

Removing 11164 of 12786 terms (22721 of 175236 tokens) due to frequency 
Removing 39 Documents with No Words 
Your corpus now has 9949 documents, 1622 terms and 152515 tokens.

Analyze

Sentiment Analysis

Using the Bing sentiment lexicon, words were categorized as positive or negative. The number of positive and negative words in each review group (High vs. Low) was compared to understand the overall emotional tone.

bing_sentiment <- comment_words %>%
  inner_join(get_sentiments("bing")) %>%
  count(rating_group, sentiment)

Joining with `by = join_by(word)`

ggplot(bing_sentiment, aes(x = rating_group, y = n, fill = sentiment)) +
  geom_col(position = "dodge") +
  labs(title = "Positive vs Negative Word Count (Bing Sentiment)",
       x = "Rating Group", y = "Word Count") +
  scale_fill_manual(values = c("positive" = "#2ca02c", "negative" = "#d62728")) +
  theme_minimal()

bing_sentiment

# A tibble: 4 × 3
  rating_group sentiment     n
  <chr>        <chr>     <int>
1 High         negative   7372
2 High         positive  20734
3 Low          negative  13479
4 Low          positive  14074

This bar chart compares the frequency of positive and negative words in RateMyProfessors.com reviews, categorized by rating group using the Bing sentiment lexicon. Reviews for high-rated professors contain significantly more positive words than negative ones, suggesting a generally favorable emotional tone. In contrast, low-rated reviews show a much smaller gap between positive and negative word usage, with negative sentiment appearing far more frequently than in high-rated reviews.

These results directly support the research question: How does sentiment differ between high- and low-rated professors? The findings suggest that students use more emotionally positive language when describing well-regarded instructors, while reviews of poorly rated professors reflect more negative sentiment and a greater balance between praise and critique.

Structural Topic Modeling (STM)

stm_model <- stm(documents = prep$documents,
                 vocab = prep$vocab,
                 K = 3,
                 data = prep$meta,
                 prevalence = ~ rating_group,
                 seed = 123,
                 verbose = FALSE)
stm_model

A topic model with 3 topics, 9949 documents and a 1622 word dictionary.

custom_labels <- c("Challenging Course Content",
                   "Instructor Demeanor",
                   "Effective Teaching")


plot(stm_model,
     type = "summary",
     custom.labels = custom_labels,
     main = "Top Topics",
     xlab = "Expected Topic Proportions")

labelTopics(stm_model, n = 10)

Topic 1 Top Words:
     Highest Prob: tests, dont, one, help, lectures, like, makes, helpful, questions, interesting 
     FREX: need, difficult, online, lab, one, interesting, isnt, dont, credit, questions 
     Lift: est, view, paced, econ, expectations, tas, difficulty, leaving, costs, practice 
     Score: dont, est, tests, one, questions, help, makes, interesting, need, lectures 
Topic 2 Top Words:
     Highest Prob: easy, hard, just, can, lot, get, nice, shes, much, always 
     FREX: reading, shes, nice, stuff, still, math, even, every, pretty, talking 
     Lift: confuses, videos, midterms, reading, technology, annoying, decent, stuff, night, handwriting 
     Score: easy, can, nice, shes, get, just, hard, confuses, make, much 
Topic 3 Top Words:
     Highest Prob: will, great, take, good, work, best, time, hes, ever, recommend 
     FREX: will, teach, awesome, learned, week, fun, far, comments, hes, teaching 
     Lift: among, proffesor, environment, style, participate, reasonable, weekly, couldnt, progress, average 
     Score: will, great, best, hes, gives, among, good, fun, teaching, know

This plot illustrates the overall frequency of each topic across all student reviews. Topic 3, which highlights effective and enjoyable instruction, is the most prevalent, suggesting that teaching quality and satisfaction dominate student feedback. Topic 1, which includes concerns about difficulty and online components, appears less frequently overall but is more prominent in low-rated reviews.

Topic Interpretation Summary

Topic	Top FREX Words	Label	Interpretation Summary
1	need, difficult, online, lab, questions	ChallengingCourse Content	Focuses on difficulty, online structure, labs, and unclear expectations. Students seem to describe challenges with material, support, and assessments.
2	reading, shes, nice, stuff, still	Instructor Demeanor	Captures more soft feedback around the professor’s personality, tone, or vibe and emotional reactions like “easy” or “hard”.
3	will, teach, awesome, learned, fun	Effective Instruction	Highlights strong teaching quality, enjoyment, and positive learning outcomes. Students use praise like “great”, “fun”, “best”, “recommend”.

effects <- estimateEffect(1:3 ~ rating_group, stm_model, meta = prep$meta, uncertainty = "Global")



plot.estimateEffect(effects, covariate = "rating_group", model = stm_model,
                    method = "difference", 
                    cov.value1 = "High", 
                    cov.value2 = "Low",
                    topics = 1:3,
                    xlab = "Topic Proportion (High - Low)",
                    main = "Topic Usage by Rating Group")

This figure compares topic usage between high- and low-rated professor reviews. Topics 1 and 2 are used more frequently in low-rated reviews, pointing to common student frustrations around course difficulty and inconsistent instructor behavior. In contrast, Topic 3, which is centered on teaching excellence and student satisfaction, appears significantly more in high-rated reviews. This pattern aligns with expressions of praise and positive experiences.

Communicate

This analysis examined RateMyProfessors.com reviews to explore how students describe their experiences with college instructors and how those descriptions differ between high- and low-rated professors. Two research questions guided the project:

How does sentiment differ between high- and low-rated professors?
What topics or themes do students emphasize when reviewing high- versus low-rated instructors?

To address these questions, the project used sentiment analysis and structural topic modeling (STM). A word cloud provided an initial view of frequently used terms, helping identify common patterns in student language. Generic terms like “class” and “professor” were removed to reveal more meaningful content.

Sentiment analysis using the Bing lexicon showed that high-rated professors were described with significantly more positive words (“great,” “helpful,” “fun,” “recommend”), while low-rated reviews included more negative language (“confusing,” “boring,” “difficult,” “unclear”). This directly answers the first research question, confirming that emotional tone clearly aligns with rating.

Structural Topic Modeling (STM) was used to address the second research question. Three topics were identified:

Topic 1: Challenging Course Content - focused on test difficulty, unclear assignments, online delivery, and expectations. This topic was more common in low-rated reviews, suggesting that structural course challenges impact student dissatisfaction.
Topic 2: Instructor Demeanor - captured interpersonal traits like being “nice,” “shes,” “still,” or “hard.” It reflected emotional impressions and was more balanced across both rating groups.
Topic 3: Effective & Enjoyable Teaching - described professors as “great,” “awesome,” “fun,” and “the best.” This topic appeared most often in high-rated reviews, indicating that clear and engaging instruction drives student satisfaction.

Two visualizations further supported these results:

A topic prevalence plot showed that Effective Teaching was the most discussed theme overall.
A topic usage difference plot revealed that Challenging Course Content was far more common in low-rated reviews, while Effective Teaching was strongly associated with high-rated reviews.

Key Insights & Implications

These findings suggest that students value clarity, support, and enthusiasm from instructors. Positive reviews focus on enjoyable and effective teaching, while negative reviews often stem from confusion, poor course structure, or disengagement. Faculty and instructional designers can use this insight to improve student experience by prioritizing communication, accessibility, and thoughtful course design.

Limitations & Ethical Considerations

This study uses publicly available, anonymous student reviews that may be biased toward extreme opinions. Because data is self-reported, it may not fully reflect classroom realities or the views of all students. No personal or private information was used, and ethical use of data was maintained throughout.