Unit 2 Independent Study: Analyzing Sentiment

Nancy

Research Context:

The data for this project comes from r/Teachers, a subreddit on Reddit. I used RedditExtractorR to collect posts containing the keywords “AI literacy,” “AI ethics,” “AI in education,” and “AI teaching.” Given that AI integration has become a highly debated topic among educators in recent years, this study aims to explore teachers’ opinions on the subject. Specifically, I seek to understand the optimism and concerns they express about implementing AI in their classrooms

Questions:

  • What are the trends in discussions and posts about AI in education among this group of teachers over the past years?

  • Which sentiment analysis method—AFINN, BING, or VADER—captures sentiment most accurately in this context?

  • What are the general sentiments expressed by teachers toward AI in education?

Wrangling data

Before tokenizing the data for sentiment analysis, I first cleaned the dataset by removing duplicates, punctuation, URLs, and numbers, eliminating unnecessary elements that could interfere with analysis. After preprocessing, I created a timeline to visualize how discussions on AI in education have evolved over the past years.

Question 1:

The bar chart illustrates the yearly trends in discussions about AI-related topics in the r/Teachers. Before 2020, posts mentioning AI in education were scarce, with only a few each year. However, the data reveals a sharp rise in discussions starting in 2022, reflecting increasing interest and engagement among teachers.

# A tibble: 6,858 × 2
   word         n
   <chr>    <int>
 1 ai         725
 2 students   565
 3 school     331
 4 im         330
 5 teacher    301
 6 teachers   285
 7 teaching   265
 8 student    241
 9 kids       231
10 class      224
# ℹ 6,848 more rows
# A tibble: 31,610 × 4
   date_utc   timestamp  comments word        
   <chr>      <chr>         <dbl> <chr>       
 1 2023-09-11 1694431464       25 current     
 2 2023-09-11 1694431464       25 ai          
 3 2023-09-11 1694431464       25 edtech      
 4 2023-09-11 1694431464       25 landscape   
 5 2023-09-11 1694431464       25 automated   
 6 2023-09-11 1694431464       25 grading     
 7 2023-09-11 1694431464       25 systems     
 8 2023-09-11 1694431464       25 personalized
 9 2023-09-11 1694431464       25 learning    
10 2023-09-11 1694431464       25 platforms   
# ℹ 31,600 more rows
# A tibble: 6,854 × 2
   word         n
   <chr>    <int>
 1 ai         725
 2 students   565
 3 school     331
 4 teacher    301
 5 teachers   285
 6 teaching   265
 7 student    241
 8 kids       231
 9 class      224
10 time       211
# ℹ 6,844 more rows
# A tibble: 2 × 5
  sentiment     n method total percent
  <chr>     <int> <chr>  <int>   <dbl>
1 positive   1694 AFINN   3229    52.5
2 negative   1535 AFINN   3229    47.5
# A tibble: 2 × 5
  sentiment     n method total percent
  <chr>     <int> <chr>  <int>   <dbl>
1 negative   1895 BING    3654    51.9
2 positive   1759 BING    3654    48.1

Sentiment analysis Using Afinn and bing

The results from AFINN and BING sentiment analysis are fairly balanced, with AFINN detecting a slightly higher proportion of positive sentiment compared to BING. This aligns with my initial impressions from reviewing the posts—while there are concerns about plagiarism, bias, and the potential threat AI poses to teaching jobs, many educators also express optimism about AI’s ability to support and enhance their work, particularly in reducing workload and improving efficiency.

Comparison of Afinn and Bing

Top_negative_words

The chart presents the most frequently mentioned negative words identified using the AFINN lexicon in the sentiment analysis of teacher discussions on AI in education. Words such as “shit,” “hell,” and “fuck” have been grouped, counted, and ranked based on their sentiment values (negative scores) and frequency of occurrence.

Top positive words

The bar chart shows the top 10 most frequent positive words from AFINN, with “fun” and “amazing” appearing more often than negative words. This suggests positive sentiment may dominate. Next, we analyze context using VADER.

Accuracy Analysis-Afinn

After summarizing the total sentiment value of each word within the same post, I extracted the highest-scoring comments for both positive and negative. The most positive comment is: “ChatGPT is literally my favourite thing ever. I cannot express how much it has helped me every day for the past few months. The more I use it, the more I realize it can do, the more confidence I have to apply my ideas, because I feel like I can accomplish more… It’s like having a personal research assistant … more efficient, and ….I am in a new era of my career.” This statement clearly expresses a highly positive sentiment and strong enthusiasm for AI.

The most negative comment is:

“I got this sub recommended to me on Reddit a little while ago and then I read through this sub’s stories and well&where the fuck do I even start?
Horror story .., abusive work environments, shitty admin that fails to a toothpick, horrible parents and students alike …. The recent trends with AI and technology causing students to not give two fucks about the world around them is befuddling to me. I’m a “Gen Z” student….. I had my own screw ups but I was interested in learning shit about the world around me. To see that curiosity gone from students pisses me off.”
This comment demonstrates extreme frustration and a strongly negative view on the impact of AI and technology on students.

# A tibble: 6 × 2
  timestamp  id_value
  <chr>         <dbl>
1 1684554438       69
2 1618542557       41
3 1702723897       36
4 1691710330       35
5 1707946323       32
6 1718184186       28
# A tibble: 1 × 6
  text                               date_utc timestamp title subreddit comments
  <chr>                              <chr>    <chr>     <chr> <chr>        <dbl>
1 "ChatGPT is literally my favourit… 2023-05… 16845544… Chat… Teachers        25
# A tibble: 1 × 6
  text                               date_utc timestamp title subreddit comments
  <chr>                              <chr>    <chr>     <chr> <chr>        <dbl>
1 "I got this sub recommended to me… 2024-06… 17183585… Gen … Teachers       570

Vader sentiment analysis

The VADER sentiment analysis results show a significantly higher number of positive sentiments (261) compared to negative (62) and neutral (2). This presents a notable disparity compared to the AFINN and BING lexicons, which previously suggested a more balanced or slightly negative sentiment toward AI in education.

However, the most positive and most negative posts identified by VADER are the same as those calculated using AFINN, suggesting some level of consistency between methods.


library(vader) vader_texts <- vader_df(tidy_texts$text)

vader_texts
mean(vader_texts$compound)
vader_texts_summary <- vader_texts |> mutate(sentiment = ifelse(compound >= 0.05, "positive", ifelse(compound <= -0.05, "negative", "neutral"))) |> count(sentiment, sort = TRUE) |> spread(sentiment, n) |> relocate(positive) |> mutate(ratio = negative/positive)

vader_texts_summary


```{r} eval: false}
discussion <- vader_texts |> select (text, compound) 

write.csv(discussion, "discussion.csv", row.names = FALSE)

Analysis

To evaluate the accuracy of VADER, which is supposed to be more context-sensitive,I manually reviewed the top-ranked posts to determine their sentiment. The results were quite interesting:

  • Negative posts were generally accurate, as they contained clear negative language and strong critical tones.

  • Some positive posts, however, were misclassified—I would actually consider them concerned or even negative.

Discussion 1:

One possible reason for this misclassification is that teachers often share long, narrative-style posts that contain both positive and negative. Therefore, many posts describe challenges as well as resolutions, making it difficult for sentiment analysis to capture the overall tone accurately.

For example, one post detailed a teacher catching a student using ChatGPT for plagiarism and later discussing the issue with parents. While the post contained words related to communication and resolution, which VADER classified as positive, the core issue—AI plagiarism—should be categorized as a negative sentiment toward AI use in education.

Discussion 2:

VADER effectively identifies strong negative sentiment but tends to overestimate positivity in nuanced or context-heavy discussions, particularly when posts contain a mix of emotions.

Sentiment analysis may not be the most appropriate method for analyzing narrative long-form discussion posts. It may be better suited for shorter posts, such as tweets, like the analysis I conducted last year. Alternatively, additional manual data processing (content analysis coding) or filtering may be necessary before applying sentiment analysis to ensure more accurate results.