James Speckart ECI 588 Spring 2023
This analysis looks at teacher reactions to the release of generative AI tools such as ChatGPT in education. By looking at general reactions to the use of generative AI, from building lessons to worries about the future of learning. Using comments from the popular free discussion forum Reddit.com, this analyis includes text mining, sentiment analysis and topic modeling to discover if teacher discussions tend to revolve around particular hopes, worries, and applications.
Reddit is a free-to-use link sharing and discussion platform that is one of the more popular sites on the internet. As of 2022, it is the fifth most visited website in the United States, and has 430 million active users each month, and over 2.8 million separate discussion forums called “subreddits”, where users can post and discuss news stories, images, videos, or personal posts. Each subreddit is moderated by selected users in each subreddit, and each subreddit has different rules about what can be posted or discussed, although the Reddit administration has occasionally stepped into moderate subreddits when they break Reddit rules about content or behavior. Roughly 41% of the Reddit user base is in the United States, making it disproportionately American in demographic makeup.
The Teachers subreddit (at https://www.reddit.com/r/teachers) was created in December 2008, and currently has 426,000 “members”, who are Reddit users who have subscribed to the subreddit to receive announcements or news updates. While Reddit is free to use and is free to join the Teachers subreddit, this analysis assumes that most users in the Teachers subreddit have chosen to join the discussions because they are teachers themselves who wish to communicate with fellow teachers. Most Reddit users prefer the anonymity of screen names and share little to no identifiable data, so the validity of this assumption is very difficult to confirm, but the author believes it is reasonable that anyone who seeks out the Teachers subreddit and posts there has a high likelihood of being a current or former teacher.
Data was collected through the website http://camas.unddit.com/, which is a web scraper
specifically designed to gather Reddit posts using the Pushift.io API. A
search for the term “ChatGPT” in the Teachers subreddit
(www.reddit.com/r/teachers) returned 751 matches; these are individual
comments that include the term “ChatGPT” somewhere in the comment text.
This was downloaded as a JSON file, and imported into R using the
The text of each comment was tokenized using the
We can start our analysis with a wordcloud of the word tokens to get an impression of what the highest frequency words are. Unsurprisingly, it is dominated by the term “ChatGPT”:
wordcloud2(allposts_counts)
Digging deeper, an initial analysis of the cleaned word tokens gives a raw count of the top words used in the dataset. Not surprisingly, “ChatGPT” is the most used word and “AI” is the third most used word. However, with “student”, “essay”, “writing”, and “lesson” all coming before the word “teacher” in the top ten words, it appears that instructors may be worried about the impact of ChatGPT on student essays before its impact on themselves or their profession.
If we remove the term “ChatGPT”, the wordcloud looks like the visualization below. The dominant word becomes “students”, with “writing/write”, “AI”, “lesson”, “teacher”, “essay”, and “time” also being prominent. Interestingly, one term that does not figure prominently is “cheat”, nor do its variants. While some concerns about cheating can be found in deeper analysis, it does not appear to be a major concern when looking at these raw word counts.
wordcloud2(allposts_counts_no_chatgpt)
And once we exclude the term “ChatGPT”, a number of words show up at least 10 times across all posts. Counts of these words are shown in the bar chart below.
# Exclude ChatGPT
top_tf_idf_no_chatgpt %>%
filter(n>10) %>%
mutate(word = reorder(word, n)) %>% #reorder the word variable by n and replace with new variable called word
ggplot(aes(n, word)) + # create a plot with n on x axis and word on y axis
geom_col() # make it a bar plot
The Term Frequency-Inverse Document Frequency (TF-IDF) is not particularly illuminating if each post is viewed as a separate document in the corpus. It is not surprising that the term with the lowest TF-IDF value is “ChatGPT”, which has a TF-IDF value as low as 0.00003. If we remove “chatgpt” from the results, the most common and least informative words can be seen in the table below. There are some differences from the initial raw counts of most common words. In particular, once “ChatGPT” has been removed, the most common words (defined as having the lowest TF-IDF score) include “teaching” and “kids” in the bottom 3, indicating the poster’s focus of the impact of ChatGPT on students, and “prompt”, “plans”, and “lesson” are also very common terms, indicating discussion on the positive or negative impact of ChatGPT on the instructional design process.
tail(top_tf_idf_no_chatgpt, 10)
## docID word n total tf idf tf_idf
## 3400 105qnvn teaching 1 978 0.0010224949 2.132686 0.002180661
## 6967 11dnfzm kids 1 864 0.0011574074 1.837222 0.002126414
## 11160 zgxkpj prompt 1 1012 0.0009881423 2.132686 0.002107397
## 11124 zgxkpj paper 1 1012 0.0009881423 2.101914 0.002076991
## 11175 zgxkpj read 1 1012 0.0009881423 1.987504 0.001963937
## 3357 105qnvn school 1 978 0.0010224949 1.769781 0.001809592
## 3328 105qnvn plans 1 978 0.0010224949 1.727221 0.001766075
## 3399 105qnvn teacher 1 978 0.0010224949 1.686399 0.001724334
## 3285 105qnvn lesson 1 978 0.0010224949 1.537979 0.001572576
## 3388 105qnvn student 1 978 0.0010224949 1.520885 0.001555097
At the opposite end of TF-IDF scores, the words with the highest TF-IDF are not informative and are idiosyncratic to each post, as seen below, which is a by-product of treating each comment as its own document:
head(top_tf_idf_no_chatgpt, 10)
## docID word n total tf idf tf_idf
## 3487 109lteb governor 1 2 0.5000000 5.598422 2.799211
## 3744 10eup5y shout 1 2 0.5000000 5.598422 2.799211
## 5613 113rjqj mfer 1 2 0.5000000 5.598422 2.799211
## 6004 115qp64 rescue 1 2 0.5000000 5.598422 2.799211
## 10354 125wj28 worded 1 2 0.5000000 4.905275 2.452637
## 7318 11e46vl heard 1 2 0.5000000 3.806662 1.903331
## 2964 103fi63 gods 1 3 0.3333333 5.598422 1.866141
## 3054 103ocwi wow 1 3 0.3333333 5.598422 1.866141
## 12960 zpclj9 clutch 1 3 0.3333333 5.598422 1.866141
## 2963 103fi63 gift 1 3 0.3333333 4.905275 1.635092
Nonetheless, the basic metrics of our text analysis indicates that commenters are discussing ChatGPT in a rather general way in the data. They are discussing the technology largely in terms of students, essays, and lesson plans, without the more emotional or political words that we might expect to see given the implications of the technology in learning. In order to get a better sense of the content of these discussions, we should turn to sentiment analysis and topic modeling.
A basic sentiment analysis was run on the data, utilizing several sentiment lexicons included in the tidytext package for R. Each lexicon has a list of words with sentiment values for each, and each lexicon measures sentiment using different linguistic theories and for different but overlapping sets of words. Comparing the results of sentiment analysis using several lexicons provides insights into the psychological affect of the comments in the dataset.
This analysis uses the bing, afinn, NRC, and loughran lexicons primarily to identify the ratio of positive to negative sentiments in the data. Because each lexicon is based on different theoretical underpinnings about how language expressing sentiment, each lexicon found different raw counts of positive and negative words. For example, the NRC lexicon found 1927 words expressing positivity in the data, but the afinn lexicon found only 97, as seen in the table below.
kable(summary_totals)
| lexicon | negative | positive | sentiment | ratio |
|---|---|---|---|---|
| bing | 969 | 889 | -80 | 0.9174407 |
| afinn | 90 | 97 | 7 | 1.0777778 |
| NRC | 823 | 1957 | 1134 | 2.3778858 |
| loughran | 565 | 287 | -278 | 0.5079646 |
Interestingly, the ratios of positive to negative words varied greatly between the lexicons, and was evenly split between positive and negative ratios, with the afinn and NRC lexicons showing positive (but different) ratios in sentiment and the bing and loughran lexicons showing negative ratios. The chart below shows the differences in ratios, with a horizontal line indicating a perfectly level ratio of 1 positive word per 1 negative word.
summary_totals %>%
ggplot(aes(x = lexicon, y = as.numeric(ratio), fill=lexicon)) +
geom_bar(width = .8, stat = "identity") +
geom_hline(yintercept = 1) +
labs(title = "Sentiments of ChatGPT on r/teachers",
subtitle = "November 2022 to April 2023",
x = "Lexicon",
y = "Ratio of Positive Words / Negative Words")
.The NRC lexicon had a very large positive to negative ratio, with nearly 3 times as many positive sentiment words than negative sentiment words. The afinn lexicon results had slighlty more positive than negative sentiment, while the bing lexicon results were slightly more negative than positive. The loughran lexicon results had roughly 50% more negative sentiment words than positive sentiment words. Since the NRC totals seem so different from the others, let’s look at a treemap of the NRC lexicon sentiment analysis to see the proportions of the different emotions in the threads. The NRC lexicon is unique among these lexicons in that it has a larger range of emotional states that has assigned to particular words, so we can see a richer depiction of sentiment in its results below:
ggplot(nrc_totals, aes(area=sumtotal, fill = sentiment, label = sentiment)) +
geom_treemap() +
scale_fill_hue(c=80, l=40) +
geom_treemap_text(colour="white", place = "centre", size = 15)
As we can see in the treemap, there is a complex mix of emotions in the comments. Consistent with the bar chart above, the sentiment “positive” is significantly larger than the sentiment “negative”, but we can also see other emotions are all in notable proportions, including such powerful emotions as joy, fear, trust, and anticipation.
Topic modeling is a method to measure the frequency of individual words to appear near each other in documents. When a set of words tend to appear in proximity to each other across a number of documents, a topic modeling algorithm can can identify that set of words as a “topic” that may be underlying the comments posted by many different posters. Two of the major ways of topic modeling are using Latent Dirichlet Allocation and Structural Topic Modeling, both of which treat documents as a random “bag of words.” we will use these two methods to examine our data for hidden “topics” of concern to authors in our data.
First we will look at Latent Dirichlet Allocation (LDA), first by picking a fixed number of topics to look for, then running the LDA functions of R’s tidytext package to see which words can be fit into that number of topics.
Picking the right number of topics to start with is a tricky proposition beset with tradeoffs between simplicity and completeness. Fortunately, we can compare multiple statistical measures to see where there are crossover points of common fit. The chart below shows a graph of four measures of K, or level of statistical fit, for different numbers of topics in our data. There are crossover points in the line graphs at 5 and 6 in the two graphs, indicating that selecting 5 or 6 topics is likely to provide a good balance between simplicity and completeness in our LDA model.
FindTopicsNumber_plot(k_metrics_no_chatgpt)
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the ldatuning package.
## Please report the issue at <]8;;https://github.com/nikita-moor/ldatuning/issueshttps://github.com/nikita-moor/ldatuning/issues]8;;>.
An LDA model was run using the tidytext package in R using 6 as our topic count to identify a slightly larger variety of potential topics in the data than the alternative topic count of 5. The data was stripped of the term “chatgpt” as its prevalence was large enough to overwhelm other words in the model, and yet was not quite common enough to be automatically dismissed as a common “stop” word.
The 6 topics of the LDA model show little no differentiation in their top 5 most common words:
top_terms_no_chatgpt %>%
mutate(term = reorder_within(term, beta, topic)) %>%
group_by(topic, term) %>%
arrange(desc(beta)) %>%
ungroup() %>%
ggplot(aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top 5 terms in each LDA topic",
x = expression(beta), y = NULL) +
facet_wrap(~ topic, ncol = 3, scales = "free")
There is almost complete overlap in the top 5 terms of each topic identified by the LDA 6-topic model, all revolving around student writing for a lesson with AI, making it impossible to distinguish them. Let’s look at longer lists of the terms in each topic to see more differentiation:
terms(text_lda_6_no_chatgpt, 15)
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6
## [1,] "write" "students" "lesson" "students" "students" "students"
## [2,] "ai" "write" "students" "writing" "ai" "student"
## [3,] "students" "lesson" "ai" "paper" "write" "write"
## [4,] "student" "plans" "write" "written" "writing" "time"
## [5,] "writing" "teachers" "time" "essays" "time" "ai"
## [6,] "essay" "ai" "plans" "write" "essay" "lesson"
## [7,] "time" "teacher" "teachers" "grade" "learning" "essay"
## [8,] "wrote" "time" "lot" "kids" "lot" "plans"
## [9,] "assignments" "class" "read" "essay" "people" "generate"
## [10,] "grade" "student" "writing" "student" "school" "cheating"
## [11,] "teacher" "school" "plan" "ai" "answer" "essays"
## [12,] "day" "create" "teacher" "tool" "tool" "class"
## [13,] "kid" "people" "job" "level" "grade" "word"
## [14,] "yeah" "written" "honestly" "people" "student" "learned"
## [15,] "essays" "content" "kids" "understand" "generate" "kids"
An interpretation of differences in the words attached to each topic can result in the following labels.
Topic 1: Essay writing Topic 2: Creating lesson plans Topic 3: The role of reading and writing in education Topic 4: How ChatGPT may affect cognition Topic 5: Testing and Grading Topic 6: Cheating and its effect on learning
We can get a sense of the relative popularity of each topic by looking at the expected proportion of each topic in each Reddit post, as measured by the statistic gamma. The following chart shows that the topics all appear in roughly similar porportions, varying between 11% and 27% of the text each, but that we should expect Topic 2 appear about 27% of the time , with Topics 1, 3, and 4 each appearing about 16% of the time, and topics 5 and 6 each appearing about 12% of the time.
gamma_terms_no_chatgpt %>%
select(topic, gamma, terms) %>%
kable(digits = 3,
col.names = c("Topic", "Expected topic proportion", "Top 10 terms"))
| Topic | Expected topic proportion | Top 10 terms |
|---|---|---|
| Topic 2 | 0.268 | chatgpt, students, ai, write, writing, student, grade, written, time, text, prompt, essays, class, assignments, paper, kids, assignment, explain, essay, tool |
| Topic 1 | 0.171 | chatgpt, ai, people, writing, write, time, kids, teacher, job, essays, student, model, answer, students, wrote, pretty, essay, lot, written, education, skills, teachers, tool, understand, gpt, teaching, post, computer, research |
| Topic 4 | 0.167 | chatgpt, write, students, time, kids, admin, generate, ai, writing, ideas, student, lesson, lot, essays, teaching, reading, plans, essay, letters, classroom |
| Topic 3 | 0.160 | chatgpt, lesson, plans, write, ai, teachers, students, lot, school, time, teacher, job, student, people, read, grade, teach, person, start, plan |
| Topic 6 | 0.120 | chatgpt, students, grade, essay, write, time, teacher, feel, writing, people, teachers, class, lesson, ai, kids, school, teach, classroom, generate, paper |
| Topic 5 | 0.113 | chatgpt, students, questions, ai, teaching, answers, student, learning, essay, specific, generate, information, research, topic, level, potential, 5, tool, follow, education |
Converting this back to our topic labels, the LDA analysis indicates that the most common topic for teachers is to discuss the impact of ChatGPT in their lesson plans, followed by discussing essay writing and its role in education in light of ChatGPT, with grading and cheating being the least common of the 6 topics.
A Structural Topic Modeling (STM) method provides different results than our LDA model. Like with LDA, we should first see how many topics will be in our model before we begin describing the character of those topics. For our STM model, we “stemmed” the text in R to avoid the issue of duplication of similar words. After stemming, many words are reduced to a common base; for example, “like”, “liked”, and “likeable” all share a common stem of “lik.” Using stems should avoid having many copies of duplicate words such as “student/students” that can obscure more unique terms in our topics.
We will first look at some charts of K, our ideal topic number for our STM model, using a slightly different method than with our LDA model but with the same goal.
First, find K with “chatgpt” included as a term. The results are rather unclear, as seen below:
plot(findingk)
When we exclude the term “chatgpt”, the charts show a clear peak at 6 topics in the Held-Out Likelihood graph, with relatively few Residuals at 6 topics:
plot(findingk_no_chatgpt)
Continuing with 6 topics in our STM model results in the following chart that includes both expected topic proportions and top words in each topic:
par(cex=0.7)
plot(text_stm_no_chatgpt, n = 10)
We will provide a label for each topic in order of their expected proportion in the data.
~35% of comments are Topic 2: Essays when using ChatGPT in lesson plans ~25% of comments are Topic 6: Students using ChatGPT as self-help tool ~15% of comments are Topic 3: Using ChatGPT to judge past lesson plans ~12% of comments are Topic 5: Using ChatGPT to brainstorm lesosn ideas ~12% of comments are Topic 4: Trying ChatGPT for the first time <5% of comments are Topic 1: ChatGPT in political context
These topics are different from the ones identified by the LDA model, and are dominated by variations of using ChatGPT in lesson planning, particularly whether it will change the role of essays in lessons (Topic 2), whether students can use it to help themselves to write better (Topic 6), or whether to use it to judge past lesson or create new ones (topics 3 and 5).
We can see examples of posts that are particularly representative of each topic below, excluding Topic 1 due to its rarity:
findThoughts(text_stm_no_chatgpt,
texts = teachers_data_clean$body,
topics = 2:6,
n = 1,
thresh = 0.4)
##
## Topic 2:
## I had two students this past week turn in a DBQ paper on revolutions. They both happened to have done the Haitian Revolution (there was 4 different revolutions on the list, so many students did the same ones). As I'm grading the first students essay, I notice a specific paragraph in the essay talking a lot about how the USA benefitted from abolishing slavery. I start thinking I'm like wouldn't you be writing about how Haiti benefited from abolishing slavery? Whatever, grade that paper and I move on to the next one. I ended up seeing the same exact paragraph written word for word about how the USA benefited from abolishing slavery in the next essay…
##
## Luckily, the very specific prompt and documents used for the DBQ was able to flush these ChatGPT out papers right away. But I fear this will be a lot more common as the years go on. Student teacher btw, I'm gonna be dealing with this stuff my whole career it looks like.
## Topic 3:
## I just started playing around with this. I have chatGPT generate a slide deck with speaker's notes and image suggestions. Next, I went to Canva and selected a YouTube video template. I copy pasted the slide deck info from chatGPT to Canva and used Canva's text to image tool to generate an AI image. Just formatted the text and transitions and downloaded the finished movie file.
##
## I tried this today with 30 seconds to one minute intervals between transitions. That way, I could teach without having to manually move the slide. I can do that in PowerPoint but Canva was faster.
## Topic 4:
## 9th grade English here. We have a 1:1 district with Chromebooks, and I've headed my school's ipad pilot program, as we were investigating whether adoption of the newer tablets would be better for students by “forcing analog” encoding. Long story short, it was a steamy hot fuck of an expensive trainwreck.
##
## Progress was erratic, achievement was consistent or worse than students learning on Chromebooks prior to the pandemic, and there was a distressing level of apathy about maintaining the devices.
##
## Students who owned and used their own devices, though, showed promise. This is almost exclusively attributable to the retention of the devices for outside of school. In other words, they performed better because they used their devices for study outside of school. Even compared to students who signed up for the loaner program, student owned devices correlated with higher performance. Bottom line I've taken from this experience: students who do more of the work do better. Almost always. For what it's worth, all students had access to a PDF folder via Google Drive containing all their notes, assignments, and work while outside the classroom - they could study and review it, but they couldn't edit it.
##
## Interestingly, some anecdotes that warrant further investigation might be these:
##
## - negative impact on reading comprehension and stamina from scrolling versus flipping through pages; scrolling appears to diminish span of recognition and retard capture and retention in working memory
##
## - tech/app-rich environments are in almost all cases a magnet for distraction; I set the restrictions on these devices as tight as I know how, and students are STILL mispurposing the tool. Example: I disabled the camera app through screen time restriction; but almost all the education apps I use (Canvas, GoodNotes, etc) have some sort of document scan tool embedded. The kids just use that for a camera and snap photos - in violation of policy - doodle on them, and use this distraction as an off ramp for learning.
##
##
## I suppose its not a total failure, as I now have a robust body of evidence to advocate - as I currently do - for strict paper/pencil environments in my classroom. It's not as easy, but it's CERTAINLY better for the students. I've even had some interesting results BANNING work outside the classroom. These kids are so immersed in online environments, that when ChatGPT emerged, it was, like, a day before I started getting plagiarized assignments. Luckily our diagnostic had been done in class and on paper, and that's been a lifesaver in terms of comparing student work to determine cheating. I've seen a student ask for an extension after writing four sentences, come back the next day with three pages of high-quality (mechanically - content was nonsense and didn't address the topic) writing. Asked to explain what “deploys an artful web of rhetorical devices” meant, the kid sort of stammered out “uh…I used a dictionary to find that inspiration.”
##
## Fuck you…you cheated. Mom and dad emailed (did not swear, btw - but also did not apologize or offer explanation: your student cheated; it's a zero; paper and pencil from now on), and since then I've not had to worry because now ALL the work has to be done in a composition book which stays in the classroom.
##
## It's a stupid and frivolous way to have discovered what many teachers appear to have done with their students many years gone, and it definitely sucks having to sort through and scan all the student's individual work - I still digitize all of it, mark it up, and redistribute it with feedback. But that's mostly CYA and for record keeping and data analysis.
##
## Paper and Pencil. ALL. THE. TIME.
## Topic 5:
## It's okay. I asked the AI if the development of AI would lead to the future for humanity as described by H.G. Wells' novel The Time Machine.
##
## The AI said "Not necessarily."
##
## >**Me**: Is the development of A.I. leading us to the future of society as described in H.G. Wells' The Time Machine?
##
## >**ChatGPT:** The development of artificial intelligence (AI) is not necessarily leading us to the future of society as described in H.G. Wells' The Time Machine. In the novel, Wells portrays a future society that is divided into two distinct classes: the Eloi, who are a race of beautiful, elegant people who live a life of leisure, and the Morlocks, who are a race of grotesque, subterranean creatures who labor to support the Eloi. This future society is the result of centuries of social and technological development, and is not necessarily indicative of where the development of AI might lead us.
##
## >While it is possible that the development of AI could lead to significant changes in society, it is not inevitable that these changes will resemble the future society described in The Time Machine. In fact, the future of society is likely to be shaped by a wide range of factors, including technological developments, economic forces, and political and social trends. As such, it is impossible to predict with certainty what the future of society will look like, and it is not necessarily the case that the development of AI will lead us to a future resembling that described in The Time Machine.
## Topic 6:
## I think you need to separate your argument for ChatGPT from your argument against off-site academic study.
##
## With regards to the former, likening Wolfram Alpha's influence on calculation to ChatGPT's influence on reading and writing is tenuous. Sure, both are computer programs that take an input, produce an output, and take a large cognitive load off of students as a byproduct....
##
## ...but compare the actual impact, on a societal level, of a generation of citizens who have had a computer help them skirt around trig problems for a year to a generation of citizens who have had a computer help them skirt around reading informational texts or writing... well, anything... from grade six up.
##
## One of the easiest metrics for a tool is how many ways it can be used to the detriment of the user. A hammer, applied to a nail, is fine. Applied to a thumb, no.
##
## With ChatGPT, we are going to caution students not to have it simplify content. Not to have it make inferences on their behalf. Not to have it generate outlines for them whole cloth. Not to have it tell them the important meanings of things. Not to have it write the fundamentals of paragraphs. Not to have it take their writing and make it more impressive or formal stylistically.
##
## And note here, I didn't call these "misuses" because they aren't. A hammer hitting a thumb is an attempt at proper usage. None of the things I listed about ChatGPT are improper usages by default: they simply have long term problematic consequences given enough use over enough time, like sucking on cigarettes.
##
## Which brings me to my biggest issue:
##
## As an English teacher, isn't rooted in a fear that students will cheat. Rather, my problem is that I look at the **atrophy** already evident in the expressive, intellectual capacity of the students I have who are simply alliterate, and can pretty clearly see down the road at what will happen people embrace these tools as corner-cutting tools as they communicate to our students that writing either a banal step in other more engaging and authentic exercises (the dull report after exciting chemical labs! The soupy review of the killer live debate!) or that writing is purely a hobbyist's activity.
##
## Someone will invariably trot out a reference to a slippery slope, even as they swing from the bandwagon passing me by. That's unfortunate. Because the truth of the matter is that there's no research out there that suggests that students get better at reading or writing by not reading and not writing. And arguments that this is hyperbolic seem weak in the face of what ChatGPT will allow students to do quite freely through correct, unfortunate, usage: trim out a lot of thinking.
##
## As for me... I began doing most of my writing in class about eleven years ago, and the only time students write outside of class is if they don't hit interim deadlines through days-long workshops. I fundamentally agree with you that there are problems with at-home work, but I do not believe ChatGPT is a good vehicle for the promotion of moving academic thinking into the classroom.
This analysis of discussions about ChatGPT in the reddit.com/r/teachers subreddit provides nuanced insights into the reactions by the subreddit’s contributors in the first 5 months after the release of ChatGPT (November 2022 - March 2022). This user base is likely to be populated largely by current K-12 teachers in the U.S., and their reactions to the potential impact of ChatGPT in education is complex. This analysis used text mining techniques, including sentiment analysis and topic modeling, to gain insights into the themes of these written discussions. While some users posted concerns about students cheating on essay assignments, that was not an overwhelming fear by teachers in this data. Instead, the teachers saw multiple roles for ChatGPT in their instructional future, from grading to lesson brainstorming, and their emotional reactions were split between positive and negative.
In addition, two separate algorithmic topic modeling methods provided distinct insights into the content of the discussions in the data. Our LDA and STM models identified 6 topics each, with some duplication between them. When we try to combine similar topics from both models to create a master list of identified topics in the discussions, we get the following 9 topics (topics identified in both models are separated by a /):
This analysis shows that discussions by these Reddit users about ChatGPT’s use in education are as varied as the capabilities of ChatGPT itself, both for positive uses and negative ones. These topics are likely to expand as the instructional community begins to form consensus on the key functions and possibilites of generative AI such as ChatGPT, and these data are an insight into the rudimentary first few months of ChatGPT’s existence before clear theories and empirical evidence are created. Future research will likely show shifts over time as generative AI is implemented by different actors in different contexts around the world, and this analysis can remains as a reference point to the earliest days of teacher discussion after the release of ChatGPT.