School Psychology Sentiments

Author

Gagan Shergill

Published

December 9, 2024

Exploring the Sentiments of r/schoolpsychology

School psychologists serve a pivotal role within U.S. public schools providing students, staff, and school systems with social-emotional, behavioral, and academic support. A critical function of school psychologists is to conduct individualized psychoeducational evaluations for students suspected of having a disability that may impact their education. Despite their essential role in schools, there is currently an acute national shortage of practicing school psychologists within schools. The National Association of School Psychologists (NASP; (NASP, n.d.)) recommends that schools have a ratio of 1 school psychologist for every 500 students; however, the current ratio is 1:1119, with large variations across states. Shortages within the field are due to multiple factors, including a lack of qualified faculty to teach in graduate training programs, burnout leading to practitioners leaving the field, as well as a lack of awareness among high school and college-age students about the field (APA, n.d.; Schilling and Randolph 2021). There have been several initiatives to boost the number of school psychologists, including the NASP Exposure Project, which seeks to use local school psychologists to conduct presentations for high school and college-age students about the field and present school psychology as a viable career pathway for them (“NASP Exposure Project (NASP-EP),” n.d.). While underexplored in the school psychology literature, online communities designed for school psychologists and those interested in the field may also represent an important avenue for providing accurate information and recruiting students interested in the field.

Online communities for professionals on Facebook, X, Reddit, and other social media websites, serve as hubs for knowledge, promote belonging, and can increase satisfaction with work (Oksanen et al. 2024). However, these communities can also serve as spaces to express negative sentiments and seek support from peers. While potentially beneficial at the individual level, research indicates that these negative sentiments may get more engagement than positive perspectives (Davis and Graham 2021). This by extension may serve to paint a negative overall perspective of the field and dissuade prospective applicants from joining. To date, however, no research has directly examined the sentiments expressed in online school psychology spaces. In this report, I examine the sentiments expressed on r/schoolpsychology which is a subreddit on Reddit. Subreddits are online communities that are dedicated to a specific topic or interest. Users primarily engage with one another through posts made by individual users, comments by other users on those posts, and points assigned to those posts through upvotes (adding points) and downvotes (removing points). r/schoolpsychology is a subreddit with over 13,000 members who are interested in school psychology, graduate students, practitioners, and/or trainers of school psychologists (“Reddit for School Psychologists!” n.d.). Using data sourced from Reddit, I answer the following questions:

What are the sentiments expressed on r/schoolpsychology?
Is there a difference between sentiments expressed on posts compared to comments?
How have sentiments changed over time?
Do negative sentiments get more engagement than positive sentiments (comments, upvotes, downvotes)?

Before we go further, let’s install some necessary packages first.

tidytext - provides functions for handling and analyzing text data using tidy data principles [ Silge et al. (2024) ].
VADER (Valence Aware Dictionary and sEntiment Reasoner) - a rule-based sentiment analysis tool trained on social media text. Vader generates a positive, neutral, or negative sentiment score using a -1 to +1 scale [ Hutto and Gilbert (2014) ].
tidyverse - a series of R packages used for data manipulation that share an underlying design and language [ Wickham (2007) ].
wordcloud2 - creates interactive word clouds [ Lang and Chien (2018) ].
wordcloud - creates static word clouds [ Fellows (2018)].
RedditExtractoR - uses Reddit’s API to scrape and extract data from Reddit, including posts, comments, and metadata [ Rivera (2023) ].
modeest - provides tools for estimating the mode of a dataset [ Paul and Clarke (2016) ].
reshape2 - transforms data between wide and long formats [ Wickham (2007) ].
multcomp - this package provides several post hoc comparison measures useful for ANOVA [ Hothorn et al. (2024) ].

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("tidytext")
install.packages("vader")
install.packages("tidyverse")
install.packages("wordcloud2")
install.packages("RedditExtractoR")
install.packages("modeest")
install.packages("reshape2")
install.packages("wordcloud")
install.packages("multcomp")

To begin, I will first need to call these packages into our current session.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'reshape2'


The following object is masked from 'package:tidyr':

    smiths


Loading required package: RColorBrewer

Loading required package: stats4

Loading required package: splines

Step 1: Wrangle

The first step is the most exciting one, gathering Reddit data! The RedditExtractoR() package allows us to scrape any subreddit for post data, up to 999 posts per call. Since r/schoolpsychology does not get posts very often, that should give us a few years worth of posts.

To begin scraping data, I first need to use the find_thread_urls() function. This function requires that you specify the subreddit(s) you want to scrape data from, and how you want to sort data. In this case, I am sorting by top posts to get a sample of the most popular topics/posts, and by date range. Since I want the largest sample possible, the date range is specified as “all” to get 999 posts.

reddschoolpsych <- find_thread_urls(subreddit = "schoolpsychology",
                                    sort_by = "top", period = "all")
commschoolpsych <- get_thread_content(reddschoolpsych$url)

Let’s see what we’ve got!

glimpse(reddschoolpsych)

Rows: 999
Columns: 7
$ date_utc  <chr> "2022-08-16", "2022-07-28", "2022-06-21", "2022-06-16", "202…
$ timestamp <dbl> 1660675004, 1658977235, 1655845096, 1655417025, 1648994396, …
$ title     <chr> "Calming/wellness room for high school", "report writing in …
$ text      <chr> "Hi,\nHas anyone successfully established a calming or welln…
$ subreddit <chr> "schoolpsychology", "schoolpsychology", "schoolpsychology", …
$ comments  <dbl> 4, 13, 28, 12, 73, 80, 27, 4, 22, 20, 6, 4, 8, 4, 6, 7, 7, 9…
$ url       <chr> "https://www.reddit.com/r/schoolpsychology/comments/wq1xgx/c…

glimpse(commschoolpsych)

List of 2
 $ threads :'data.frame':   260 obs. of  15 variables:
  ..$ url                  : chr [1:260] "https://www.reddit.com/r/schoolpsychology/comments/wq1xgx/calmingwellness_room_for_high_school/" "https://www.reddit.com/r/schoolpsychology/comments/w9y61b/report_writing_in_different_states/" "https://www.reddit.com/r/schoolpsychology/comments/vhnez1/masters_level_school_psychs_and_iq_tests/" "https://www.reddit.com/r/schoolpsychology/comments/vdx8q1/looking_for_good_sel_curriculum/" ...
  ..$ author               : chr [1:260] "ana_banana_obp" "bageltechnician" "msolorio79" "feedthebite" ...
  ..$ date                 : chr [1:260] "2022-08-16" "2022-07-28" "2022-06-21" "2022-06-16" ...
  ..$ timestamp            : num [1:260] 1.66e+09 1.66e+09 1.66e+09 1.66e+09 1.65e+09 ...
  ..$ title                : chr [1:260] "Calming/wellness room for high school" "report writing in different states" "Masters level School Psychs and IQ tests" "Looking for good SEL curriculum" ...
  ..$ text                 : chr [1:260] "Hi,\nHas anyone successfully established a calming or wellness room that was manageable to monitor along with a"| __truncated__ "Is report writing required in your state? I\031m in Georgia and it\031s required here.\n\nI heard Washington an"| __truncated__ "I sat through a presentation today by a BCBA and one of the things he said caught my attention.  He said that s"| __truncated__ "Hi everyone, my district is getting a grant for SEL MTSS. Do you know of good researched based SEL that could b"| __truncated__ ...
  ..$ subreddit            : chr [1:260] "schoolpsychology" "schoolpsychology" "schoolpsychology" "schoolpsychology" ...
  ..$ score                : num [1:260] 10 10 9 11 10 10 10 11 10 10 ...
  ..$ upvotes              : num [1:260] 10 10 9 11 10 10 10 11 10 10 ...
  ..$ downvotes            : num [1:260] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ up_ratio             : num [1:260] 1 1 0.92 1 1 1 1 1 1 0.92 ...
  ..$ total_awards_received: num [1:260] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ golds                : num [1:260] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ cross_posts          : num [1:260] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ comments             : num [1:260] 4 13 28 12 73 80 27 4 22 20 ...
 $ comments:'data.frame':   4133 obs. of  10 variables:
  ..$ url       : chr [1:4133] "https://www.reddit.com/r/schoolpsychology/comments/wq1xgx/calmingwellness_room_for_high_school/" "https://www.reddit.com/r/schoolpsychology/comments/wq1xgx/calmingwellness_room_for_high_school/" "https://www.reddit.com/r/schoolpsychology/comments/wq1xgx/calmingwellness_room_for_high_school/" "https://www.reddit.com/r/schoolpsychology/comments/wq1xgx/calmingwellness_room_for_high_school/" ...
  ..$ author    : chr [1:4133] "sendapicofyourkitty" "odd-42" "VaginaPirate" "lmidor" ...
  ..$ date      : chr [1:4133] "2022-08-16" "2022-08-17" "2022-08-17" "2022-08-17" ...
  ..$ timestamp : num [1:4133] 1.66e+09 1.66e+09 1.66e+09 1.66e+09 1.66e+09 ...
  ..$ score     : num [1:4133] 10 9 3 2 17 4 6 4 3 2 ...
  ..$ upvotes   : num [1:4133] 10 9 3 2 17 4 6 4 3 2 ...
  ..$ downvotes : num [1:4133] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ golds     : num [1:4133] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ comment   : chr [1:4133] "I tried one and it failed fairly spectacularly- it just wasn\031t a safe space. Certain groups of kids would co"| __truncated__ "Our social worker tried this.  It blew up similarly.  \034Cool kids\035 claimed it as \034theirs.\035  The othe"| __truncated__ "yes, has to be associated to specific needs of individuals who will be using it.  behavior accommodations and/o"| __truncated__ "My high school set one up successfully last year.  The students had to get a pass from either the mental health"| __truncated__ ...
  ..$ comment_id: chr [1:4133] "1" "1_1" "2" "3" ...

Next, since I am interested in posts and comments, I use the get_thread_content function, which uses each post’s URL in order to scrape its comments as well as some other useful metadata. However, after doing so, I noticed that Reddit’s API is limiting the number of comments that I am able to extract, leaving a large number of comments empty. We will need to address this missing data.

Now that we have our initial dataset, I’ll use glimpse() in order to get an initial sense of how the data are structured.

Rows: 260
Columns: 15
$ url                   <chr> "https://www.reddit.com/r/schoolpsychology/comme…
$ author                <chr> "ana_banana_obp", "bageltechnician", "msolorio79…
$ date                  <chr> "2022-08-16", "2022-07-28", "2022-06-21", "2022-…
$ timestamp             <dbl> 1660675004, 1658977235, 1655845096, 1655417025, …
$ title                 <chr> "Calming/wellness room for high school", "report…
$ text                  <chr> "Hi,\nHas anyone successfully established a calm…
$ subreddit             <chr> "schoolpsychology", "schoolpsychology", "schoolp…
$ score                 <dbl> 10, 10, 9, 11, 10, 10, 10, 11, 10, 10, 9, 10, 9,…
$ upvotes               <dbl> 10, 10, 9, 11, 10, 10, 10, 11, 10, 10, 9, 10, 9,…
$ downvotes             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ up_ratio              <dbl> 1.00, 1.00, 0.92, 1.00, 1.00, 1.00, 1.00, 1.00, …
$ total_awards_received <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ golds                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ cross_posts           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ comments              <dbl> 4, 13, 28, 12, 73, 80, 27, 4, 22, 20, 6, 4, 8, 4…

Rows: 4,133
Columns: 10
$ url        <chr> "https://www.reddit.com/r/schoolpsychology/comments/wq1xgx/…
$ author     <chr> "sendapicofyourkitty", "odd-42", "VaginaPirate", "lmidor", …
$ date       <chr> "2022-08-16", "2022-08-17", "2022-08-17", "2022-08-17", "20…
$ timestamp  <dbl> 1660691647, 1660697989, 1660700124, 1660751725, 1658978871,…
$ score      <dbl> 10, 9, 3, 2, 17, 4, 6, 4, 3, 2, 3, 3, 5, 3, 5, 2, 3, 70, 13…
$ upvotes    <dbl> 10, 9, 3, 2, 17, 4, 6, 4, 3, 2, 3, 3, 5, 3, 5, 2, 3, 70, 13…
$ downvotes  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ golds      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ comment    <chr> "I tried one and it failed fairly spectacularly- it just wa…
$ comment_id <chr> "1", "1_1", "2", "3", "1", "1_1", "2", "2_1", "3", "3_1", "…

Examining the scraped data, we can see that we have 999 posts (observations) and 7 columns in the reddschoolpsych data frame, including:

date_utc - the date of the post, in year-month-day format
timestamp - a time stamp for each post
title - the title description for each post
text - the main text for each post
subreddit - in this case, r/schoolpsychology
comments - the number of comments each post gets - this also includes a count of hidden, deleted or blocked comments made on the post
URL - a direct link to each post

Using the URL for each post, we can use get_thread_content() to scrape the comments for each individual post. After doing so, we get a list that includes two data frames. The first has 260 observations and 15 columns. While this data frame has extensive information, such as the author of each post, upvotes, downvotes, awards given, etc., we are primarily going to need the upvote, downvote, score, and URL columns.

The second data frame is much larger than the first and contains 4133 comments (observations) from the 999 posts we scraped. Aside from the comments themselves, there are 9 other variables; however, from this data frame we primarily need the comments and URL columns for our analysis.

Because the data are spread across several tables and contains several extraneous columns, I’ll need to do some cleaning and data processing, which is where I turn to next.

Step 2: Pre-Processing

In Step two, I’ll begin cleaning the data and creating useful new variables. This is a process that will be revisited throughout this analysis, but this initial cleaning will give a good starting point.

To begin, I want to take my list that contains two data frames and break it up into two separate data frames. This will be necessary as while lists are useful for storing multiple types of data, they are difficult to perform calculations on directly.

After creating new data frames, we can extract useful columns from them and add them to the main data frame that contains post information. An interesting quirk of the data are that they are different sizes. comm_sepschoolpsych has 4133 observations, whereas reddschoolpsych only has 999 observations. If you try to combine these data frames as is, R will “helpfully” fill in the missing 3134 values and duplicate all of the post information. To avoid this, I first combine all of the comments associated with each post by its shared URL using the summarize() function and separate comments by a semicolon. Now when we add them to the reddschoolpsych data frame, each post with comments available will have all of its comments stored in a single cell.

Next, I focus on combining information from the thread_sepschoolpsych data frame with our posts and comments. I use full_join() for this and match each post to its scores, upvotes, and downvotes by its associated URL.

After doing this, there are just a few more steps that need to be completed to get a nice clean dataset. I want to rename my column names to be something more descriptive, separate my dates so that each post’s year, month, and day are stored in their own column, and choose what columns I want to keep.

Finally, to address missing data from our posts, I will use list wise deletion for posts with scores of “NA”. While this will reduce the number of observations in our data, it will leave us with a more complete dataset.

#Separating comments and threads 
comm_sepschoolpsych <- commschoolpsych$comments 

thread_sepschoolpsych <- commschoolpsych$threads

#joining the data so that comments are collapsed into one cell and are associated with each post. 
comments_combined <- comm_sepschoolpsych |>
  group_by(url) |>
  summarize(comments = paste(comment, collapse = "; "))

posts_comm_comb <- reddschoolpsych |>
  full_join(comments_combined, by = "url")

#joining threads data so that we can get post rating information 
complete_reddit_data <- thread_sepschoolpsych |> 
  select(url, score) |>
  group_by(url) |> 
  right_join(posts_comm_comb, by = "url") |>
  ungroup(url)

#renaming columns 
complete_reddit_data <- complete_reddit_data |> 
  rename(num_comments = comments.x,
         comb_comments = comments.y)

# separating date 
complete_reddit_data <- complete_reddit_data |>
  separate(date_utc, c("Year", "Month", "Day"), sep = "-")

# selecting columns to retain
complete_reddit_data <- complete_reddit_data |>
  select(title, text, comb_comments, score, num_comments, Year, Month, url)

# addressing NA values 
complete_reddit_data <- na.omit(complete_reddit_data)

Now, let’s see if our hard work paid off.

glimpse(complete_reddit_data)

Rows: 251
Columns: 8
$ title         <chr> "Calming/wellness room for high school", "report writing…
$ text          <chr> "Hi,\nHas anyone successfully established a calming or w…
$ comb_comments <chr> "I tried one and it failed fairly spectacularly- it just…
$ score         <dbl> 10, 10, 9, 11, 10, 10, 10, 11, 10, 10, 9, 10, 9, 10, 9, …
$ num_comments  <dbl> 4, 13, 28, 12, 73, 80, 27, 4, 22, 20, 6, 4, 8, 4, 6, 7, …
$ Year          <chr> "2022", "2022", "2022", "2022", "2022", "2022", "2022", …
$ Month         <chr> "08", "07", "06", "06", "04", "03", "03", "03", "03", "0…
$ url           <chr> "https://www.reddit.com/r/schoolpsychology/comments/wq1x…

Excellent! It looks like we have a decent starting point for analysis. The new data frame complete_reddit_data has 251 observations, and 8 columns: title, text, comb_comments (each post’s combined comments), num_comments (the number of comments each post received), Year, Month, and URL. The significant reduction in observations was due to Reddit’s API limiting the number of comments that could be gathered. Thus, 748 posts did not have comments or score information and were excluded from analysis.

Now that I have a clean dataset, I can begin to do some initial analysis and explore the data further.

Step 3: Analyze

The first thing we can do with our data is to get a sense of its shape and distribution. Using ggplot2 I create histograms for each post’s scores and number of comments.

#visualizing numeric data 
complete_reddit_data |> 
  ggplot(aes(score)) +
  geom_histogram() +
  ggtitle("Histogram of Post Scores") +
  theme_minimal()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

complete_reddit_data |> 
  ggplot(aes(num_comments)) +
  geom_histogram() +
  labs(title = "Histogram of Number of Comments", 
       x = "Number of Comments") + 
  theme_minimal()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Looking at the histogram for the number of comments, we can see that the data has a strong positive skew and appears leptokurtic. Most posts only seem to have a few comments with some outliers exceeding 200+ comments!

Similarly, for post score, we can see that there is also a right-tailed skew, with the majority of posts having a score of 10 and only a few having scores between 1 and 25.

We can also quantify the distributions using the summary() function and the mfv() function which returns the mode of the data.

# summary of numeric data 
summary(select(complete_reddit_data, 
               score,
               num_comments))

     score        num_comments  
 Min.   : 8.00   Min.   :  1.0  
 1st Qu.:11.00   1st Qu.:  6.0  
 Median :12.00   Median : 11.0  
 Mean   :14.29   Mean   : 16.3  
 3rd Qu.:19.00   3rd Qu.: 16.5  
 Max.   :24.00   Max.   :171.0

mfv(complete_reddit_data$score, na_rm = TRUE)

[1] 10

mfv(complete_reddit_data$num_comments, na_rm = TRUE)

[1] 6 9

Looking at the data, we can again see evidence of a slight skew. Scores have a mean of 14.29, a median of 12, and a mode of 10. Similarly, the number of comments have a mean of 16.3, a median of 11, and the data are bi-modal at 6 and 9.

Next, we can get a sense of what is being discussed in the data by looking at posts directly.

#viewing top posts 
complete_reddit_data |>
  arrange(desc(score)) |>
  slice(1:10)

# A tibble: 10 × 8
   title                text  comb_comments score num_comments Year  Month url  
   <chr>                <chr> <chr>         <dbl>        <dbl> <chr> <chr> <chr>
 1 Considering school … "I a… "1. Right no…    24           10 2021  02    http…
 2 Struggling with con… "Hi,… "One great p…    23            7 2021  04    http…
 3 School psychologist… "Sch… "Honestly? I…    23           29 2020  11    http…
 4 Advice for Internsh… "Hey… "Be flexible…    23            4 2019  07    http…
 5 How to emphasize im… "Hi … "Friendly re…    22            6 2024  02    http…
 6 Academic testing fo… "Aft… "I think it …    22           21 2023  09    http…
 7 How many evaluation… "In … "I do about …    22           15 2022  07    http…
 8 Can I ask to shadow… "Hel… "My district…    22           16 2021  07    http…
 9 When does the anxie… "So … "[deleted]; …    22           14 2021  05    http…
10 Best moment as a sc… "Wha… "Years ago I…    22           11 2021  01    http…

#viewing bottom posts
complete_reddit_data |> 
  arrange(score) |>
  slice(1:10)

# A tibble: 10 × 8
   title                text  comb_comments score num_comments Year  Month url  
   <chr>                <chr> <chr>         <dbl>        <dbl> <chr> <chr> <chr>
 1 Anyone here work in… "Hel… "Looks like …     8            1 2021  07    http…
 2 Graduate School, Ad… "Hel… "Hi everyone…     8           44 2021  06    http…
 3 Masters level Schoo… "I s… "BCBAs, seco…     9           28 2022  06    http…
 4 Counseling outside … "I w… "I'm a bot. …     9            6 2022  02    http…
 5 Automatic referral … "Yes… "\"What is t…     9            8 2022  01    http…
 6 Contemplating my ca… "Hel… "Pros and co…     9            6 2022  01    http…
 7 Giving the BASC-3 t… "If … "I would att…     9            9 2021  12    http…
 8 Salary Schedules     "Whe… "I'm a bot. …     9            4 2021  09    http…
 9 On Pedagogy, reform… "Do … "Following. …     9            3 2021  08    http…
10 How to Email School… "I a… "Hello, 2nd …     9           13 2021  07    http…

Looking at the top and bottom posts, we can see that the highest-rated posts are generally about people seeking advice on how to become school psychologists, credentialing standards, and evaluation considerations. However, three of the top posts, “Struggling with consultation”, “School psychologist Frustration”, and “When does the anxiety stop?” seem to be expressing negative sentiments about the field and experiences within it.

The lowest-rated posts pertain to educational programs, jobs/compensation, research, and district-level procedures. Nothing on the surface at least appears to be negative about these posts.

Now that we have a general sense of what the posts are talking about, I can begin preparing our data for sentiment analysis. Sentiment analysis examines subjective elements, such as words, for their emotional quality [ Taboada (2016) ]. Sentiment analysis compares words to dictionaries that have been pre-rated by researchers for their emotional quality by either assigning a qualitative label (e.g. positive, negative) or through assigning a quantitative value (e.g. 1 [negative] – 5 [positive]) (Silge and Robinson, n.d.). For the present study, I used the Valence Aware Dictionary and sEntiment Reasoner (VADER; Hutto and Gilbert (2014) ), as it is trained to analyze the sentiments of short social media posts.

posts_unested <- select(complete_reddit_data, title, text, Year, score, num_comments)
comments_unested <- select(complete_reddit_data, comb_comments, Year)

posts_unested <- posts_unested |> 
  unnest_tokens(output = word, 
                input = title) |> 
  unnest_tokens(output = word, 
                input = text) 

comments_unested <- comments_unested |>
  unnest_tokens(output = cword, 
                input = comb_comments) 

# Remove stop words from both posts and comments 
posts_unested <- posts_unested |>
  anti_join(stop_words, by = "word")

comments_unested <-  comments_unested |>
  anti_join(stop_words, by = c("cword" = "word"))

glimpse(posts_unested)

Rows: 55,746
Columns: 4
$ Year         <chr> "2022", "2022", "2022", "2022", "2022", "2022", "2022", "…
$ score        <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
$ num_comments <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, …
$ word         <chr> "successfully", "established", "calming", "wellness", "ma…

glimpse(comments_unested)

Rows: 71,682
Columns: 2
$ Year  <chr> "2022", "2022", "2022", "2022", "2022", "2022", "2022", "2022", …
$ cword <chr> "failed", "fairly", "spectacularly", "wasn", "safe", "space", "k…

For this analysis, words from the title of each post and their associated text have been combined into a new object, posts_unested. Similarly, comments have been unnested into comments_unested. I also performed some initial cleaning of the data by removing “stop words” or common words that do not convey much information on their own, such as “the”, “and”, “if”, etc. After unnesting, we end up with 55, 746 unique words for posts and 71, 682 words for our comments. Let’s now take a look a closer look at the words. Using ggplot() we can create a column chart that displays the frequencies for each data frame.

#identifying top words 
posts_unested|> 
  count(word, sort = TRUE) |>
  filter(n > 200) |>
  mutate(word = reorder(word, n)) |>
  ggplot(aes(n, word)) + 
  geom_col() +
  labs(y = NULL)

comments_unested |> 
  count(cword, sort = TRUE) |>
  filter(n > 200) |>
  mutate(cword = reorder(cword, n)) |>
  ggplot(aes(n, cword)) + 
  geom_col() +
  labs(y = NULL)

Looking through the histograms, we can see there were a lot of undesirable words generated such as “NA”, word fragments, such as “ve”, “don”, etc., as well as words that are obvious given the context from which the data came from, such as “school”, “psych”, “psychology”, etc. To get more usable data, I will need to remove these words. Luckily, I can use a similar procedure as described above and specify my own custom stop words.

# custom stopwords 
restop <- c("school", "schools", "schoolpsychology", "psychology", "psych", "psychs", 
            "psychologist", "psychologists","grad", "graduate", "graduates", "NA", "N/A", "ve", "reddit",
            "old.reddit.com", "don", "thread", "https", ".com", "faq", "wiki", 
            "2", "1", "3", "4", "5", "6", "7", "8", "9", "10", "deleted", "ll", "11", 
            "amp", NA)

posts_unested <- 
  posts_unested |> 
  filter(!word %in% restop)

comments_unested <-
  comments_unested |> 
  filter(!cword %in% restop)

# reviewing top words 
posts_unested |> 
  count(word, sort = TRUE) |>
  filter(n > 200) |>
  mutate(word = reorder(word, n)) |>
  ggplot(aes(n, word)) + 
  geom_col() +
  labs(y = NULL, x = "Number of Occurences", title = "Top Post Words")

comments_unested |> 
  count(cword, sort = TRUE) |>
  filter(n > 200) |>
  mutate(cword = reorder(cword, n)) |>
  ggplot(aes(n, cword)) + 
  geom_col() +
  labs(y = NULL, x = "Number of Occurences", title = "Top Comment Words")

Looking at the top post words, we can see that they mostly pertain to time, jobs, districts, students, and feelings. Comments refer to programs, time, districts, jobs, and students. There are some clear overlaps between what is commonly discussed in posts and comments. Another way to visualize words is through word clouds.

# creating word clouds
posts_unested |> 
  select(word) |> 
  count(word, sort = TRUE) |> 
  slice_max(order_by = n, n = 50) |>
  wordcloud2(size = 0.5)

In this view, the same words for posts are represented as they were in the previous chart. However, this lets us better visualize the relative frequency of each word.

Now that the data are prepared, we’re ready to begin answering some questions! The first is, What are the sentiments expressed on r/schoolpsychology? The second is, is there is a difference between sentiments expressed on posts compared to comments? To answer these questions, I’ll need to conduct a sentiment analysis on the words. The vader() package allows us to use a rubric of words that have already been assigned a positive (1) to negative (-1) score to compare our words against and assign ratings to each.

# Sentiment Analysis 
sentiments_posts <- vader_df(posts_unested$word)
sentiments_comments <- vader_df(comments_unested$cword)

vader_posts_summary <- sentiments_posts |>
  mutate(sentiment = ifelse(compound >= 0.05, "positive",
                            ifelse(compound <= -0.05, "negative", "neutral"))) |>
  count(sentiment, sort = TRUE) |> 
  spread(sentiment, n) |>
  relocate(positive) |>
  mutate(ratio = positive/negative)


vader_comments_summary <- sentiments_comments |>
  mutate(sentiment = ifelse(compound >= 0.05, "positive",
                            ifelse(compound <= -0.05, "negative", "neutral"))) |>
  count(sentiment, sort = TRUE) |> 
  spread(sentiment, n) |>
  relocate(positive) |>
  mutate(ratio = positive/negative)

vader_posts_summary

  positive negative neutral   ratio
1     4047     2436   42061 1.66133

vader_comments_summary

  positive negative neutral    ratio
1     6108     2993   56557 2.040762

A few things become apparent through my analysis using vader(). First, comments appear to have a higher ratio of positive to negative words compared to posts. For every 100 negative words in the comments, there are around 200 positive words present. Whereas there is a 1.66 ratio for positive to negative words for posts, in other words, for every 100 negative words, there are 166 positive words. Taken together, it seems that there is a neutral to positive overall sentiment expressed on r/schoolpsychology.

The second observation is that there is a large proportion of words that vader() classified as Neutral - 86.6% of post words and 86.1% of words for comments. This can be due to these words being truly neutral (not having a negative or positive connotation), or they are just words that are not included in vader’s dictionary, and so vader is not able to assign a valence rating. We can take a look at what words were assigned to negative, positive, and neutral ratings next.

# most common positive and negative sentiments 
sentiments_posts <- sentiments_posts |> 
  mutate(sentiment = ifelse(compound >= 0.05, "positive",
                            ifelse(compound <= -0.05, "negative", "neutral"))) 

sentiments_comments <- sentiments_comments |>
  mutate(sentiment = ifelse(compound >= 0.05, "positive",
                            ifelse(compound <= -0.05, "negative", "neutral"))) 

sentiments_posts |>
  count(text, sentiment, sort = TRUE) |>
  filter(sentiment == "positive") |>
  group_by(sentiment) |>
  slice_max(n, n = 10) |>
  ungroup()

# A tibble: 11 × 3
   text        sentiment     n
   <chr>       <chr>     <int>
 1 love        positive    154
 2 curious     positive    133
 3 support     positive    123
 4 helpful     positive    109
 5 special     positive    107
 6 pretty      positive     90
 7 honestly    positive     84
 8 hope        positive     80
 9 appreciated positive     76
10 emotional   positive     73
11 feeling     positive     73

sentiments_posts |>
  count(text, sentiment, sort = TRUE) |>
  filter(sentiment == "negative") |>
  group_by(sentiment) |>
  slice_max(n, n = 10) |>
  ungroup()

# A tibble: 11 × 3
   text      sentiment     n
   <chr>     <chr>     <int>
 1 hard      negative    124
 2 anxiety   negative    107
 3 low       negative     86
 4 pay       negative     80
 5 bad       negative     57
 6 difficult negative     57
 7 tough     negative     55
 8 nervous   negative     45
 9 worried   negative     45
10 crazy     negative     40
11 leave     negative     40

sentiments_posts |>
  count(text, sentiment, sort = TRUE) |>
  filter(sentiment == "neutral") |>
  group_by(sentiment) |>
  slice_max(n, n = 10) |>
  ungroup()

# A tibble: 10 × 3
   text       sentiment     n
   <chr>      <chr>     <int>
 1 district   neutral     520
 2 job        neutral     449
 3 student    neutral     435
 4 time       neutral     420
 5 students   neutral     407
 6 program    neutral     383
 7 feel       neutral     368
 8 programs   neutral     291
 9 experience neutral     284
10 testing    neutral     281

sentiments_comments |>
  count(text, sentiment, sort = TRUE) |>
  filter(sentiment == "positive") |>
  group_by(sentiment) |>
  slice_max(n, n = 10) |>
  ungroup()

# A tibble: 10 × 3
   text     sentiment     n
   <chr>    <chr>     <int>
 1 love     positive    167
 2 pretty   positive    156
 3 luck     positive    154
 4 accepted positive    124
 5 hope     positive    115
 6 lol      positive    109
 7 helpful  positive    105
 8 special  positive    103
 9 support  positive     97
10 honestly positive     75

sentiments_comments |>
  count(text, sentiment, sort = TRUE) |>
  filter(sentiment == "negative") |>
  group_by(sentiment) |>
  slice_max(n, n = 10) |>
  ungroup()

# A tibble: 11 × 3
   text      sentiment     n
   <chr>     <chr>     <int>
 1 pay       negative    185
 2 hard      negative    125
 3 leave     negative     76
 4 bad       negative     75
 5 anxiety   negative     64
 6 difficult negative     61
 7 stress    negative     50
 8 rejected  negative     40
 9 tough     negative     37
10 low       negative     35
11 nervous   negative     35

sentiments_comments |>
  count(text, sentiment, sort = TRUE) |>
  filter(sentiment == "neutral") |>
  group_by(sentiment) |>
  slice_max(n, n = 10) |>
  ungroup()

# A tibble: 10 × 3
   text       sentiment     n
   <chr>      <chr>     <int>
 1 program    neutral     727
 2 time       neutral     671
 3 district   neutral     542
 4 job        neutral     480
 5 student    neutral     371
 6 students   neutral     371
 7 lot        neutral     360
 8 interview  neutral     345
 9 people     neutral     342
10 experience neutral     328

The tables generated above display the top 10 words for posts and comments. In brief, they include:

Posts

Valence	Top Words
Positive	love, curious, support, helpful, special
Neutral	district, job, student, time, students
Negative	hard, anxiety, low, pay, bad

Comments

Valence	Top Words
Positive	love, pretty, luck, accepted, hope
Neutral	program, time, district, job, student
Negative	pay, hard, leave, bad, anxiety

We can also see positive and negative words expressed visually through a “comparison cloud” that plots the top 100 positive and negative words next to each other. This view allows us to directly compare what words are considered positive and negative in the data as well as their relative frequency of representation.

suppressWarnings({sentiments_posts |>
  filter(sentiment %in% c("positive", "negative")) |>
  count(text, sentiment, sort = TRUE) |>
  acast(text ~ sentiment, value.var = "n", fill = 0) |>
  comparison.cloud(colors = c("darkblue", "goldenrod3"),
                   max.words = 50)
})

Looking through the words, it does appear that vader did a decent job at selecting truly positive, negative, and neutral words. The positive words across posts and comments seem to deal with positive emotions, as well as being accepted. In this case “acceptance” is likely coming from a weekly graduate student thread posted on r/schoolpsychology, where prospective or current students can ask questions or share news of their acceptance into graduate programs. Neutral words also seem to be more objectively neutral, such as time, district, students, etc. However, of course, the context of words matters quite a bit. A limitation of examining each word individually is that we don’t know the context for the words. For example if “pay” is preceded “high” then we might consider that a positive sentiment (getting high pay); however, if “pay” is preceded by “low”, we might interpret that as a complaint about receiving “low pay”. To get a bit more context for the data, I also generate bigrams or two-word sequences rather than just one word individually. Doing so will require some tweaking to the code above, but the process is largely the same.

#creating bigrams 
posts_unested_big <- select(complete_reddit_data, title, text, Year, score, num_comments)
comments_unested_big <- select(complete_reddit_data, comb_comments, Year)

post_bigram <- posts_unested_big |> 
  unnest_tokens(output = word, 
                input = title,
                token = "ngrams", 
                n = 2) |> 
  unnest_tokens(output = word, 
                input = text,
                token = "ngrams", 
                n = 2)
comments_bigram <- comments_unested_big |> 
  unnest_tokens(output = cword, 
                input = comb_comments,
                token = "ngrams",
                n = 2)

posts_bigrams_separated <- post_bigram |>
  separate(word, c("word1", "word2"), sep = " ") 

comm_bigrams_separated <- comments_bigram |>
  separate(cword, c("word1", "word2"), sep = " ") 

post_clean_bigrams <- posts_bigrams_separated |>
  filter(!word1 %in% stop_words$word) |>
  filter(!word2 %in% stop_words$word) |>
  filter(!word1 %in% restop) |>
  filter(!word2 %in% restop) |>
  unite(word, word1, word2, sep = " ")

comments_clean_bigrams <- comm_bigrams_separated |>
  filter(!word1 %in% stop_words$word) |>
  filter(!word2 %in% stop_words$word) |>
  filter(!word1 %in% restop) |>
  filter(!word2 %in% restop) |>
  unite(cword, word1, word2, sep = " ")

post_clean_bigrams |> count(word, sort=TRUE)

# A tibble: 1,527 × 2
   word                      n
   <chr>                 <int>
 1 questions related        96
 2 programs admissions      85
 3 training programs        85
 4 special education        69
 5 post questions           53
 6 iep meetings             48
 7 social emotional         48
 8 hard time                42
 9 mental health            42
10 assignments practicum    36
# ℹ 1,517 more rows

comments_clean_bigrams |> count(cword, sort = TRUE)

# A tibble: 13,979 × 2
   cword                 n
   <chr>             <int>
 1 mental health        78
 2 special education    57
 3 report writing       47
 4 private practice     39
 5 phd program          36
 6 feel free            33
 7 nasp approved        33
 8 eds program          30
 9 special ed           27
10 message compose      26
# ℹ 13,969 more rows

Using bigrams gave a bit more context for the data. Most of the frequently occurring bigrams appear to be neutral questions (e.g. programs admissions, special education, etc), to positive (“greatly appreciated”). Similarly, comments deal with similar topics, such as “mental health”, “report writing”, etc. Thus, it seems that using single words (unigrams) in our analysis is capturing the true sentiments expressed in on r/schoolpsychology.

Now that we’ve looked at general sentiments expressed on r/schoolpsychology. We can turn our attention to the third question, How have sentiments changed over time? This can be accomplished fairly easily using ggplot() as well as features in the dataset that I’ve already created. Namely, I will generate a bar chart that plots the ratio of positive to negative scores by year.

#sentiments over time 
posts_unested<- bind_cols(posts_unested, select(sentiments_posts, compound, sentiment))
comments_unested<- bind_cols(comments_unested, select(sentiments_comments, compound, sentiment))


posts_time <- posts_unested |>
 group_by(Year, sentiment) |>
  count(sentiment, sort = TRUE) |> 
  spread(sentiment, n) |>
  relocate(Year) |>
  mutate(ratio = positive/negative)
  
comments_time <- comments_unested |>
  group_by(Year, sentiment) |>
  count(sentiment, sort = TRUE) |> 
  spread(sentiment, n) |>
  relocate(Year) |>
  mutate(ratio = positive/negative)

posts_time |> ggplot(aes(Year, ratio)) +
  geom_col() +
  ggtitle ("Positive & Negative Posts by Year") +
  theme_minimal()

comments_time |> ggplot(aes(Year, ratio)) +
  geom_col() +
  ggtitle ("Positive & Negative Commments by Year") +
  theme_minimal()

Looking at the bar charts, we can see that for posts, the year with the lowest ratio of positive to negative words was 2022. In addition, it appears that positive posts have somewhat decreased over time; however, there continues to be more positive than negative posts, as a proportion, each year. In contrast, for comments, 2018 had the lowest number of positive to negative words ratio, and positive words have increased as a proportion until 2021 where words have remained consistently highly positive since. In short, posts have become somewhat less positive over time and comments have become more positive, but there is consistently more positive than negative sentiments expressed since 2018.

Now that we’ve examined how posts have changed over time, we’re ready to answer our final question, Do negative sentiments get more engagement than positive sentiments (comments, upvotes, downvotes)? To assess this, we could just examine the average sentiment ratings, points, and the number of comments of positive and negative words and see which appears to be larger. However, a more robust approach would be to use the aov function in order to conduct a one-way ANOVA, which will allow us to examine if mean differences in scores and number of comments between positive, negative, and neutral words are significant. However, an ANOVA will only tell us if there are significant differences between groups. Tukey’s HSD will be used as a post-hoc comparison in order to examine which groups were significantly different from each other.

# Do posts with negative sentiments get more engagement? 
posts_unested <- posts_unested %>%
  mutate(across(c(score, num_comments), as.numeric))

posts_unested |> 
  group_by(sentiment) |>
  summarize(mean_num_comments = mean(num_comments, na.rm = TRUE),
            mean_score = mean(score, na.rm = TRUE))

# A tibble: 3 × 3
  sentiment mean_num_comments mean_score
  <chr>                 <dbl>      <dbl>
1 negative               12.8       16.1
2 neutral                14.2       15.0
3 positive               12.1       15.0

posts_unested$sentiment <- factor(posts_unested$sentiment, 
                                  levels = c("negative", "neutral", "positive"))

scoredif <- aov(score ~ sentiment, data = posts_unested)
summary(scoredif)

               Df  Sum Sq Mean Sq F value Pr(>F)    
sentiment       2    2844  1421.9   68.64 <2e-16 ***
Residuals   48541 1005606    20.7                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

commdif <- aov(num_comments ~ sentiment, data = posts_unested)
summary(commdif)

               Df   Sum Sq Mean Sq F value Pr(>F)    
sentiment       2    19878    9939   41.24 <2e-16 ***
Residuals   48541 11697126     241                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TukeyHSD(scoredif)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = score ~ sentiment, data = posts_unested)

$sentiment
                          diff        lwr        upr     p adj
neutral-negative  -1.109385866 -1.3316905 -0.8870812 0.0000000
positive-negative -1.100359264 -1.3739141 -0.8268044 0.0000000
positive-neutral   0.009026602 -0.1665406  0.1845938 0.9920269

TukeyHSD(commdif)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = num_comments ~ sentiment, data = posts_unested)

$sentiment
                        diff        lwr       upr     p adj
neutral-negative   1.4221443  0.6639609  2.180328 0.0000328
positive-negative -0.6813163 -1.6142915  0.251659 0.2008148
positive-neutral  -2.1034606 -2.7022431 -1.504678 0.0000000

Let’s first look at the mean number of comments and scores for positive, negative and neutral posts. The analysis indicates that posts containing negative sentiments, on average, get 12.79 comments, neutral posts get 14.22 comments, and positive posts only get 12.11 comments. It appears that positive posts get marginally fewer comments than negative or neutral posts. Looking at scores however, tells a slightly different story. Posts containing negative sentiments get 16.09 points, neutral posts get 14.98 points, and positive posts get 14.99 points. It appears that posts that express more negative sentiments get more points. We can also see if these differences are statistically significant.

In terms of scores, findings from a one-way ANOVA indicate that there was a significant difference between sentiment groups (F(2, 48541) = 68.64, p <.001). Tukey’s HSD indicated that negative sentiments (M = 16.09), had significantly higher scores than neutral (M = 14.98) and positive (M = 14.99) sentiments. This indicates that negative sentiments received higher scores than positive or neutral sentiments overall.

In terms of the number of comments, a one way ANOVA similarly indicated that there was a significant difference between sentiment groups (F(2, 48541) = 41.24, p <.001). However, Tukey’s HSD indicated that neutral sentiments (M = 14.22) received significantly more comments than positive (M = 12.11) and negative (M = 12.79) sentiments, and that posts with positive and negative sentiments did not significantly differ from each other.

Thus, in reference to the fourth question, it appears that how “engagement” is defined appears to matter. Negative sentiments appear to get more points than positive sentiments, but both posts with negative and positive sentiments get similar numbers of comments.

We can speculate further about why these results may have occurred. It may be that posts that express negative sentiments are generally seen as more relatable to other school psychologists. Thus, they are more likely to get upvoted when they occur. The finding that both posts with negative and positive sentiments get similar numbers of comments indicates that people may be engaging or supporting others when positive or celebratory posts are made. Similarly, negative posts may also elicit more empathy and support from others when they come up. For example, the post entitled “When does the anxiety stop?” received 22 upvotes and 14 comments. The majority of comments appear to be supportive of the post creator. For example:

“For real though, I finally got to the point of not having panic attacks when I learned to take the onus off of myself. I am not the decision maker. I am not responsible for what the numbers look like (unless I made a mistake.) I am not the teacher. I am a part of a team and the team, including the parent, is responsible for decisions made. Honestly, I am probably one of the least important at the table, but I’m sure if you’re like me, you didn’t feel like that at the beginning of your career. And again, remember its your career, your job, not your life.”

Step 4: Communicate

Let’s briefly recap and summarize our four research questions:

What are the sentiments expressed on r/schoolpsychology?
- In general, sentiments appear to be neutral to positive.
Is there a difference between sentiments expressed on posts compared to comments?
- Yes, comments appear to contain more positive sentiments than posts.
How have sentiments changed over time?
- Sentiments have stayed consistently positive for posts and have become increasingly positive for comments.
Do negative sentiments get more engagement than positive sentiments (comments, upvotes, downvotes)?
- Negative sentiments do appear to get more engagement than positive sentiments in terms of total points awarded, and similar number of comments as positive sentiments.

What does this mean for online school psychology communities?

From the current study it appears that r/schoolpsychology is generally a positive place for current and prospective school psychologists. However, while the sentiments expressed on r/schoolpsychology are largely positive, there were proportionally more neutral sentiments, as well as a fairly sizable proportion of negative sentiments. In addition, negative sentiments were more likely to get more points. Posts with negative sentiments may lead to more discussion or sympathy for the negative feelings expressed. It appears that school psychologists may use r/schoolpsychology to vent or express frustration, and when they do, they receive positive support from others. This is consistent with prior research that indicates that online professional communities can offer a space for professionals to vent and receive support from colleagues [ Oksanen et al. (2024) ].

While posts expressing negative sentiments appear to be proportionally fewer than positive sentiments, research indicates that negative information may capture a disproportionate amount of our attention (Ito et al. 1998). Thus, these posts may still be what a prospective applicant thinks about when they think about the field of school psychology. This may be especially likely when prospective applicants view r/schoolpsychology by filtering by the “top” posts or by “hot” posts. Both show posts to users based on the total number of upvotes given to each post, with “hot” also favoring more recently created posts. Ranking by “hot” is the default method that Reddit uses to show posts. Thus, since negative sentiments are likely to get more upvotes, users browsing the platform are more likely to see posts containing these sentiments and are less likely to see the proportionally larger amount of positive or neutral posts.

Limitations

A key limitation of this study is that while sentiment analysis is able to provide a general understanding of sentiments expressed across a large corpus of text, the approach may miss more subtle nuances in language or context that change the meaning of the message being conveyed. For this reason, qualitative or mixed methods approaches examining posts directly in conjunction with sentiment analysis may provide more accurate insights. In addition, there was a significant amount of missing data which may have biased the present results. This limitation may be overcome through gaining full access by Reddit to use it’s API for research through applications such as Pushshift.io [ “NCRI Reddit Access” (n.d.) ]. Another limitation is that the views expressed on Reddit may not be representative of all school psychologists. However, research on the perceptions of school psychologists has rarely accounted for the views of school psychologists online. Given that r/schoolpsychology has over 13,000 members, the views of this community are likely to represent a significant, and under-researched, perspective in the field. Future research may supplement the perspectives of school psychologists online with qualitative approaches such as in-person interviews or questionnaires.

There are also several ethical considerations for the present study. First, data was taken from Reddit from users who did not necessarily consent to being a part of a study. This concern may be mitigated by the fact that Reddit is an anonymous social networking website, so the data used cannot be tracked to a user’s real name or other personally identifiable information, unless they have chosen to disclose that information. In addition, usernames were not used in this analysis. A related potential legal concern is the use of Reddit data for analysis. At this time, using the Reddit API to pull post information for research still may be allowed by Reddit’s terms of service, and packages like RedditExtractoR are still allowed to function. However, there is a lack of clarity around this issue as Reddit recently updated their terms of service to ban the use of their API for machine learning specifically [ “Data API Terms” (n.d.) ]. Since this is a research project and is not seeking to build a commercial product, it is likely that use of Reddit data in the present analysis is legally defensible.

Next Steps

While it is heartening to see that the majority of sentiments expressed on r/schoolpsychology are positive, there remains a sizable portion of negativity being expressed. These negative sentiments may provide an avenue for future efforts to intervene and improve working conditions for current and future practitioners. Many of the negative sentiments expressed deal with stress, anxiety, workload, and pay, among other concerns. These issues are not new in the field of school psychology and are major contributors to burnout and attrition (Schilling, Randolph, and Boan-Lenzo 2018). These issues require advocacy at the national, state, and regional levels through organizations such as NASP to promote better working conditions, pay, and support for new and seasoned school psychologists alike. For instance, NASP could advocate for policies that set limits on the number of evaluations school psychologists are required to complete each year, helping to reduce burnout and ensure that students receive high quality services. Additionally, addressing workforce shortages by investing in recruitment efforts is crucial. This could involve raising awareness about the field among undergraduate students and establishing structured pathways that offer research and practice-oriented experiences, which are often necessary for admission into school psychology graduate programs. Doing so may lead to a more organic reduction in negative sentiments expressed online and make school psychology a more attractive field for current and future practitioners.

References

APA. n.d. “There’s a Strong Push for More School Psychologists.” https://www.apa.org/monitor/2024/01/trends-more-school-psychologists-needed.

“Data API Terms.” n.d. https://redditinc.com/policies/data-api-terms.

Davis, Jenny L., and Timothy Graham. 2021. “Emotional Consequences and Attention Rewards: The Social Effects of Ratings on Reddit.” Information, Communication & Society 24 (5): 649–66. https://doi.org/10.1080/1369118X.2021.1874476.

Fellows, Ian. 2018. Wordcloud: Word Clouds. https://cran.r-project.org/web/packages/wordcloud/index.html.

Hothorn, Torsten, Frank Bretz, Peter Westfall, Richard M. Heiberger, Andre Schuetzenmeister, and Susan Scheibe. 2024. Multcomp: Simultaneous Inference in General Parametric Models. https://cran.r-project.org/web/packages/multcomp/index.html.

Hutto, C., and Eric Gilbert. 2014. “VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text.” Proceedings of the International AAAI Conference on Web and Social Media 8 (1): 216–25. https://doi.org/10.1609/icwsm.v8i1.14550.

Ito, Tiffany, Jeff Larsen, Kyle Smith, and John Cacioppo. 1998. “Negative Information Weighs More Heavily on the Brain.” Journal of Personality and Social Psychology 75 (October): 887–900. https://doi.org/10.1037/0022-3514.75.4.887.

Lang, Dawei, and Guan-tin Chien. 2018. Wordcloud2: Create Word Cloud by ’Htmlwidget’. https://cran.r-project.org/web/packages/wordcloud2/index.html.

NASP. n.d. “State Shortages Data Dashboard.” https://www.nasponline.org/about-school-psychology/state-shortages-data-dashboard.

“NASP Exposure Project (NASP-EP).” n.d. https://www.nasponline.org/resources-and-publications/resources-and-podcasts/diversity-and-social-justice/cultural-responsiveness/multicultural-affairs-committee/nasp-exposure-project-(nasp-ep).

“NCRI Reddit Access.” n.d. https://pushshift.io/signup.

Oksanen, Atte, Magdalena Celuch, Reetta Oksa, and Iina Savolainen. 2024. “Online Communities Come with Real-World Consequences for Individuals and Societies.” Communications Psychology 2 (1): 1–9. https://doi.org/10.1038/s44271-024-00112-6.

Paul, Shirley-Anne S., and Paula J. Clarke. 2016. “A Systematic Review of Reading Interventions for Secondary School Students.” International Journal of Educational Research 79 (January): 116–27. https://doi.org/10.1016/j.ijer.2016.05.011.

“Reddit for School Psychologists!” n.d. https://www.reddit.com/r/schoolpsychology/.

Rivera, Ivan. 2023. RedditExtractoR: Reddit Data Extraction Toolkit. https://cran.r-project.org/web/packages/RedditExtractoR/index.html.

Schilling, Ethan J., and Mickey Randolph. 2021. “Voices from the Field: Addressing Job Burnout in School Psychology Training Programs.” Contemporary School Psychology 25 (4): 572–81. https://doi.org/10.1007/s40688-020-00283-z.

Schilling, Ethan J., Mickey Randolph, and Candace Boan-Lenzo. 2018. “Job Burnout in School Psychology: How Big Is the Problem?” Contemporary School Psychology 22 (3): 324–31. https://doi.org/10.1007/s40688-017-0138-x.

Silge, Julia, and David Robinson. n.d. Welcome to Text Mining with R | Text Mining with R. 1st ed. O’Reilly Media. https://www.tidytextmining.com/.

Silge, Julia, David Robinson, Gabriela De Queiroz, Colin Fay, Emil Hvitfeldt, Os Keyes, Kanishka Misra, Tim Mastny, and Jeff Erickson. 2024. Tidytext: Text Mining Using ’Dplyr’, ’Ggplot2’, and Other Tidy Tools. https://cran.r-project.org/web/packages/tidytext/index.html.

Taboada, Maite. 2016. “Sentiment Analysis: An Overview from Linguistics.” Annual Review of Linguistics 2 (Volume 2, 2016): 325–47. https://doi.org/10.1146/annurev-linguistics-011415-040518.

Wickham, Hadley. 2007. “Reshaping Data with the Reshape Package.” Journal of Statistical Software 21 (November): 1–20. https://doi.org/10.18637/jss.v021.i12.