Census at School: A MOOC Sentiment Analysis

INTRODUCTION

This case study presents a sentiment analysis of educators’ opinions on Census at School program mined from discussion forum posts of a MOOC-Ed titled Teaching Statistics through a Data investigation.The MOOC is a professional development program that is targeted toward educators in K-12 and post secondary education systems in the United States. On the other hand, Census at Schools is a program that supports student learning through statistical problem-solving. It is designed for students in grades 4-12 and fosters project -based learning through international collaboration.

The targeted audience for this study includes educators and other education stakeholders who are at a point of making decisions regarding integrating Census at School in their classrooms. Additionally, this information can be useful to Census at School as an institution and it can contribute to the process of making improvements to the program.

This study follows the Data-Intensive Research Workflow presented by (Krumm et al.,2018) to perform the informed analysis and communicate the findings.

Prepare

Background

The recent advancements in text mining approaches has led to the rise of studies that seek to understand learners’ emotions, opinions, and attitudes toward specific topics in learning communities. Prior studies have indicated that sentiment analysis can be applied to glean insightful information from educational environments, including Massive Open Online Courses (Wang et al., 2014; Yan et al., 2021). For instance, studies by Moreno-Marcos et al. (2018) and Onan (2020) used sentiment analysis algorithms to assess feedback on course evaluations and used the findings to aid decision making processes and strategize for future course improvements.

From the literature review, I have noticed that empirical studies have had more focus on applying these techniques on public and social media networks such as Twiiter and Facebook. Therefore, part of my motivation for performing this study comes from the need of experimenting how these tools can be applied in MOOC contexts. In this study, a mix of sentiment analysis and social network analysis algorithms have been utilized to mine perspectives and opinions of participants of a MOOC-Ed. As a researcher, I have taken advantage of the background and profiles of MOOC participants who are positioned as educators in K-12 and post secondary.

Research Questions

The main research questions that will lead to understanding educators’ sentiments toward Census at School include:

What are the most frequently used words by educators regarding Census at School program?
What are educators’ sentiments regarding Census at School program?
Who are the active participants that engaged in Census at School discussions in the forum and what is their interaction pattern?

The Dataset

The source dataset that I am using in study has a collection of 5788 discussion forum posts from a MOOC: Teaching Statistics through a Data investigation. The dataset contains discussion forum posts from offered course instances from Fall 2015 to Fall 2017. Due to the scope of this project, my sentiment analysis has focused on a pared data frame containing observations of the Fall 2017 course with discussion posts targeting Census at Schools as the topic.

Wrangle

The wrangling process involved a set of steps including paring the dataset which originally had 5788 observations from instances of eight course offerings from Fall 2015 to Fall 2017. I have included comments in the code chuck to inform on the performed manipulations.

#loading libraries
library(tidytext)
library(vader)
library(tidyverse)
library(here)
library(wordcloud2)
library(tidygraph)
library(ggraph)
library(igraph)

The observations of interest for this analysis are discussion forum posts from the course offered in Fall 2017 and it specifically targeted topics that included “Census at School”.

Table 1: Variables of Interest
S/N	Variable	Description
1	post_content	Primary variable of interest containing discussion forum posts
2	discussion_id	Unique reference of new discussion post
3	forum_id	Unique reference of the forum
4	discussion_creator	User who initially created the forum post
4	discussion_poster	user who posted in the discussion forum
5	course_id	Unique identification of the course
7	post_title	For validating selected posts based on the title

#importing dataset and converting "double" variables to characters
mooc_forum <- read_csv(here("Data", "mooc_forum.csv"), 
                       col_types = cols(course_id = col_character(), 
                                        forum_id = col_character(),
                                        discussion_id = col_character(), 
                                        discussion_creator = col_character(), discussion_poster = col_character(), discussion_reference = col_character(), parent_id = col_character(), post_id = col_character()
                       )
)
#Selecting variables of interest for analysis
mooc_forum_1 <- mooc_forum %>% 
  select(post_content, discussion_id, forum_id, discussion_creator, discussion_poster,
         discussion_reference, post_title, course_id)%>%
  # ommitting entries that do not have corresponding values  
  filter((!is.na(course_id))|(!is.na(discussion_id)))

By skimming through the dataset, discussion forum posts that focused on Census at School were available in the course offered in 2017 with course_id of 73.

#selecting the course of interest with ID 73, offered in Fall 2017
mooc_forum_2 <- mooc_forum_1 %>% filter (course_id == "73")

#Filtering the rows by discussion_id
mooc_forum_3 <- mooc_forum_2 %>% filter (discussion_id == "18582" | discussion_id == "19132" | discussion_id == "23801" | discussion_id == "18555" | discussion_id == "22624")
mooc_forum_3

## # A tibble: 53 × 8
##    post_content         discussion_id forum_id discussion_crea… discussion_post…
##    <chr>                <chr>         <chr>    <chr>            <chr>           
##  1 If my students were… 18582         866      14612            14612           
##  2 I agree with you. C… 18582         866      14612            14730           
##  3 I agree that using … 18582         866      14612            14611           
##  4 I agree this is mor… 18582         866      14612            14132           
##  5 I agree with you on… 18582         866      14612            14232           
##  6 I agree with starti… 18582         866      14612            17316           
##  7 I think it would be… 18582         866      14612            14572           
##  8 Thanks so much for … 18582         866      14612            14612           
##  9 Thank you for shari… 18582         866      14612            14715           
## 10 I'm also in agreeme… 18582         866      14612            13639           
## # … with 43 more rows, and 3 more variables: discussion_reference <chr>,
## #   post_title <chr>, course_id <chr>

The tribble table represents the trimmed dataframe that contains 53 discussion forum posts that were based on Census at School topics. These were filtered by using discussion forum posts with identification numbers 18582, 19132, 23801, 18555 and 22624.

Explore and Model (Phase I: Sentiment Analysis)

The explore step involved tokenization and computation of descriptive statistics such as word count and top tokens in the forum posts. The initial dataframe after tokenization resulted into 372 words. After removing stopwords and specified words such as “census” and “school” the dataframe had 340 words. In order to create an appealing and informative visualization and frequency graph of the words, I selected the top 50 words for further analysis.

# allocating tokens to the forum posts 
tidy_mooc <- mooc_forum_3 %>% unnest_tokens(output = word, input = post_content) %>%
  relocate(word)

#removing stopwords and doing a count of common words 

tidy_mooc3 <- anti_join(tidy_mooc, stop_words,
                        by = "word") %>% count(word, sort = TRUE)

#removing customized words and saving in new dataframe 

my_stopwords <- c("census", "school")
tidy_mooc4 <-
  tidy_mooc3 %>%
  filter(!word %in% my_stopwords)

#Saving the new dataframe as csvfile 

final_mooc <- tidy_mooc4
write_csv(tidy_mooc4, here("Data", "final_mooc.csv"))

#Selecting tokens for creating frequency table and wordcloud

mooc_top_tokens <- final_mooc %>%
  
  top_n(50)

wordcloud2 (mooc_top_tokens)

From the wordcloud visualization, it can be interpreted that educators were mainly discussing about “students”, “data”, “questions” and they seemed to “agree” with the arguments that their peers were inferring about Census at School. In order to get a more detailed representation of data, I deployed a bar graph that shows the actual frequency values of each word.

# Frequent words from posts on Census at School
mooc_top_tokens %>% 
  filter(n > 4) %>% 
  mutate(word = reorder(word, n)) %>% 
    ggplot(aes(word, n)) + 
    geom_col() +
    coord_flip() +
    labs(x = "Word \n", y = "\n Count ", title = "Frequent Words on Census at School \n") +
    geom_text(aes(label = n), hjust = 1.2, colour = "white", fontface = "bold") +
    theme(plot.title = element_text(hjust = 0.5), 
        axis.title.x = element_text(face="bold", colour="darkblue", size = 12),
        axis.title.y = element_text(face="bold", colour="darkblue", size = 12))

The frequency graph provides further insight especially in highlighting words that can provide deeper meaning into how educators view Census at School. The words such as “overwhelming” can indicate that the nature and workload of projects that student work on at Census at School can be too much for their learning experience. The words “clean” and “cleaning” reveal the importance of cleaning the data for students prior to use. In order to validate this, I checked instances were these words were used, and the interpretations were within my expectations.

Random validated Instances

Word	discussion_poster	post_content
Agree	14135	I agree with all of the above. I love that the kids can discover the need to clean data without it just being me telling them.
Cleaning	13313	I agree that allowing students to practice cleaning up data is valuable. Census at School would definitely allow students to \stumble onto\” situations where data would need to be cleaned up a bit.. ”
Overwhelming	14141	Sounds like Census at Schools would be beneficial in the classroom. 40 questions seem a little much but there is nothing students like better than to be heard or to have ownership in the learning and it sounds like this does just that! The only thing is the data would be overwhelming however this would give students a true picture of what data in the real world looks like. It would open up for discussions on How to clean up data. Several statistical questions can be posed.. It would allow through its simulation to teach students to become statistically literate.

In order to conduct sentiment analysis on the discussion forum posts, I deploy the VADER algorithm and computed the mean compound score of the post content. I particularly selected this approach in order to get a collective sentiment value that will help in understanding the nature of educators perceptions regarding Census at School.

vader_mooc <- vader_df(mooc_forum_3$post_content)
##putting this function as a comment since it populates the page when run
## head(vader_mooc)

#Computing the mean compound score of the forum posts
mean(vader_mooc$compound)

## [1] 0.7236226

The computed mean compound score is 0.724 which substantially leans toward positive sentiment. The vader rules provides the parameters of -1 as being most negative and +1 as being most positive. In this regard, I can answer the second research question that the overall sentiments of educators were supportive and positive towards Census at School program.

Explore and Model (Phase II: Social Network Analysis)

Based on the nature of the dataset, I thought it would be interesting to apply social network analysis approaches to investigate and understand learner interaction in the discussion forums pertaining Census at School topics.

I started the process by creating an edgelist and this involved performing preliminary wrangling through renaming variables and removing entries in the target that contained “NULL”.

#used the mooc_forum_3 dataframe that contains the original wrangled dataset
head(mooc_forum_3)

## # A tibble: 6 × 8
##   post_content          discussion_id forum_id discussion_crea… discussion_post…
##   <chr>                 <chr>         <chr>    <chr>            <chr>           
## 1 If my students were … 18582         866      14612            14612           
## 2 I agree with you. Cl… 18582         866      14612            14730           
## 3 I agree that using l… 18582         866      14612            14611           
## 4 I agree this is more… 18582         866      14612            14132           
## 5 I agree with you on … 18582         866      14612            14232           
## 6 I agree with startin… 18582         866      14612            17316           
## # … with 3 more variables: discussion_reference <chr>, post_title <chr>,
## #   course_id <chr>

In the dataset, there are two variables that qualify as “senders” and “receivers” , these are ids for discussion forum posters and discussion forum replies (as references) .

# renaming discussion poster to sender and discussion_reference to receiver
ties_1 <-  mooc_forum_3 %>%
  relocate(sender = discussion_poster, 
           target = discussion_reference) %>% 
  select(sender,
         target,
         post_content)%>%
  filter(!target %in% "NULL")

ties_1

## # A tibble: 48 × 3
##    sender target post_content                                                   
##    <chr>  <chr>  <chr>                                                          
##  1 14730  14612  I agree with you. Cleaning up data is an important skill to ma…
##  2 14611  14612  I agree that using large sets of messy data would be confusing…
##  3 14132  14612  I agree this is more for high school student setting  but I al…
##  4 14232  14612  I agree with you on how this could be confusing for students. …
##  5 17316  14612  I agree with starting with smaller messy data so the students …
##  6 14572  14612  I think it would be to complicated for the culture of my stude…
##  7 14612  14730  Thanks so much for that insight which is very helpful.         
##  8 14715  14730  Thank you for sharing this resource. It looks like a great (fr…
##  9 13639  14611  I'm also in agreement. The data set is too large for elementar…
## 10 13313  14611  I think it would be great at the high school level to have stu…
## # … with 38 more rows

The data frame ties_1 illustrates the receivers and senders that are available in the forum.

ties_2 <- ties_1 %>%
  unnest_tokens(input = target,
                output = receiver,
                to_lower = FALSE) %>%
  relocate(sender, receiver)

ties <- ties_2 %>% drop_na(receiver)
ties

## # A tibble: 48 × 3
##    sender receiver post_content                                                 
##    <chr>  <chr>    <chr>                                                        
##  1 14730  14612    I agree with you. Cleaning up data is an important skill to …
##  2 14611  14612    I agree that using large sets of messy data would be confusi…
##  3 14132  14612    I agree this is more for high school student setting  but I …
##  4 14232  14612    I agree with you on how this could be confusing for students…
##  5 17316  14612    I agree with starting with smaller messy data so the student…
##  6 14572  14612    I think it would be to complicated for the culture of my stu…
##  7 14612  14730    Thanks so much for that insight which is very helpful.       
##  8 14715  14730    Thank you for sharing this resource. It looks like a great (…
##  9 13639  14611    I'm also in agreement. The data set is too large for element…
## 10 13313  14611    I think it would be great at the high school level to have s…
## # … with 38 more rows

Removing blank entries and formulating a list of distinct actors

#transforming to one list of senders and receivers 
actors_1 <- ties %>%
  select(sender, receiver) %>%
  pivot_longer(cols = c(sender,receiver))

actors_1

## # A tibble: 96 × 2
##    name     value
##    <chr>    <chr>
##  1 sender   14730
##  2 receiver 14612
##  3 sender   14611
##  4 receiver 14612
##  5 sender   14132
##  6 receiver 14612
##  7 sender   14232
##  8 receiver 14612
##  9 sender   17316
## 10 receiver 14612
## # … with 86 more rows

#creating a list of actors 
mooc_actors <- actors_1 %>%
  select(value) %>%
  rename(mooc_actors = value) %>% 
  distinct()
mooc_actors

## # A tibble: 34 × 1
##    mooc_actors
##    <chr>      
##  1 14730      
##  2 14612      
##  3 14611      
##  4 14132      
##  5 14232      
##  6 17316      
##  7 14572      
##  8 14715      
##  9 13639      
## 10 13313      
## # … with 24 more rows

mooc_network_1 <- tbl_graph(edges = ties, 
                            nodes = mooc_actors)

mooc_network_1

## # A tbl_graph: 34 nodes and 48 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 34 × 1 (active)
##   mooc_actors
##   <chr>      
## 1 14730      
## 2 14612      
## 3 14611      
## 4 14132      
## 5 14232      
## 6 17316      
## # … with 28 more rows
## #
## # Edge Data: 48 × 3
##    from    to post_content                                                      
##   <int> <int> <chr>                                                             
## 1     1     2 I agree with you. Cleaning up data is an important skill to maste…
## 2     3     2 I agree that using large sets of messy data would be confusing to…
## 3     4     2 I agree this is more for high school student setting  but I also …
## # … with 45 more rows

The output shows us that in the discussion forum there are 35 active actors in the MOOC, which in this case they are the educators and perhaps a facilitator. In the next step I explored centrality measures and performed calculations that would summarize centrality degrees of the forum.

mooc_network <- mooc_network_1 %>%
  activate(nodes) %>%
  mutate(degree = centrality_degree(mode = "all")) %>%
  mutate(in_degree = centrality_degree(mode = "in"))%>%
  mutate(out_degree = centrality_degree(mode = "out"))

mooc_network

## # A tbl_graph: 34 nodes and 48 edges
## #
## # A directed multigraph with 1 component
## #
## # Node Data: 34 × 4 (active)
##   mooc_actors degree in_degree out_degree
##   <chr>        <dbl>     <dbl>      <dbl>
## 1 14730            5         3          2
## 2 14612           12        10          2
## 3 14611            5         3          2
## 4 14132            3         1          2
## 5 14232            1         0          1
## 6 17316            1         0          1
## # … with 28 more rows
## #
## # Edge Data: 48 × 3
##    from    to post_content                                                      
##   <int> <int> <chr>                                                             
## 1     1     2 I agree with you. Cleaning up data is an important skill to maste…
## 2     3     2 I agree that using large sets of messy data would be confusing to…
## 3     4     2 I agree this is more for high school student setting  but I also …
## # … with 45 more rows

The results indicate that the forum is directed with 34 nodes (actors) and 48 edges that represent connections and interactions between the actors. To be more specific, I activated nodes and coded a summary of the measure to identify the specific actors.

node_measures <- mooc_network %>% 
  activate(nodes) %>%
  data.frame()

summary(node_measures)

##  mooc_actors            degree         in_degree        out_degree   
##  Length:34          Min.   : 1.000   Min.   : 0.000   Min.   :0.000  
##  Class :character   1st Qu.: 1.000   1st Qu.: 0.000   1st Qu.:1.000  
##  Mode  :character   Median : 2.000   Median : 0.000   Median :1.000  
##                     Mean   : 2.824   Mean   : 1.412   Mean   :1.412  
##                     3rd Qu.: 3.000   3rd Qu.: 1.000   3rd Qu.:2.000  
##                     Max.   :12.000   Max.   :11.000   Max.   :3.000

node_measures

##    mooc_actors degree in_degree out_degree
## 1        14730      5         3          2
## 2        14612     12        10          2
## 3        14611      5         3          2
## 4        14132      3         1          2
## 5        14232      1         0          1
## 6        17316      1         0          1
## 7        14572      2         0          2
## 8        14715      3         0          3
## 9        13639      7         6          1
## 10       13313      3         1          2
## 11       17384      3         1          2
## 12         296      1         0          1
## 13       14205      2         1          1
## 14       14141     11        11          0
## 15       17312      1         0          1
## 16       17657      2         1          1
## 17       17318      2         0          2
## 18       15620      5         3          2
## 19       15990      3         1          2
## 20       17641      1         0          1
## 21       17315      1         0          1
## 22       15475      1         0          1
## 23       16600      5         2          3
## 24       17317      2         0          2
## 25       17840      1         0          1
## 26       14163      1         0          1
## 27       17310      1         0          1
## 28       14627      1         0          1
## 29       14153      1         0          1
## 30       17309      1         0          1
## 31       17308      1         0          1
## 32       13592      1         0          1
## 33       14135      2         1          1
## 34       14217      4         3          1

From the summary, it can be observed that actors with the highest numbers of degree values are 14612, 13639, 14611,14141, 14611,16600 and 14730. Out of these, I can identify actors with the most replied posts as 14612,13639 and 14141. Based on the numbers of their indegrees, I presume that some of these actors could potentially be instructors and I have to admit that more context in this aspect is required. To analyse this learning community further, I attempt to model a sociogram that can help in visualizing the interactions and connections between actors. In this way, we can also be to visualize the subgroups.

ggraph(mooc_network, layout = "fr") + 
  geom_node_point(aes(size = out_degree,
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = mooc_actors,
                     size = out_degree/2,
                     color = out_degree),
                 repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(2, 'mm')), 
                 end_cap = circle(5, 'mm'),
                 alpha = .2) + 
  theme_graph()

I also used the network graph with geom_node_voronoi function just to improve on the first one and to see if the visualization will be clearer.

mooc_network_groups <- mooc_network %>%
  activate(nodes) %>%
  mutate(group = group_components())

mooc_network_groups %>%
  ggraph(layout = "fr") + 
  geom_node_point(aes(size = out_degree, 
                      color = out_degree),
                  show.legend = FALSE) +
  geom_node_text(aes(label = mooc_actors, 
                     color = out_degree,
                     size = out_degree), 
                     repel=TRUE,
                 show.legend = FALSE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) + 
  theme_graph() + 
  geom_node_voronoi(aes(fill = factor(group),
                         alpha = .05), 
                    max.radius = .5,
                    show.legend = FALSE)

Communicate

Discussions

The performed analysis provided baseline information that has helped in revealing useful insights on how educators perceive Census at School program and the level of interaction that was behind those conversations. In order to effectively structure the discussion, I will use the research questions to guide aligned presentation of the findings.

What are the most frequently used words by educators regarding Census at School program?

From the word counts and wordscloud, it can be observed that educators were discussing issues pertaining to their students and how they learn about data in the Census at School program. The top five words that were more frequent in the discussions were “students”,“data”, “agree”, “questions” and “clean”. During verification of the instances, the word “agree” indicated that the educators’ perceptions and contributions were in unison and they were validating each others thoughts and experiences. Additionally, since this was a small dataset, other words such as “clean” and “cleaning” can collectively send a message that dealing with Census at School questions and activities is linked to cleaning of data . Further more words such as “overwhelming” can be interpreted that some educators thought the activities were overwhelming to the students.

What are educators’ sentiments regarding Census at School program?

From the modelling activity that involved computation of VADER compound score, it has been revealed that the sentiments of educators in the forum leaned toward positive. The compound score of 0. 724 sits on the positive edge given that the rule acknowledges +1 being the most positive. This was within my expectation as when I crosschecked the instances, the majority of the posts were positive and supportive of using Census to School for teaching and learning about data.

Who are the active participants that engaged in Census at School discussions in the forum and what is their interaction pattern?

The centrality measure identified actors with the highest numbers of degree values as 14612, 13639, 14611,14141, 14611,16600 and 14730. Out of these, I can identify actors with the most replied posts as 14612,13639 and 14141. Based on the numbers of their indegrees, I presume that some of these actors could potentially be instructors and I have to admit that more context in this aspect is required. The network is directed and from the graph it is visible that the forum posts were distributed. There is a subgroup that is made up mostly of replies (that is about 11 actors replied to that discussion forum thread). The other subgroup with actors 14612 and 14730 demonstrates mutual engagement.

Limitations

The main limitation is on the scope of the study due to the time frame and available resources. As it has been observed, the sample was based on one unit of the course and the observations were filtered to include the precise topic of interest. This also implies that the findings can not be generalized to broader contexts.

Furthermore, since this course is my initial experience with using SA and SNA algorithms, I have to acknowledge that the approaches and algorithms used are the ones learned in this class. Perhaps the use of other approaches would yield better results and informative findings.

The project stands as a pilot study that can be developed to a full study in the future. Furthermore, while searching for relevant literature, I have realized that there is still a gap in the number of studies that have focused on text mining in MOOC forums.

Implications

As much as the findings are limited, they can still provide foundational information for policy, research and practice. These opinions of educators reveal the positivity toward Census for Schools, and it can be value-adding information for peer educators and policy makers who are trying to make decisions of integrating the programs into their schools and classrooms. Census for Schools as an organization can also benefit from these findings, as they will know what elements of the program need re-design and improvements. For example, words such as “clean data” “messy” and “overwhelming” can send a message on the type of activities they design for students.

Ethical Considerations

The dataset was intentionally provided by the instructor, Dr. Kellogg to be used for the purpose of the class, so I that gave me consent to use it for my analysis. However, demonstrating ethical conduct as a researcher is fundamental for integrity and trustworthiness. I have therefore treated the data with confidentiality, especially in instances where identity and names of the users were exposed. This report will be used for the scope of this class and I do not intend to share (at least for now) these findings publicly.

Further Research

There are ample ways of expanding on this study. Perhaps the topic could not be Census for School per se, but there are other topics that can be useful to be studied in the MOOC forums. Discussions happening in professional development programs can be mined to inform stakeholders on various issues pertaining to practice. These sentiments could as well be tracked over time , on the context of the course and beyond.

Furthermore, the other potential expansion is to conduct the same study but instead of gleaning data from the MOOC_Ed; this time around I could retrieve Twitter or Facebook data with hash tags on the topic. I am actually planning to experiment with this and I will use these findings as the pilot study.

Conclusion

The use of sentiment analysis and social network analysis can be used to study opinions and learning patterns in MOOCs. This study provided insights on how wordclouds and lexicon based algorithms such as Vader can be used to assess educators’ opinions and interactions regarding Census for School discussions. Overall educators were positive toward the program and active actors that were engaged in the discussion were identified. As MOOCs are becoming more prevalent, this calls for the need of more research that is set to investigate how text mining techniques can be applied to assess participants’ opinions, behaviors and patterns and design appropriate interventions.

References

Krumm, A., Means, B., & Bienkowski, M. (2018). Learning analytics goes to school: A collaborative approach to improving education. Routledge.

Moreno-Marcos, P. M., Alario-Hoyos, C., Muñoz-Merino, P. J., Estévez-Ayres, I., & Kloos, C. D. (2018, April). Sentiment analysis in MOOCs: A case study. In 2018 IEEE Global Engineering Education Conference (EDUCON) (pp. 1489-1496). IEEE.

ONAN, A. (2021). Sentiment analysis on massive open online course evaluations: a text mining and deep learning approach. Computer Applications in Engineering Education, 29(3), 572-589.

Yan, X., Li, G., Li, Q., Chen, J., Chen, W., & Xia, F. (2021, October). sentiment analysis on massive open online course evaluation. In 2021 International Conference on Neuromorphic Computing (ICNC) (pp. 245-249). IEEE.

Wen, M., Yang, D., & Rose, C. (2014, July). Sentiment Analysis in MOOC Discussion Forums: What does it tell us?. In Educational data mining 2014.