This project will go through the following steps to conduct topic modeling on the Friday Institute for Educational Innovation at North Carolina State University’s Massively Open Online Courses for Educators (MOOC-Ed) and Online Professional Learning programs.
tidytext package but are also
introduced to the the stm package. This package makes use
of tm text mining package to preprocess text and will also
be our first introduction to word stemming.topicmodels and stm packages, including the
findThoughts function for viewing documents assigned to a
given topic and the toLDAvis function for exploring topic
and word distributions.Data Source & Analysis
All peer interaction, including peer discussion, take place within discussion forums of MOOC-Eds, which are hosted using the Moodle Learning Management System. To build the dataset, the research team wrote a query for Moodle’s MySQL database, which records participants’ user-logs of activity in the online forums. This sql query combines separate database tables containing postings and comments including participant IDs, timestamps, discussion text and other attributes or “metadattsa.”
Summary of Key Findings
The following highlight some key findings related to the discussion forums in the papers cited above:
What are the similarities and differences between how PLT members and Non-PLT online participants engage and meet course goals in a MOOC-Ed designed for educators in secondary and collegiate settings?
What ideas or issues emerged in the discussion forums this past week?
How do we to quantify what a document or collection of documents is about?
The following packages were loaded:
library(tidyverse)
library(tidytext)
library(SnowballC)
library(topicmodels)
library(stm)
library(ldatuning)
library(knitr)
library(LDAvis)
Topic Modeling, an unsupervised learning approach to automatically identify topics in a collection of documents was conducted.
forums_stm and
forums_lda models contains the terms “teach”, “students”,
“statistics”. This could be an “overarching theme” but more likely may
simply be just the residue of the course title though being sprinkled
throughout the forums and deserves some follow up. Topics 8 from the LDA
model may overlap with this topic as well.lda and stm models
respectively, seem to potentially be about the usefulness of course
“resources” like lessons, tools, videos, and activities. I’m wagering
this might be a forum dedicated to course feedback. Topic 15 from the
STM model also suggest this may be a broader theme.To serve as a check on my tea leaf reading, I’m going to follow
Bail’s recommendation to examine some of these topics qualitatively. The
stm package has another useful function though
exceptionally fussy function called findThoughts which
extracts passages from documents within the corpus associate with topics
that you specify.