library(tm) # For text mining
Loading required package: NLP
library(wordcloud) # For word cloud visualization
Loading required package: RColorBrewer
library(RColorBrewer) # For color palettes
CUED 7540: Learning Analytics IV
In this final module, we’ll explore Wordclouds, Social Network Analysis (SNA), and Heatmaps—three advanced visualization techniques. These tools are highly customizable, and today, we’ll focus on the foundational steps to get you started.
By the end of this module, you’ll be able to create these visualizations and uncover patterns in your data. Remember, all the codes are provided, but I encourage you to modify them and try using your own data for a deeper understanding.
We’ll begin by creating a word cloud from student text data, such as survey responses or discussion board posts. A word cloud visually represents the frequency of words in a text, with more frequent words appearing larger.
library(tm) # For text mining
Loading required package: NLP
library(wordcloud) # For word cloud visualization
Loading required package: RColorBrewer
library(RColorBrewer) # For color palettes
The data for a word cloud is typically a plain text file. We’ll use a text file from the Handbook of Learning Analytics (2022) for this example. Handbook of Learning Analtics (2022). This book was converted to a txt file for our analytics. Check our data folder.
# Load the data
<- read.delim("data/Handbook of LA.txt", header = FALSE, stringsAsFactors = FALSE)
la
# Create a text corpus, which is a collection of text documents
<- Corpus(VectorSource(la))
doc
head(doc)
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 1
Note: The read.delim()
function is used to read a plain text file. The header=FALSE
and stringsAsFactors = FALSE
arguments ensure the file is read as a single block of text rather than a table.
Before creating the word cloud, we need to “clean” the text by removing common words, punctuation, and numbers. This ensures our word cloud is meaningful.
# Define a function to replace specific patterns with a space
<- content_transformer(function (x, pattern) gsub(pattern, " ", x))
toSpace
# Apply text transformations
<- tm_map(doc, toSpace, "/") doc
Warning in tm_map.SimpleCorpus(doc, toSpace, "/"): transformation drops
documents
<- tm_map(doc, toSpace, "@") doc
Warning in tm_map.SimpleCorpus(doc, toSpace, "@"): transformation drops
documents
<- tm_map(doc, toSpace, "\\|") doc
Warning in tm_map.SimpleCorpus(doc, toSpace, "\\|"): transformation drops
documents
<- tm_map(doc, content_transformer(tolower)) # Convert to lowercase doc
Warning in tm_map.SimpleCorpus(doc, content_transformer(tolower)):
transformation drops documents
<- tm_map(doc, removeWords, c(stopwords("english"), "https","can","doi")) # Remove common stopwords doc
Warning in tm_map.SimpleCorpus(doc, removeWords, c(stopwords("english"), :
transformation drops documents
<- tm_map(doc, removeNumbers) # Remove numbers doc
Warning in tm_map.SimpleCorpus(doc, removeNumbers): transformation drops
documents
<- tm_map(doc, removePunctuation) # Remove punctuation doc
Warning in tm_map.SimpleCorpus(doc, removePunctuation): transformation drops
documents
<- tm_map(doc, stripWhitespace) # Remove extra whitespace doc
Warning in tm_map.SimpleCorpus(doc, stripWhitespace): transformation drops
documents
# You can add more words to remove by updating the stopwords list.
Note: You might get a lengthy list of warning messages when you run the code above. As long as it’s not an error message, we can continue.
Now, we’ll create the word cloud by first counting the frequency of each word.
# Create a term-document matrix to count word frequencies
<- TermDocumentMatrix(doc)
dtm
# Convert the matrix to a data frame of word frequencies
<- as.matrix(dtm)
m <- sort(rowSums(m), decreasing = TRUE)
v <- data.frame(word = names(v), freq = v)
d
# Display the top 10 words
head(d, 10)
word freq
learning learning 3037
analytics analytics 1443
data data 1112
– – 1084
url url 638
education education 466
educational educational 441
research research 417
knowledge knowledge 360
analysis analysis 358
# Create the word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 2,
max.words = 150, random.order = FALSE, rot.per = 0.35,
colors = brewer.pal(8, "Dark2"))
Task: Try modifying the max.words or min.freq parameters to see how the word cloud changes.
Question: How might customizing the word cloud (e.g., colors, word frequency) enhance the insights you gain from text data?
A heatmap visualizes data in a grid where values are represented by colors. We’ll use a dataset of student assignment scores to create a heatmap that shows student progress.
We’ll use ggplot2
for plotting and reshape2
to transform the data into the correct format for the heatmap. We’ll use a dataset of student assignment scores
to create a heatmap.
#We need various libraries this time.
library(ggplot2)
Attaching package: 'ggplot2'
The following object is masked from 'package:NLP':
annotate
library(reshape2)
# Load the dataset
<- read.csv("data/student_assignment_scores.csv")
data_hm
# Inspect your data
head(data_hm)
Student_ID Assignment_1 Assignment_2 Assignment_3 Assignment_4 Assignment_5
1 Student_1 98 77 52 98 54
2 Student_2 86 89 70 67 58
3 Student_3 50 52 51 54 65
4 Student_4 74 82 87 95 75
5 Student_5 59 91 59 89 92
6 Student_6 85 73 89 89 91
Assignment_6 Assignment_7 Assignment_8 Assignment_9 Assignment_10
1 87 54 80 82 79
2 80 84 57 87 54
3 55 63 65 92 71
4 69 63 52 90 69
5 64 77 86 71 99
6 83 82 62 75 90
# Reshape the data from "wide" to "long" format for ggplot
<- melt(data_hm, id.vars = "Student_ID", variable.name = "Assignment", value.name = "Score") data_long
# Generate the heatmap to visualize student progress across assignments
ggplot(data_long, aes(x = Assignment, y = Student_ID, fill = Score)) +
geom_tile() +
scale_fill_gradient(low = "green", high = "blue") +
labs(title = "Heatmap of Student Progress Across Assignments", x = "Assignment", y = "Student ID") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Task: Try modifying the color gradient by changing the low
and high
colors (e.g., low = "green", high = "blue"
) or updating the axis labels. How does this change the interpretation of the data? - [Modifying the color gradient and axis labels of a data visualization can notably alter how the data is viewed and interpreted. Even without changing the data itself, different choices in color and labeling can change the visual picture, highlight different patterns, or even misrepresent the information.]
Task: How can these advanced visualization techniques enhance your understanding and decision-making processes as an educator, instructional designer, or policymaker? Consider specific scenarios where these tools could provide deeper insights or drive more informed decisions.
Task: What challenges might you face in implementing these visualizations in real-world educational settings? How can you overcome these challenges to effectively utilize these tools?
Task: Reflect on how these techniques could evolve in your field. What future possibilities do you see for the use of advanced visualizations in learning analytics?
Congratulations, you’ve completed the final module!
To receive full score, you will need to render this document and publish via a method such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with me and I have reviewed your work, you will be officially done with the current module.
Complete the following steps to submit your work for review by:
First, change the name of the author:
in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Next, click the “Render” button in the toolbar above to “render” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let me know if you run into any issues with rendering.
Finally, publish. To do publish, follow the step from the link
If you have any questions about this module, or run into any technical issues, don’t hesitate to contact me.
Once I have checked your link, you will be notified!