# Text mining packages
# install.packages(c("tm", "wordcloud")) if you get errors here
library(tm)
library(wordcloud)Advanced Visualization Techniques
Learning Analytics — Week 4 (Required)
Learning objectives
By the end of this file, you will be able to:
- Create a word cloud from text data and interpret frequency patterns
- Build a heatmap to reveal patterns across two dimensions
- Visualize a social network of student interactions using SNA
- Reflect on when each visualization type is the most appropriate choice
When to use which advanced visualization
| Visualization | Best for | Typical data source |
|---|---|---|
| Word cloud | Frequency of words in text | Survey responses, discussion posts, feedback |
| Heatmap | Patterns across two categories | Scores by subject + week, engagement by module + cohort |
| SNA plot | Relationships between individuals | Forum reply data, collaboration records |
Before you run any code, read through the whole file once. Notice which dataset each part uses and make sure the file is in your data folder.
Part 1 · Word cloud
A word cloud visualizes how often words appear in a text — larger words appear more frequently. We will use the Handbook of Learning Analytics (2022), which has been converted to a plain text file.
Packages for this section
Load the text data
# read.delim() reads a plain text file
# header = FALSE and stringsAsFactors = FALSE read it as raw text
la_text <- read.delim("data/Handbook of LA.txt",
header = FALSE,
stringsAsFactors = FALSE)
# Create a text corpus — a collection of text documents the tm package can process
doc <- Corpus(VectorSource(la_text))
head(doc)<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 1
Clean the text
Before generating the word cloud, we remove common words, punctuation, numbers, and other noise. This is called text preprocessing.
# Helper function: replace specific characters with a space
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
doc <- doc |>
(\(d) {
d <- tm_map(d, toSpace, "/")
d <- tm_map(d, toSpace, "@")
d <- tm_map(d, toSpace, "\\|")
d <- tm_map(d, content_transformer(tolower))
d <- tm_map(d, removeWords,
c(stopwords("english"), "https", "can", "doi", "also", "use"))
d <- tm_map(d, removeNumbers)
d <- tm_map(d, removePunctuation)
d <- tm_map(d, stripWhitespace)
d
})()You may see warning messages when running the preprocessing chunk. These are expected from the tm package — as long as you see no error messages, you can continue.
Generate the word cloud
# Build a term-document matrix — a table of word frequencies
dtm <- TermDocumentMatrix(doc)
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing = TRUE)
word_freq <- data.frame(word = names(v), freq = v)
# Top 10 most frequent words
head(word_freq, 10)Word cloud from the Handbook of Learning Analytics (2022)
# Generate the word cloud
set.seed(1234)
wordcloud(
words = word_freq$word,
freq = word_freq$freq,
min.freq = 5,
max.words = 100,
random.order = FALSE,
rot.per = 0.35,
colors = brewer.pal(8, "Dark2")
)Task 1 — Parameter exploration: Try changing max.words to 50 or 150 and min.freq to 10. How does the word cloud change? Which version is more readable?
- [I changed min-freq to 10 & max.words is 50. The words are cut off, such as analytics is now “analy”. The initial one is more readable. as no words are cut off.]
Task 2 — Add your own stopwords: Look at the word cloud above. Find 2–3 words that appear large but feel too generic to be meaningful — words that are frequent only because they appear in every chapter, not because they signal a specific theme.
Add those words to the removeWords() line in the text-preprocessing chunk above, then re-run that chunk and the wordcloud-generate chunk. The line to edit looks like this:
d <- tm_map(d, removeWords,
c(stopwords("english"), "https", "can", "doi", "also", "use"))Add your words inside the c(...) vector, then re-run both chunks.
Reflection: Which words did you remove? Why did you choose them? Does the revised cloud reveal any themes the original missed?
- [I removed the words “URL,” “text,” and “com” because they appeared frequently in the dataset but were not meaningful in terms of the actual content of the learning analytics handbook. These words likely came from formatting, citations, or web references rather than representing key ideas in the field. After removing these additional stopwords, the word cloud became more focused on academic and conceptual terms instead of technical or structural noise. This helped make the main themes of learning analytics easier to identify and interpret, such as learning, analytics, and general educational research terms.]
Question: Looking at the most frequent words, what does this tell you about the themes the learning analytics field focuses on? Does anything surprise you?
- [The most frequent words show that learning analytics focuses on using data to understand and improve student learning. Common themes include education, research, and analytics, which suggests the field is very data-driven and focused on measuring learning outcomes. What stands out is how often technical terms like “research” and “analytics” appear compared to more human-centered ideas like teaching or relationships, showing the field leans more toward measurement than instruction.]
Part 2 · Heatmap
A heatmap uses color intensity to show values across two categorical dimensions — useful for spotting patterns that would be invisible in a table or bar chart.
Package for this section
# reshape2 gives us melt() for converting wide to long format
# install.packages("reshape2") if needed
library(reshape2)Load the data
data_hm <- read.csv("data/student_assignment_scores.csv")
head(data_hm)glimpse(data_hm)Rows: 30
Columns: 11
$ Student_ID <chr> "Student_1", "Student_2", "Student_3", "Student_4", "Stu…
$ Assignment_1 <int> 98, 86, 50, 74, 59, 85, 67, 98, 96, 73, 56, 85, 74, 86, …
$ Assignment_2 <int> 77, 89, 52, 82, 91, 73, 79, 92, 64, 71, 57, 85, 53, 71, …
$ Assignment_3 <int> 52, 70, 51, 87, 59, 89, 54, 82, 98, 88, 85, 94, 91, 58, …
$ Assignment_4 <int> 98, 67, 54, 95, 89, 89, 70, 85, 85, 88, 91, 67, 76, 62, …
$ Assignment_5 <int> 54, 58, 65, 75, 92, 91, 91, 82, 78, 74, 100, 81, 66, 95,…
$ Assignment_6 <int> 87, 80, 55, 69, 64, 83, 91, 72, 61, 75, 90, 50, 51, 73, …
$ Assignment_7 <int> 54, 84, 63, 63, 77, 82, 86, 84, 75, 80, 94, 92, 81, 66, …
$ Assignment_8 <int> 80, 57, 65, 52, 86, 62, 61, 50, 79, 100, 80, 86, 96, 55,…
$ Assignment_9 <int> 82, 87, 92, 90, 71, 75, 69, 94, 86, 92, 64, 70, 83, 74, …
$ Assignment_10 <int> 79, 54, 71, 69, 99, 90, 60, 77, 87, 91, 58, 72, 75, 100,…
Reshape and plot
# melt() converts wide format (one column per assignment) to long format
# (one row per student-assignment combination) — required for ggplot heatmaps
data_hm_long <- melt(data_hm,
id.vars = "Student_ID",
variable.name = "Assignment",
value.name = "Score")
ggplot(data_hm_long,
aes(x = Assignment, y = Student_ID, fill = Score)) +
geom_tile(color = "white", linewidth = 0.3) +
scale_fill_gradient(low = "#F0FAF6", high = "#0F6E56") +
labs(
title = "Heatmap of Student Progress Across Assignments",
x = "Assignment",
y = "Student ID",
fill = "Score"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(size = 8)
)Task: Copy the ggplot() call from the heatmap-plot chunk above into the blank chunk below. Make two changes:
- Change the
lowandhighcolors inscale_fill_gradient()— for example:low = "white", high = "#993C1D"(red gradient). - Update the
xandylabels inlabs()to something more descriptive than"Assignment"and"Student ID".
Copy the entire ggplot(data_hm_long, ...) block from above. You only need to change two things: scale_fill_gradient() and labs(). Everything else stays the same.
# YOUR CODE HERE — copy the ggplot() call above and apply your two changes
data_hm_long <- melt(data_hm,
id.vars = "Student_ID",
variable.name = "Assignment",
value.name = "Score")
ggplot(data_hm_long,
aes(x = Assignment, y = Student_ID, fill = Score)) +
geom_tile(color = "white", linewidth = 0.3) +
scale_fill_gradient(low = "red", high = "green") +
labs(
title = "Heatmap of Student Progress Across Assignments",
x = "Assignment from Week #",
y = "Student Number in CUED7540",
fill = "Score"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
axis.text.y = element_text(size = 8)
)Reflection: How does the color change affect how you interpret the data? Does the red gradient feel different from the green one — and why might that matter when presenting to a principal or department chair?
- [The color change affects how I interpret the data because the green to red feels more negative or urgent, while white to green feels more positive and easier to read. Even with the same information, color changes how the data is emotionally interpreted. This matters when presenting to a principal or department chair because color can influence first impressions. The same data can look more concerning containing red or more encouraging in only green, so color choice can shape how the message is received.]
Question: Looking at the heatmap, identify one student and one assignment that stand out. What would you do as an instructor based on what you see?
- [Student 8 stands out because their performance is generally consistent across most assignments, except for Assignment 8 where there is a noticeable drop compared to their usual scores. This suggests the issue is likely specific to that skill or concept rather than a general learning difficulty. As an instructor, I would look closely at Assignment 8 to identify what made it challenging like whether it was a new concept, unclear instructions, or a gap in prior knowledge. I would then reteach or review that specific skill and give Student 8 (and possibly the whole class because the colors are closer to. red for Assignment 8) an opportunity for practice and reassessment.]
Part 4 · Visualization choice reflection
You have now used five visualization types: scatter plot, bar plot, line plot, histogram, word cloud, heatmap, and SNA plot (this week).
Question 1: For each of the three advanced techniques, describe one specific scenario from your track (K–12 or ID/higher ed) where it would be the most useful choice:
- Word cloud: [A word cloud would be useful in an ESL classroom to quickly show which vocabulary words students are using most often in writing samples or discussions. It helps identify language patterns and overused or underdeveloped academic vocabulary. It could also pinpoint grammatical errors like ‘runned’ instead of ‘ran’.]
- Heatmap: [A heatmap would be especially useful for analyzing ELPA scores across domains for my ESL students in grades K–4. It would help me quickly identify which students need support in specific language areas (like writing or speaking) and guide small-group instruction and intervention planning.]
- SNA plot: [An SNA plot would be useful to analyze classroom collaboration patterns, showing which students frequently interact during group work and which students may be more isolated or less engaged socially. This would benefit building Speaking scores in preparation for ELPA21 testing.]
Question 2: What challenges might you face implementing these visualizations in a real school or institutional setting? Think about data availability, privacy, and stakeholder interpretation.
- [In a real school setting, one challenge is inconsistent or incomplete data collection, especially across multiple teachers or assessment systems. Privacy is also a major concern, particularly when visualizing individual student performance in small ESL groups where students may be easily identifiable. In addition, some stakeholders may not have strong data literacy skills, which can lead to misinterpretation of visualizations or oversimplified conclusions about student language ability.]
Question 3: How could these techniques evolve in your field? What would be possible if these tools were integrated into an LMS dashboard that teachers or designers could access in real time?
- [If these visualization tools were embedded into an LMS dashboard, ESL teachers could use real-time data to make more responsive instructional decisions. For example, teachers could quickly identify gaps in language domains and adjust small-group instruction, scaffolding, or vocabulary support immediately. Instructional designers could also use this data to refine curriculum supports for multilingual learners based on ongoing performance trends rather than end-of-unit assessments. This would allow for more personalized, equitable, and timely support for English learners.]
Render & submit
Step 1 — Add your name
Change the author: field in the YAML header at the top to your name.
Step 2 — Render
Click Render in the toolbar. This file uses several packages (tm, wordcloud, igraph) that produce warnings during preprocessing — that is expected. As long as the final HTML page appears, the render was successful. If you see a true error (red text that stops the render), check that all packages are installed:
install.packages(c("tm", "wordcloud", "RColorBrewer", "reshape2", "igraph"))Step 3 — Publish
| Option | Best for | Link |
|---|---|---|
| Posit Cloud | Quickest — one click from your workspace | Guide |
| RPubs | Free, public, easy to share a link | rpubs.com |
| Quarto Pub | Clean public portfolio pages | Guide |
| GitHub Pages | Best for a professional portfolio | Guide |
This is the most visually impressive of the four files — word clouds, heatmaps, and network graphs are immediately recognizable as advanced data work to anyone reviewing your portfolio. If you are sharing one document from this course with a hiring committee or graduate school application, this is the one to lead with. Pair it with your capstone analysis for the full picture.
Share your published link with your instructor once you have rendered and published. Post in the course discussion board if you run into any technical issues.