Advanced Visualization Techniques

Learning Analytics — Week 4 (Required)

Author

Coy Johnson

Published

June 19, 2026


Learning objectives

By the end of this file, you will be able to:

  • Create a word cloud from text data and interpret frequency patterns
  • Build a heatmap to reveal patterns across two dimensions
  • Visualize a social network of student interactions using SNA
  • Reflect on when each visualization type is the most appropriate choice

When to use which advanced visualization

Visualization Best for Typical data source
Word cloud Frequency of words in text Survey responses, discussion posts, feedback
Heatmap Patterns across two categories Scores by subject + week, engagement by module + cohort
SNA plot Relationships between individuals Forum reply data, collaboration records

Before you run any code, read through the whole file once. Notice which dataset each part uses and make sure the file is in your data folder.


Part 1 · Word cloud

A word cloud visualizes how often words appear in a text — larger words appear more frequently. We will use the Handbook of Learning Analytics (2022), which has been converted to a plain text file.

Packages for this section

# Text mining packages
# install.packages(c("tm", "wordcloud")) if you get errors here
library(tm)
library(wordcloud)

Load the text data

# read.delim() reads a plain text file
# header = FALSE and stringsAsFactors = FALSE read it as raw text
la_text <- read.delim("data/Handbook of LA.txt",
                      header = FALSE,
                      stringsAsFactors = FALSE)

# Create a text corpus — a collection of text documents the tm package can process
doc <- Corpus(VectorSource(la_text))

head(doc)
<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 1

Clean the text

Before generating the word cloud, we remove common words, punctuation, numbers, and other noise. This is called text preprocessing.

# Helper function: replace specific characters with a space
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))

doc <- doc |>
  (\(d) {
    d <- tm_map(d, toSpace, "/")
    d <- tm_map(d, toSpace, "@")
    d <- tm_map(d, toSpace, "\\|")
    d <- tm_map(d, content_transformer(tolower))
    d <- tm_map(d, removeWords,
                c(stopwords("english"), "https", "can", "doi", "also", "use"))
    d <- tm_map(d, removeNumbers)
    d <- tm_map(d, removePunctuation)
    d <- tm_map(d, stripWhitespace)
    d
  })()
Note

You may see warning messages when running the preprocessing chunk. These are expected from the tm package — as long as you see no error messages, you can continue.

Generate the word cloud

# Build a term-document matrix — a table of word frequencies
dtm <- TermDocumentMatrix(doc)
m   <- as.matrix(dtm)
v   <- sort(rowSums(m), decreasing = TRUE)
doc <- tm_map(doc, removeWords,
              c(stopwords("english"),
                "may", "one", "wirting", "learn", "work", "isbn", "http", "url"))
word_freq <- data.frame(word = names(v), freq = v)

# Top 10 most frequent words
head(word_freq, 10)

Word cloud from the Handbook of Learning Analytics (2022)

# Generate the word cloud
set.seed(1234)
wordcloud(
  words       = word_freq$word,
  freq        = word_freq$freq,
  min.freq    = 5,
  max.words   = 100,
  random.order = FALSE,
  rot.per     = 0.35,
  colors      = brewer.pal(8, "Dark2")
  
  
)

Word cloud from the Handbook of Learning Analytics (2022)
dtm <- TermDocumentMatrix(doc)
m   <- as.matrix(dtm)
v   <- sort(rowSums(m), decreasing = TRUE)
word_freq <- data.frame(word = names(v), freq = v)

# Top 10 most frequent words
head(word_freq, 10)

Word cloud from the Handbook of Learning Analytics (2022)

# Generate the word cloud
set.seed(1234)
wordcloud(
  words       = word_freq$word,
  freq        = word_freq$freq,
  min.freq    = 10,
  max.words   = 50,
  random.order = FALSE,
  rot.per     = 0.35,
  colors      = brewer.pal(8, "Dark2")
)

Word cloud from the Handbook of Learning Analytics (2022)

Task 1 — Parameter exploration: Try changing max.words to 50 or 150 and min.freq to 10. How does the word cloud change? Which version is more readable?

  • [When changing the max from 100 to 50 the cloud gets smaller and when changing the max from 100 to 150 the cloud gets bigger. I think the version with a min of 10 and a max of 50 was the most reliable because it contained the least amount of words and would make it easier the see what words you are looking for.]

Task 2 — Add your own stopwords: Look at the word cloud above. Find 2–3 words that appear large but feel too generic to be meaningful — words that are frequent only because they appear in every chapter, not because they signal a specific theme.

Add those words to the removeWords() line in the text-preprocessing chunk above, then re-run that chunk and the wordcloud-generate chunk. The line to edit looks like this:

d <- tm_map(d, removeWords,
            c(stopwords("english"), "https", "can", "doi", "also", "use"))

Add your words inside the c(...) vector, then re-run both chunks.

Reflection: Which words did you remove? Why did you choose them? Does the revised cloud reveal any themes the original missed?

  • [may, one, writing, learn, work, isbn, url, and http. I chose them because I did not feel like they really had any value to the research. I felt like they were just common word that got used repeatedly. Yes, the revised cloud help me see everything a lot better. It really helped me see some more of the words that needed exclude but I did not originally see.]

Question: Looking at the most frequent words, what does this tell you about the themes the learning analytics field focuses on? Does anything surprise you?

  • [From looking at the cloud I can definitely see that some of the common themes are going to be using data and research. I can also see that it is going to rely on students, feedback, modeling and collaboration. ]

Part 2 · Heatmap

A heatmap uses color intensity to show values across two categorical dimensions — useful for spotting patterns that would be invisible in a table or bar chart.

Package for this section

# reshape2 gives us melt() for converting wide to long format
# install.packages("reshape2") if needed
library(reshape2)

Load the data

data_hm <- read.csv("data/student_assignment_scores.csv")

head(data_hm)
glimpse(data_hm)
Rows: 30
Columns: 11
$ Student_ID    <chr> "Student_1", "Student_2", "Student_3", "Student_4", "Stu…
$ Assignment_1  <int> 98, 86, 50, 74, 59, 85, 67, 98, 96, 73, 56, 85, 74, 86, …
$ Assignment_2  <int> 77, 89, 52, 82, 91, 73, 79, 92, 64, 71, 57, 85, 53, 71, …
$ Assignment_3  <int> 52, 70, 51, 87, 59, 89, 54, 82, 98, 88, 85, 94, 91, 58, …
$ Assignment_4  <int> 98, 67, 54, 95, 89, 89, 70, 85, 85, 88, 91, 67, 76, 62, …
$ Assignment_5  <int> 54, 58, 65, 75, 92, 91, 91, 82, 78, 74, 100, 81, 66, 95,…
$ Assignment_6  <int> 87, 80, 55, 69, 64, 83, 91, 72, 61, 75, 90, 50, 51, 73, …
$ Assignment_7  <int> 54, 84, 63, 63, 77, 82, 86, 84, 75, 80, 94, 92, 81, 66, …
$ Assignment_8  <int> 80, 57, 65, 52, 86, 62, 61, 50, 79, 100, 80, 86, 96, 55,…
$ Assignment_9  <int> 82, 87, 92, 90, 71, 75, 69, 94, 86, 92, 64, 70, 83, 74, …
$ Assignment_10 <int> 79, 54, 71, 69, 99, 90, 60, 77, 87, 91, 58, 72, 75, 100,…

Reshape and plot

# melt() converts wide format (one column per assignment) to long format
# (one row per student-assignment combination) — required for ggplot heatmaps
data_hm_long <- melt(data_hm,
                     id.vars      = "Student_ID",
                     variable.name = "Assignment",
                     value.name   = "Score")

ggplot(data_hm_long,
       aes(x = Assignment, y = Student_ID, fill = Score)) +
  geom_tile(color = "white", linewidth = 0.3) +
  scale_fill_gradient(low = "#F0FAF6", high = "#0F6E56") +
  labs(
    title = "Heatmap of Student Progress Across Assignments",
    x     = "Assignment",
    y     = "Student ID",
    fill  = "Score"
  ) +
  theme_minimal() +
  theme(
    plot.title  = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.text.y = element_text(size = 8)
  )

Student scores across assignments — darker = higher score

Task: Copy the ggplot() call from the heatmap-plot chunk above into the blank chunk below. Make two changes:

  1. Change the low and high colors in scale_fill_gradient() — for example: low = "white", high = "#993C1D" (red gradient).
  2. Update the x and y labels in labs() to something more descriptive than "Assignment" and "Student ID".
TipHint

Copy the entire ggplot(data_hm_long, ...) block from above. You only need to change two things: scale_fill_gradient() and labs(). Everything else stays the same.

# YOUR CODE HERE — copy the ggplot() call above and apply your two changes
data_hm_long <- melt(data_hm,
                     id.vars      = "Student_ID",
                     variable.name = "Assignment",
                     value.name   = "Score")

ggplot(data_hm_long,
       aes(x = Assignment, y = Student_ID, fill = Score)) +
  geom_tile(color = "white", linewidth = 0.3) +
  scale_fill_gradient(low = "red", high = "green") +
  labs(
    title = "Heatmap of Student Progress Across Assignments",
    x     = "Assignment grade",
    y     = "Student ID #",
    fill  = "Score"
  ) +
  theme_minimal() +
  theme(
    plot.title  = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.text.y = element_text(size = 8)
  )

Heatmap with custom colors and labels

Reflection: How does the color change affect how you interpret the data? Does the red gradient feel different from the green one — and why might that matter when presenting to a principal or department chair?

  • [I like the color better. To me it makes it seem more obvious and easier to see. I think this would matter to a principal or department chair because it makes the data seem more important. When having the lower scores in red it really beings home the idea that this is a serious problem that needs attention. ]

Question: Looking at the heatmap, identify one student and one assignment that stand out. What would you do as an instructor based on what you see?

  • [One that stands out to me is assignment 9 student 17. This student is in the dark red on my heat map. As an instructor I would use this information and talk to the student and ask him what might of caused his grade to me so low. From there I would give him advice and try to help them improve their future grades.]

Part 3 · Social network analysis (SNA)

Social network analysis maps relationships and interactions between people. In education, this reveals who is connected, who is isolated, and who acts as a hub or bridge in a learning community.

Package for this section

# igraph is the main package for network analysis in R
# install.packages("igraph") if needed
library(igraph)

Load the interaction data

# This dataset has two columns: From and To
# Each row is one interaction (e.g., Student_1 replied to Student_5)
data_sna <- read.csv("data/student_interactions.csv")

head(data_sna)

Build the network

# Convert the data frame to a graph object
# directed = FALSE means interactions go both ways
g <- graph_from_data_frame(d = data_sna, directed = FALSE)

# Summary of the network
summary(g)
IGRAPH d078d37 UNW- 30 100 -- 
+ attr: name (v/c), weight (e/n)

Basic network plot

plot(
  g,
  vertex.label = V(g)$name,
  vertex.size  = 30,
  vertex.color = "#E1F5EE",
  edge.color   = "gray60",
  main         = "Student Interaction Network"
)

Basic social network of student interactions

Enhanced network — node size by connections

# Compute degree — how many connections each student has
V(g)$degree <- degree(g)

# Color palette from RColorBrewer (loaded in the main setup chunk)
n_nodes       <- length(V(g))
node_colors   <- brewer.pal(min(n_nodes, 12), "Set3")

plot(
  g,
  vertex.label       = V(g)$name,
  vertex.size        = V(g)$degree * 1.5 + 5,   # bigger = more connected
  vertex.color       = node_colors[seq_len(n_nodes) %% length(node_colors) + 1],
  edge.color         = "gray60",
  vertex.label.cex   = 0.6,
  vertex.label.color = "#2C2C2A",
  layout             = layout_with_fr,
  main               = "Student Interaction Network — Node Size = Degree"
)

Network with node size scaled by number of connections (degree)
TipClick “Show in New Window” for a better view

Network plots can be hard to read in the small viewer. Click the expand icon in the top right of the plot pane to open it full size.

Task: The enhanced plot above shows node size by degree, but a visual ranking is hard to read precisely when many nodes are close in size. The chunk below has already built a data frame called degree_df with each student’s name and degree count. Your job is to sort it and find the top 3 most-connected students.

# This data frame is already built for you from the sna-enhanced chunk above.
# V(g)$name = student names, V(g)$degree = number of connections each has
degree_df <- data.frame(
  student = V(g)$name,
  degree  = V(g)$degree
)

# YOUR CODE HERE
# Use arrange() and head() to show the top 3 students by degree, descending.
# Hint: arrange(desc(degree)) |> head(3)
degree_df |>
  arrange(desc(degree)) |>
  head(3)

Reflection: Does the result match what you expected from the visual? Which student is most connected and what does that mean for an instructor?

  • [No, I was picking mine off which circles were the closest not the same size. The 3 closest are David, Mia, and Emily. For an instructor this shows some similarity between these students.These are probably the 3 most talkative students and should be paired with the least talkative ones.]

Question: Looking at the enhanced network plot, identify:

  1. Which students appear most central (largest nodes)? What does that mean?
  2. Are any students isolated (small nodes with few or no connections)?
  3. What would you do as an instructor based on what you see?
  • [1. David, Sarah, and Christopher
    1. James and Joshua 3.I would partner the most and least talkative students so that way they could benefit each other.]

Part 4 · Visualization choice reflection

You have now used five visualization types: scatter plot, bar plot, line plot, histogram, word cloud, heatmap, and SNA plot (this week).

Question 1: For each of the three advanced techniques, describe one specific scenario from your track (K–12 or ID/higher ed) where it would be the most useful choice:

  • Word cloud: [I could see this being usful when making lessons to help identify important word in the stadards. You could take the state standards fo the class you are teaching and find what words keep reapering to know which are the most important.]
  • Heatmap: [High and low scores. A heat map is super useful in seeing which student are under performing and which ones are not.]
  • SNA plot: [I think this would be super useful for summarizing student class surveys. It would make it easy to see which student gave the lowest scores. From there you could see what grade they had, if they caused much trouble, or just what kind of relationship they had with the teacher.]

Question 2: What challenges might you face implementing these visualizations in a real school or institutional setting? Think about data availability, privacy, and stakeholder interpretation.

  • [I think to some degree it would be hard to get the exact data set up we need for every single visualization. I feel like sometimes we would want to make a certain visualization but we wouldn’t have to right data to accomplish it. Alsi I think there could be a privacy problem with singling out students by them selves for certain visualizations depending on who you were sharing it with.]

Question 3: How could these techniques evolve in your field? What would be possible if these tools were integrated into an LMS dashboard that teachers or designers could access in real time?

  • [I feel like the tools could definitely evolve into teachers grade keeping. The tools would make it possible for teacher to see which students are underscoring and how many are. Also it would make seeing these students very clear. I think it would help teacher make sure every single student is staying on task and caught up.]

Render & submit

Step 1 — Add your name

Change the author: field in the YAML header at the top to your name.

Step 2 — Render

Click Render in the toolbar. This file uses several packages (tm, wordcloud, igraph) that produce warnings during preprocessing — that is expected. As long as the final HTML page appears, the render was successful. If you see a true error (red text that stops the render), check that all packages are installed:

install.packages(c("tm", "wordcloud", "RColorBrewer", "reshape2", "igraph"))

Step 3 — Publish

Option Best for Link
Posit Cloud Quickest — one click from your workspace Guide
RPubs Free, public, easy to share a link rpubs.com
Quarto Pub Clean public portfolio pages Guide
GitHub Pages Best for a professional portfolio Guide
TipE-portfolio tip

This is the most visually impressive of the four files — word clouds, heatmaps, and network graphs are immediately recognizable as advanced data work to anyone reviewing your portfolio. If you are sharing one document from this course with a hiring committee or graduate school application, this is the one to lead with. Pair it with your capstone analysis for the full picture.

Share your published link with your instructor once you have rendered and published. Post in the course discussion board if you run into any technical issues.