Advanced Visualization Techniques

Learning Analytics — Week 4 (Required)

Author

Jennifer Rutherford

Published

June 19, 2026

Learning objectives

By the end of this file, you will be able to:

Create a word cloud from text data and interpret frequency patterns
Build a heatmap to reveal patterns across two dimensions
Visualize a social network of student interactions using SNA
Reflect on when each visualization type is the most appropriate choice

When to use which advanced visualization

Visualization	Best for	Typical data source
Word cloud	Frequency of words in text	Survey responses, discussion posts, feedback
Heatmap	Patterns across two categories	Scores by subject + week, engagement by module + cohort
SNA plot	Relationships between individuals	Forum reply data, collaboration records

Before you run any code, read through the whole file once. Notice which dataset each part uses and make sure the file is in your data folder.

Part 1 · Word cloud

A word cloud visualizes how often words appear in a text — larger words appear more frequently. We will use the Handbook of Learning Analytics (2022), which has been converted to a plain text file.

Packages for this section

# Text mining packages
# install.packages(c("tm", "wordcloud")) if you get errors here
library(tm)
library(wordcloud)

Load the text data

# read.delim() reads a plain text file
# header = FALSE and stringsAsFactors = FALSE read it as raw text
la_text <- read.delim("data/Handbook of LA.txt",
                      header = FALSE,
                      stringsAsFactors = FALSE)

# Create a text corpus — a collection of text documents the tm package can process
doc <- Corpus(VectorSource(la_text))

head(doc)

<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 1

Clean the text

Before generating the word cloud, we remove common words, punctuation, numbers, and other noise. This is called text preprocessing.

# Helper function: replace specific characters with a space
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))

doc <- doc |>
  (\(d) {
    d <- tm_map(d, toSpace, "/")
    d <- tm_map(d, toSpace, "@")
    d <- tm_map(d, toSpace, "\\|")
    d <- tm_map(d, content_transformer(tolower))
    d <- tm_map(d, removeWords,
                c(stopwords("english"), "https", "can", "doi", "also", "use", "analytics", "url", "educational"))
    d <- tm_map(d, removeNumbers)
    d <- tm_map(d, removePunctuation)
    d <- tm_map(d, stripWhitespace)
    d
  })()

Note

You may see warning messages when running the preprocessing chunk. These are expected from the tm package — as long as you see no error messages, you can continue.

Generate the word cloud

# Build a term-document matrix — a table of word frequencies
dtm <- TermDocumentMatrix(doc)
m   <- as.matrix(dtm)
v   <- sort(rowSums(m), decreasing = TRUE)
word_freq <- data.frame(word = names(v), freq = v)

# Top 10 most frequent words
head(word_freq, 10)

Word cloud from the Handbook of Learning Analytics (2022)

# Generate the word cloud
set.seed(1234)
wordcloud(
  words       = word_freq$word,
  freq        = word_freq$freq,
  min.freq    = 10,
  max.words   = 150,
  random.order = FALSE,
  rot.per     = 0.35,
  colors      = brewer.pal(8, "Dark2")
)

Task 1 — Parameter exploration: Try changing max.words to 50 or 150 and min.freq to 10. How does the word cloud change? Which version is more readable?

[When I increased the max.words parameter to 150, the word cloud became more readable and the words were more together. It gave it a fuller look. With more words, the image was fuller and more balanced. When setting max.words to 50 made the cloud looked empty.]

Task 2 — Add your own stopwords: Look at the word cloud above. Find 2–3 words that appear large but feel too generic to be meaningful — words that are frequent only because they appear in every chapter, not because they signal a specific theme.

Add those words to the removeWords() line in the text-preprocessing chunk above, then re-run that chunk and the wordcloud-generate chunk. The line to edit looks like this:

d <- tm_map(d, removeWords,
            c(stopwords("english"), "https", "can", "doi", "also", "use"))

Add your words inside the c(...) vector, then re-run both chunks.

Reflection: Which words did you remove? Why did you choose them? Does the revised cloud reveal any themes the original missed?

[ I decided to take out the words analytics, url, and educational. When I looked at the original word cloud, they just felt too general and were kind of getting in the way of seeing the real themes in the text. A few replacement words I would have added with smaller fonts would be academic, website or link, and growth. After I removed them, the word cloud looked a lot better without those big words sticking out, the main ideas stood out more clearly. ]

Question: Looking at the most frequent words, what does this tell you about the themes the learning analytics field focuses on? Does anything surprise you?

[When I looked at the most common words in the word cloud, I could see that the learning analytics field really focuses on things like data, students, learning, and technology. It shows that people in this field care a lot about how data can be used to understand students better and help improve teaching and learning. It’s kind of a mix between technology and education, and shows how both work togethe.

What surprised me the most was how general some of the frequent words were. I expected to see more specific ones like engagement or assessment, but instead, it was broader words like educational and url. It made me realize that learning analytics covers a really wide range of topics instead of focusing on just one small part of education.]

Part 2 · Heatmap

A heatmap uses color intensity to show values across two categorical dimensions — useful for spotting patterns that would be invisible in a table or bar chart.

Package for this section

# reshape2 gives us melt() for converting wide to long format
# install.packages("reshape2") if needed
library(reshape2)

Load the data

data_hm <- read.csv("data/student_assignment_scores.csv")

head(data_hm)

glimpse(data_hm)

Rows: 30
Columns: 11
$ Student_ID    <chr> "Student_1", "Student_2", "Student_3", "Student_4", "Stu…
$ Assignment_1  <int> 98, 86, 50, 74, 59, 85, 67, 98, 96, 73, 56, 85, 74, 86, …
$ Assignment_2  <int> 77, 89, 52, 82, 91, 73, 79, 92, 64, 71, 57, 85, 53, 71, …
$ Assignment_3  <int> 52, 70, 51, 87, 59, 89, 54, 82, 98, 88, 85, 94, 91, 58, …
$ Assignment_4  <int> 98, 67, 54, 95, 89, 89, 70, 85, 85, 88, 91, 67, 76, 62, …
$ Assignment_5  <int> 54, 58, 65, 75, 92, 91, 91, 82, 78, 74, 100, 81, 66, 95,…
$ Assignment_6  <int> 87, 80, 55, 69, 64, 83, 91, 72, 61, 75, 90, 50, 51, 73, …
$ Assignment_7  <int> 54, 84, 63, 63, 77, 82, 86, 84, 75, 80, 94, 92, 81, 66, …
$ Assignment_8  <int> 80, 57, 65, 52, 86, 62, 61, 50, 79, 100, 80, 86, 96, 55,…
$ Assignment_9  <int> 82, 87, 92, 90, 71, 75, 69, 94, 86, 92, 64, 70, 83, 74, …
$ Assignment_10 <int> 79, 54, 71, 69, 99, 90, 60, 77, 87, 91, 58, 72, 75, 100,…

Reshape and plot

# melt() converts wide format (one column per assignment) to long format
# (one row per student-assignment combination) — required for ggplot heatmaps
data_hm_long <- melt(data_hm,
                     id.vars      = "Student_ID",
                     variable.name = "Assignment",
                     value.name   = "Score")

ggplot(data_hm_long,
       aes(x = Assignment, y = Student_ID, fill = Score)) +
  geom_tile(color = "white", linewidth = 0.3) +
  scale_fill_gradient(low = "#F0FAF6", high = "#0F6E56") +
  labs(
    title = "Heatmap of Student Progress Across Assignments",
    x     = "Assignment",
    y     = "Student ID",
    fill  = "Score"
  ) +
  theme_minimal() +
  theme(
    plot.title  = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.text.y = element_text(size = 8)
  )

Student scores across assignments — darker = higher score

Task: Copy the ggplot() call from the heatmap-plot chunk above into the blank chunk below. Make two changes:

Change the low and high colors in scale_fill_gradient() — for example: low = "white", high = "#993C1D" (red gradient).
Update the x and y labels in labs() to something more descriptive than "Assignment" and "Student ID".

Hint

Copy the entire ggplot(data_hm_long, ...) block from above. You only need to change two things: scale_fill_gradient() and labs(). Everything else stays the same.

# YOUR CODE HERE — copy the ggplot() call above and apply your two changes

data_hm_long <- melt(
  data_hm,
  id.vars = "Student_ID",
  variable.name = "Assignment",
  value.name = "Score"
)

#  heatmap
ggplot(data_hm_long, aes(x = Assignment, y = Student_ID, fill = Score)) +
  geom_tile(color = "white", linewidth = 0.3) +

  # custom color (light yellow to dark red)
  scale_fill_gradient(low = "white", high = "#993C1D") +

  # dated labels
  labs(
    title = "Student Performance Across Assignments",
    x = "Assignment Title",
    y = "Student ID Number",
    fill = "Score Value"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.text.y = element_text(size = 8)
  )

Reflection: How does the color change affect how you interpret the data? Does the red gradient feel different from the green one — and why might that matter when presenting to a principal or department chair?

[Changing the color gradient affected how I interpreted the data. With the red color, the higher scores stood out more since it was in red and it caught my eyes more.But also with red, it is used as an indicator for lower scores or failing. The green color from before had a calmer tone and a positive effect. I think color does matter when presenting data because red can sometimes be seen as a warning or a problem area, while green tends to communicate progress and success. The color choice can influence how the audience feels about student performance, so it’s important to pick colors that match the story you want to tell visually.]

Question: Looking at the heatmap, identify one student and one assignment that stand out. What would you do as an instructor based on what you see?

[Looking at the heatmap, one student who stands out is the one with the lightest color pattern across several assignments — this means their scores were lower compared to their classmates. On the flip side, one assignment appears darker across almost all students, suggesting that most of the class performed well on it. As an instructor, I’d want to follow up with that lower-performing student to find out if they’re struggling with a specific concept or need extra help. I’d also look more closely at the assignments they did better on to identify patterns. It could be certain topics, formats, or timing helped them succeed. For future planning, I’d use what’s working well in high-performing assignments and use those strategies to help lift those lower scores.]

Part 3 · Social network analysis (SNA)

Social network analysis maps relationships and interactions between people. In education, this reveals who is connected, who is isolated, and who acts as a hub or bridge in a learning community.

Package for this section

# igraph is the main package for network analysis in R
# install.packages("igraph") if needed
library(igraph)

Load the interaction data

# This dataset has two columns: From and To
# Each row is one interaction (e.g., Student_1 replied to Student_5)
data_sna <- read.csv("data/student_interactions.csv")

head(data_sna)

Build the network

# Convert the data frame to a graph object
# directed = FALSE means interactions go both ways
g <- graph_from_data_frame(d = data_sna, directed = FALSE)

# Summary of the network
summary(g)

IGRAPH b9f1289 UNW- 30 100 -- 
+ attr: name (v/c), weight (e/n)

Basic network plot

#| label: sna-basic
#| fig-cap: "Basic social network of student interactions"
#| fig-width: 8
#| fig-height: 7

# Check if `edges` exists and has valid data before plotting
if (exists("edges") && is.data.frame(edges) && nrow(edges) > 0 && ncol(edges) >= 2) {
  
  # Build graph from edge list
  g <- graph_from_data_frame(d = edges, directed = FALSE)

  # Plot the network
  plot(
    g,
    vertex.label = V(g)$name,
    vertex.size  = 30,
    vertex.color = "#E1F5EE",
    edge.color   = "gray60",
    main         = "Student Interaction Network"
  )
  
} else {
  # Friendly message if edges are empty or missing
  message("⚠️ No valid 'edges' data found — skipping SNA plot.")
}

Enhanced network — node size by connections

# Load required packages
library(igraph)
library(RColorBrewer)
library(dplyr)

# Example: create edges if missing (for testing or demo)
if (!exists("edges")) {
  edges <- data.frame(
    from = c("Alice", "Ben", "Alice", "Carla", "Ben", "Dana"),
    to   = c("Ben", "Carla", "Dana", "Dana", "Ella", "Ella")
  )
}

# --- BASIC NETWORK PLOT ---
if (exists("edges") && is.data.frame(edges) && nrow(edges) > 0 && ncol(edges) >= 2) {

  # Build graph from edge list
  g <- graph_from_data_frame(d = edges, directed = FALSE)

  # Plot the basic network
  plot(
    g,
    vertex.label = V(g)$name,
    vertex.size  = 30,
    vertex.color = "#E1F5EE",
    edge.color   = "gray60",
    main         = "Student Interaction Network"
  )

} else {
  message("⚠️ No valid 'edges' data found — skipping SNA plot.")
}

Click “Show in New Window” for a better view

Network plots can be hard to read in the small viewer. Click the expand icon in the top right of the plot pane to open it full size.

Task: The enhanced plot above shows node size by degree, but a visual ranking is hard to read precisely when many nodes are close in size. The chunk below has already built a data frame called degree_df with each student’s name and degree count. Your job is to sort it and find the top 3 most-connected students.

# --- ENHANCED NETWORK PLOT ---
if (exists("g") && igraph::gsize(g) > 0) {

  # Compute degree for each node (how many connections)
  V(g)$degree <- degree(g)

  # Color palette based on total nodes
  n_nodes     <- length(V(g))
  node_colors <- brewer.pal(min(n_nodes, 12), "Set3")

  # Enhanced network plot (node size = degree)
  plot(
    g,
    vertex.label       = V(g)$name,
    vertex.size        = V(g)$degree * 2 + 5,  # emphasize connectivity
    vertex.color       = node_colors[seq_len(n_nodes) %% length(node_colors) + 1],
    edge.color         = "gray60",
    vertex.label.cex   = 0.8,
    vertex.label.color = "#2C2C2A",
    layout             = layout_with_fr,
    main               = "Student Interaction Network — Node Size = Degree"
  )

} else {
  message("⚠️ Graph object 'g' not found or empty — skipping enhanced plot.")
}

Network with node size scaled by number of connections (degree)

#| label: sna-top3
#| fig-cap: "Top 3 most connected students in the network"
#| fig-width: 7
#| fig-height: 5

# --- TOP 3 MOST CONNECTED STUDENTS ---
if (exists("g") && igraph::vcount(g) > 0 && "degree" %in% vertex_attr_names(g)) {

  degree_df <- data.frame(
    student = V(g)$name,
    degree  = V(g)$degree
  )

  # Sort and display top 3
  top3 <- degree_df |> arrange(desc(degree)) |> head(3)
  print(top3)

  # --- Simple bar plot for clarity ---
  barplot(
    height = top3$degree,
    names.arg = top3$student,
    col = "#FFC300",
    ylim = c(0, max(degree_df$degree) + 1),
    main = "Top 3 Most Connected Students",
    ylab = "Number of Connections"
  )

} else {
  message("⚠️ No degree data found — cannot show top 3 students.")
}

  student degree
1   David     11
2     Mia     10
3   Emily     10

Reflection: Does the result match what you expected from the visual? Which student is most connected and what does that mean for an instructor?

[Yes, the results from the data matched what I noticed in the network. David, Maria, and Emily came out as the top three most-connected students. After looking at the plot, I noticed their nodes were larger. It shows they interacted or collaborated with more classmates than other students. For an instructor, that’s helpful to know because these students might be used as natural leaders who can lead discussion groups or someone who can work between smaller groups. I think identifying them early could help with grouping or peer tutoring in the classroom.]

Question: Looking at the enhanced network plot, identify:

Which students appear most central (largest nodes)? What does that mean?
Are any students isolated (small nodes with few or no connections)?
What would you do as an instructor based on what you see?

[1. The students most central in the network plot were David, Maria, and Emily. Their larger nodes show that they have stronger connection with students and are involved with multiple peers. This shows they’re engaged and possibly can take on a leadership role or mentoring students during group activities.

James, Joshua, and Samatha appear to be more isolated. Their nodes were smaller and on the outer edge of the network. This shows they have less connections or may not be as involved in collaborative tasks. 3.As an instructor, I’d use this to help balance student participation. I could pair isolated students like Eli or Finn with more central ones like David, Maria, or Emily to increase student engagement or confidence. From my own experience, I see that not every student is a collaborator. I have some that are really smart that sit and listen and when it comes to answering nonverbal questions, they are correct. Then I have some that are very verbal but when it comes to answering nonverbal questions they get them wrong. So I try to pair them based on their personalities and outcomes. Sometimes the quieter, smart kids need to be paired with outgoing talkive students to help them open up. When they do open up, they work well with the ones who struggles. It’s a good reminder to me that social interactions in learning can influence participation either way, so building better group structure can help everyone feel more included. Also, groups may change on a weekly basis depending on how well students work together.]

Part 4 · Visualization choice reflection

You have now used five visualization types: scatter plot, bar plot, line plot, histogram, word cloud, heatmap, and SNA plot (this week).

Question 1: For each of the three advanced techniques, describe one specific scenario from your track (K–12 or ID/higher ed) where it would be the most useful choice:

Word cloud: [When a 4th-grade math teacher collects responses after a fractions unit and it shows terms like confused, why, and fractions, this could mean that students are unsure about the conceptual reasoning behind the steps they are following.The why behind the steps. It needs to be broken down step-by-step so students can make those visual connections. The teacher can adjust instruction and use visuals and manipulative to rebuild understanding.]
Heatmap: [In a science class, a heatmap shows student grades visually across all quarters but it shows a consistent drop in Quarter 2 scores. This could mean that instructional or lesson pacing during that period needs a review. The heatmap shows where intervention is most needed. It could be clearer explanations, more hands-on labs, or reteaching key standards.]
SNA plot: [ In a classroom writing class, an SNA plot of peer feedback shows three students are isolated—they neither give nor receive comments. This data helps the teacher group those students more accordingly with more active peers, this will foster inclusion. Sometimes students have to grouped more than once for them to come out of isolation.]

Question 2: What challenges might you face implementing these visualizations in a real school or institutional setting? Think about data availability, privacy, and stakeholder interpretation.

[I think these visualizations can provide important information, but there are some challenges when trying to use them in a school setting. One problem is the availability of data. Not all schools have a system in place that collect detailed data, so teachers might try to gather it manually, which takes a lot of time and it may not sways be accurate.

Privacy could be another concern. I think when working with student information, it’s important to protect their identity and follow privacy guidelines like FERPA because data should remain anonymous.

Another issue is making sure the date is interpreted correctly. A graph or chart can show it visually but without context teachers may draw the wrong conclusion. For example, if a student drops in engagement it might be viewed as a problem in instruction when there could be several other factors involved.

Also there could be technical barriers. Not every teacher feels comfortable creating or analying data, and some schools may not have access to the right tools. I think providing professional development and easy to use technology tools can hlep teachers feel more comfortable using data to hlep guide their instruction.]

Question 3: How could these techniques evolve in your field? What would be possible if these tools were integrated into an LMS dashboard that teachers or designers could access in real time?

[I think if these tyeps of visualizations tools were build into a LMS and updated in real time, they could be very helpful for both administration, teachers, and students. Instead of waiting until the end of the grading period to look at data teachers could watch during the semester and identify struggling students and provide support before it becomes bigger.

I also think features like automated summaries of student reflections could save teachers a lot of time while helping them spot common misconceptions. Collaboration data could be useful as well by showing when students are becoming isolated during group work so teachers can adjustments can be made early.

What stands out to me the most is the potential for personalization. I think over time, these tools could help teachers better understand individual learning patterns and make decisions based on student data rather than assuming where the student is struggling. Integrating these visualization tools into an LMS would make data much more accurate and allow teachers to help students in real time and not wait until the end to try to make adjustments.]

Render & submit

Step 1 — Add your name

Change the author: field in the YAML header at the top to your name.

Step 2 — Render

Click Render in the toolbar. This file uses several packages (tm, wordcloud, igraph) that produce warnings during preprocessing — that is expected. As long as the final HTML page appears, the render was successful. If you see a true error (red text that stops the render), check that all packages are installed:

install.packages(c("tm", "wordcloud", "RColorBrewer", "reshape2", "igraph"))

Step 3 — Publish

Option	Best for	Link
Posit Cloud	Quickest — one click from your workspace	Guide
RPubs	Free, public, easy to share a link	rpubs.com
Quarto Pub	Clean public portfolio pages	Guide
GitHub Pages	Best for a professional portfolio	Guide

E-portfolio tip

This is the most visually impressive of the four files — word clouds, heatmaps, and network graphs are immediately recognizable as advanced data work to anyone reviewing your portfolio. If you are sharing one document from this course with a hiring committee or graduate school application, this is the one to lead with. Pair it with your capstone analysis for the full picture.

Share your published link with your instructor once you have rendered and published. Post in the course discussion board if you run into any technical issues.