Advanced Techniques & Visualization with Various Dataset

CUED 7540: Learning Analytics IV

Author

Caroline Wendt, cghuntley42@tntech.edu

Published

September 30, 2025

Final Analytics Practice Module

In this final module, we’ll explore Wordclouds, Social Network Analysis (SNA), and Heatmaps—three advanced visualization techniques. These tools are highly customizable, and today, we’ll focus on the foundational steps to get you started.

Module Goal

By the end of this module, you’ll be able to create these visualizations and uncover patterns in your data. Remember, all the codes are provided, but I encourage you to modify them and try using your own data for a deeper understanding.

Part 1: Word Cloud with Text Data

We’ll begin by creating a word cloud from student text data, such as survey responses or discussion board posts. A word cloud visually represents the frequency of words in a text, with more frequent words appearing larger.

1. Setup

library(tm)        # For text mining
Loading required package: NLP
library(wordcloud) # For word cloud visualization
Loading required package: RColorBrewer
library(RColorBrewer) # For color palettes

2.Loading Data

The data for a word cloud is typically a plain text file. We’ll use a text file from the Handbook of Learning Analytics (2022) for this example. Handbook of Learning Analytics (2022). This book was converted to a txt file for our analytics. Check our data folder.

# Load the data
la <- read.delim("data/Handbook of LA.txt", header = FALSE, stringsAsFactors = FALSE)

# Create a text corpus, which is a collection of text documents
doc <- Corpus(VectorSource(la))

head(doc)
<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 1

Note: The read.delim() function is used to read a plain text file. The header=FALSE and stringsAsFactors = FALSE arguments ensure the file is read as a single block of text rather than a table.

3. Text Preprocessing

Before creating the word cloud, we need to “clean” the text by removing common words, punctuation, and numbers. This ensures our word cloud is meaningful.

# Define a function to replace specific patterns with a space
toSpace <- content_transformer(function (x, pattern) gsub(pattern, " ", x))

# Apply text transformations
doc <- tm_map(doc, toSpace, "/")
Warning in tm_map.SimpleCorpus(doc, toSpace, "/"): transformation drops
documents
doc <- tm_map(doc, toSpace, "@")
Warning in tm_map.SimpleCorpus(doc, toSpace, "@"): transformation drops
documents
doc <- tm_map(doc, toSpace, "\\|")
Warning in tm_map.SimpleCorpus(doc, toSpace, "\\|"): transformation drops
documents
doc <- tm_map(doc, content_transformer(tolower))   # Convert to lowercase
Warning in tm_map.SimpleCorpus(doc, content_transformer(tolower)):
transformation drops documents
doc <- tm_map(doc, removeWords, c(stopwords("english"), "https","can","doi")) # Remove common stopwords
Warning in tm_map.SimpleCorpus(doc, removeWords, c(stopwords("english"), :
transformation drops documents
doc <- tm_map(doc, removeNumbers)                    # Remove numbers
Warning in tm_map.SimpleCorpus(doc, removeNumbers): transformation drops
documents
doc <- tm_map(doc, removePunctuation)          # Remove punctuation
Warning in tm_map.SimpleCorpus(doc, removePunctuation): transformation drops
documents
doc <- tm_map(doc, stripWhitespace)           # Remove extra whitespace
Warning in tm_map.SimpleCorpus(doc, stripWhitespace): transformation drops
documents
# You can add more words to remove by updating the stopwords list.

Note: You might get a lengthy list of warning messages when you run the code above. As long as it’s not an error message, we can continue.

4. Creating the Word Cloud

Now, we’ll create the word cloud by first counting the frequency of each word.

# Create a term-document matrix to count word frequencies
dtm <- TermDocumentMatrix(doc)

# Convert the matrix to a data frame of word frequencies
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)

# Display the top 10 words
head(d, 10)
                   word freq
learning       learning 3037
analytics     analytics 1443
data               data 1112
–                     – 1084
url                 url  638
education     education  466
educational educational  441
research       research  417
knowledge     knowledge  360
analysis       analysis  358
# Create the word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 5,
          max.words = 50, random.order = FALSE, rot.per = 0.35, 
          colors = brewer.pal(8, "Dark2"))

Task: Try modifying the max.words or min.freq parameters to see how the word cloud changes.

Question: How might customizing the word cloud (e.g., colors, word frequency) enhance the insights you gain from text data?

  • [By changing the word frequency, we can display words that occur more or less, dependent on what we are trying to gain from the data. If I wanted to see words that were mentioned at least 50 times, I could make that adjustment to see words mentioned a lot more than others. This helps sift through the data more efficiently. Customizing the word colors would allow the viewer to more easily see how frequent words appear. The colors help us understand if it is a more used word or less used word. It also helps the viewer easily see which words are used in the same frequency as other words. These insights help us gain insights into what words are used most and relationships between the usages of words.]

Part 2: Heatmap

A heatmap visualizes data in a grid where values are represented by colors. We’ll use a dataset of student assignment scores to create a heatmap that shows student progress.

1. Setup and Load/Inspect Data

We’ll use ggplot2 for plotting and reshape2 to transform the data into the correct format for the heatmap. We’ll use a dataset of student assignment scores to create a heatmap.

#We need various libraries this time.
library(ggplot2)

Attaching package: 'ggplot2'
The following object is masked from 'package:NLP':

    annotate
library(reshape2)  

# Load the dataset
data_hm <- read.csv("data/student_assignment_scores.csv")

# Inspect your data
head(data_hm)
  Student_ID Assignment_1 Assignment_2 Assignment_3 Assignment_4 Assignment_5
1  Student_1           98           77           52           98           54
2  Student_2           86           89           70           67           58
3  Student_3           50           52           51           54           65
4  Student_4           74           82           87           95           75
5  Student_5           59           91           59           89           92
6  Student_6           85           73           89           89           91
  Assignment_6 Assignment_7 Assignment_8 Assignment_9 Assignment_10
1           87           54           80           82            79
2           80           84           57           87            54
3           55           63           65           92            71
4           69           63           52           90            69
5           64           77           86           71            99
6           83           82           62           75            90
# Reshape the data from "wide" to "long" format for ggplot
data_long <- melt(data_hm, id.vars = "Student_ID", variable.name = "Assignment", value.name = "Score")

2. Generate the Heatmap

# Generate the heatmap to visualize student progress across assignments
ggplot(data_long, aes(x = Assignment, y = Student_ID, fill = Score)) +
  geom_tile() +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Heatmap of Student Progress Across Assignments", x = "Assignment", y = "Student ID") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Task: Try modifying the color gradient by changing the low and high colors (e.g., low = "green", high = "blue") or updating the axis labels. How does this change the interpretation of the data? - [Updating the gradient colors can help with the viewing of the heatmap. When choosing colors that work together, you can get a better visualization of low to high scores based on the color code. For instance, I made the low color light blue and the highest color dark blue. This made the map much more viewable as there was a good,clear contrast from low to high scores. Changing the axis labels can make the graph more specific and understandable to the viewer, explaining specifically what the assignments are, etc.]

Part 3: Social Network Analysis (SNA)

Social network analysis visualizes the connections (interactions) between individuals in a group. We’ll use this to map student interactions.

1. Setup and Load data

The data for SNA needs to be in a specific format, typically two columns representing a From and To relationship.

2. Prepare the Data

We need to convert our data frame into a special graph object that the igraph package can understand.

# Create a graph from the data
# The 'd' argument takes the data frame, and 'directed = FALSE' means the interactions are not one-way.
g <- graph_from_data_frame(d = data4, directed = FALSE)

# Display a summary of the graph
summary(g)
IGRAPH f720c6e UNW- 30 100 -- 
+ attr: name (v/c), weight (e/n)

3. Visualize the Network

The basicplot() function gives a good overview, but it can be difficult to read.

# Plot the network graph
plot(g, 
     vertex.label = V(g)$name,         # Show student IDs as labels
     vertex.size = 30,                  # Size of the nodes
     vertex.color = "lightblue",        # Color of the nodes
     edge.width = E(g)$weight / 2,      # Width of the edges based on interaction count
     edge.color = "gray",               # Color of the edges
     main = "Social Network of Student Interactions")

4. Customize the Network

We’ll enhance the plot to make it more insightful. For example, we’ll size the nodes based on the number of connections they have, which is called their degree.

# Compute node degree (number of connections)
V(g)$degree <- degree(g)

# Set the color palette
color_palette <- brewer.pal(8, "Set3")
# Plot the network graph with enhanced visuals
plot(g, 
     vertex.label = V(g)$name,            # Show student IDs as labels
     vertex.size = V(g)$degree * 1.5 + 2,    # Size of nodes based on degree
     vertex.color = color_palette[1:length(V(g))], # Color nodes using palette
     edge.width = E(g)$weight / 2,        # Width of edges based on interaction count
     edge.color = "gray",                  # Color of edges
     vertex.label.cex = 0.4,               # Size of labels
     vertex.label.color = "blue",         # Color of labels
     vertex.label.dist = 0.1,              # Distance of labels from nodes
     layout = layout_with_fr,              # Layout algorithm
     main = "Social Network of Student Interactions"
)

Note: Click the ‘Show in New Window’ icon on the top right of the preview to see the full image.

Part 4. Final Reflection

Application and Impact

Task: How can these advanced visualization techniques enhance your understanding and decision-making processes as an educator, instructional designer, or policymaker? Consider specific scenarios where these tools could provide deeper insights or drive more informed decisions.

  • [As an educator, I loved looking at the visuals we created. It was amazing being able to take a data set and create a word cloud, heat map, and social interaction map. These maps can have many practical uses in the classroom. Each of these help give an easily understandable and viewable description of student analytics. This helps you decide about what are the most common understandings of students, what topics/assignments they succeeded the most with or need more help with, or who to group together within the classroom based on personalities and/or behaviors. This also then allows you to make informed adjustments in teaching methods, types of instruction, assignments, and assessments, and strategic groupings of students.

One specific scenario in which the heat map could drive informed decisions is if an educator is looking at areas needing re-teaching before moving onto an assessment or new unit. When looking at the heat map, you can quickly identify assignments that students are scoring lower in. If I saw that the scores were lower on assignment 8, I could work to identify what was causing those lower scores and potentially find that the information covered on assignment 8 needed further teaching and clarity for students. An additional scenario could be using the word cloud to identify areas of needed clarity for students. If students were asked to describe what they need the most help in on an online questionnaire, the teacher could upload those responses and make a word cloud that would identify the most concentrated words that students were using to identify potential areas of confusion across the entire class.]

Challenges and Opportunities

Task: What challenges might you face in implementing these visualizations in real-world educational settings? How can you overcome these challenges to effectively utilize these tools?

  • [One of the biggest challenges that I foresee in an educational setting is time. These visualizations are useful, but they it does take time to put data in the proper format, upload the data, and then analyze the data. Educators simply do not have the time to make these visualizations. However, one way to overcome these challenges would be by having accessible, online dashboards that provide useful analytics that can then be more easily be used for analyses such as these. If educators are given the tools to quickly run analyses, then it is manageable. Educators would also need to be taught how to run these types of analyses. If taught the usefulness and how to create their own, then there would be more likelihood of them being used.

Another challenge I see specifically in implementing the social map is actually collecting the data to create it. It would be difficult to collect interactions in the classroom. However, one way to overcome this could be collecting data on online interactions between students that can be traced digitally. ]

Future Considerations

Task: Reflect on how these techniques could evolve in your field. What future possibilities do you see for the use of advanced visualizations in learning analytics?

  • [As learning analytics advances, I am sure we will see more and more applications as educators. I think that the current visualizations will be more widely used as educators learn about them and they become integrated in data analyses for students. Thinking ahead, I think we will also see visualizations that can predict future performance and show what students need to increase that predicted performance. I can see this becoming a streamlined process of data immediately being uploaded and analyses being ran that educators can access as needed.]

Render & Submity

Congratulations, you’ve completed the final module!

To receive full score, you will need to render this document and publish via a method such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with me and I have reviewed your work, you will be officially done with the current module.

Complete the following steps to submit your work for review by:

  1. First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Next, click the “Render” button in the toolbar above to “render” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let me know if you run into any issues with rendering.

  3. Finally, publish. To do publish, follow the step from the link

If you have any questions about this module, or run into any technical issues, don’t hesitate to contact me.

Once I have checked your link, you will be notified!