Advanced Techniques & Visualization with Various Dataset

CUED 7540: Learning Analytics IV

Author

Abby Shoulders, afshoulder42@tntech.edu

Published

October 10, 2025

Final Analytics Practice Module

In this final module, we’ll explore Wordclouds, Social Network Analysis (SNA), and Heatmaps—three advanced visualization techniques. These tools are highly customizable, and today, we’ll focus on the foundational steps to get you started.

Module Goal

By the end of this module, you’ll be able to create these visualizations and uncover patterns in your data. Remember, all the codes are provided, but I encourage you to modify them and try using your own data for a deeper understanding.

Part 1: Word Cloud with Text Data

We’ll begin by creating a word cloud from student text data, such as survey responses or discussion board posts. A word cloud visually represents the frequency of words in a text, with more frequent words appearing larger.

1. Setup

library(tm)        # For text mining
Loading required package: NLP
library(wordcloud) # For word cloud visualization
Loading required package: RColorBrewer
library(RColorBrewer) # For color palettes

2.Loading Data

The data for a word cloud is typically a plain text file. We’ll use a text file from the Handbook of Learning Analytics (2022) for this example. Handbook of Learning Analtics (2022). This book was converted to a txt file for our analytics. Check our data folder.

# Load the data
la <- read.delim("data/Handbook of LA.txt", header = FALSE, stringsAsFactors = FALSE)

# Create a text corpus, which is a collection of text documents
doc <- Corpus(VectorSource(la))

head(doc)
<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 1

Note: The read.delim() function is used to read a plain text file. The header=FALSE and stringsAsFactors = FALSE arguments ensure the file is read as a single block of text rather than a table.

3. Text Preprocessing

Before creating the word cloud, we need to “clean” the text by removing common words, punctuation, and numbers. This ensures our word cloud is meaningful.

# Define a function to replace specific patterns with a space
toSpace <- content_transformer(function (x, pattern) gsub(pattern, " ", x))

# Apply text transformations
doc <- tm_map(doc, toSpace, "/")
Warning in tm_map.SimpleCorpus(doc, toSpace, "/"): transformation drops
documents
doc <- tm_map(doc, toSpace, "@")
Warning in tm_map.SimpleCorpus(doc, toSpace, "@"): transformation drops
documents
doc <- tm_map(doc, toSpace, "\\|")
Warning in tm_map.SimpleCorpus(doc, toSpace, "\\|"): transformation drops
documents
doc <- tm_map(doc, content_transformer(tolower))   # Convert to lowercase
Warning in tm_map.SimpleCorpus(doc, content_transformer(tolower)):
transformation drops documents
doc <- tm_map(doc, removeWords, c(stopwords("english"), "https","can","doi")) # Remove common stopwords
Warning in tm_map.SimpleCorpus(doc, removeWords, c(stopwords("english"), :
transformation drops documents
doc <- tm_map(doc, removeNumbers)                    # Remove numbers
Warning in tm_map.SimpleCorpus(doc, removeNumbers): transformation drops
documents
doc <- tm_map(doc, removePunctuation)          # Remove punctuation
Warning in tm_map.SimpleCorpus(doc, removePunctuation): transformation drops
documents
doc <- tm_map(doc, stripWhitespace)           # Remove extra whitespace
Warning in tm_map.SimpleCorpus(doc, stripWhitespace): transformation drops
documents
# You can add more words to remove by updating the stopwords list.

Note: You might get a lengthy list of warning messages when you run the code above. As long as it’s not an error message, we can continue.

4. Creating the Word Cloud

Now, we’ll create the word cloud by first counting the frequency of each word.

# Create a term-document matrix to count word frequencies
dtm <- TermDocumentMatrix(doc)

# Convert the matrix to a data frame of word frequencies
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)

# Display the top 10 words
head(d, 10)
                   word freq
learning       learning 3037
analytics     analytics 1443
data               data 1112
–                     – 1084
url                 url  638
education     education  466
educational educational  441
research       research  417
knowledge     knowledge  360
analysis       analysis  358
# Create the word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 2,
          max.words = 150, random.order = FALSE, rot.per = 0.35, 
          colors = brewer.pal(8, "Dark2"))

Task: Try modifying the max.words or min.freq parameters to see how the word cloud changes.

Question: How might customizing the word cloud (e.g., colors, word frequency) enhance the insights you gain from text data?

  • [By customizing the word cloud, I gain a more clear and detailed understanding of the text data, key themes, and it reveals patterns that might otherwise be missed. This makes the visualization more informative and engaging.]

Part 2: Heatmap

A heatmap visualizes data in a grid where values are represented by colors. We’ll use a dataset of student assignment scores to create a heatmap that shows student progress.

1. Setup and Load/Inspect Data

We’ll use ggplot2 for plotting and reshape2 to transform the data into the correct format for the heatmap. We’ll use a dataset of student assignment scores to create a heatmap.

#We need various libraries this time.
library(ggplot2)

Attaching package: 'ggplot2'
The following object is masked from 'package:NLP':

    annotate
library(reshape2)  

# Load the dataset
data_hm <- read.csv("data/student_assignment_scores.csv")

# Inspect your data
head(data_hm)
  Student_ID Assignment_1 Assignment_2 Assignment_3 Assignment_4 Assignment_5
1  Student_1           98           77           52           98           54
2  Student_2           86           89           70           67           58
3  Student_3           50           52           51           54           65
4  Student_4           74           82           87           95           75
5  Student_5           59           91           59           89           92
6  Student_6           85           73           89           89           91
  Assignment_6 Assignment_7 Assignment_8 Assignment_9 Assignment_10
1           87           54           80           82            79
2           80           84           57           87            54
3           55           63           65           92            71
4           69           63           52           90            69
5           64           77           86           71            99
6           83           82           62           75            90
# Reshape the data from "wide" to "long" format for ggplot
data_long <- melt(data_hm, id.vars = "Student_ID", variable.name = "Assignment", value.name = "Score")

2. Generate the Heatmap

# Generate the heatmap to visualize student progress across assignments
ggplot(data_long, aes(x = Assignment, y = Student_ID, fill = Score)) +
  geom_tile() +
  scale_fill_gradient(low = "green", high = "blue") +
  labs(title = "Heatmap of Student Progress Across Assignments", x = "Assignment", y = "Student ID") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Task: Try modifying the color gradient by changing the low and high colors (e.g., low = "green", high = "blue") or updating the axis labels. How does this change the interpretation of the data? - [Modifying the color gradient and axis labels of a data visualization can notably alter how the data is viewed and interpreted. Even without changing the data itself, different choices in color and labeling can change the visual picture, highlight different patterns, or even misrepresent the information.]

Part 3: Social Network Analysis (SNA)

Social network analysis visualizes the connections (interactions) between individuals in a group. We’ll use this to map student interactions.

1. Setup and Load data

The data for SNA needs to be in a specific format, typically two columns representing a From and To relationship.

2. Prepare the Data

We need to convert our data frame into a special graph object that the igraph package can understand.

# Create a graph from the data
# The 'd' argument takes the data frame, and 'directed = FALSE' means the interactions are not one-way.
g <- graph_from_data_frame(d = data4, directed = FALSE)

# Display a summary of the graph
summary(g)
IGRAPH 97d39c8 UNW- 30 100 -- 
+ attr: name (v/c), weight (e/n)

3. Visualize the Network

The basicplot() function gives a good overview, but it can be difficult to read.

# Plot the network graph
plot(g, 
     vertex.label = V(g)$name,         # Show student IDs as labels
     vertex.size = 30,                  # Size of the nodes
     vertex.color = "lightblue",        # Color of the nodes
     edge.width = E(g)$weight / 2,      # Width of the edges based on interaction count
     edge.color = "gray",               # Color of the edges
     main = "Social Network of Student Interactions")

4. Customize the Network

We’ll enhance the plot to make it more insightful. For example, we’ll size the nodes based on the number of connections they have, which is called their degree.

# Compute node degree (number of connections)
V(g)$degree <- degree(g)

# Set the color palette
color_palette <- brewer.pal(8, "Set3")
# Plot the network graph with enhanced visuals
plot(g, 
     vertex.label = V(g)$name,            # Show student IDs as labels
     vertex.size = V(g)$degree * 1.5 + 2,    # Size of nodes based on degree
     vertex.color = color_palette[1:length(V(g))], # Color nodes using palette
     edge.width = E(g)$weight / 2,        # Width of edges based on interaction count
     edge.color = "gray",                  # Color of edges
     vertex.label.cex = 0.4,               # Size of labels
     vertex.label.color = "blue",         # Color of labels
     vertex.label.dist = 0.1,              # Distance of labels from nodes
     layout = layout_with_fr,              # Layout algorithm
     main = "Social Network of Student Interactions"
)

Note: Click the ‘Show in New Window’ icon on the top right of the preview to see the full image.

Part 4. Final Reflection

Application and Impact

Task: How can these advanced visualization techniques enhance your understanding and decision-making processes as an educator, instructional designer, or policymaker? Consider specific scenarios where these tools could provide deeper insights or drive more informed decisions.

  • [These advanced visualization techniques could enhance my understanding and decision-making as a physical education teacher because I could track and understand student performance, design targeted interventions, and make more informed instructional decisions. These tools enable faster identification trends, patterns, improved decision making, simplified complex data, and highlights key insights. A PE teacher could use these visuals by scoring students on the PACER test. The teacher could easily and quickly notice students’ scores whether they are increasing/decreasing, have all of the students’ data in one place, and possibly make decisions on how to improve the test.]

Challenges and Opportunities

Task: What challenges might you face in implementing these visualizations in real-world educational settings? How can you overcome these challenges to effectively utilize these tools?

  • [Challenges in implementing visualizations in education include technical issues such as slow performance, unfamiliarity with the program, data handling, making sure students know what visuals to use, and designing effective graphs. Overcoming these challenges could be by providing teacher training, designing clear, user-friendly visualizations, and having a supportive learning environment.]

Future Considerations

Task: Reflect on how these techniques could evolve in your field. What future possibilities do you see for the use of advanced visualizations in learning analytics?

  • [Advanced visualization techniques could transform physical education by creating data-rich, interactive learning experiences that lead to personalized instruction, enhanced safety, and greater student engagement. PE teachers can use visual dashboards and engaging technologies to provide deeper insights into performance, health, and skill development.]

Render & Submit

Congratulations, you’ve completed the final module!

To receive full score, you will need to render this document and publish via a method such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with me and I have reviewed your work, you will be officially done with the current module.

Complete the following steps to submit your work for review by:

  1. First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

  2. Next, click the “Render” button in the toolbar above to “render” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let me know if you run into any issues with rendering.

  3. Finally, publish. To do publish, follow the step from the link

If you have any questions about this module, or run into any technical issues, don’t hesitate to contact me.

Once I have checked your link, you will be notified!