Advanced Techniques & Visualization with Various Dataset

CUED 7540: Learning Analytics IV

Author

Nicole Jayne (nojayne42@tntech.edu)

Published

October 18, 2024

Final Analytics Practice Module

In this final module, we’ll explore Wordclouds, Social Network Analysis (SNA), and Heatmaps—three advanced visualization techniques. These tools are highly customizable, and today, we’ll focus on the foundational steps to get you started.

Module Goal

By the end of this module, you’ll be able to create these visualizations and uncover patterns in your data. Remember, all the codes are provided, but I encourage you to revise them and try using your own dataset for a deeper understanding.

WordCloud with Text Dataset

We’ll begin by creating a basic word cloud from students’ text data, such as survey responses or discussion board posts.

1. Setup

Step 1: Loading data

We will use a different command read.delim() and you will choose the text file. In this example, we’ll use “Sherlock.Homes.txt” from the ‘data’ folder.

We will load a text file from the Handbook of Learning Analtics (2022). This book was converted to a txt file for our analytics. Check our data folder.

# Load the data
sh <- read.delim("data/Handbook of LA.txt")

# Create a text corpus
doc <- Corpus(VectorSource(sh))

head(doc)

<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 1

Step 2: Text Preprocessing

Clean the text data to improve the quality of the word cloud.

# Define a function to replace specific patterns with a space
toSpace <- content_transformer(function (x, pattern) gsub(pattern, " ", x))

# Apply text transformations
doc <- tm_map(doc, toSpace, "/")

Warning in tm_map.SimpleCorpus(doc, toSpace, "/"): transformation drops
documents

doc <- tm_map(doc, toSpace, "@")

Warning in tm_map.SimpleCorpus(doc, toSpace, "@"): transformation drops
documents

doc <- tm_map(doc, toSpace, "\\|")

Warning in tm_map.SimpleCorpus(doc, toSpace, "\\|"): transformation drops
documents

doc <- tm_map(doc, content_transformer(tolower))   # Convert to lowercase

Warning in tm_map.SimpleCorpus(doc, content_transformer(tolower)):
transformation drops documents

doc <- tm_map(doc, removeWords, c(stopwords("english"), "https","can","doi")) # Remove common stopwords

Warning in tm_map.SimpleCorpus(doc, removeWords, c(stopwords("english"), :
transformation drops documents

doc <- tm_map(doc, removeNumbers)                    # Remove numbers

Warning in tm_map.SimpleCorpus(doc, removeNumbers): transformation drops
documents

doc <- tm_map(doc, removePunctuation)          # Remove punctuation

Warning in tm_map.SimpleCorpus(doc, removePunctuation): transformation drops
documents

doc <- tm_map(doc, stripWhitespace)           # Remove extra whitespace

Warning in tm_map.SimpleCorpus(doc, stripWhitespace): transformation drops
documents

# You can add more words to remove by updating the stopwords list.

When you run the code section above, you might get a lengthy list of warining message. As long as it’s not an error message, we can continue.

Step 3: Creating the Wordcloud

Now, let’s create the Wordcloud.

# Create a term-document matrix
dtm <- TermDocumentMatrix(doc)

# Convert the matrix to a data frame of word frequencies
m <- as.matrix(dtm)
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)

# Display the top words
head(d, 10)

                   word freq
learning       learning 3037
analytics     analytics 1443
data               data 1112
–                     – 1084
url                 url  638
education     education  466
educational educational  441
research       research  417
knowledge     knowledge  360
analysis       analysis  358

# Create the word cloud
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words = 100, random.order = FALSE, rot.per = 0.35, 
          colors = brewer.pal(8, "Dark2"))

Question: How might customizing the word cloud (e.g., colors, word frequency) enhance the insights you gain from text data?

[Making the word cloud a nice, easy to read color will help the viewer look at it with ease and comfort. To enhance the insight from the text, we could ensure word frequency based off important words and their significance within our analytical approach. So making some words larger and changing the color of them to ensure that they stand out.]

Heatmap

Step 1: Setup and Load/Inspect Data

We’ll use a dataset of student assignment scores to create a heatmap.

#We need various libraries this time.
library(ggplot2)


Attaching package: 'ggplot2'

The following object is masked from 'package:NLP':

    annotate

library(reshape2)  

#import/load the dataset
data_hm <- read.csv("data/student_assignment_scores.csv")

#inspect your data
head(data_hm)

  Student_ID Assignment_1 Assignment_2 Assignment_3 Assignment_4 Assignment_5
1  Student_1           98           77           52           98           54
2  Student_2           86           89           70           67           58
3  Student_3           50           52           51           54           65
4  Student_4           74           82           87           95           75
5  Student_5           59           91           59           89           92
6  Student_6           85           73           89           89           91
  Assignment_6 Assignment_7 Assignment_8 Assignment_9 Assignment_10
1           87           54           80           82            79
2           80           84           57           87            54
3           55           63           65           92            71
4           69           63           52           90            69
5           64           77           86           71            99
6           83           82           62           75            90

# Reshape the data for the heatmap
data_long <- melt(data_hm, id.vars = "Student_ID", variable.name = "Assignment", value.name = "Score")

Step 2: Generate the heatmap

# Generate the heatmap to visualize student progress across assignments
ggplot(data_long, aes(x = Assignment, y = Student_ID, fill = Score)) +
  geom_tile() +
  scale_fill_gradient(low = "black", high = "blue") +
  labs(title = "Heatmap of Student Progress Across Assignments", x = "Assignment", y = "Student ID") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Question: Try modifying the color gradient or axis labels. How does it change the interpretation of the data?

[Modifying the color gradient and axis labels can change the dynamic of the heatmap. Making the colors lighter or darker can enhance or make the graph hard to read. We need to make sure that the color of the heatmap and labels are easily discernible so that we can successfully interpret the information.]

Social Network Analysis with Interaction Dataset

Step 1: Setup and Load data

Step 2: Prepare the Data

Convert the data into a format suitable for network analysis.

# Create a graph from the data
g <- graph_from_data_frame(d = data4, directed = FALSE)

# Display a summary of the graph
summary(g)

IGRAPH 83e55ac UNW- 30 100 -- 
+ attr: name (v/c), weight (e/n)

Step 3: Visualize the network

# Plot the network graph
plot(g, 
     vertex.label = V(g)$name,         # Show student IDs as labels
     vertex.size = 30,                  # Size of the nodes
     vertex.color = "lightblue",        # Color of the nodes
     edge.width = E(g)$weight / 2,      # Width of the edges based on interaction count
     edge.color = "gray",               # Color of the edges
     main = "Social Network of Student Interactions")

It’s hard to see the interaction patterns so we will customize the network to make it easy to look.

Step 4: Customize the network

# Compute node degree (number of connections)
V(g)$degree <- degree(g)

# Set the color palette
color_palette <- brewer.pal(8, "Set3")
# Plot the network graph with enhanced visuals
plot(g, 
     vertex.label = V(g)$name,            # Show student IDs as labels
     vertex.size = V(g)$degree * 1.5 + 2,    # Size of nodes based on degree
     vertex.color = color_palette[1:length(V(g))], # Color nodes using palette
     edge.width = E(g)$weight / 2,        # Width of edges based on interaction count
     edge.color = "gray",                  # Color of edges
     vertex.label.cex = 0.4,               # Size of labels
     vertex.label.color = "blue",         # Color of labels
     vertex.label.dist = 0.1,              # Distance of labels from nodes
     layout = layout_with_fr,              # Layout algorithm
     main = "Social Network of Student Interactions"
)

*Click the ‘Show in New Window’ icon on the top right of the preview.

Reflection Activity: How can we use LA in instructional design and decision?

Application and Impact: How can these advanced visualization techniques enhance your understanding and decision-making processes as an educator, instructional designer, or policymaker? Consider specific scenarios where these tools could provide deeper insights or drive more informed decisions.

[Using advanced visualization techniques can enhance understanding and decision-making processes as an educator by providing accurate, precise, and descriptive visuals for teachers to base their instructional decisions to advance students learning and progress. A specific scenario that this would apply to would be progress monitoring and benchmark standards. Using advanced visualization techniques will provide an accurate placement displaying student mastery of the content and grade level placement. This will provide teacher with valuable information on the nextn instructional steps they need to take whether that involves reteaching, introducing a new topic, or challenging learners.]

Challenges and Opportunities: What challenges might you face in implementing these visualizations in real-world educational settings? How can you overcome these challenges to effectively utilize these tools?

[Some challenges that I may face in implementing these visualizations is that it can be time consuming, and teachers do not always have time to create visuals in this sort of in-depth manner. Another challenge may be deciphering which assignments or assessments we should include within a graph to examine student growth and progress. Like should we include every assignment or just major assignments? This can be a challenge because students complete so many assignments in a day. Is it necessary to track every detail? A teacher has time to check a worksheet/assignment,but once again, it can become very time consuming to track and enter every assignment in a data set. This is where we would have to decide what is necessary to track and what is not. I would consider large assessments like quizzes or tests as priority to track and then maybe an occasional worksheet that shows mastery of a units topics would improve a teachers understanding of where and how well their students are doing.]

Future Considerations: Reflect on how these techniques could evolve in your field. What future possibilities do you see for the use of advanced visualizations in learning analytics?

[Future possibilities for advanced visualizations in education pertaining to learning analytics would be the advancement of having an app or website that you can easily input and track student data throughout a child’s school career. Making this information easily accessible would help an educator visualize multiple dynamics in a child’s learning career. Incorporating factors like student habits and attendance rate to track reasoning behind student work and participation would help an educator better understand a students habits and dedication in learning. Also making an interface that is user friendly and fast to input variables and values. Something that would not take long for a teacher to do. As I think about the advancement of technology in education, we could also consider an application that records student data from an assessment that is distributed electronically. A student can complete an assignment or assessment online, and it will automatically grade the assessment and place students in a domain pertaining to their score or performance level. If their is a required question that a teacher has to grade, a teacher could grade it based of key words or criterion which is then inputted in the system to place the student and populate a graphical (visual) representation of their assessments outcome. I really think visuals in education are key in understanding student achievement and growth. Visual representation of student progress also makes it easy to compare students across a grade level and track their accomplishments throughout the school year.]

Render & Submit

Congratulations, you’ve completed the final module!

To receive full score, you will need to render this document and publish via a method such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with me and I have reviewed your work, you will be officially done with the current module.

Complete the following steps to submit your work for review by:

First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Next, click the “Render” button in the toolbar above to “render” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let me know if you run into any issues with rendering.
Finally, publish. To do publish, follow the step from the link

If you have any questions about this module, or run into any technical issues, don’t hesitate to contact me.

Once I have checked your link, you will be notified!