Visualization Practices

CUED 7540: Learning Analytics III

Author

Caroline Wendt, cghuntley42@tntech.edu

Published

September 30, 2025

Learning Objectives

By the end of this lesson, you will be able to:

- Generate basic analytics plots using ggplot2.

- Customize plots for better data visualization.

- Interpret different types of visualizations to gain insights.

- Understand how to apply visualization to different data analysis tasks.

Part 1: Loading and Exploring Data**

Before we start creating plots, we need to load and inspect the datasets we’ll be working with. We’ll use a new dataset called educational_data.csv.

Task 1: Load the educational_data.csv dataset and inspect its structure. This will help you understand what variables you’re working with.

Reflect & Respond

Question 1: What Catches Your Eye? As you browse through the dataset, what stands out to you? Is there anything that piques your curiosity? Maybe a surprising trend or a pattern you didn’t expect?

[One trend that I seemed to notice was that as the study hours went up, the final grades went up as well. Another thing that was surprising was that the quiz scores seemed to be higher than the final exam scores. Also, the student who performed the best on the quiz had a lower score than most of the other students in the first part of the data.]

Question 2: What Questions Do You Have? Is there something specific you’d like to dig deeper into? Think about what you might want to learn more about. Are there any relationships between variables you’re curious about?

[I’m curious to explore if there are is a relationship between homework completion and the final grade. It seems that there might be some positive relationship between the two but I’d like to analyze it further.]

Question 3: What’s Your Analytics Game Plan? How would you approach analyzing this dataset? What steps would you take to uncover the insights you’re interested in?

[When approaching this data set, I would want to analyze most of the variables in conjunction with the final grade. This is because I’d be interested to see what might have the most significant relationship to the final performance of students. To uncover these insights, I would create scatter plots with trend lines between each variable and final grades separately. I would then compute the correlation between each variable and final grades to see what variables might predict or have the strongest relationship with final grades. This would indicate areas of importance to boost the final grades of students. ]

Part 2: Visualizing Relationship and Scatter Plots

Scatter plots are useful for visualizing the relationship between two continuous variables. The gg in ggplot stands for “Grammar of Graphics,” which means we build plots in layers.

Scatter Plot

Task 2: Create a scatter plot to explore the relationship between Study_hours and Quiz_Score. This plot will help you visualize if there’s a correlation.

# Create a scatter plot of Study_hours vs. Quiz_Score with a regression line

ggplot(data3, aes(x = Study_Hours, y = Quiz_Score)) + # TYPE YOUR CODE. TWO VARIABLES HERE
  geom_point(color = "blue", size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +  # This line will add a linear regression line
  labs(title = "Scatter Plot of Study Hours vs. Quiz Score", #UPDATE YOUR PLOT TITLES
       x = "Study Hours (Hours)",
       y = "Quiz Score") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 12, face = "bold"),
    axis.title.y = element_text(size = 12, face = "bold")
  )

`geom_smooth()` using formula = 'y ~ x'

Task 3: Now that we’ve visualized the relationship, let’s compute the correlation between the two variables to get a numerical value for their relationship. The use = "complete.obs" argument handles any missing values by only using the rows that have data for both variables.

# Compute the correlation
# COMPLETE YOUR CODE BELOW
correlation <- cor(data3$Study_Hours, data3$Quiz_Score, use = "complete.obs")

# Display the correlation
correlation

[1] -0.04308071

Reflect & Respond

Question: What does the correlation value tell you about the relationship between study hours and quiz scores?

[The correlation is negative and weak. This means that as one variable goes up, the other might tend to go down. However, a 0.043 is a very weak correlation meaning there probably is no relationship between the variables study hours and quiz scores.]

Activity: Customize the Scatterplot

# Create a scatterplot of 'Quiz_Score' vs 'Final_Exam_Score'
ggplot(data3, aes(x = Quiz_Score, y = Final_Exam_Score)) + #COMPLETE THE CODE
  geom_point(color = "red") +
  geom_smooth(method = "lm", color = "blue", se = TRUE) +
  labs(title = "Scatter plot of Quiz Score vs. Final Exam Score", x = "Quiz Score", y = "Final Exam Score")

`geom_smooth()` using formula = 'y ~ x'

Let’s customize the relationship between Quiz_Score and ‘Final_Exam_Score’.

Change the size of the points to make them more prominent. You can also experiment with different shapes (e.g., circles, triangles, squares) Hint: add ‘size = 3, shape = 16’ inside geom_point(). Numbers can change.
Add a linear regression line to your scatter plot to see the trend between variable 1 and variable 2. *Hint: Use geom_smooth()
Update the title and the axis labels. Make the title bold and center it.
Use facet_wrap() to create separate scatter plots based on Gender. *Hint:facet_wrap()
Exercise with the {scatterplot-activity} chunk below.

# Create a scatterplot of 'Quiz_Score' vs 'Final_Exam_Score'
# COMPLETE THE CODE
ggplot(data3, aes(x = Quiz_Score, y = Final_Exam_Score)) +
  geom_point(color = "lightpink",size = 5, shape = 2) +
  labs(title = "Scatterplot of Quiz Score vs. Final Exam Score", x = "Quiz Score", y = "Final Exam Score")+
geom_smooth(method = "lm", color = "lightblue", se = FALSE)+
  theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))+
facet_wrap(~Gender)

`geom_smooth()` using formula = 'y ~ x'

Part 3: Histogram

Histograms are used to visualize the distribution of a single continuous variable. Let’s create a histogram of Homework_completion.

# TYPE YOUR CODE FOR THE X VARIABLE BELOW
ggplot(data3, aes(x = Homework_Completion)) +
  geom_histogram(binwidth = 5, fill = "lightblue", color = "black") +
  labs(title = "Histogram of Homework Completion", x = "Homework Completion", y = "Frequency") #update your titles

Activity: Customize the Histogram

Change the fill and color of the bars to something else. You can choose any colors you like! Use the link provided earlier for color options. color names
Change the binwidth, and observe how the histogram changes. What happens if you set it to different numbers? If you set the ‘binwidth’ to different numbers it makes the bars on the graph thicker or thinner. A smaller number will make the bars thinner, and a larger number will make the bar thicker.
Update the title and the x-axis to something that is relevant to your analytics.
Apply the theme_minimal() and see how it changes the look of your plot. Try out other themes like theme_classic() or theme_dark(). *Hint - You can add +theme_minimal() at the end of the code line.
Add a vertical line at the mean of the variable to highlight the average value. Use geom_vline(). The mean() function with na.rm = TRUE will ignore missing values.
Use facet_wrap() to create separate scatterplots ffor a categorical variable like Gender. For example, if you want two scatterplots based on ‘Gender’ variable, the syntax is +facet_wrap(~Gender).
Use the histogram-activity chunk below to practice.

# Customize the histogram of the 'Homework_completion' variable
# COMPLETE THE CODE
ggplot(data3, aes(x = Homework_Completion)) +
  geom_histogram(binwidth = 4, fill = "lightyellow", color = "lightgreen") +
  labs(title = "Histogram of Homework Completion", x = "Homework Completion", y = "Frequency")+
theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))+
  theme_dark()+
  geom_vline(aes(xintercept = mean(Homework_Completion, na.rm = TRUE)),
           color = "red", linetype = "dashed", size = 1)+
  facet_wrap(~Gender)

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Part 4: Exploring Grouped Data with Box Plots

Boxplots are useful for visualizing the distribution of a variable across different categories and identifying potential outliers.

Box Plot

Let’s create a box plot of Homework_Completionby Gender.

# Create a boxplot of 'Homework_Completion' by 'Gender'
# COMPLETE THE CODE
ggplot(data3, aes(x = Gender, y = Homework_Completion, fill = Gender)) +
  geom_boxplot() +
  labs(title = "Boxplot of Homework Completion by Gender", x = "Gender", y = "Homework Completion")

Activity: Customize the Boxplot

Change the fill colors for the boxes.
Add color = “black” (or some other color) to the geom_boxplot() to set the color of the box outlines.
Update the title and the y-axis label properly. Make the title bold and center it. **Hint: ** Add +theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))
Customize the outliers by changing their shape and color. For example, make outliers larger and red by adding +geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 3)
Use the boxplot-activity chunk below to practice.

# Create a boxplot of 'Homework_Completion' by 'Gender'
# COMPLETE THE CODE
ggplot(data3, aes(x = Gender, y = Homework_Completion, fill = Gender)) +
  geom_boxplot(fill = "lightblue", color = "black", outlier.colour = "purple", outlier.shape = 15, outlier.size = 4) +
  labs(title = "Boxplot of Homework Completion by Gender", x = "Gender", y = "Homework Completion")+
  theme(plot.title = element_text(size = 17, face = "bold", hjust = 0.5))+
  theme(legend.position = "none")

Part 5: Counting Categories with Bar Plots

Bar plots can display the counts of different categories in your data.

Task 5: Visualize the count of students by Gender.

# Create a bar plot of counts of 'Gender'
# COMPLETE THE CODE BELOW
ggplot(data3, aes(x = Gender)) +
  geom_bar(fill = "green", color = "black") +
  labs(title = "Bar Plot", x = "Gender", y = "Count of Students")

Activity: Customize the Bar Plot

Change the fill color of the bars.
Change the width of the bars by using the width parameter inside geom_bar(). (i.e., width = 0.5)
Update the title and the y-axis labels to be descriptive. Make the title bold and center it.
Use the barplot-activity chunk below to practice.

# Create a bar plot of counts of 'Gender'
# COMPLETE THE CODE
ggplot(data3, aes(x = Gender)) +
  geom_bar(width = 0.4, aes(fill = Gender), color = "black") +
  labs(title = "Bar Plot of the Count of Students by Gender", x = "Gender", y = "Count of Students")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_fill_manual(values = c("F" = "lightgreen", "M" = "lightblue"))+
  theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))

Part 6: Tracking Trends with Line Plots

Line plots are useful for showing trends over time or accross ordered categories.

For this, we will use a new dataset, student_quiz_scores.csv from our data folder.

Line Plot

Task 6: Load the student_quiz_scores.csv file and create a line plot to visualize each student’s score trend across quizzes.

# Import/load the dataset
data3_2 <- read_csv("data/student_quiz_scores.csv") # COMPLETE YOUR CODE

Rows: 400 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Student_ID, Quiz
dbl (1): Score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Create a line plot for each student's quiz scores
ggplot(data3_2, aes(x = Quiz, y = Score, group = Student_ID, color = Student_ID)) +
  geom_line(size = 1, alpha = 0.6) +
  geom_point(size = 2) +
  labs(title = "Student Quiz Scores", x = "Quiz", y = "Quiz Scores") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 12, face = "bold"),
    axis.title.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"
  )

Activity: Customize the Line Plot

Select a subset of 5 specific students to create a more focused line plot.
Make the line thicker and the points larger to improve readability.
Update the plot title and axis labels to be more descriptive. Make the title bold and center it.
Use the lineplot-activity chunk below to practice.

# COMPLETE YOUR CODE
selected_students <- data3_2 %>%
  filter(Student_ID %in% c("Student_1", "Student_5", "Student_10", "Student_15", "Student_7"))

ggplot(selected_students, aes(x = Quiz, y = Score, group = Student_ID, color = Student_ID)) +
  geom_line(size = .7, alpha = 0.6) +
  geom_point(size = 5) +
  labs(title = "Student Quiz Scores", x = "Quiz", y = "Quiz Score") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 17, face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 12, face = "bold"),
    axis.title.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"
  )

Final Reflection : How can we use LA in instructional design and decision?

After practicing the basic analysis and visualization techniques for the past couple of weeks, take some time to reflect on how these skills can be applied in the real world.

Consider the role of an educator, instructional designer, curriculum developer, policymaker, etc. How might the ability to analyze and visualize data help you make informed decisions, improve learning outcomes, or design more effective educational experiences? Think broadly about the implications of these skills in your current or future professional context, and share your thoughts on how data-driven insights could enhance your work.

[Coming from the standpoint of an educator and aspiring instructional designer, I found that learning analytics was extremely beneficial in making informed decisions. After working with data sets, I got to dive into the usefulness of analyzing data efficiently to visualize trends that might be occurring. Specifically, I got to look at trends in student performance compared to different variables. Alongside trends, I got the see the strength of relationships between particular variables. It helped me answer questions such as: Does one variable correlate to another? And if so, how strong is that correlation? With this, the ability to analyze and visualize data can allow me as an educator to improve learning outcomes by implementing practices or strategies that seemed to increase student outputs positively. For instance, I could upload student data from our learning management system and specifically look at the variables of number of lessons completed on a personalized pathway each week and student mid-term grades in that specific subject. With this, I could then make graphs with trend lines and complete data analyses to gain descriptive information on the correlation between those two variables. If I found that students did the best with 5 lessons a week but more than 5 started to negatively affect grades, I could use this to inform how many lessons a week are the goal for students in order to improve learning outcomes.

As an instructional designer, I could take this same data set and analyze the effectiveness of different online learning components. If I found correlations in student grades and number of lessons completed, I I could then redesign the online platform to cap off the number of lessons on a pathway in a week for students to maximize their learning achievements.

All in all, learning analytics is powerful in informing the best ways to educate and design as it provides analyses of real-life data efficiently that can help us understand the variables we are working with and the relationships present. ]

Render & Submit

Congratulations, you’ve completed the module!

To receive full score, you will need to render this document and publish via a method such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with me and I have reviewed your work, you will be officially done with the current module.

Complete the following steps to submit your work for review by:

First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Next, click the “Render” button in the toolbar above to “render” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let me know if you run into any issues with rendering.
Finally, publish. To do publish, follow the step from the link

If you have any questions about this module, or run into any technical issues, don’t hesitate to contact me.

Once I have checked your link, you will be notified!