Visualization Practices

CUED 7540: Learning Analytics III

Author

Nicole Jayne (nojayne42@tntech.edu)

Published

October 11, 2024

Learning Objectives

By the end of this lesson, you will be able to:

- Generate basic analytics plots.

- Customize plots for better data visualization.

Load data

Before we start creating plots, we need to load and inspect the dataset we’ll be working with.

Task 1: Load the dataset you have or our new dataset called ‘educational_data.csv.’ You will inspect its structure. This will help you understand what data you’re working with.

Reflect & Respond

Question 1: What Catches Your Eye? As you browse through the dataset, what stands out to you? Is there anything that piques your curiosity? Maybe a surprising trend or a pattern you didn’t expect?

[One thing that really caught my eye when examining the data was only one student scored a 90 on the quiz. The rest of the students scored under that, and no students scored above an 83 on the exam. The highest score was a 82. This forms an interesting relationship based off the amount of hours students spent on studying the content. Another interesting thing that I noticed is that student four had the lowest attendance rate (78.0) but had the highest study rate (13.0). They got the most homework completed, did average on the quiz, but earned the top grade on the final exam.]

Question 2: What Questions Do You Have? Is there something specific you’d like to dig deeper into? Think about what you might want to learn more about. Are there any relationships between variables you’re curious about?

[A question I would ask is, “Does the students whose attendance rate and study hours achieve the highest grade on the quiz and exam?”. Also does completing the homework prove a successful indication that students who had a high completion score on the homework are going to do well on the quiz and exam?. Relating variables such as study hours and attendance rate to homework completion and quiz/final exam scores will help me recognize strategies that worked for students and gaps that still may need to be filled in.]

Question 3: What’s Your Analytics Game Plan? How would you approach analyzing this dataset? What steps would you take to uncover the insights you’re interested in?

[I would begin by analyzing the correlation between study hours, attendance rate, homework completion, quiz scores, and final exam scores based off each individual student. After examining students’ individual data, I would relate similar variables between students to see if I can identify a trend between similar students pertaining to their study habits, attendance rate, and homework completed. This would help me narrow down/reason scenarios for why some students did well and others did poorly.]

Correlation

Scatter Plot

Task 2: Scatter plots are useful for visualizing the relationship between two continuous variables (e.g., Study hours & Quiz Score). Let’s create a scatter plot to explore the relationship between two variables of your choice from the loaded data frame.

# Create a scatterplot
# Create a scatter plot of two variables with a regression line

ggplot(data3, aes(x = Study_Hours , y = Quiz_Score )) + # TYPE YOUR CODE. TWO VARIABLES HERE
  geom_point(color = "blue", size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", color = "red", se = TRUE) +  # This line will add a linear regression line
  labs(title = "Scatter Plot of Study Hours vs. Quiz_Score", #UPDATE YOUR PLOT TITLES
       x = "Study Hours (Hours)",
       y = "Quiz_Score") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 12, face = "bold"),
    axis.title.y = element_text(size = 12, face = "bold")
  )

`geom_smooth()` using formula = 'y ~ x'

Task 3: Now that we’ve visualized the relationship, let’s compute the correlation between two variables.

# Compute the correlation
# TYPE YOUR CODE
correlation <- cor(data3$Study_Hours, data3$Quiz_Score , use = "complete.obs")

# Display the correlation
correlation

[1] -0.04308071

Reflect & Respond

Question: What does the correlation value tell you about the relationship between time spent and final grades?

[The correlation value displayed is negative. This indicates to me that as one variable increases (time spent) the other variable (final grades) decreases.]

Histogram

Histograms are used to visualize the distribution of a single continuous variable. Let’s create a histogram of Homework_completion or the variable of your choice.

# TYPE YOUR CODE FOR THE X VARIABLE BELOW
ggplot(data3, aes(x = Homework_Completion)) +
  geom_histogram(binwidth = 2, fill = "magenta", color = "black") +
  labs(title = "Histogram of Homework Completion", x = "Homework Completion Score ", y = "Frequency") #update your titles

Activity: Customize the Histogram

Change the fill color AND the color to something else. You can choose any colors you like! Use the link provided earlier for color options. color names
Change the binwidth, and observe how the histogram changes. What happens if you set it to 2?
Update the title and the x-axis to something that is relevant to your analytics.
Apply the theme_minimal() and see how it changes the look of your plot. Try out other themes like theme_classic() or theme_dark(). *Hint - You can add +theme_minimal() at the end of the code line.
Try adding a vertical line at the mean of the variable to highlight the average value of Homework_Completion. *Hint - use geom_vline()
Use facet_wrap() to create separate scatterplots for the variable that is categorical. For example, if you want two scatterplots based on ‘Gender’ variable, we can add +facet_wrap(~Gender)
Exercise with the {histogram-activity} chunk below.

# Customize the histogram of the 'Homework_completion' variable
# COMPLETE THE CODE
ggplot(data3, aes(x = Homework_Completion )) +
  geom_histogram(binwidth = 2, fill = "magenta", color = "black") +
  labs(title = "Histogram of Homework Completed", x = "Homework Completion Grade", y = "Frequency")+
theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))+
  theme_dark()+ 
  geom_vline(aes(xintercept = mean(Homework_Completion, na.rm = TRUE)),
           color = "red", linetype = "dashed", size = 1)+facet_wrap(~Gender)

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Scatterplot

Scatter plots show the relationship between two continuous variables. For example, we can visualize the relationship between TimeSpent and FinalGradeCEMS.

# Create a scatterplot of 'Quiz_Score' vs 'Final_Exam_Score'
ggplot(data3, aes(x = Quiz_Score , y = Final_Exam_Score )) + #COMPLETE THE CODE
  geom_point(color = "red") +
  labs(title = "Scatter plot ", x = "x-Quiz_Score", y = "y-Final_Exam_Score")

Activity: Customize the Scatterplot

Change the size of the points to make them more prominent. You can also experiment with different shapes (e.g., circles, triangles, squares) *Hint: add ‘size = 3, shape = 16’ inside geom_point(). Numbers can change.
Add a linear regression line to your scatter plot to see the trend between variable 1 and variable 2. *Hint: Use geom_smooth()
Update the title and the axis labels. Make the title bold and center it.
Use facet_wrap() to create separate scatter plots based on Gender. *Hint:facet_wrap()
Exercise with the {scatterplot-activity} chunk below.

# Create a scatterplot of 'Quiz_Score' vs 'Final_Exam_Score'
# COMPLETE THE CODE
ggplot(data3, aes(x = Quiz_Score , y = Final_Exam_Score )) +
  geom_point(color = "purple",size = 3, shape = 16) +
  labs(title = "Scatterplot of Quiz scores vs. Final Exam Score ", x = "Quiz Score", y = "Final Exam Score")+
geom_smooth(method = "lm", color = "lightpink", se = FALSE)+
  theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))+facet_wrap(~Gender)

`geom_smooth()` using formula = 'y ~ x'

Boxplot

Boxplots are useful for visualizing the distribution of a variable and identifying potential outliers. Let’s create a boxplot of Homework_Completionby Gender.

# Create a boxplot of 'Homework_Completion' by 'Gender'
# COMPLETE THE CODE
ggplot(data3, aes(x = Gender, y = Homework_Completion , fill = Gender)) +
  geom_boxplot() +
  labs(title = "Boxplot of Homework Completion and Gender", x = "Gender", y = "Homework Completion")

Activity: Customize the Boxplot

Change the fill color.
Add color = “black” to the geom_boxplot() to set the color of the box outlines.
Update the title and the y-axis label properly. Make the title bold and center it. **Hint: ** Add +theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))
Customize the outliers by changing their shape and color. For example, make outliers larger and red by adding +geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 3)
Exercise with the {boxplot-activity} chunk below.

# Create a boxplot of 'Homework_Completion' by 'Gender'
# COMPLETE THE CODE
ggplot(data3, aes(x = Homework_Completion , y = Gender , fill = Gender)) +
  geom_boxplot(color = 'black') +
  labs(title = "Boxplot of Homework Completion by Gender", x = "Gender", y = "Homework Completion")+
  theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))+
  theme(legend.position = "none")+geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 3)

Bar Plot

Bar plots can display counts of categorical data. We’ll visualize the count of students by Gender.

# Create a bar plot of counts of 'Gender'
# COMPLETE THE CODE
ggplot(data3, aes(x = Gender )) +
  geom_bar(fill = "green", color = "black") +
  labs(title = "Bar Plot", x = "x", y = "y")

Activity: Customize the Bar Plot

Change the fill color.
Change the width of the bars by using the width parameter inside geom_bar(). (i.e., width = 0.5)
Update the title and the y-axis label. Make the title bold and center it.
Exercise with the {barplot-activity} chunk below.

# Create a bar plot of counts of 'Gender'
# COMPLETE THE CODE
ggplot(data3, aes(x = Gender )) +
  geom_bar(width = 0.5, fill = "orange", color = "black") +
  labs(title = "Bar Plot of Genders in Test Group ", x = "x", y = "y")+
  geom_bar(aes(fill = Gender), color = "black") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_fill_manual(values = c("F" = "blue", "M" = "red"))+
  theme(plot.title = element_text(size = 16, face = "bold", hjust = 0.5))

Line Plot

Line plots are useful for showing trends over time or ordered categories. Load student_quiz_scores.csv from our data folder to create a line plot.

# Import/load the dataset
data3_2 <- read_csv("data/student_quiz_scores.csv") # TYPE YOUR CODE

Rows: 400 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Student_ID, Quiz
dbl (1): Score

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

ggplot(data3_2, aes(x = Quiz, y = Score, group = Student_ID, color = Student_ID)) +
  geom_line(size = 1, alpha = 0.6) +
  geom_point(size = 2) +
  labs(title = "Student's Quiz Score", x = "Quizzes", y = "Scores") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 12, face = "bold"),
    axis.title.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"
  )

Activity: Customize the Line Plot

Choose 5 specific students to create the line plot (check previous moudle).
Change the code to make the line thicker and the point larger.
Update the title and the y-axis label properly.Make the title bold and center it.
Exercise with the {lineplot-activity} chunk below.

# COMPLETE YOUR CODE
selected_students <- data3_2 %>%
  filter(Student_ID %in% c("Student_1", "Student_3", "Student_5", "Student_7", "Student_9"))

ggplot(selected_students, aes(x = Quiz, y = Score, group = Student_ID, color = Student_ID)) +
  geom_line(size =2, alpha = 0.6) +
  geom_point(size = 3) +
  labs(title = "Student's Quiz Scores", x = "Quizzes", y = "Scores") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 17, face = "bold", hjust = 0.5),
    axis.title.x = element_text(size = 12, face = "bold"),
    axis.title.y = element_text(size = 12, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"
  )

Reflection Activity: How can we use LA in instructional design and decision?

After practicing the basic analysis and visualization techniques for the past couple of weeks, take some time to reflect on how these skills can be applied in the real world. Consider the role of an educator, instructional designer, or policymaker. How might the ability to analyze and visualize data help you make informed decisions, improve learning outcomes, or design more effective educational experiences? Think broadly about the implications of these skills in your current or future professional context, and share your thoughts on how data-driven insights could enhance your work.

[Utilizing the skills that I have learned the last few weeks will help me gather information and to consolidate, compare, and analyze data to help me make informed decisions on how well my students are doing or have done in class or on an assignment. With these analytical skills in mind, I can successfully make informed decisions based of valuable data that symbolizes variables pertaining to each individual students scores, accomplishments, and struggles. With this information, I can compile strategies that will help my students succeed/excel and this helps me conform my teaching methodologies according to my students individual learning styles/needs.]

Render & Submit

Congratulations, you’ve completed the module!

To receive full score, you will need to render this document and publish via a method such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with me and I have reviewed your work, you will be officially done with the current module.

Complete the following steps to submit your work for review by:

First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Next, click the “Render” button in the toolbar above to “render” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let me know if you run into any issues with rendering.
Finally, publish. To do publish, follow the step from the link

If you have any questions about this module, or run into any technical issues, don’t hesitate to contact me.

Once I have checked your link, you will be notified!