Lab 3 Case Study: Unsupervised Learning in Learning Analytics

Step by step process:

• Here first we simulated the student data with two features i.e., student_engagement and student_performance. These two are used to measure the student engagement and the performance.

• In the next step, we reduced the dimensionality by conducting Principal component analysis (PCA).

• This identifies the principal components that captures the most variation in the data.

• Now we applied KMeans clustering algorithm to the reduced dimensional data to identify clusters of the students with same learning patterns.

• In the next step, based on their principal components, we assigned each student to each of the three clusters.

Interpretation of the results:

Number of clusters:

Three clusters were identified by Kmeans clustering.

Characteristics of these clusters:

Cluster 1:

This cluster represents the group of students who have high engagement and high performance. They tend to actively involve in the academics and exhibit strong achievement.

Cluster 2:

This cluster represents the group of students who have medium engagement and medium performance. They tend to moderately involve in the curriculum and achieve average academic performance.

Cluster 3:

This group of students have less engagement in class and low performance. They show minimal participation in the class activities and struggle academically.

Insights:

“Learning analytics is an emerging field in which sophisticated analytic tools are used to improve learning and education. It draws from, and is closely tied to, a series of other fields of study including business intelligence, web analytics, academic analytics, educational data mining, and action analytics” [1]. This clustering analysis provides distinct group of students who have distinct levels of engagement and performance. By studying and analyzing these clusters, it helps us to identify students who will need additional help support in terms of academic help and curriculum assistance. Cluster 1 represents the scenario of highly engaged and high performing students whereas Cluster 3 represents the group of students who are not performing well and are at a risk of being academic failure.
The scatter plot visualizes the distribution of students based on their engagement and performance. Each point represents each individual. All three clusters are color coded for easy identification. Clusters can be distinguished based on their positions in the plot. Cluster 1 students tend to cluster in the top middle and towards the top-right corner. Cluster 2 students appear at the middle and bottom left corner. Cluster 3 students appear in the bottom right corner. The plot also highlights the separation between clusters. This indicates the students with distinct groups who possess varying engagement and performances.

Implications:

The observations highlight the importance of personalized learning approach that helps students to also academic advisors to lend their support t students who are in need of intervention from the faculty and who need additional support. By identifying student clusters, faculty and academic advisors can implement a plan by targeting students to improve academic performance. Learning analytics plays a crucial role in optimizing teaching strategies and facilitating improvement in student success rate. Overall, this clustering analysis provides valuable insights about student learning patterns. This not only helps students but also faculty in effectively identifying students who need support and intervention.

References: [1] Elias, T. (2011). Learning Analytics: Definitions, Processes and Potential.

# Define function to simulate student features
simulate_student_features <- function(n = 100) {
  # Set the random seed
  set.seed(260923)
  
  # Generate unique student IDs
  student_ids <- seq(1, n)
  
  # Simulate student engagement
  student_engagement <- rnorm(n, mean = 50, sd = 10)
  
  # Simulate student performance
  student_performance <- rnorm(n, mean = 60, sd = 15)
  
  # Combine the data into a data frame
  student_features <- data.frame(
    student_id = student_ids,
    student_engagement = student_engagement,
    student_performance = student_performance
  )
  
  # Return the data frame
  return(student_features)
}

# Generate the dataset
student_data <- simulate_student_features()

# View the dataset
head(student_data)
##   student_id student_engagement student_performance
## 1          1           35.47855            50.52231
## 2          2           51.79512            58.88396
## 3          3           62.41012            40.56755
## 4          4           35.20679            62.46033
## 5          5           59.37552            54.69326
## 6          6           57.00109            54.09745
# Perform dimensionality reduction using Principal Component Analysis (PCA)
pca_result <- prcomp(student_data[, -1], scale. = TRUE)

# Extract principal components
pca_components <- pca_result$x

# Summarize principal components
summary(pca_components)
##       PC1                PC2           
##  Min.   :-2.32997   Min.   :-2.323430  
##  1st Qu.:-0.91441   1st Qu.:-0.685371  
##  Median :-0.06904   Median : 0.005658  
##  Mean   : 0.00000   Mean   : 0.000000  
##  3rd Qu.: 0.77925   3rd Qu.: 0.757692  
##  Max.   : 2.12058   Max.   : 2.571604
# Apply KMeans clustering
set.seed(123)
kmeans_result <- kmeans(pca_components, centers = 3)

# Extract cluster assignments
cluster_assignments <- kmeans_result$cluster

# Add cluster assignments to the dataset
student_data$cluster <- cluster_assignments

# Summarize cluster assignments
summary(cluster_assignments)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    2.00    2.01    3.00    3.00
# View the dataset with cluster assignments
head(student_data)
##   student_id student_engagement student_performance cluster
## 1          1           35.47855            50.52231       2
## 2          2           51.79512            58.88396       3
## 3          3           62.41012            40.56755       3
## 4          4           35.20679            62.46033       2
## 5          5           59.37552            54.69326       3
## 6          6           57.00109            54.09745       3
# Plot the clusters
library(ggplot2)

# Plot student engagement vs student performance with clusters colored
ggplot(student_data, aes(x = student_engagement, y = student_performance, color = factor(cluster))) +
  geom_point() +
  scale_color_manual(values = c("blue", "red", "green")) +
  labs(title = "Student Clusters based on Engagement and Performance",
       x = "Student Engagement",
       y = "Student Performance",
       color = "Cluster") +
  theme_minimal()

Including Plots

You can also embed plots, for example:

# Plot the clusters
library(ggplot2)

# Plot student engagement vs student performance with clusters colored
ggplot(student_data, aes(x = student_engagement, y = student_performance, color = factor(cluster))) +
  geom_point() +
  scale_color_manual(values = c("blue", "red", "green")) +
  labs(title = "Student Clusters based on Engagement and Performance",
       x = "Student Engagement",
       y = "Student Performance",
       color = "Cluster") +
  theme_minimal()

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.