Step by step process:
• Here first we simulated the student data with two features i.e., student_engagement and student_performance. These two are used to measure the student engagement and the performance.
• In the next step, we reduced the dimensionality by conducting Principal component analysis (PCA).
• This identifies the principal components that captures the most variation in the data.
• Now we applied KMeans clustering algorithm to the reduced dimensional data to identify clusters of the students with same learning patterns.
• In the next step, based on their principal components, we assigned each student to each of the three clusters.
Interpretation of the results:
Number of clusters:
Three clusters were identified by Kmeans clustering.
Characteristics of these clusters:
Cluster 1:
This cluster represents the group of students who have high engagement and high performance. They tend to actively involve in the academics and exhibit strong achievement.
Cluster 2:
This cluster represents the group of students who have medium engagement and medium performance. They tend to moderately involve in the curriculum and achieve average academic performance.
Cluster 3:
This group of students have less engagement in class and low performance. They show minimal participation in the class activities and struggle academically.
Insights:
“Learning analytics is an emerging field in which sophisticated
analytic tools are used to improve learning and education. It draws
from, and is closely tied to, a series of other fields of study
including business intelligence, web analytics, academic analytics,
educational data mining, and action analytics” [1]. This clustering
analysis provides distinct group of students who have distinct levels of
engagement and performance. By studying and analyzing these clusters, it
helps us to identify students who will need additional help support in
terms of academic help and curriculum assistance. Cluster 1 represents
the scenario of highly engaged and high performing students whereas
Cluster 3 represents the group of students who are not performing well
and are at a risk of being academic failure.
The scatter plot visualizes the distribution of students based on their
engagement and performance. Each point represents each individual. All
three clusters are color coded for easy identification. Clusters can be
distinguished based on their positions in the plot. Cluster 1 students
tend to cluster in the top middle and towards the top-right corner.
Cluster 2 students appear at the middle and bottom left corner. Cluster
3 students appear in the bottom right corner. The plot also highlights
the separation between clusters. This indicates the students with
distinct groups who possess varying engagement and performances.
Implications:
The observations highlight the importance of personalized learning approach that helps students to also academic advisors to lend their support t students who are in need of intervention from the faculty and who need additional support. By identifying student clusters, faculty and academic advisors can implement a plan by targeting students to improve academic performance. Learning analytics plays a crucial role in optimizing teaching strategies and facilitating improvement in student success rate. Overall, this clustering analysis provides valuable insights about student learning patterns. This not only helps students but also faculty in effectively identifying students who need support and intervention.
References: [1] Elias, T. (2011). Learning Analytics: Definitions, Processes and Potential.
# Define function to simulate student features
simulate_student_features <- function(n = 100) {
# Set the random seed
set.seed(260923)
# Generate unique student IDs
student_ids <- seq(1, n)
# Simulate student engagement
student_engagement <- rnorm(n, mean = 50, sd = 10)
# Simulate student performance
student_performance <- rnorm(n, mean = 60, sd = 15)
# Combine the data into a data frame
student_features <- data.frame(
student_id = student_ids,
student_engagement = student_engagement,
student_performance = student_performance
)
# Return the data frame
return(student_features)
}
# Generate the dataset
student_data <- simulate_student_features()
# View the dataset
head(student_data)
## student_id student_engagement student_performance
## 1 1 35.47855 50.52231
## 2 2 51.79512 58.88396
## 3 3 62.41012 40.56755
## 4 4 35.20679 62.46033
## 5 5 59.37552 54.69326
## 6 6 57.00109 54.09745
# Perform dimensionality reduction using Principal Component Analysis (PCA)
pca_result <- prcomp(student_data[, -1], scale. = TRUE)
# Extract principal components
pca_components <- pca_result$x
# Summarize principal components
summary(pca_components)
## PC1 PC2
## Min. :-2.32997 Min. :-2.323430
## 1st Qu.:-0.91441 1st Qu.:-0.685371
## Median :-0.06904 Median : 0.005658
## Mean : 0.00000 Mean : 0.000000
## 3rd Qu.: 0.77925 3rd Qu.: 0.757692
## Max. : 2.12058 Max. : 2.571604
# Apply KMeans clustering
set.seed(123)
kmeans_result <- kmeans(pca_components, centers = 3)
# Extract cluster assignments
cluster_assignments <- kmeans_result$cluster
# Add cluster assignments to the dataset
student_data$cluster <- cluster_assignments
# Summarize cluster assignments
summary(cluster_assignments)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 2.00 2.01 3.00 3.00
# View the dataset with cluster assignments
head(student_data)
## student_id student_engagement student_performance cluster
## 1 1 35.47855 50.52231 2
## 2 2 51.79512 58.88396 3
## 3 3 62.41012 40.56755 3
## 4 4 35.20679 62.46033 2
## 5 5 59.37552 54.69326 3
## 6 6 57.00109 54.09745 3
# Plot the clusters
library(ggplot2)
# Plot student engagement vs student performance with clusters colored
ggplot(student_data, aes(x = student_engagement, y = student_performance, color = factor(cluster))) +
geom_point() +
scale_color_manual(values = c("blue", "red", "green")) +
labs(title = "Student Clusters based on Engagement and Performance",
x = "Student Engagement",
y = "Student Performance",
color = "Cluster") +
theme_minimal()
You can also embed plots, for example:
# Plot the clusters
library(ggplot2)
# Plot student engagement vs student performance with clusters colored
ggplot(student_data, aes(x = student_engagement, y = student_performance, color = factor(cluster))) +
geom_point() +
scale_color_manual(values = c("blue", "red", "green")) +
labs(title = "Student Clusters based on Engagement and Performance",
x = "Student Engagement",
y = "Student Performance",
color = "Cluster") +
theme_minimal()
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.