Learning analytics is the use of data to understand and improve learning. Unsupervised learning is a type of machine learning that can be used to identify patterns and relationships in data without the need for labeled data.
In this case study, you will use unsupervised learning to analyze learning data from a Simulated School course. You will use dimensionality reduction to reduce the number of features in the data, and then use clustering to identify groups of students with similar learning patterns.
The data for this case study is generated with the simulated function below. The data contains the following features:
Student ID: A unique identifier for each student Feature 1: A measure of student engagement Feature 2: A measure of student performance
simulate_student_features <- function(n = 100) {
# Set the random seed
set.seed(260923)
# Generate unique student IDs
student_ids <- seq(1, n)
# Simulate student engagement
student_engagement <- rnorm(n, mean = 50, sd = 10)
# Simulate student performance
student_performance <- rnorm(n, mean = 60, sd = 15)
# Combine the data into a data frame
student_features <- data.frame(
student_id = student_ids,
student_engagement = student_engagement,
student_performance = student_performance
)
# Return the data frame
return(student_features)
}
This function takes the number of students to simulate as an input and returns a data frame with three columns: student_id, student_engagement, and student_performance. The student_engagement and student_performance features are simulated using normal distributions with mean values of 50 and 60, respectively, and standard deviations of 10 and 15, respectively.
To use the simulate_student_features() function, we can simply pass the desired number of students to simulate as the argument:
student_features <- simulate_student_features(n = 100)
We can then use this data frame to perform unsupervised learning to identify groups of students with similar learning patterns,
library(stats)
library(dbscan)
##
## Attaching package: 'dbscan'
## The following object is masked from 'package:stats':
##
## as.dendrogram
library(cluster)
library(clustMixType)
library(mclust)
## Package 'mclust' version 6.0.0
## Type 'citation("mclust")' for citing this R package in publications.
library(ggplot2)
n <-100
student_ids <- seq(1,n)
student_ids
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
sample_size <-10
sample_size
## [1] 10
sample_students_ids <-sample(student_ids, size = sample_size)
sample_students_ids <-c(1,2,3,4,5)
sample_students_ids
## [1] 1 2 3 4 5
print(sample_students_ids)
## [1] 1 2 3 4 5
simulate the data
library(stats)
library(graphics)
library(datasets)
library(methods)
library(utils)
library(grDevices)
simulate_student_features <- function(n = 100) {
# Set the random seed
set.seed(260923)
#Generate unique student IDs
student_ids <- seq(1, n)
# Simulate student engagement
student_engagement <- rnorm(n, mean = 50, sd = 10)
# Simulate student performance
student_performance <- rnorm(n, mean = 60, sd = 15)
# Combine the data into a data frame
student_features <- data.frame(
student_id = student_ids,
student_engagement = student_engagement,
student_performance = student_performance
)
# Return the data frame
return(student_features)
}
(simulate_student_features)
## function(n = 100) {
## # Set the random seed
## set.seed(260923)
##
## #Generate unique student IDs
## student_ids <- seq(1, n)
##
## # Simulate student engagement
## student_engagement <- rnorm(n, mean = 50, sd = 10)
##
## # Simulate student performance
## student_performance <- rnorm(n, mean = 60, sd = 15)
##
## # Combine the data into a data frame
## student_features <- data.frame(
## student_id = student_ids,
## student_engagement = student_engagement,
## student_performance = student_performance
## )
##
## # Return the data frame
## return(student_features)
## }
student_data <- simulate_student_features()
student_data
## student_id student_engagement student_performance
## 1 1 35.47855 50.52231
## 2 2 51.79512 58.88396
## 3 3 62.41012 40.56755
## 4 4 35.20679 62.46033
## 5 5 59.37552 54.69326
## 6 6 57.00109 54.09745
## 7 7 34.81908 51.59185
## 8 8 66.43009 71.66933
## 9 9 53.12224 53.38812
## 10 10 58.66933 77.51403
## 11 11 64.61152 66.60144
## 12 12 64.73720 42.54982
## 13 13 61.38786 81.60053
## 14 14 43.46833 71.08870
## 15 15 60.90687 65.14477
## 16 16 62.31512 96.92023
## 17 17 30.19917 59.56755
## 18 18 62.71153 45.28098
## 19 19 53.46864 54.77840
## 20 20 43.16863 73.15011
## 21 21 47.79988 57.94230
## 22 22 32.86718 64.51376
## 23 23 48.43080 44.12915
## 24 24 44.64360 81.96037
## 25 25 59.37263 67.10083
## 26 26 40.14190 57.07313
## 27 27 52.05290 33.93715
## 28 28 58.53192 66.30623
## 29 29 46.85526 60.08155
## 30 30 40.57498 83.21472
## 31 31 48.27392 68.24131
## 32 32 63.40857 64.74623
## 33 33 42.63505 74.19713
## 34 34 49.50695 36.55820
## 35 35 39.64492 63.26139
## 36 36 58.16632 42.53282
## 37 37 46.85609 77.93879
## 38 38 53.21176 71.84144
## 39 39 24.56077 72.83616
## 40 40 47.94104 72.70324
## 41 41 54.54657 57.30181
## 42 42 65.36895 64.50104
## 43 43 63.35530 43.93461
## 44 44 38.06179 40.12883
## 45 45 52.79802 70.87511
## 46 46 53.09701 57.98622
## 47 47 52.26786 75.90067
## 48 48 60.27817 65.06789
## 49 49 49.96451 35.62894
## 50 50 62.85455 53.86589
## 51 51 61.18422 35.28333
## 52 52 73.07623 72.12643
## 53 53 62.47065 81.81560
## 54 54 55.20634 46.58676
## 55 55 50.93514 36.66812
## 56 56 55.74387 72.98336
## 57 57 56.41645 80.03891
## 58 58 50.67616 64.95174
## 59 59 42.12485 72.70400
## 60 60 33.46542 63.30804
## 61 61 45.10251 73.36602
## 62 62 47.66047 64.39439
## 63 63 53.12859 46.91366
## 64 64 49.11046 61.00457
## 65 65 45.98572 66.17874
## 66 66 53.55687 70.29459
## 67 67 59.53438 68.26805
## 68 68 71.76380 50.57808
## 69 69 42.81577 70.02416
## 70 70 34.62981 60.46338
## 71 71 43.46502 43.93160
## 72 72 59.06670 56.56050
## 73 73 53.00539 60.76549
## 74 74 43.51405 67.50167
## 75 75 54.59768 71.64025
## 76 76 43.57840 79.52546
## 77 77 29.80201 57.75304
## 78 78 45.01510 55.98704
## 79 79 48.30114 37.44258
## 80 80 45.19776 97.40030
## 81 81 54.44051 52.26339
## 82 82 56.73032 66.93843
## 83 83 46.66086 54.04959
## 84 84 33.72874 51.90589
## 85 85 32.07322 74.45711
## 86 86 53.97599 57.86733
## 87 87 33.15555 40.26646
## 88 88 50.55680 43.10695
## 89 89 33.68189 71.96778
## 90 90 45.95642 72.64398
## 91 91 51.90072 83.01396
## 92 92 50.25869 46.90754
## 93 93 62.41476 62.63873
## 94 94 44.93745 80.00130
## 95 95 61.23368 69.66618
## 96 96 50.22648 82.94298
## 97 97 62.92548 67.67482
## 98 98 57.15390 90.19502
## 99 99 39.31421 65.78667
## 100 100 52.15625 62.67658
student_data_standardized <- scale(student_data[, -1]) # Exclude student_id column
student_data_standardized
## student_engagement student_performance
## [1,] -1.47425931 -0.849831176
## [2,] 0.13464313 -0.252476997
## [3,] 1.18133983 -1.560996157
## [4,] -1.50105654 0.003018249
## [5,] 0.88211165 -0.551859344
## [6,] 0.64797986 -0.594423533
## [7,] -1.53928691 -0.773423312
## [8,] 1.57773046 0.660906516
## [9,] 0.26550394 -0.645098114
## [10,] 0.81247718 1.078450376
## [11,] 1.39840997 0.298858127
## [12,] 1.41080201 -1.419383310
## [13,] 1.08053955 1.370389384
## [14,] -0.68642397 0.619426886
## [15,] 1.03311144 0.194793823
## [16,] 1.17197240 2.464824285
## [17,] -1.99483508 -0.203641496
## [18,] 1.21105999 -1.224270011
## [19,] 0.29966080 -0.545776646
## [20,] -0.71597640 0.766693595
## [21,] -0.25930970 -0.319749034
## [22,] -1.73175475 0.149714787
## [23,] -0.19709689 -1.306556322
## [24,] -0.57053564 1.396096300
## [25,] 0.88182636 0.334533981
## [26,] -1.01442756 -0.381842118
## [27,] 0.16006166 -2.034670666
## [28,] 0.79892787 0.277768015
## [29,] -0.35245440 -0.166921164
## [30,] -0.97172364 1.485706629
## [31,] -0.21256683 0.416009564
## [32,] 1.27979268 0.166321991
## [33,] -0.76858934 0.841492217
## [34,] -0.09098263 -1.847423089
## [35,] -1.06343289 0.060245402
## [36,] 0.76287783 -1.420598051
## [37,] -0.35237271 1.108795511
## [38,] 0.27433123 0.673202175
## [39,] -2.55081177 0.744265117
## [40,] -0.24539004 0.734768973
## [41,] 0.40595045 -0.365505402
## [42,] 1.47309629 0.148806309
## [43,] 1.27453992 -1.320454071
## [44,] -1.21953806 -1.592338093
## [45,] 0.23353413 0.604168134
## [46,] 0.26301645 -0.316610723
## [47,] 0.18125794 0.963192313
## [48,] 0.97111751 0.189301395
## [49,] -0.04586503 -1.913809004
## [50,] 1.22516290 -0.610966175
## [51,] 1.06045948 -1.938499819
## [52,] 2.23307640 0.693561990
## [53,] 1.18730833 1.385753547
## [54,] 0.47100810 -1.130985333
## [55,] 0.04984461 -1.839570529
## [56,] 0.52401063 0.754780464
## [57,] 0.59033159 1.258827546
## [58,] 0.02430732 0.181004080
## [59,] -0.81889862 0.734823225
## [60,] -1.67276538 0.063578391
## [61,] -0.52528454 0.782117572
## [62,] -0.27305573 0.141187296
## [63,] 0.26613016 -1.107631910
## [64,] -0.13007961 -0.100981158
## [65,] -0.43819568 0.268660576
## [66,] 0.30836076 0.562695822
## [67,] 0.89777599 0.417919911
## [68,] 2.10366361 -0.845846336
## [69,] -0.75076981 0.543376438
## [70,] -1.55794968 -0.139643490
## [71,] -0.68675028 -1.320669188
## [72,] 0.85165988 -0.418464161
## [73,] 0.25398184 -0.118060647
## [74,] -0.68191562 0.363170472
## [75,] 0.41099077 0.658829192
## [76,] -0.67557031 1.222146562
## [77,] -2.03399782 -0.333269290
## [78,] -0.53390387 -0.459431926
## [79,] -0.20988241 -1.784243059
## [80,] -0.51589301 2.499120433
## [81,] 0.39549277 -0.725448637
## [82,] 0.62128058 0.322932703
## [83,] -0.37162344 -0.597843011
## [84,] -1.64680050 -0.750987946
## [85,] -1.81004321 0.860065292
## [86,] 0.34968872 -0.325104590
## [87,] -1.70332045 -1.582505853
## [88,] 0.01253756 -1.379582049
## [89,] -1.65142004 0.682227645
## [90,] -0.44108469 0.730535252
## [91,] 0.14505533 1.471364012
## [92,] -0.01685771 -1.108068799
## [93,] 1.18179703 0.015762727
## [94,] -0.54156089 1.256140312
## [95,] 1.06533591 0.517802237
## [96,] -0.02003365 1.466293146
## [97,] 1.23215739 0.375539677
## [98,] 0.66304733 1.984376991
## [99,] -1.09604310 0.240650937
## [100,] 0.17025257 0.018467234
## attr(,"scaled:center")
## student_engagement student_performance
## 50.42965 62.41808
## attr(,"scaled:scale")
## student_engagement student_performance
## 10.14143 13.99781
pca_result <- prcomp(student_data_standardized, center = TRUE, scale = TRUE)
pca_result
## Standard deviations (1, .., p=2):
## [1] 1.0103856 0.9895054
##
## Rotation (n x k) = (2 x 2):
## PC1 PC2
## student_engagement 0.7071068 0.7071068
## student_performance -0.7071068 0.7071068
summary(pca_result)
## Importance of components:
## PC1 PC2
## Standard deviation 1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion 0.5104 1.0000
summary(pca_result)
## Importance of components:
## PC1 PC2
## Standard deviation 1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion 0.5104 1.0000
# Scree plot
plot(pca_result, type = "l")
plot(pca_result)
biplot(pca_result)
biplot(pca_result)
Clustering the data using Kmeans and other clustering algorithms
library(cluster)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(dendextend)
##
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags:
## https://stackoverflow.com/questions/tagged/dendextend
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## Attaching package: 'dendextend'
## The following object is masked from 'package:stats':
##
## cutree
student_data <- data.frame(
student_id = 1:100,
student_engagement = rnorm(100, mean = 50, sd = 10),
student_performance = rnorm(100, mean = 60, sd = 15)
)
features_for_clustering <- student_data[, c("student_engagement", "student_performance")]
features_for_clustering
## student_engagement student_performance
## 1 50.34141 50.87050
## 2 58.72042 73.12289
## 3 43.20279 84.44236
## 4 64.41171 68.40365
## 5 50.77767 41.25902
## 6 31.28675 45.02724
## 7 41.27575 72.30414
## 8 59.86722 62.27657
## 9 69.23595 83.63904
## 10 60.58023 67.34476
## 11 47.76827 73.75417
## 12 43.13707 53.56535
## 13 34.71940 70.69030
## 14 35.23064 42.72979
## 15 48.84513 106.85308
## 16 51.49570 45.37659
## 17 51.14283 56.88448
## 18 45.91954 67.37794
## 19 65.12505 50.21417
## 20 23.71014 77.40492
## 21 26.52903 55.94977
## 22 43.69522 42.71431
## 23 39.42058 41.04327
## 24 37.63265 45.29856
## 25 59.36463 60.75324
## 26 43.49067 49.15702
## 27 44.92786 83.98754
## 28 62.42045 49.83971
## 29 46.87671 51.89013
## 30 55.27773 57.85043
## 31 33.13091 80.73897
## 32 49.62973 66.57664
## 33 44.18231 53.28314
## 34 41.85276 86.08857
## 35 52.53889 64.85828
## 36 39.01903 84.57562
## 37 44.52350 51.35114
## 38 72.07040 44.45029
## 39 51.31941 93.99482
## 40 45.16741 58.35955
## 41 48.62702 56.28116
## 42 38.02379 69.67018
## 43 58.86010 76.71115
## 44 46.93476 53.61647
## 45 44.24499 42.26734
## 46 70.05082 52.74531
## 47 54.76850 42.22711
## 48 35.86835 51.70694
## 49 45.40187 70.08449
## 50 32.45119 69.52485
## 51 46.64593 62.56839
## 52 37.65238 82.59639
## 53 43.94986 57.54832
## 54 40.29002 71.22464
## 55 41.94026 53.32983
## 56 46.22579 72.02040
## 57 42.12285 69.53595
## 58 50.48794 41.17798
## 59 66.39980 54.33527
## 60 45.31590 61.73160
## 61 50.80898 38.50146
## 62 47.08932 72.81314
## 63 36.56073 67.21069
## 64 48.50858 62.17706
## 65 62.25178 79.81829
## 66 57.09046 63.61160
## 67 62.18861 92.80451
## 68 55.32697 57.21496
## 69 46.58509 84.21974
## 70 56.99658 34.24256
## 71 47.06972 80.99892
## 72 56.75350 47.98020
## 73 30.94684 51.25061
## 74 43.72396 58.74016
## 75 52.90853 49.67723
## 76 46.81471 71.47170
## 77 39.35296 59.00359
## 78 48.91035 65.59541
## 79 59.31772 32.66590
## 80 48.19276 53.97321
## 81 51.19977 46.36008
## 82 63.78180 69.04974
## 83 35.18343 83.85464
## 84 60.28445 45.78735
## 85 47.34873 69.03376
## 86 62.09446 55.45128
## 87 55.65226 68.95471
## 88 54.16094 37.21200
## 89 55.14687 65.02472
## 90 61.02509 90.14921
## 91 58.62450 48.63667
## 92 64.11730 50.52493
## 93 39.53840 55.47703
## 94 44.40291 54.40743
## 95 39.49910 75.42115
## 96 50.19683 57.89378
## 97 37.50572 77.89252
## 98 51.57948 49.24915
## 99 32.04550 49.26757
## 100 64.99983 64.46045
scaled_data <- scale(features_for_clustering)
scaled_data
## student_engagement student_performance
## [1,] 0.159597023 -0.720708827
## [2,] 0.983898026 0.766675554
## [3,] -0.542678660 1.523286434
## [4,] 1.543789565 0.451234234
## [5,] 0.202514306 -1.363155182
## [6,] -1.714942526 -1.111281142
## [7,] -0.732254677 0.711948974
## [8,] 1.096716945 0.041690657
## [9,] 2.018383503 1.469591343
## [10,] 1.166860522 0.380456188
## [11,] -0.093541222 0.808870819
## [12,] -0.549143751 -0.540580675
## [13,] -1.377249183 0.604077206
## [14,] -1.326954048 -1.264846458
## [15,] 0.012397271 3.021253376
## [16,] 0.273151917 -1.087930225
## [17,] 0.238437965 -0.318725021
## [18,] -0.275413121 0.382673965
## [19,] 1.613966118 -0.764578909
## [20,] -2.460305927 1.052892504
## [21,] -2.182992432 -0.381202466
## [22,] -0.494235577 -1.265881104
## [23,] -0.914761286 -1.377576046
## [24,] -1.090651752 -1.093146110
## [25,] 1.047273509 -0.060131088
## [26,] -0.514357964 -0.835240420
## [27,] -0.372971507 1.492885625
## [28,] 1.347895458 -0.789608206
## [29,] -0.181250233 -0.652555529
## [30,] 0.645216993 -0.254159295
## [31,] -1.533519839 1.275745976
## [32,] 0.089584037 0.329113838
## [33,] -0.446316273 -0.559444510
## [34,] -0.675490746 1.633321213
## [35,] 0.375777789 0.214255802
## [36,] -0.954264366 1.532193815
## [37,] -0.412751915 -0.688582400
## [38,] 2.297228235 -1.149845517
## [39,] 0.255809462 2.161787087
## [40,] -0.349405163 -0.220129446
## [41,] -0.009060072 -0.359052231
## [42,] -1.052173308 0.535891007
## [43,] 0.997638762 1.006519976
## [44,] -0.175538769 -0.537164067
## [45,] -0.440150207 -1.295757324
## [46,] 2.098547852 -0.595393814
## [47,] 0.595120394 -1.298446729
## [48,] -1.264218210 -0.664799759
## [49,] -0.326340172 0.563584010
## [50,] -1.600388480 0.526176840
## [51,] -0.203953227 0.061196163
## [52,] -1.088711201 1.399898747
## [53,] -0.469184455 -0.274352856
## [54,] -0.829228095 0.639793181
## [55,] -0.666882401 -0.556323619
## [56,] -0.245285782 0.692982864
## [57,] -0.648920296 0.526918754
## [58,] 0.174011906 -1.368571944
## [59,] 1.739372198 -0.489118481
## [60,] -0.334797239 0.005263829
## [61,] 0.205594452 -1.547474583
## [62,] -0.160334354 0.745971179
## [63,] -1.196104013 0.371494667
## [64,] -0.020711281 0.035039384
## [65,] 1.331302248 1.214206439
## [66,] 0.823547763 0.130926013
## [67,] 1.325087702 2.082225139
## [68,] 0.650060334 -0.296635130
## [69,] -0.209938242 1.508405638
## [70,] 0.814311884 -1.832145921
## [71,] -0.162262255 1.293121259
## [72,] 0.790398059 -0.913901102
## [73,] -1.748381209 -0.695301507
## [74,] -0.491407604 -0.194688774
## [75,] 0.412141882 -0.800468967
## [76,] -0.187349336 0.656307371
## [77,] -0.921413159 -0.177080281
## [78,] 0.018813416 0.263527092
## [79,] 1.042658895 -1.937532291
## [80,] -0.051781262 -0.513319030
## [81,] 0.244039861 -1.022192074
## [82,] 1.481821051 0.494419423
## [83,] -1.331598410 1.484002350
## [84,] 1.137762662 -1.060474348
## [85,] -0.134813586 0.493351331
## [86,] 1.315825535 -0.414522513
## [87,] 0.682061771 0.488067720
## [88,] 0.535350442 -1.633664129
## [89,] 0.632342956 0.225381174
## [90,] 1.210623851 1.904740845
## [91,] 0.974461999 -0.870021407
## [92,] 1.514826473 -0.743807073
## [93,] -0.903170638 -0.412801554
## [94,] -0.424614324 -0.484295152
## [95,] -0.907036117 0.920294512
## [96,] 0.145373203 -0.251261639
## [97,] -1.103138902 1.085484601
## [98,] 0.281393801 -0.829082605
## [99,] -1.640298483 -0.827851370
## [100,] 1.601647264 0.187664731
## attr(,"scaled:center")
## student_engagement student_performance
## 48.71911 61.65285
## attr(,"scaled:scale")
## student_engagement student_performance
## 10.16499 14.96075
kmeans_result <- kmeans(scaled_data, centers = 3)
kmeans_result
## K-means clustering with 3 clusters of sizes 42, 28, 30
##
## Cluster means:
## student_engagement student_performance
## 1 -0.3480864 -0.78316939
## 2 1.2301548 0.09485208
## 3 -0.6608236 1.00790854
##
## Clustering vector:
## [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
## [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
## [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
##
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
## (between_SS / total_SS = 59.1 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
fviz_cluster(kmeans_result, data = scaled_data)
kmeans_result
## K-means clustering with 3 clusters of sizes 42, 28, 30
##
## Cluster means:
## student_engagement student_performance
## 1 -0.3480864 -0.78316939
## 2 1.2301548 0.09485208
## 3 -0.6608236 1.00790854
##
## Clustering vector:
## [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
## [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
## [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
##
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
## (between_SS / total_SS = 59.1 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
hclust_result <- hclust(dist(scaled_data), method = "ward.D2")
hclust_result
##
## Call:
## hclust(d = dist(scaled_data), method = "ward.D2")
##
## Cluster method : ward.D2
## Distance : euclidean
## Number of objects: 100
cluster_assignments <- cutree(hclust_result, k = 3)
cluster_assignments
## [1] 1 1 2 1 1 3 2 1 1 1 3 3 2 3 2 1 3 3 1 2 3 3 3 3 1 3 2 1 3 1 2 3 3 2 3 2 3
## [38] 1 2 3 3 2 1 3 3 1 1 3 3 2 3 2 3 2 3 3 2 1 1 3 1 3 2 3 1 1 1 1 2 1 2 1 3 3
## [75] 1 3 3 3 1 3 1 1 2 1 3 1 1 1 1 1 1 1 3 3 2 3 2 1 3 1
plot(hclust_result)
plot(hclust_result)
rect.hclust(hclust_result, k = 3, border = "red")
Submit a report containing the following:
Dimensionality Reduction: The dimensionality reduction technique employed was Principal Component Analysis (PCA). With as much of the original variance as feasible preserved, PCA is a technique for minimizing the number of variables in a dataset. Student_engagement and Student_performance, the two initial attributes in this study, were subjected to PCA. The linear combinations of these characteristics (principal components) that account for the majority of the variation in the data are found through principal component analysis (PCA). Based on the intended degree of dimensionality reduction, the amount of variation explained by the principal components was ranked, and a subset of these components was selected for additional study.
Clustering (including K-Means and Hierarchical Clustering): Two clustering methods were applied to the reduced-dimensional data: K-Means In clustering, the data is split into a predetermined number of clusters (in this example, three) based on similarities in the condensed feature space. Using Ward’s Method, hierarchical clustering We may experiment with different levels of grouping granularity thanks to the tree-like structure that is produced by hierarchical clustering. The analytical objective or domain expertise was used to choose the three clusters for K-Means. K-Means and hierarchical clustering were both used on the same dataset in order to compare the outcomes and spot any trends in the data.
Interpretation:Based on the characteristics of the students’ involvement and performance, the research identified three unique groups of pupils. These clusters serve as a foundation for categorizing the student population and customizing support or educational interventions. Based on their applicability for the dataset and the research or application aims, K-Means and hierarchical clustering were used for clustering, while PCA was chosen for dimensionality reduction. To improve the clustering findings and extract useful information from the data, further analysis and validation may be carried out.
Number of Clusters Identified: Assume that your study revealed three separate student clusters based on learning data. Characteristics of Each Cluster:
Cluster 1 - Highly Engaged and High-Performing Students:
This cluster may consist of students who exhibit both high engagement and high performance in the course. These students may participate actively, complete assignments on time, and consistently achieve high grades.
Educational Implication: Recognize these students as potential academic role models and consider advanced learning opportunities to further challenge and engage them.
Cluster 2 - Moderately Engaged and Moderate-Performing Students:
This cluster may include students with moderate engagement levels and average performance. Students here might engage with the course materials adequately but may not consistently excel. Educational Implication: Provide additional support or resources to help these students improve their performance and motivation.
Cluster 3 - Low Engagement and Low-Performing Students:
This cluster may comprise students who demonstrate low engagement and struggle with their performance. Students in this cluster might be disengaged, miss assignments, or perform poorly in assessments. Educational Implication: Implement targeted interventions, such as additional tutoring or personalized learning plans, to help these students catch up and increase their engagement.
Personalized Learning Paths:
The identification of different student clusters based on engagement and performance can inform the development of personalized learning paths. By recognizing students’ unique needs and characteristics within each cluster, educators and learning platforms can tailor content and interventions. For example, highly engaged and high-performing students may benefit from advanced coursework, while those in lower-performing clusters may need additional support. Early Intervention:
Learning analytics can enable early intervention strategies. When students are categorized into clusters, educators can identify students at risk of falling behind (e.g., those in the low engagement and low-performing cluster) and provide timely support. This might include tutoring, mentorship, or targeted resources to address specific needs.
Resource Allocation:
Learning analytics can optimize resource allocation. By understanding the distribution of students across clusters, educational institutions can allocate resources more efficiently. For instance, allocate more resources to support students in struggling clusters while optimizing resources for high-performing clusters. Feedback and Improvement:
Feedback loops are crucial for continuous improvement. Learning analytics can provide insights into the effectiveness of interventions and instructional methods. Institutions can use this data to refine their teaching strategies and make data-driven decisions to enhance overall learning outcomes. Predictive Analytics:
Clustering students based on their engagement and performance can serve as a foundation for predictive analytics. Machine learning models can be trained to predict which cluster a new student is likely to belong to based on their initial behavior. This can help educators identify potential issues early and intervene proactively. Ethical Considerations:
It’s important to consider ethical implications. Learning analytics should be used responsibly, with a focus on student well-being and privacy. Ensure that the data collected and the actions taken based on the analysis align with ethical guidelines and regulations.
Continuous Monitoring and Adaptation:
Learning analytics is an ongoing process. Institutions should continuously collect data and monitor the changing patterns in student behavior. As students progress through their academic journey, their needs and behaviors may evolve, requiring adaptive strategies. In conclusion, the findings from your learning analytics study, which involved clustering students based on engagement and performance, have significant implications for optimizing educational practices. By leveraging these insights, educational institutions can enhance the learning experience, promote student success, and continually improve their teaching methods. However, it’s essential to approach learning analytics with sensitivity to student privacy and ethical considerations while focusing on the ultimate goal of improving learning outcomes.
Siemens, G. (2013). Learning Analytics: The Emergence of a Discipline. American Behavioral Scientist, 57(10), 1380-1400. doi:10.1177/0002764213498851
Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, 18(6), 683-695. doi:10.1080/13562517.2013.827653
Your report should include your code. Submit the published RPubs link to Blackboard.