Learning analytics is the use of data to understand and improve learning. Unsupervised learning is a type of machine learning that can be used to identify patterns and relationships in data without the need for labeled data.
The data for this case study is generated with the simulated function below. The data contains the following features:
Student ID: A unique identifier for each student Feature 1: A measure of student engagement Feature 2: A measure of student performance
simulate_student_features <- function(n = 100) {
# Set the random seed
set.seed(260923)
# Generate unique student IDs
student_ids <- seq(1, n)
# Simulate student engagement
student_engagement <- rnorm(n, mean = 50, sd = 10)
# Simulate student performance
student_performance <- rnorm(n, mean = 60, sd = 15)
# Combine the data into a data frame
student_features <- data.frame(
student_id = student_ids,
student_engagement = student_engagement,
student_performance = student_performance
)
# Return the data frame
return(student_features)
}
student_features <- simulate_student_features(n = 100)
We can then use this data frame to perform unsupervised learning to identify groups of students with similar learning patterns,
library(stats)
library(dbscan)
## Warning: package 'dbscan' was built under R version 4.3.1
##
## Attaching package: 'dbscan'
## The following object is masked from 'package:stats':
##
## as.dendrogram
library(cluster)
## Warning: package 'cluster' was built under R version 4.3.1
library(clustMixType)
## Warning: package 'clustMixType' was built under R version 4.3.1
library(mclust)
## Warning: package 'mclust' was built under R version 4.3.1
## Package 'mclust' version 6.0.0
## Type 'citation("mclust")' for citing this R package in publications.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.1
n <-100
student_ids <- seq(1,n)
student_ids
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
#show a random sample of student ids
sample_size <-10
sample_size
## [1] 10
sample_students_ids <-sample(student_ids, size = sample_size)
sample_students_ids <-c(1,2,3,4,5)
sample_students_ids
## [1] 1 2 3 4 5
#print the sample
print(sample_students_ids)
## [1] 1 2 3 4 5
simulate the data
library(stats)
library(graphics)
library(datasets)
library(methods)
library(utils)
library(grDevices)
simulate_student_features <- function(n = 100) {
# Set the random seed
set.seed(260923)
#Generate unique student IDs
student_ids <- seq(1, n)
# Simulate student engagement
student_engagement <- rnorm(n, mean = 50, sd = 10)
# Simulate student performance
student_performance <- rnorm(n, mean = 60, sd = 15)
# Combine the data into a data frame
student_features <- data.frame(
student_id = student_ids,
student_engagement = student_engagement,
student_performance = student_performance
)
# Return the data frame
return(student_features)
}
(simulate_student_features)
## function(n = 100) {
## # Set the random seed
## set.seed(260923)
##
## #Generate unique student IDs
## student_ids <- seq(1, n)
##
## # Simulate student engagement
## student_engagement <- rnorm(n, mean = 50, sd = 10)
##
## # Simulate student performance
## student_performance <- rnorm(n, mean = 60, sd = 15)
##
## # Combine the data into a data frame
## student_features <- data.frame(
## student_id = student_ids,
## student_engagement = student_engagement,
## student_performance = student_performance
## )
##
## # Return the data frame
## return(student_features)
## }
student_data <- simulate_student_features()
student_data
## student_id student_engagement student_performance
## 1 1 35.47855 50.52231
## 2 2 51.79512 58.88396
## 3 3 62.41012 40.56755
## 4 4 35.20679 62.46033
## 5 5 59.37552 54.69326
## 6 6 57.00109 54.09745
## 7 7 34.81908 51.59185
## 8 8 66.43009 71.66933
## 9 9 53.12224 53.38812
## 10 10 58.66933 77.51403
## 11 11 64.61152 66.60144
## 12 12 64.73720 42.54982
## 13 13 61.38786 81.60053
## 14 14 43.46833 71.08870
## 15 15 60.90687 65.14477
## 16 16 62.31512 96.92023
## 17 17 30.19917 59.56755
## 18 18 62.71153 45.28098
## 19 19 53.46864 54.77840
## 20 20 43.16863 73.15011
## 21 21 47.79988 57.94230
## 22 22 32.86718 64.51376
## 23 23 48.43080 44.12915
## 24 24 44.64360 81.96037
## 25 25 59.37263 67.10083
## 26 26 40.14190 57.07313
## 27 27 52.05290 33.93715
## 28 28 58.53192 66.30623
## 29 29 46.85526 60.08155
## 30 30 40.57498 83.21472
## 31 31 48.27392 68.24131
## 32 32 63.40857 64.74623
## 33 33 42.63505 74.19713
## 34 34 49.50695 36.55820
## 35 35 39.64492 63.26139
## 36 36 58.16632 42.53282
## 37 37 46.85609 77.93879
## 38 38 53.21176 71.84144
## 39 39 24.56077 72.83616
## 40 40 47.94104 72.70324
## 41 41 54.54657 57.30181
## 42 42 65.36895 64.50104
## 43 43 63.35530 43.93461
## 44 44 38.06179 40.12883
## 45 45 52.79802 70.87511
## 46 46 53.09701 57.98622
## 47 47 52.26786 75.90067
## 48 48 60.27817 65.06789
## 49 49 49.96451 35.62894
## 50 50 62.85455 53.86589
## 51 51 61.18422 35.28333
## 52 52 73.07623 72.12643
## 53 53 62.47065 81.81560
## 54 54 55.20634 46.58676
## 55 55 50.93514 36.66812
## 56 56 55.74387 72.98336
## 57 57 56.41645 80.03891
## 58 58 50.67616 64.95174
## 59 59 42.12485 72.70400
## 60 60 33.46542 63.30804
## 61 61 45.10251 73.36602
## 62 62 47.66047 64.39439
## 63 63 53.12859 46.91366
## 64 64 49.11046 61.00457
## 65 65 45.98572 66.17874
## 66 66 53.55687 70.29459
## 67 67 59.53438 68.26805
## 68 68 71.76380 50.57808
## 69 69 42.81577 70.02416
## 70 70 34.62981 60.46338
## 71 71 43.46502 43.93160
## 72 72 59.06670 56.56050
## 73 73 53.00539 60.76549
## 74 74 43.51405 67.50167
## 75 75 54.59768 71.64025
## 76 76 43.57840 79.52546
## 77 77 29.80201 57.75304
## 78 78 45.01510 55.98704
## 79 79 48.30114 37.44258
## 80 80 45.19776 97.40030
## 81 81 54.44051 52.26339
## 82 82 56.73032 66.93843
## 83 83 46.66086 54.04959
## 84 84 33.72874 51.90589
## 85 85 32.07322 74.45711
## 86 86 53.97599 57.86733
## 87 87 33.15555 40.26646
## 88 88 50.55680 43.10695
## 89 89 33.68189 71.96778
## 90 90 45.95642 72.64398
## 91 91 51.90072 83.01396
## 92 92 50.25869 46.90754
## 93 93 62.41476 62.63873
## 94 94 44.93745 80.00130
## 95 95 61.23368 69.66618
## 96 96 50.22648 82.94298
## 97 97 62.92548 67.67482
## 98 98 57.15390 90.19502
## 99 99 39.31421 65.78667
## 100 100 52.15625 62.67658
Data standardzation
student_data_standardized <- scale(student_data[, -1]) # Exclude student_id column
student_data_standardized
## student_engagement student_performance
## [1,] -1.47425931 -0.849831176
## [2,] 0.13464313 -0.252476997
## [3,] 1.18133983 -1.560996157
## [4,] -1.50105654 0.003018249
## [5,] 0.88211165 -0.551859344
## [6,] 0.64797986 -0.594423533
## [7,] -1.53928691 -0.773423312
## [8,] 1.57773046 0.660906516
## [9,] 0.26550394 -0.645098114
## [10,] 0.81247718 1.078450376
## [11,] 1.39840997 0.298858127
## [12,] 1.41080201 -1.419383310
## [13,] 1.08053955 1.370389384
## [14,] -0.68642397 0.619426886
## [15,] 1.03311144 0.194793823
## [16,] 1.17197240 2.464824285
## [17,] -1.99483508 -0.203641496
## [18,] 1.21105999 -1.224270011
## [19,] 0.29966080 -0.545776646
## [20,] -0.71597640 0.766693595
## [21,] -0.25930970 -0.319749034
## [22,] -1.73175475 0.149714787
## [23,] -0.19709689 -1.306556322
## [24,] -0.57053564 1.396096300
## [25,] 0.88182636 0.334533981
## [26,] -1.01442756 -0.381842118
## [27,] 0.16006166 -2.034670666
## [28,] 0.79892787 0.277768015
## [29,] -0.35245440 -0.166921164
## [30,] -0.97172364 1.485706629
## [31,] -0.21256683 0.416009564
## [32,] 1.27979268 0.166321991
## [33,] -0.76858934 0.841492217
## [34,] -0.09098263 -1.847423089
## [35,] -1.06343289 0.060245402
## [36,] 0.76287783 -1.420598051
## [37,] -0.35237271 1.108795511
## [38,] 0.27433123 0.673202175
## [39,] -2.55081177 0.744265117
## [40,] -0.24539004 0.734768973
## [41,] 0.40595045 -0.365505402
## [42,] 1.47309629 0.148806309
## [43,] 1.27453992 -1.320454071
## [44,] -1.21953806 -1.592338093
## [45,] 0.23353413 0.604168134
## [46,] 0.26301645 -0.316610723
## [47,] 0.18125794 0.963192313
## [48,] 0.97111751 0.189301395
## [49,] -0.04586503 -1.913809004
## [50,] 1.22516290 -0.610966175
## [51,] 1.06045948 -1.938499819
## [52,] 2.23307640 0.693561990
## [53,] 1.18730833 1.385753547
## [54,] 0.47100810 -1.130985333
## [55,] 0.04984461 -1.839570529
## [56,] 0.52401063 0.754780464
## [57,] 0.59033159 1.258827546
## [58,] 0.02430732 0.181004080
## [59,] -0.81889862 0.734823225
## [60,] -1.67276538 0.063578391
## [61,] -0.52528454 0.782117572
## [62,] -0.27305573 0.141187296
## [63,] 0.26613016 -1.107631910
## [64,] -0.13007961 -0.100981158
## [65,] -0.43819568 0.268660576
## [66,] 0.30836076 0.562695822
## [67,] 0.89777599 0.417919911
## [68,] 2.10366361 -0.845846336
## [69,] -0.75076981 0.543376438
## [70,] -1.55794968 -0.139643490
## [71,] -0.68675028 -1.320669188
## [72,] 0.85165988 -0.418464161
## [73,] 0.25398184 -0.118060647
## [74,] -0.68191562 0.363170472
## [75,] 0.41099077 0.658829192
## [76,] -0.67557031 1.222146562
## [77,] -2.03399782 -0.333269290
## [78,] -0.53390387 -0.459431926
## [79,] -0.20988241 -1.784243059
## [80,] -0.51589301 2.499120433
## [81,] 0.39549277 -0.725448637
## [82,] 0.62128058 0.322932703
## [83,] -0.37162344 -0.597843011
## [84,] -1.64680050 -0.750987946
## [85,] -1.81004321 0.860065292
## [86,] 0.34968872 -0.325104590
## [87,] -1.70332045 -1.582505853
## [88,] 0.01253756 -1.379582049
## [89,] -1.65142004 0.682227645
## [90,] -0.44108469 0.730535252
## [91,] 0.14505533 1.471364012
## [92,] -0.01685771 -1.108068799
## [93,] 1.18179703 0.015762727
## [94,] -0.54156089 1.256140312
## [95,] 1.06533591 0.517802237
## [96,] -0.02003365 1.466293146
## [97,] 1.23215739 0.375539677
## [98,] 0.66304733 1.984376991
## [99,] -1.09604310 0.240650937
## [100,] 0.17025257 0.018467234
## attr(,"scaled:center")
## student_engagement student_performance
## 50.42965 62.41808
## attr(,"scaled:scale")
## student_engagement student_performance
## 10.14143 13.99781
Perform PCA
pca_result <- prcomp(student_data_standardized, center = TRUE, scale = TRUE)
pca_result
## Standard deviations (1, .., p=2):
## [1] 1.0103856 0.9895054
##
## Rotation (n x k) = (2 x 2):
## PC1 PC2
## student_engagement 0.7071068 0.7071068
## student_performance -0.7071068 0.7071068
Explored PCA results
summary(pca_result)
## Importance of components:
## PC1 PC2
## Standard deviation 1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion 0.5104 1.0000
summary(pca_result)
## Importance of components:
## PC1 PC2
## Standard deviation 1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion 0.5104 1.0000
Visualization PCA
# Scree plot
plot(pca_result, type = "l")
plot(pca_result)
biplot(pca_result)
biplot(pca_result)
Clustering the data using Kmeans and other clustering algorithms
library(cluster)
library(factoextra)
## Warning: package 'factoextra' was built under R version 4.3.1
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(dendextend)
## Warning: package 'dendextend' was built under R version 4.3.1
##
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags:
## https://stackoverflow.com/questions/tagged/dendextend
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## Attaching package: 'dendextend'
## The following object is masked from 'package:stats':
##
## cutree
Created a sample student dataset
student_data <- data.frame(
student_id = 1:100,
student_engagement = rnorm(100, mean = 50, sd = 10),
student_performance = rnorm(100, mean = 60, sd = 15)
)
features_for_clustering <- student_data[, c("student_engagement", "student_performance")]
features_for_clustering
## student_engagement student_performance
## 1 50.34141 50.87050
## 2 58.72042 73.12289
## 3 43.20279 84.44236
## 4 64.41171 68.40365
## 5 50.77767 41.25902
## 6 31.28675 45.02724
## 7 41.27575 72.30414
## 8 59.86722 62.27657
## 9 69.23595 83.63904
## 10 60.58023 67.34476
## 11 47.76827 73.75417
## 12 43.13707 53.56535
## 13 34.71940 70.69030
## 14 35.23064 42.72979
## 15 48.84513 106.85308
## 16 51.49570 45.37659
## 17 51.14283 56.88448
## 18 45.91954 67.37794
## 19 65.12505 50.21417
## 20 23.71014 77.40492
## 21 26.52903 55.94977
## 22 43.69522 42.71431
## 23 39.42058 41.04327
## 24 37.63265 45.29856
## 25 59.36463 60.75324
## 26 43.49067 49.15702
## 27 44.92786 83.98754
## 28 62.42045 49.83971
## 29 46.87671 51.89013
## 30 55.27773 57.85043
## 31 33.13091 80.73897
## 32 49.62973 66.57664
## 33 44.18231 53.28314
## 34 41.85276 86.08857
## 35 52.53889 64.85828
## 36 39.01903 84.57562
## 37 44.52350 51.35114
## 38 72.07040 44.45029
## 39 51.31941 93.99482
## 40 45.16741 58.35955
## 41 48.62702 56.28116
## 42 38.02379 69.67018
## 43 58.86010 76.71115
## 44 46.93476 53.61647
## 45 44.24499 42.26734
## 46 70.05082 52.74531
## 47 54.76850 42.22711
## 48 35.86835 51.70694
## 49 45.40187 70.08449
## 50 32.45119 69.52485
## 51 46.64593 62.56839
## 52 37.65238 82.59639
## 53 43.94986 57.54832
## 54 40.29002 71.22464
## 55 41.94026 53.32983
## 56 46.22579 72.02040
## 57 42.12285 69.53595
## 58 50.48794 41.17798
## 59 66.39980 54.33527
## 60 45.31590 61.73160
## 61 50.80898 38.50146
## 62 47.08932 72.81314
## 63 36.56073 67.21069
## 64 48.50858 62.17706
## 65 62.25178 79.81829
## 66 57.09046 63.61160
## 67 62.18861 92.80451
## 68 55.32697 57.21496
## 69 46.58509 84.21974
## 70 56.99658 34.24256
## 71 47.06972 80.99892
## 72 56.75350 47.98020
## 73 30.94684 51.25061
## 74 43.72396 58.74016
## 75 52.90853 49.67723
## 76 46.81471 71.47170
## 77 39.35296 59.00359
## 78 48.91035 65.59541
## 79 59.31772 32.66590
## 80 48.19276 53.97321
## 81 51.19977 46.36008
## 82 63.78180 69.04974
## 83 35.18343 83.85464
## 84 60.28445 45.78735
## 85 47.34873 69.03376
## 86 62.09446 55.45128
## 87 55.65226 68.95471
## 88 54.16094 37.21200
## 89 55.14687 65.02472
## 90 61.02509 90.14921
## 91 58.62450 48.63667
## 92 64.11730 50.52493
## 93 39.53840 55.47703
## 94 44.40291 54.40743
## 95 39.49910 75.42115
## 96 50.19683 57.89378
## 97 37.50572 77.89252
## 98 51.57948 49.24915
## 99 32.04550 49.26757
## 100 64.99983 64.46045
scaled_data <- scale(features_for_clustering)
scaled_data
## student_engagement student_performance
## [1,] 0.159597023 -0.720708827
## [2,] 0.983898026 0.766675554
## [3,] -0.542678660 1.523286434
## [4,] 1.543789565 0.451234234
## [5,] 0.202514306 -1.363155182
## [6,] -1.714942526 -1.111281142
## [7,] -0.732254677 0.711948974
## [8,] 1.096716945 0.041690657
## [9,] 2.018383503 1.469591343
## [10,] 1.166860522 0.380456188
## [11,] -0.093541222 0.808870819
## [12,] -0.549143751 -0.540580675
## [13,] -1.377249183 0.604077206
## [14,] -1.326954048 -1.264846458
## [15,] 0.012397271 3.021253376
## [16,] 0.273151917 -1.087930225
## [17,] 0.238437965 -0.318725021
## [18,] -0.275413121 0.382673965
## [19,] 1.613966118 -0.764578909
## [20,] -2.460305927 1.052892504
## [21,] -2.182992432 -0.381202466
## [22,] -0.494235577 -1.265881104
## [23,] -0.914761286 -1.377576046
## [24,] -1.090651752 -1.093146110
## [25,] 1.047273509 -0.060131088
## [26,] -0.514357964 -0.835240420
## [27,] -0.372971507 1.492885625
## [28,] 1.347895458 -0.789608206
## [29,] -0.181250233 -0.652555529
## [30,] 0.645216993 -0.254159295
## [31,] -1.533519839 1.275745976
## [32,] 0.089584037 0.329113838
## [33,] -0.446316273 -0.559444510
## [34,] -0.675490746 1.633321213
## [35,] 0.375777789 0.214255802
## [36,] -0.954264366 1.532193815
## [37,] -0.412751915 -0.688582400
## [38,] 2.297228235 -1.149845517
## [39,] 0.255809462 2.161787087
## [40,] -0.349405163 -0.220129446
## [41,] -0.009060072 -0.359052231
## [42,] -1.052173308 0.535891007
## [43,] 0.997638762 1.006519976
## [44,] -0.175538769 -0.537164067
## [45,] -0.440150207 -1.295757324
## [46,] 2.098547852 -0.595393814
## [47,] 0.595120394 -1.298446729
## [48,] -1.264218210 -0.664799759
## [49,] -0.326340172 0.563584010
## [50,] -1.600388480 0.526176840
## [51,] -0.203953227 0.061196163
## [52,] -1.088711201 1.399898747
## [53,] -0.469184455 -0.274352856
## [54,] -0.829228095 0.639793181
## [55,] -0.666882401 -0.556323619
## [56,] -0.245285782 0.692982864
## [57,] -0.648920296 0.526918754
## [58,] 0.174011906 -1.368571944
## [59,] 1.739372198 -0.489118481
## [60,] -0.334797239 0.005263829
## [61,] 0.205594452 -1.547474583
## [62,] -0.160334354 0.745971179
## [63,] -1.196104013 0.371494667
## [64,] -0.020711281 0.035039384
## [65,] 1.331302248 1.214206439
## [66,] 0.823547763 0.130926013
## [67,] 1.325087702 2.082225139
## [68,] 0.650060334 -0.296635130
## [69,] -0.209938242 1.508405638
## [70,] 0.814311884 -1.832145921
## [71,] -0.162262255 1.293121259
## [72,] 0.790398059 -0.913901102
## [73,] -1.748381209 -0.695301507
## [74,] -0.491407604 -0.194688774
## [75,] 0.412141882 -0.800468967
## [76,] -0.187349336 0.656307371
## [77,] -0.921413159 -0.177080281
## [78,] 0.018813416 0.263527092
## [79,] 1.042658895 -1.937532291
## [80,] -0.051781262 -0.513319030
## [81,] 0.244039861 -1.022192074
## [82,] 1.481821051 0.494419423
## [83,] -1.331598410 1.484002350
## [84,] 1.137762662 -1.060474348
## [85,] -0.134813586 0.493351331
## [86,] 1.315825535 -0.414522513
## [87,] 0.682061771 0.488067720
## [88,] 0.535350442 -1.633664129
## [89,] 0.632342956 0.225381174
## [90,] 1.210623851 1.904740845
## [91,] 0.974461999 -0.870021407
## [92,] 1.514826473 -0.743807073
## [93,] -0.903170638 -0.412801554
## [94,] -0.424614324 -0.484295152
## [95,] -0.907036117 0.920294512
## [96,] 0.145373203 -0.251261639
## [97,] -1.103138902 1.085484601
## [98,] 0.281393801 -0.829082605
## [99,] -1.640298483 -0.827851370
## [100,] 1.601647264 0.187664731
## attr(,"scaled:center")
## student_engagement student_performance
## 48.71911 61.65285
## attr(,"scaled:scale")
## student_engagement student_performance
## 10.16499 14.96075
#Performed K-means clustering
kmeans_result <- kmeans(scaled_data, centers = 3)
kmeans_result
## K-means clustering with 3 clusters of sizes 42, 28, 30
##
## Cluster means:
## student_engagement student_performance
## 1 -0.3480864 -0.78316939
## 2 1.2301548 0.09485208
## 3 -0.6608236 1.00790854
##
## Clustering vector:
## [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
## [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
## [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
##
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
## (between_SS / total_SS = 59.1 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
#visualize K-means clusters
fviz_cluster(kmeans_result, data = scaled_data)
kmeans_result
## K-means clustering with 3 clusters of sizes 42, 28, 30
##
## Cluster means:
## student_engagement student_performance
## 1 -0.3480864 -0.78316939
## 2 1.2301548 0.09485208
## 3 -0.6608236 1.00790854
##
## Clustering vector:
## [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
## [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
## [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
##
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
## (between_SS / total_SS = 59.1 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
hclust_result <- hclust(dist(scaled_data), method = "ward.D2")
hclust_result
##
## Call:
## hclust(d = dist(scaled_data), method = "ward.D2")
##
## Cluster method : ward.D2
## Distance : euclidean
## Number of objects: 100
cluster_assignments <- cutree(hclust_result, k = 3)
cluster_assignments
## [1] 1 1 2 1 1 3 2 1 1 1 3 3 2 3 2 1 3 3 1 2 3 3 3 3 1 3 2 1 3 1 2 3 3 2 3 2 3
## [38] 1 2 3 3 2 1 3 3 1 1 3 3 2 3 2 3 2 3 3 2 1 1 3 1 3 2 3 1 1 1 1 2 1 2 1 3 3
## [75] 1 3 3 3 1 3 1 1 2 1 3 1 1 1 1 1 1 1 3 3 2 3 2 1 3 1
plot(hclust_result)
plot(hclust_result)
rect.hclust(hclust_result, k = 3, border = "red")
Submit a report containing the following:
A brief description of the approach to dimensionality reduction and clustering applied in the analysis
Dimensionality Reduction Principal Component Analysis (PCA) was used as the dimensionality reduction method. PCA is a method for reducing the number of variables in a dataset while retaining as much of the original variance as possible. PCA was applied to the two original characteristics in this analysis: student_engagement and student_performance. The principal component analysis (PCA) identifies linear combinations of these qualities (principal components) that capture the most variance in the data. The amount of variance explained by the primary components was sorted, and a subset of these components was chosen for further analysis based on the desired level of dimensionality reduction.
Clustering (including K-Means and Hierarchical Clustering):
On the reduced-dimensional data, two clustering algorithms were used: K-Means Clustering divides the data into a preset number of clusters (in this case, three) based on similarity in the reduced feature space. Hierarchical Clustering (Ward’s Method): Hierarchical clustering creates a tree-like structure of clusters that allows us to experiment with various levels of granularity in grouping. The three clusters for K-Means were chosen based on the analytical goal or domain knowledge. To compare the results and identify potential trends in the data, both K-Means and hierarchical clustering were applied to the same dataset.
Interpretation
The analysis revealed three distinct clusters of students based on their engagement and performance characteristics. These clusters provide a basis for segmenting the student population and tailoring educational interventions or support strategies. The choice of PCA for dimensionality reduction and the use of K-Means and hierarchical clustering for clustering were made based on their suitability for the dataset and the research or application goals. Further analysis and validation may be conducted to refine the clustering results and derive actionable insights from the data.
The K-means clustering and hierarchical clustering studies provide information about the structure of the student data. Here’s how the results were interpreted: K-Means Clustering: Three groups with 42, 28, and 30 students each were identified. For the two features, student_engagement and student_performance, each cluster has its own centroid or mean value. Cluster 1: When compared to the other clusters, this cluster has lower values for both student involvement and student performance. These pupils exhibit below-average engagement and performance. Cluster 2: Students in this cluster are highly engaged but perform only averagely. Cluster 3: Students in this cluster have below-average involvement but above-average performance. Each student is assigned to one of these clusters using the clustering vector
**Ward's Method of Hierarchical Clustering**
Using Ward’s approach and Euclidean distance, hierarchical clustering classified the students into groups based on their similarities. The dendrogram structure demonstrates how the clusters formed. Hierarchical clustering, like K-means, revealed three major groupings
**Interpretation**
Based on their engagement and performance, the results indicate that the dataset contains three separate groups of pupils.
Students in Cluster 1 are neither highly engaged nor high performers. Cluster 2 comprises of pupils that are very engaged yet perform below average. Students in Cluster 3 have below-average involvement but above-average achievement. These clusters may be useful for focused interventions or customized support measures. Cluster 2 may benefit from more resources to boost performance, whilst Cluster 3 may require tactics to increase participation. Cluster 1 may require extensive assistance in both engagement and performance. The decision between K-means and hierarchical clustering is determined by the individual aims and data properties.
The findings of the scoping review regarding the theoretical influences on learning analytics have several important implications for the field of learning analytics:
Theoretical Pluralism is Beneficial: The review emphasizes the wide spectrum of theoretical influences in learning analytics, with no single theory prevailing. This theoretical variety should be treasured and safeguarded. It implies that learning analytics is not bound by a single theoretical perspective, allowing academics and practitioners to draw from a diverse range of ideas and methodologies.
Theory is Critical in Data-Driven Methodologies: Despite the abundance of data available in learning analytics, the function of theory remains critical. According to the review, larger datasets underline the need of theory in analysis. Theory assists researchers in making sense of data, finding relevant factors, interpreting outcomes, and turning data-driven insights into actionable insights**
Learning Analytics Is intrinsically multidisciplinary: Learning analytics is intrinsically multidisciplinary, pulling from subjects such as computer science, educational psychology, neuroscience, and anthropology. The paper underlines the need for learning analytics academics and practitioners embracing this multidisciplinary character and exploring diverse theories from these disciplines.**
Bridging Theory and Practice Challenges: The review identifies a potential gap between the academic community’s emphasis on theories and the practical, data-driven orientation of learning analytics programs. Bridging this gap is critical for the progress of the subject, as it ensures that theoretical findings are converted into successful educational interventions and advances.
Continuous Learning Theory Exploration: Learning analytics should continue to investigate and adapt multiple learning theories to its context. While certain traditional theories, like as behaviorism, cognitivism, and constructivism, continue to be relevant, newer theories, such as connectivism, have also found a home in the area. Researchers must have an open mind to new theories that reflect changing settings and technologies.
Greater Integration: The review recommends that there is a need for greater integration across diverse groups within learning analytics, particularly those focused on data-driven, practical approaches and those prioritizing theoretical advancement. Bridging this gap can lead to more holistic and effective learning analytics research and practice.
Importance of Ethical Considerations: As learning analytics evolves, ethical considerations, particularly those related to data privacy and algorithmic bias, should be informed by, and incorporated with appropriate ethical theories. The review does not address ethical considerations explicitly, yet they are an important component of responsible and ethical learning analytics practice.
In conclusion, the scoping review findings indicate that learning analytics is an interdisciplinary field with a rich theoretical terrain. This diversity should be capitalized on to spur innovation and improve educational outcomes. For the benefit of students and educators, theoretical viewpoints should guide the ethical use of data, bridging the gap between theory and practice.
-Khalil, M., Prinsloo, P., & Slade, S. (2022). The use and application of learning theory in learning analytics: A scoping review. Journal of Computing in Higher Education. Advance online publication (article/10.1007/s12528-022-09340-3)
Your report should include your code. Submit the published RPubs link to Blackboard.