Introduction

Learning analytics is the use of data to understand and improve learning. Unsupervised learning is a type of machine learning that can be used to identify patterns and relationships in data without the need for labeled data.

In this case study, you will use unsupervised learning to analyze learning data from a Simulated School course. You will use dimensionality reduction to reduce the number of features in the data, and then use clustering to identify groups of students with similar learning patterns.

Data

The data for this case study is generated with the simulated function below. The data contains the following features:

Student ID: A unique identifier for each student Feature 1: A measure of student engagement Feature 2: A measure of student performance

simulate_student_features <- function(n = 100) {
  # Set the random seed
  set.seed(260923)
  
  # Generate unique student IDs
  student_ids <- seq(1, n)

  # Simulate student engagement
  student_engagement <- rnorm(n, mean = 50, sd = 10)

  # Simulate student performance
  student_performance <- rnorm(n, mean = 60, sd = 15)

  # Combine the data into a data frame
  student_features <- data.frame(
    student_id = student_ids,
    student_engagement = student_engagement,
    student_performance = student_performance
  )

  # Return the data frame
  return(student_features)
}

This function takes the number of students to simulate as an input and returns a data frame with three columns: student_id, student_engagement, and student_performance. The student_engagement and student_performance features are simulated using normal distributions with mean values of 50 and 60, respectively, and standard deviations of 10 and 15, respectively.

To use the simulate_student_features() function, we can simply pass the desired number of students to simulate as the argument:

student_features <- simulate_student_features(n = 100)

We can then use this data frame to perform unsupervised learning to identify groups of students with similar learning patterns,

library(stats)
library(dbscan)
## 
## Attaching package: 'dbscan'
## The following object is masked from 'package:stats':
## 
##     as.dendrogram
library(cluster)
library(clustMixType)
library(mclust)
## Package 'mclust' version 6.0.0
## Type 'citation("mclust")' for citing this R package in publications.
library(ggplot2)
n <-100

student_ids <- seq(1,n)
student_ids
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100
sample_size <-10
sample_size
## [1] 10
sample_students_ids <-sample(student_ids, size = sample_size)
sample_students_ids <-c(1,2,3,4,5)
sample_students_ids
## [1] 1 2 3 4 5
print(sample_students_ids)
## [1] 1 2 3 4 5

simulate the data

library(stats)
library(graphics)
library(datasets)
library(methods)
library(utils)
library(grDevices)
simulate_student_features <- function(n = 100) {
  # Set the random seed
  set.seed(260923)
  
#Generate unique student IDs
  student_ids <- seq(1, n)

# Simulate student engagement
student_engagement <- rnorm(n, mean = 50, sd = 10)  

# Simulate student performance
student_performance <- rnorm(n, mean = 60, sd = 15)

  # Combine the data into a data frame
  student_features <- data.frame(
    student_id = student_ids,
    student_engagement = student_engagement,
    student_performance = student_performance
  )
  
    # Return the data frame
  return(student_features)
}
(simulate_student_features)
## function(n = 100) {
##   # Set the random seed
##   set.seed(260923)
##   
## #Generate unique student IDs
##   student_ids <- seq(1, n)
## 
## # Simulate student engagement
## student_engagement <- rnorm(n, mean = 50, sd = 10)  
## 
## # Simulate student performance
## student_performance <- rnorm(n, mean = 60, sd = 15)
## 
##   # Combine the data into a data frame
##   student_features <- data.frame(
##     student_id = student_ids,
##     student_engagement = student_engagement,
##     student_performance = student_performance
##   )
##   
##     # Return the data frame
##   return(student_features)
## }
student_data <- simulate_student_features()
student_data
##     student_id student_engagement student_performance
## 1            1           35.47855            50.52231
## 2            2           51.79512            58.88396
## 3            3           62.41012            40.56755
## 4            4           35.20679            62.46033
## 5            5           59.37552            54.69326
## 6            6           57.00109            54.09745
## 7            7           34.81908            51.59185
## 8            8           66.43009            71.66933
## 9            9           53.12224            53.38812
## 10          10           58.66933            77.51403
## 11          11           64.61152            66.60144
## 12          12           64.73720            42.54982
## 13          13           61.38786            81.60053
## 14          14           43.46833            71.08870
## 15          15           60.90687            65.14477
## 16          16           62.31512            96.92023
## 17          17           30.19917            59.56755
## 18          18           62.71153            45.28098
## 19          19           53.46864            54.77840
## 20          20           43.16863            73.15011
## 21          21           47.79988            57.94230
## 22          22           32.86718            64.51376
## 23          23           48.43080            44.12915
## 24          24           44.64360            81.96037
## 25          25           59.37263            67.10083
## 26          26           40.14190            57.07313
## 27          27           52.05290            33.93715
## 28          28           58.53192            66.30623
## 29          29           46.85526            60.08155
## 30          30           40.57498            83.21472
## 31          31           48.27392            68.24131
## 32          32           63.40857            64.74623
## 33          33           42.63505            74.19713
## 34          34           49.50695            36.55820
## 35          35           39.64492            63.26139
## 36          36           58.16632            42.53282
## 37          37           46.85609            77.93879
## 38          38           53.21176            71.84144
## 39          39           24.56077            72.83616
## 40          40           47.94104            72.70324
## 41          41           54.54657            57.30181
## 42          42           65.36895            64.50104
## 43          43           63.35530            43.93461
## 44          44           38.06179            40.12883
## 45          45           52.79802            70.87511
## 46          46           53.09701            57.98622
## 47          47           52.26786            75.90067
## 48          48           60.27817            65.06789
## 49          49           49.96451            35.62894
## 50          50           62.85455            53.86589
## 51          51           61.18422            35.28333
## 52          52           73.07623            72.12643
## 53          53           62.47065            81.81560
## 54          54           55.20634            46.58676
## 55          55           50.93514            36.66812
## 56          56           55.74387            72.98336
## 57          57           56.41645            80.03891
## 58          58           50.67616            64.95174
## 59          59           42.12485            72.70400
## 60          60           33.46542            63.30804
## 61          61           45.10251            73.36602
## 62          62           47.66047            64.39439
## 63          63           53.12859            46.91366
## 64          64           49.11046            61.00457
## 65          65           45.98572            66.17874
## 66          66           53.55687            70.29459
## 67          67           59.53438            68.26805
## 68          68           71.76380            50.57808
## 69          69           42.81577            70.02416
## 70          70           34.62981            60.46338
## 71          71           43.46502            43.93160
## 72          72           59.06670            56.56050
## 73          73           53.00539            60.76549
## 74          74           43.51405            67.50167
## 75          75           54.59768            71.64025
## 76          76           43.57840            79.52546
## 77          77           29.80201            57.75304
## 78          78           45.01510            55.98704
## 79          79           48.30114            37.44258
## 80          80           45.19776            97.40030
## 81          81           54.44051            52.26339
## 82          82           56.73032            66.93843
## 83          83           46.66086            54.04959
## 84          84           33.72874            51.90589
## 85          85           32.07322            74.45711
## 86          86           53.97599            57.86733
## 87          87           33.15555            40.26646
## 88          88           50.55680            43.10695
## 89          89           33.68189            71.96778
## 90          90           45.95642            72.64398
## 91          91           51.90072            83.01396
## 92          92           50.25869            46.90754
## 93          93           62.41476            62.63873
## 94          94           44.93745            80.00130
## 95          95           61.23368            69.66618
## 96          96           50.22648            82.94298
## 97          97           62.92548            67.67482
## 98          98           57.15390            90.19502
## 99          99           39.31421            65.78667
## 100        100           52.15625            62.67658
student_data_standardized <- scale(student_data[, -1])  # Exclude student_id column
student_data_standardized
##        student_engagement student_performance
##   [1,]        -1.47425931        -0.849831176
##   [2,]         0.13464313        -0.252476997
##   [3,]         1.18133983        -1.560996157
##   [4,]        -1.50105654         0.003018249
##   [5,]         0.88211165        -0.551859344
##   [6,]         0.64797986        -0.594423533
##   [7,]        -1.53928691        -0.773423312
##   [8,]         1.57773046         0.660906516
##   [9,]         0.26550394        -0.645098114
##  [10,]         0.81247718         1.078450376
##  [11,]         1.39840997         0.298858127
##  [12,]         1.41080201        -1.419383310
##  [13,]         1.08053955         1.370389384
##  [14,]        -0.68642397         0.619426886
##  [15,]         1.03311144         0.194793823
##  [16,]         1.17197240         2.464824285
##  [17,]        -1.99483508        -0.203641496
##  [18,]         1.21105999        -1.224270011
##  [19,]         0.29966080        -0.545776646
##  [20,]        -0.71597640         0.766693595
##  [21,]        -0.25930970        -0.319749034
##  [22,]        -1.73175475         0.149714787
##  [23,]        -0.19709689        -1.306556322
##  [24,]        -0.57053564         1.396096300
##  [25,]         0.88182636         0.334533981
##  [26,]        -1.01442756        -0.381842118
##  [27,]         0.16006166        -2.034670666
##  [28,]         0.79892787         0.277768015
##  [29,]        -0.35245440        -0.166921164
##  [30,]        -0.97172364         1.485706629
##  [31,]        -0.21256683         0.416009564
##  [32,]         1.27979268         0.166321991
##  [33,]        -0.76858934         0.841492217
##  [34,]        -0.09098263        -1.847423089
##  [35,]        -1.06343289         0.060245402
##  [36,]         0.76287783        -1.420598051
##  [37,]        -0.35237271         1.108795511
##  [38,]         0.27433123         0.673202175
##  [39,]        -2.55081177         0.744265117
##  [40,]        -0.24539004         0.734768973
##  [41,]         0.40595045        -0.365505402
##  [42,]         1.47309629         0.148806309
##  [43,]         1.27453992        -1.320454071
##  [44,]        -1.21953806        -1.592338093
##  [45,]         0.23353413         0.604168134
##  [46,]         0.26301645        -0.316610723
##  [47,]         0.18125794         0.963192313
##  [48,]         0.97111751         0.189301395
##  [49,]        -0.04586503        -1.913809004
##  [50,]         1.22516290        -0.610966175
##  [51,]         1.06045948        -1.938499819
##  [52,]         2.23307640         0.693561990
##  [53,]         1.18730833         1.385753547
##  [54,]         0.47100810        -1.130985333
##  [55,]         0.04984461        -1.839570529
##  [56,]         0.52401063         0.754780464
##  [57,]         0.59033159         1.258827546
##  [58,]         0.02430732         0.181004080
##  [59,]        -0.81889862         0.734823225
##  [60,]        -1.67276538         0.063578391
##  [61,]        -0.52528454         0.782117572
##  [62,]        -0.27305573         0.141187296
##  [63,]         0.26613016        -1.107631910
##  [64,]        -0.13007961        -0.100981158
##  [65,]        -0.43819568         0.268660576
##  [66,]         0.30836076         0.562695822
##  [67,]         0.89777599         0.417919911
##  [68,]         2.10366361        -0.845846336
##  [69,]        -0.75076981         0.543376438
##  [70,]        -1.55794968        -0.139643490
##  [71,]        -0.68675028        -1.320669188
##  [72,]         0.85165988        -0.418464161
##  [73,]         0.25398184        -0.118060647
##  [74,]        -0.68191562         0.363170472
##  [75,]         0.41099077         0.658829192
##  [76,]        -0.67557031         1.222146562
##  [77,]        -2.03399782        -0.333269290
##  [78,]        -0.53390387        -0.459431926
##  [79,]        -0.20988241        -1.784243059
##  [80,]        -0.51589301         2.499120433
##  [81,]         0.39549277        -0.725448637
##  [82,]         0.62128058         0.322932703
##  [83,]        -0.37162344        -0.597843011
##  [84,]        -1.64680050        -0.750987946
##  [85,]        -1.81004321         0.860065292
##  [86,]         0.34968872        -0.325104590
##  [87,]        -1.70332045        -1.582505853
##  [88,]         0.01253756        -1.379582049
##  [89,]        -1.65142004         0.682227645
##  [90,]        -0.44108469         0.730535252
##  [91,]         0.14505533         1.471364012
##  [92,]        -0.01685771        -1.108068799
##  [93,]         1.18179703         0.015762727
##  [94,]        -0.54156089         1.256140312
##  [95,]         1.06533591         0.517802237
##  [96,]        -0.02003365         1.466293146
##  [97,]         1.23215739         0.375539677
##  [98,]         0.66304733         1.984376991
##  [99,]        -1.09604310         0.240650937
## [100,]         0.17025257         0.018467234
## attr(,"scaled:center")
##  student_engagement student_performance 
##            50.42965            62.41808 
## attr(,"scaled:scale")
##  student_engagement student_performance 
##            10.14143            13.99781
pca_result <- prcomp(student_data_standardized, center = TRUE, scale = TRUE)
pca_result
## Standard deviations (1, .., p=2):
## [1] 1.0103856 0.9895054
## 
## Rotation (n x k) = (2 x 2):
##                            PC1       PC2
## student_engagement   0.7071068 0.7071068
## student_performance -0.7071068 0.7071068
summary(pca_result)
## Importance of components:
##                           PC1    PC2
## Standard deviation     1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion  0.5104 1.0000
summary(pca_result)
## Importance of components:
##                           PC1    PC2
## Standard deviation     1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion  0.5104 1.0000
# Scree plot
plot(pca_result, type = "l")

plot(pca_result)

biplot(pca_result)
biplot(pca_result)

Clustering the data using Kmeans and other clustering algorithms

library(cluster)
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(dendextend)
## 
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
## 
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
## 
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags: 
##   https://stackoverflow.com/questions/tagged/dendextend
## 
##  To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
## ---------------------
## 
## Attaching package: 'dendextend'
## The following object is masked from 'package:stats':
## 
##     cutree
student_data <- data.frame(
  student_id = 1:100,
  student_engagement = rnorm(100, mean = 50, sd = 10),
  student_performance = rnorm(100, mean = 60, sd = 15)
)
features_for_clustering <- student_data[, c("student_engagement", "student_performance")]
features_for_clustering
##     student_engagement student_performance
## 1             50.34141            50.87050
## 2             58.72042            73.12289
## 3             43.20279            84.44236
## 4             64.41171            68.40365
## 5             50.77767            41.25902
## 6             31.28675            45.02724
## 7             41.27575            72.30414
## 8             59.86722            62.27657
## 9             69.23595            83.63904
## 10            60.58023            67.34476
## 11            47.76827            73.75417
## 12            43.13707            53.56535
## 13            34.71940            70.69030
## 14            35.23064            42.72979
## 15            48.84513           106.85308
## 16            51.49570            45.37659
## 17            51.14283            56.88448
## 18            45.91954            67.37794
## 19            65.12505            50.21417
## 20            23.71014            77.40492
## 21            26.52903            55.94977
## 22            43.69522            42.71431
## 23            39.42058            41.04327
## 24            37.63265            45.29856
## 25            59.36463            60.75324
## 26            43.49067            49.15702
## 27            44.92786            83.98754
## 28            62.42045            49.83971
## 29            46.87671            51.89013
## 30            55.27773            57.85043
## 31            33.13091            80.73897
## 32            49.62973            66.57664
## 33            44.18231            53.28314
## 34            41.85276            86.08857
## 35            52.53889            64.85828
## 36            39.01903            84.57562
## 37            44.52350            51.35114
## 38            72.07040            44.45029
## 39            51.31941            93.99482
## 40            45.16741            58.35955
## 41            48.62702            56.28116
## 42            38.02379            69.67018
## 43            58.86010            76.71115
## 44            46.93476            53.61647
## 45            44.24499            42.26734
## 46            70.05082            52.74531
## 47            54.76850            42.22711
## 48            35.86835            51.70694
## 49            45.40187            70.08449
## 50            32.45119            69.52485
## 51            46.64593            62.56839
## 52            37.65238            82.59639
## 53            43.94986            57.54832
## 54            40.29002            71.22464
## 55            41.94026            53.32983
## 56            46.22579            72.02040
## 57            42.12285            69.53595
## 58            50.48794            41.17798
## 59            66.39980            54.33527
## 60            45.31590            61.73160
## 61            50.80898            38.50146
## 62            47.08932            72.81314
## 63            36.56073            67.21069
## 64            48.50858            62.17706
## 65            62.25178            79.81829
## 66            57.09046            63.61160
## 67            62.18861            92.80451
## 68            55.32697            57.21496
## 69            46.58509            84.21974
## 70            56.99658            34.24256
## 71            47.06972            80.99892
## 72            56.75350            47.98020
## 73            30.94684            51.25061
## 74            43.72396            58.74016
## 75            52.90853            49.67723
## 76            46.81471            71.47170
## 77            39.35296            59.00359
## 78            48.91035            65.59541
## 79            59.31772            32.66590
## 80            48.19276            53.97321
## 81            51.19977            46.36008
## 82            63.78180            69.04974
## 83            35.18343            83.85464
## 84            60.28445            45.78735
## 85            47.34873            69.03376
## 86            62.09446            55.45128
## 87            55.65226            68.95471
## 88            54.16094            37.21200
## 89            55.14687            65.02472
## 90            61.02509            90.14921
## 91            58.62450            48.63667
## 92            64.11730            50.52493
## 93            39.53840            55.47703
## 94            44.40291            54.40743
## 95            39.49910            75.42115
## 96            50.19683            57.89378
## 97            37.50572            77.89252
## 98            51.57948            49.24915
## 99            32.04550            49.26757
## 100           64.99983            64.46045
scaled_data <- scale(features_for_clustering)
scaled_data
##        student_engagement student_performance
##   [1,]        0.159597023        -0.720708827
##   [2,]        0.983898026         0.766675554
##   [3,]       -0.542678660         1.523286434
##   [4,]        1.543789565         0.451234234
##   [5,]        0.202514306        -1.363155182
##   [6,]       -1.714942526        -1.111281142
##   [7,]       -0.732254677         0.711948974
##   [8,]        1.096716945         0.041690657
##   [9,]        2.018383503         1.469591343
##  [10,]        1.166860522         0.380456188
##  [11,]       -0.093541222         0.808870819
##  [12,]       -0.549143751        -0.540580675
##  [13,]       -1.377249183         0.604077206
##  [14,]       -1.326954048        -1.264846458
##  [15,]        0.012397271         3.021253376
##  [16,]        0.273151917        -1.087930225
##  [17,]        0.238437965        -0.318725021
##  [18,]       -0.275413121         0.382673965
##  [19,]        1.613966118        -0.764578909
##  [20,]       -2.460305927         1.052892504
##  [21,]       -2.182992432        -0.381202466
##  [22,]       -0.494235577        -1.265881104
##  [23,]       -0.914761286        -1.377576046
##  [24,]       -1.090651752        -1.093146110
##  [25,]        1.047273509        -0.060131088
##  [26,]       -0.514357964        -0.835240420
##  [27,]       -0.372971507         1.492885625
##  [28,]        1.347895458        -0.789608206
##  [29,]       -0.181250233        -0.652555529
##  [30,]        0.645216993        -0.254159295
##  [31,]       -1.533519839         1.275745976
##  [32,]        0.089584037         0.329113838
##  [33,]       -0.446316273        -0.559444510
##  [34,]       -0.675490746         1.633321213
##  [35,]        0.375777789         0.214255802
##  [36,]       -0.954264366         1.532193815
##  [37,]       -0.412751915        -0.688582400
##  [38,]        2.297228235        -1.149845517
##  [39,]        0.255809462         2.161787087
##  [40,]       -0.349405163        -0.220129446
##  [41,]       -0.009060072        -0.359052231
##  [42,]       -1.052173308         0.535891007
##  [43,]        0.997638762         1.006519976
##  [44,]       -0.175538769        -0.537164067
##  [45,]       -0.440150207        -1.295757324
##  [46,]        2.098547852        -0.595393814
##  [47,]        0.595120394        -1.298446729
##  [48,]       -1.264218210        -0.664799759
##  [49,]       -0.326340172         0.563584010
##  [50,]       -1.600388480         0.526176840
##  [51,]       -0.203953227         0.061196163
##  [52,]       -1.088711201         1.399898747
##  [53,]       -0.469184455        -0.274352856
##  [54,]       -0.829228095         0.639793181
##  [55,]       -0.666882401        -0.556323619
##  [56,]       -0.245285782         0.692982864
##  [57,]       -0.648920296         0.526918754
##  [58,]        0.174011906        -1.368571944
##  [59,]        1.739372198        -0.489118481
##  [60,]       -0.334797239         0.005263829
##  [61,]        0.205594452        -1.547474583
##  [62,]       -0.160334354         0.745971179
##  [63,]       -1.196104013         0.371494667
##  [64,]       -0.020711281         0.035039384
##  [65,]        1.331302248         1.214206439
##  [66,]        0.823547763         0.130926013
##  [67,]        1.325087702         2.082225139
##  [68,]        0.650060334        -0.296635130
##  [69,]       -0.209938242         1.508405638
##  [70,]        0.814311884        -1.832145921
##  [71,]       -0.162262255         1.293121259
##  [72,]        0.790398059        -0.913901102
##  [73,]       -1.748381209        -0.695301507
##  [74,]       -0.491407604        -0.194688774
##  [75,]        0.412141882        -0.800468967
##  [76,]       -0.187349336         0.656307371
##  [77,]       -0.921413159        -0.177080281
##  [78,]        0.018813416         0.263527092
##  [79,]        1.042658895        -1.937532291
##  [80,]       -0.051781262        -0.513319030
##  [81,]        0.244039861        -1.022192074
##  [82,]        1.481821051         0.494419423
##  [83,]       -1.331598410         1.484002350
##  [84,]        1.137762662        -1.060474348
##  [85,]       -0.134813586         0.493351331
##  [86,]        1.315825535        -0.414522513
##  [87,]        0.682061771         0.488067720
##  [88,]        0.535350442        -1.633664129
##  [89,]        0.632342956         0.225381174
##  [90,]        1.210623851         1.904740845
##  [91,]        0.974461999        -0.870021407
##  [92,]        1.514826473        -0.743807073
##  [93,]       -0.903170638        -0.412801554
##  [94,]       -0.424614324        -0.484295152
##  [95,]       -0.907036117         0.920294512
##  [96,]        0.145373203        -0.251261639
##  [97,]       -1.103138902         1.085484601
##  [98,]        0.281393801        -0.829082605
##  [99,]       -1.640298483        -0.827851370
## [100,]        1.601647264         0.187664731
## attr(,"scaled:center")
##  student_engagement student_performance 
##            48.71911            61.65285 
## attr(,"scaled:scale")
##  student_engagement student_performance 
##            10.16499            14.96075
kmeans_result <- kmeans(scaled_data, centers = 3)
kmeans_result
## K-means clustering with 3 clusters of sizes 42, 28, 30
## 
## Cluster means:
##   student_engagement student_performance
## 1         -0.3480864         -0.78316939
## 2          1.2301548          0.09485208
## 3         -0.6608236          1.00790854
## 
## Clustering vector:
##   [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
##  [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
##  [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
## 
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
##  (between_SS / total_SS =  59.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
fviz_cluster(kmeans_result, data = scaled_data)

kmeans_result
## K-means clustering with 3 clusters of sizes 42, 28, 30
## 
## Cluster means:
##   student_engagement student_performance
## 1         -0.3480864         -0.78316939
## 2          1.2301548          0.09485208
## 3         -0.6608236          1.00790854
## 
## Clustering vector:
##   [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
##  [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
##  [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
## 
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
##  (between_SS / total_SS =  59.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
hclust_result <- hclust(dist(scaled_data), method = "ward.D2")
hclust_result
## 
## Call:
## hclust(d = dist(scaled_data), method = "ward.D2")
## 
## Cluster method   : ward.D2 
## Distance         : euclidean 
## Number of objects: 100
cluster_assignments <- cutree(hclust_result, k = 3)
cluster_assignments
##   [1] 1 1 2 1 1 3 2 1 1 1 3 3 2 3 2 1 3 3 1 2 3 3 3 3 1 3 2 1 3 1 2 3 3 2 3 2 3
##  [38] 1 2 3 3 2 1 3 3 1 1 3 3 2 3 2 3 2 3 3 2 1 1 3 1 3 2 3 1 1 1 1 2 1 2 1 3 3
##  [75] 1 3 3 3 1 3 1 1 2 1 3 1 1 1 1 1 1 1 3 3 2 3 2 1 3 1
plot(hclust_result)
plot(hclust_result)
rect.hclust(hclust_result, k = 3, border = "red")

Tasks

Submission

Submit a report containing the following:

Dimensionality Reduction: The dimensionality reduction technique employed was Principal Component Analysis (PCA). With as much of the original variance as feasible preserved, PCA is a technique for minimizing the number of variables in a dataset. Student_engagement and Student_performance, the two initial attributes in this study, were subjected to PCA. The linear combinations of these characteristics (principal components) that account for the majority of the variation in the data are found through principal component analysis (PCA). Based on the intended degree of dimensionality reduction, the amount of variation explained by the principal components was ranked, and a subset of these components was selected for additional study.

Clustering (including K-Means and Hierarchical Clustering): Two clustering methods were applied to the reduced-dimensional data: K-Means In clustering, the data is split into a predetermined number of clusters (in this example, three) based on similarities in the condensed feature space. Using Ward’s Method, hierarchical clustering We may experiment with different levels of grouping granularity thanks to the tree-like structure that is produced by hierarchical clustering. The analytical objective or domain expertise was used to choose the three clusters for K-Means. K-Means and hierarchical clustering were both used on the same dataset in order to compare the outcomes and spot any trends in the data.

Interpretation:Based on the characteristics of the students’ involvement and performance, the research identified three unique groups of pupils. These clusters serve as a foundation for categorizing the student population and customizing support or educational interventions. Based on their applicability for the dataset and the research or application aims, K-Means and hierarchical clustering were used for clustering, while PCA was chosen for dimensionality reduction. To improve the clustering findings and extract useful information from the data, further analysis and validation may be carried out.

Number of Clusters Identified: Assume that your study revealed three separate student clusters based on learning data. Characteristics of Each Cluster:

Cluster 1 - Highly Engaged and High-Performing Students:

This cluster may consist of students who exhibit both high engagement and high performance in the course. These students may participate actively, complete assignments on time, and consistently achieve high grades.

Educational Implication: Recognize these students as potential academic role models and consider advanced learning opportunities to further challenge and engage them.

Cluster 2 - Moderately Engaged and Moderate-Performing Students:

This cluster may include students with moderate engagement levels and average performance. Students here might engage with the course materials adequately but may not consistently excel. Educational Implication: Provide additional support or resources to help these students improve their performance and motivation.

Cluster 3 - Low Engagement and Low-Performing Students:

This cluster may comprise students who demonstrate low engagement and struggle with their performance. Students in this cluster might be disengaged, miss assignments, or perform poorly in assessments. Educational Implication: Implement targeted interventions, such as additional tutoring or personalized learning plans, to help these students catch up and increase their engagement.

Personalized Learning Paths:

The identification of different student clusters based on engagement and performance can inform the development of personalized learning paths. By recognizing students’ unique needs and characteristics within each cluster, educators and learning platforms can tailor content and interventions. For example, highly engaged and high-performing students may benefit from advanced coursework, while those in lower-performing clusters may need additional support. Early Intervention:

Learning analytics can enable early intervention strategies. When students are categorized into clusters, educators can identify students at risk of falling behind (e.g., those in the low engagement and low-performing cluster) and provide timely support. This might include tutoring, mentorship, or targeted resources to address specific needs.

Resource Allocation:

Learning analytics can optimize resource allocation. By understanding the distribution of students across clusters, educational institutions can allocate resources more efficiently. For instance, allocate more resources to support students in struggling clusters while optimizing resources for high-performing clusters. Feedback and Improvement:

Feedback loops are crucial for continuous improvement. Learning analytics can provide insights into the effectiveness of interventions and instructional methods. Institutions can use this data to refine their teaching strategies and make data-driven decisions to enhance overall learning outcomes. Predictive Analytics:

Clustering students based on their engagement and performance can serve as a foundation for predictive analytics. Machine learning models can be trained to predict which cluster a new student is likely to belong to based on their initial behavior. This can help educators identify potential issues early and intervene proactively. Ethical Considerations:

It’s important to consider ethical implications. Learning analytics should be used responsibly, with a focus on student well-being and privacy. Ensure that the data collected and the actions taken based on the analysis align with ethical guidelines and regulations.

Continuous Monitoring and Adaptation:

Learning analytics is an ongoing process. Institutions should continuously collect data and monitor the changing patterns in student behavior. As students progress through their academic journey, their needs and behaviors may evolve, requiring adaptive strategies. In conclusion, the findings from your learning analytics study, which involved clustering students based on engagement and performance, have significant implications for optimizing educational practices. By leveraging these insights, educational institutions can enhance the learning experience, promote student success, and continually improve their teaching methods. However, it’s essential to approach learning analytics with sensitivity to student privacy and ethical considerations while focusing on the ultimate goal of improving learning outcomes.

Siemens, G. (2013). Learning Analytics: The Emergence of a Discipline. American Behavioral Scientist, 57(10), 1380-1400. doi:10.1177/0002764213498851

Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, 18(6), 683-695. doi:10.1080/13562517.2013.827653

Your report should include your code. Submit the published RPubs link to Blackboard.