Introduction

Learning analytics is the use of data to understand and improve learning. Unsupervised learning is a type of machine learning that can be used to identify patterns and relationships in data without the need for labeled data.

Data

The data for this case study is generated with the simulated function below. The data contains the following features:

Student ID: A unique identifier for each student Feature 1: A measure of student engagement Feature 2: A measure of student performance

simulate_student_features <- function(n = 100) {
  # Set the random seed
  set.seed(260923)
  
  # Generate unique student IDs
  student_ids <- seq(1, n)
  
  
  # Simulate student engagement
  student_engagement <- rnorm(n, mean = 50, sd = 10)

  # Simulate student performance
  student_performance <- rnorm(n, mean = 60, sd = 15)

  # Combine the data into a data frame
  student_features <- data.frame(
    student_id = student_ids,
    student_engagement = student_engagement,
    student_performance = student_performance
  )

  # Return the data frame
  return(student_features)
}

student_features <- simulate_student_features(n = 100)

We can then use this data frame to perform unsupervised learning to identify groups of students with similar learning patterns,

Tasks

Simulate the data.
Perform dimensionality reduction on the data using PCA.
Cluster the data using KMeans and other clustering algorithms
Interpret the results of your analysis.

library(stats)
library(dbscan)

## Warning: package 'dbscan' was built under R version 4.3.1

## 
## Attaching package: 'dbscan'

## The following object is masked from 'package:stats':
## 
##     as.dendrogram

library(cluster)

## Warning: package 'cluster' was built under R version 4.3.1

library(clustMixType)

## Warning: package 'clustMixType' was built under R version 4.3.1

library(mclust)

## Warning: package 'mclust' was built under R version 4.3.1

## Package 'mclust' version 6.0.0
## Type 'citation("mclust")' for citing this R package in publications.

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.1

n <-100

student_ids <- seq(1,n)
student_ids

##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
##  [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
##  [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
##  [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
##  [91]  91  92  93  94  95  96  97  98  99 100

#show a random sample of student ids

sample_size <-10
sample_size

## [1] 10

sample_students_ids <-sample(student_ids, size = sample_size)

sample_students_ids <-c(1,2,3,4,5)
sample_students_ids

## [1] 1 2 3 4 5

#print the sample

print(sample_students_ids)

## [1] 1 2 3 4 5

simulate the data

library(stats)
library(graphics)
library(datasets)
library(methods)
library(utils)
library(grDevices)

simulate_student_features <- function(n = 100) {
  # Set the random seed
  set.seed(260923)
  
#Generate unique student IDs
  student_ids <- seq(1, n)

# Simulate student engagement
student_engagement <- rnorm(n, mean = 50, sd = 10)  

# Simulate student performance
student_performance <- rnorm(n, mean = 60, sd = 15)

  # Combine the data into a data frame
  student_features <- data.frame(
    student_id = student_ids,
    student_engagement = student_engagement,
    student_performance = student_performance
  )
  
    # Return the data frame
  return(student_features)
}

(simulate_student_features)

## function(n = 100) {
##   # Set the random seed
##   set.seed(260923)
##   
## #Generate unique student IDs
##   student_ids <- seq(1, n)
## 
## # Simulate student engagement
## student_engagement <- rnorm(n, mean = 50, sd = 10)  
## 
## # Simulate student performance
## student_performance <- rnorm(n, mean = 60, sd = 15)
## 
##   # Combine the data into a data frame
##   student_features <- data.frame(
##     student_id = student_ids,
##     student_engagement = student_engagement,
##     student_performance = student_performance
##   )
##   
##     # Return the data frame
##   return(student_features)
## }

student_data <- simulate_student_features()
student_data

##     student_id student_engagement student_performance
## 1            1           35.47855            50.52231
## 2            2           51.79512            58.88396
## 3            3           62.41012            40.56755
## 4            4           35.20679            62.46033
## 5            5           59.37552            54.69326
## 6            6           57.00109            54.09745
## 7            7           34.81908            51.59185
## 8            8           66.43009            71.66933
## 9            9           53.12224            53.38812
## 10          10           58.66933            77.51403
## 11          11           64.61152            66.60144
## 12          12           64.73720            42.54982
## 13          13           61.38786            81.60053
## 14          14           43.46833            71.08870
## 15          15           60.90687            65.14477
## 16          16           62.31512            96.92023
## 17          17           30.19917            59.56755
## 18          18           62.71153            45.28098
## 19          19           53.46864            54.77840
## 20          20           43.16863            73.15011
## 21          21           47.79988            57.94230
## 22          22           32.86718            64.51376
## 23          23           48.43080            44.12915
## 24          24           44.64360            81.96037
## 25          25           59.37263            67.10083
## 26          26           40.14190            57.07313
## 27          27           52.05290            33.93715
## 28          28           58.53192            66.30623
## 29          29           46.85526            60.08155
## 30          30           40.57498            83.21472
## 31          31           48.27392            68.24131
## 32          32           63.40857            64.74623
## 33          33           42.63505            74.19713
## 34          34           49.50695            36.55820
## 35          35           39.64492            63.26139
## 36          36           58.16632            42.53282
## 37          37           46.85609            77.93879
## 38          38           53.21176            71.84144
## 39          39           24.56077            72.83616
## 40          40           47.94104            72.70324
## 41          41           54.54657            57.30181
## 42          42           65.36895            64.50104
## 43          43           63.35530            43.93461
## 44          44           38.06179            40.12883
## 45          45           52.79802            70.87511
## 46          46           53.09701            57.98622
## 47          47           52.26786            75.90067
## 48          48           60.27817            65.06789
## 49          49           49.96451            35.62894
## 50          50           62.85455            53.86589
## 51          51           61.18422            35.28333
## 52          52           73.07623            72.12643
## 53          53           62.47065            81.81560
## 54          54           55.20634            46.58676
## 55          55           50.93514            36.66812
## 56          56           55.74387            72.98336
## 57          57           56.41645            80.03891
## 58          58           50.67616            64.95174
## 59          59           42.12485            72.70400
## 60          60           33.46542            63.30804
## 61          61           45.10251            73.36602
## 62          62           47.66047            64.39439
## 63          63           53.12859            46.91366
## 64          64           49.11046            61.00457
## 65          65           45.98572            66.17874
## 66          66           53.55687            70.29459
## 67          67           59.53438            68.26805
## 68          68           71.76380            50.57808
## 69          69           42.81577            70.02416
## 70          70           34.62981            60.46338
## 71          71           43.46502            43.93160
## 72          72           59.06670            56.56050
## 73          73           53.00539            60.76549
## 74          74           43.51405            67.50167
## 75          75           54.59768            71.64025
## 76          76           43.57840            79.52546
## 77          77           29.80201            57.75304
## 78          78           45.01510            55.98704
## 79          79           48.30114            37.44258
## 80          80           45.19776            97.40030
## 81          81           54.44051            52.26339
## 82          82           56.73032            66.93843
## 83          83           46.66086            54.04959
## 84          84           33.72874            51.90589
## 85          85           32.07322            74.45711
## 86          86           53.97599            57.86733
## 87          87           33.15555            40.26646
## 88          88           50.55680            43.10695
## 89          89           33.68189            71.96778
## 90          90           45.95642            72.64398
## 91          91           51.90072            83.01396
## 92          92           50.25869            46.90754
## 93          93           62.41476            62.63873
## 94          94           44.93745            80.00130
## 95          95           61.23368            69.66618
## 96          96           50.22648            82.94298
## 97          97           62.92548            67.67482
## 98          98           57.15390            90.19502
## 99          99           39.31421            65.78667
## 100        100           52.15625            62.67658

Data standardzation

student_data_standardized <- scale(student_data[, -1])  # Exclude student_id column
student_data_standardized

##        student_engagement student_performance
##   [1,]        -1.47425931        -0.849831176
##   [2,]         0.13464313        -0.252476997
##   [3,]         1.18133983        -1.560996157
##   [4,]        -1.50105654         0.003018249
##   [5,]         0.88211165        -0.551859344
##   [6,]         0.64797986        -0.594423533
##   [7,]        -1.53928691        -0.773423312
##   [8,]         1.57773046         0.660906516
##   [9,]         0.26550394        -0.645098114
##  [10,]         0.81247718         1.078450376
##  [11,]         1.39840997         0.298858127
##  [12,]         1.41080201        -1.419383310
##  [13,]         1.08053955         1.370389384
##  [14,]        -0.68642397         0.619426886
##  [15,]         1.03311144         0.194793823
##  [16,]         1.17197240         2.464824285
##  [17,]        -1.99483508        -0.203641496
##  [18,]         1.21105999        -1.224270011
##  [19,]         0.29966080        -0.545776646
##  [20,]        -0.71597640         0.766693595
##  [21,]        -0.25930970        -0.319749034
##  [22,]        -1.73175475         0.149714787
##  [23,]        -0.19709689        -1.306556322
##  [24,]        -0.57053564         1.396096300
##  [25,]         0.88182636         0.334533981
##  [26,]        -1.01442756        -0.381842118
##  [27,]         0.16006166        -2.034670666
##  [28,]         0.79892787         0.277768015
##  [29,]        -0.35245440        -0.166921164
##  [30,]        -0.97172364         1.485706629
##  [31,]        -0.21256683         0.416009564
##  [32,]         1.27979268         0.166321991
##  [33,]        -0.76858934         0.841492217
##  [34,]        -0.09098263        -1.847423089
##  [35,]        -1.06343289         0.060245402
##  [36,]         0.76287783        -1.420598051
##  [37,]        -0.35237271         1.108795511
##  [38,]         0.27433123         0.673202175
##  [39,]        -2.55081177         0.744265117
##  [40,]        -0.24539004         0.734768973
##  [41,]         0.40595045        -0.365505402
##  [42,]         1.47309629         0.148806309
##  [43,]         1.27453992        -1.320454071
##  [44,]        -1.21953806        -1.592338093
##  [45,]         0.23353413         0.604168134
##  [46,]         0.26301645        -0.316610723
##  [47,]         0.18125794         0.963192313
##  [48,]         0.97111751         0.189301395
##  [49,]        -0.04586503        -1.913809004
##  [50,]         1.22516290        -0.610966175
##  [51,]         1.06045948        -1.938499819
##  [52,]         2.23307640         0.693561990
##  [53,]         1.18730833         1.385753547
##  [54,]         0.47100810        -1.130985333
##  [55,]         0.04984461        -1.839570529
##  [56,]         0.52401063         0.754780464
##  [57,]         0.59033159         1.258827546
##  [58,]         0.02430732         0.181004080
##  [59,]        -0.81889862         0.734823225
##  [60,]        -1.67276538         0.063578391
##  [61,]        -0.52528454         0.782117572
##  [62,]        -0.27305573         0.141187296
##  [63,]         0.26613016        -1.107631910
##  [64,]        -0.13007961        -0.100981158
##  [65,]        -0.43819568         0.268660576
##  [66,]         0.30836076         0.562695822
##  [67,]         0.89777599         0.417919911
##  [68,]         2.10366361        -0.845846336
##  [69,]        -0.75076981         0.543376438
##  [70,]        -1.55794968        -0.139643490
##  [71,]        -0.68675028        -1.320669188
##  [72,]         0.85165988        -0.418464161
##  [73,]         0.25398184        -0.118060647
##  [74,]        -0.68191562         0.363170472
##  [75,]         0.41099077         0.658829192
##  [76,]        -0.67557031         1.222146562
##  [77,]        -2.03399782        -0.333269290
##  [78,]        -0.53390387        -0.459431926
##  [79,]        -0.20988241        -1.784243059
##  [80,]        -0.51589301         2.499120433
##  [81,]         0.39549277        -0.725448637
##  [82,]         0.62128058         0.322932703
##  [83,]        -0.37162344        -0.597843011
##  [84,]        -1.64680050        -0.750987946
##  [85,]        -1.81004321         0.860065292
##  [86,]         0.34968872        -0.325104590
##  [87,]        -1.70332045        -1.582505853
##  [88,]         0.01253756        -1.379582049
##  [89,]        -1.65142004         0.682227645
##  [90,]        -0.44108469         0.730535252
##  [91,]         0.14505533         1.471364012
##  [92,]        -0.01685771        -1.108068799
##  [93,]         1.18179703         0.015762727
##  [94,]        -0.54156089         1.256140312
##  [95,]         1.06533591         0.517802237
##  [96,]        -0.02003365         1.466293146
##  [97,]         1.23215739         0.375539677
##  [98,]         0.66304733         1.984376991
##  [99,]        -1.09604310         0.240650937
## [100,]         0.17025257         0.018467234
## attr(,"scaled:center")
##  student_engagement student_performance 
##            50.42965            62.41808 
## attr(,"scaled:scale")
##  student_engagement student_performance 
##            10.14143            13.99781

Perform PCA

pca_result <- prcomp(student_data_standardized, center = TRUE, scale = TRUE)
pca_result

## Standard deviations (1, .., p=2):
## [1] 1.0103856 0.9895054
## 
## Rotation (n x k) = (2 x 2):
##                            PC1       PC2
## student_engagement   0.7071068 0.7071068
## student_performance -0.7071068 0.7071068

Explored PCA results

summary(pca_result)

## Importance of components:
##                           PC1    PC2
## Standard deviation     1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion  0.5104 1.0000

summary(pca_result)

## Importance of components:
##                           PC1    PC2
## Standard deviation     1.0104 0.9895
## Proportion of Variance 0.5104 0.4896
## Cumulative Proportion  0.5104 1.0000

Visualization PCA

# Scree plot
plot(pca_result, type = "l")

plot(pca_result)

Biplot

biplot(pca_result)
biplot(pca_result)

Clustering the data using Kmeans and other clustering algorithms

library(cluster)
library(factoextra)

## Warning: package 'factoextra' was built under R version 4.3.1

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(dendextend)

## Warning: package 'dendextend' was built under R version 4.3.1

## 
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
## 
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
## 
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags: 
##   https://stackoverflow.com/questions/tagged/dendextend
## 
##  To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
## ---------------------

## 
## Attaching package: 'dendextend'

## The following object is masked from 'package:stats':
## 
##     cutree

Created a sample student dataset

student_data <- data.frame(
  student_id = 1:100,
  student_engagement = rnorm(100, mean = 50, sd = 10),
  student_performance = rnorm(100, mean = 60, sd = 15)
)

Extract the relevant features for clustering

features_for_clustering <- student_data[, c("student_engagement", "student_performance")]
features_for_clustering

##     student_engagement student_performance
## 1             50.34141            50.87050
## 2             58.72042            73.12289
## 3             43.20279            84.44236
## 4             64.41171            68.40365
## 5             50.77767            41.25902
## 6             31.28675            45.02724
## 7             41.27575            72.30414
## 8             59.86722            62.27657
## 9             69.23595            83.63904
## 10            60.58023            67.34476
## 11            47.76827            73.75417
## 12            43.13707            53.56535
## 13            34.71940            70.69030
## 14            35.23064            42.72979
## 15            48.84513           106.85308
## 16            51.49570            45.37659
## 17            51.14283            56.88448
## 18            45.91954            67.37794
## 19            65.12505            50.21417
## 20            23.71014            77.40492
## 21            26.52903            55.94977
## 22            43.69522            42.71431
## 23            39.42058            41.04327
## 24            37.63265            45.29856
## 25            59.36463            60.75324
## 26            43.49067            49.15702
## 27            44.92786            83.98754
## 28            62.42045            49.83971
## 29            46.87671            51.89013
## 30            55.27773            57.85043
## 31            33.13091            80.73897
## 32            49.62973            66.57664
## 33            44.18231            53.28314
## 34            41.85276            86.08857
## 35            52.53889            64.85828
## 36            39.01903            84.57562
## 37            44.52350            51.35114
## 38            72.07040            44.45029
## 39            51.31941            93.99482
## 40            45.16741            58.35955
## 41            48.62702            56.28116
## 42            38.02379            69.67018
## 43            58.86010            76.71115
## 44            46.93476            53.61647
## 45            44.24499            42.26734
## 46            70.05082            52.74531
## 47            54.76850            42.22711
## 48            35.86835            51.70694
## 49            45.40187            70.08449
## 50            32.45119            69.52485
## 51            46.64593            62.56839
## 52            37.65238            82.59639
## 53            43.94986            57.54832
## 54            40.29002            71.22464
## 55            41.94026            53.32983
## 56            46.22579            72.02040
## 57            42.12285            69.53595
## 58            50.48794            41.17798
## 59            66.39980            54.33527
## 60            45.31590            61.73160
## 61            50.80898            38.50146
## 62            47.08932            72.81314
## 63            36.56073            67.21069
## 64            48.50858            62.17706
## 65            62.25178            79.81829
## 66            57.09046            63.61160
## 67            62.18861            92.80451
## 68            55.32697            57.21496
## 69            46.58509            84.21974
## 70            56.99658            34.24256
## 71            47.06972            80.99892
## 72            56.75350            47.98020
## 73            30.94684            51.25061
## 74            43.72396            58.74016
## 75            52.90853            49.67723
## 76            46.81471            71.47170
## 77            39.35296            59.00359
## 78            48.91035            65.59541
## 79            59.31772            32.66590
## 80            48.19276            53.97321
## 81            51.19977            46.36008
## 82            63.78180            69.04974
## 83            35.18343            83.85464
## 84            60.28445            45.78735
## 85            47.34873            69.03376
## 86            62.09446            55.45128
## 87            55.65226            68.95471
## 88            54.16094            37.21200
## 89            55.14687            65.02472
## 90            61.02509            90.14921
## 91            58.62450            48.63667
## 92            64.11730            50.52493
## 93            39.53840            55.47703
## 94            44.40291            54.40743
## 95            39.49910            75.42115
## 96            50.19683            57.89378
## 97            37.50572            77.89252
## 98            51.57948            49.24915
## 99            32.04550            49.26757
## 100           64.99983            64.46045

Standardized the data

scaled_data <- scale(features_for_clustering)
scaled_data

##        student_engagement student_performance
##   [1,]        0.159597023        -0.720708827
##   [2,]        0.983898026         0.766675554
##   [3,]       -0.542678660         1.523286434
##   [4,]        1.543789565         0.451234234
##   [5,]        0.202514306        -1.363155182
##   [6,]       -1.714942526        -1.111281142
##   [7,]       -0.732254677         0.711948974
##   [8,]        1.096716945         0.041690657
##   [9,]        2.018383503         1.469591343
##  [10,]        1.166860522         0.380456188
##  [11,]       -0.093541222         0.808870819
##  [12,]       -0.549143751        -0.540580675
##  [13,]       -1.377249183         0.604077206
##  [14,]       -1.326954048        -1.264846458
##  [15,]        0.012397271         3.021253376
##  [16,]        0.273151917        -1.087930225
##  [17,]        0.238437965        -0.318725021
##  [18,]       -0.275413121         0.382673965
##  [19,]        1.613966118        -0.764578909
##  [20,]       -2.460305927         1.052892504
##  [21,]       -2.182992432        -0.381202466
##  [22,]       -0.494235577        -1.265881104
##  [23,]       -0.914761286        -1.377576046
##  [24,]       -1.090651752        -1.093146110
##  [25,]        1.047273509        -0.060131088
##  [26,]       -0.514357964        -0.835240420
##  [27,]       -0.372971507         1.492885625
##  [28,]        1.347895458        -0.789608206
##  [29,]       -0.181250233        -0.652555529
##  [30,]        0.645216993        -0.254159295
##  [31,]       -1.533519839         1.275745976
##  [32,]        0.089584037         0.329113838
##  [33,]       -0.446316273        -0.559444510
##  [34,]       -0.675490746         1.633321213
##  [35,]        0.375777789         0.214255802
##  [36,]       -0.954264366         1.532193815
##  [37,]       -0.412751915        -0.688582400
##  [38,]        2.297228235        -1.149845517
##  [39,]        0.255809462         2.161787087
##  [40,]       -0.349405163        -0.220129446
##  [41,]       -0.009060072        -0.359052231
##  [42,]       -1.052173308         0.535891007
##  [43,]        0.997638762         1.006519976
##  [44,]       -0.175538769        -0.537164067
##  [45,]       -0.440150207        -1.295757324
##  [46,]        2.098547852        -0.595393814
##  [47,]        0.595120394        -1.298446729
##  [48,]       -1.264218210        -0.664799759
##  [49,]       -0.326340172         0.563584010
##  [50,]       -1.600388480         0.526176840
##  [51,]       -0.203953227         0.061196163
##  [52,]       -1.088711201         1.399898747
##  [53,]       -0.469184455        -0.274352856
##  [54,]       -0.829228095         0.639793181
##  [55,]       -0.666882401        -0.556323619
##  [56,]       -0.245285782         0.692982864
##  [57,]       -0.648920296         0.526918754
##  [58,]        0.174011906        -1.368571944
##  [59,]        1.739372198        -0.489118481
##  [60,]       -0.334797239         0.005263829
##  [61,]        0.205594452        -1.547474583
##  [62,]       -0.160334354         0.745971179
##  [63,]       -1.196104013         0.371494667
##  [64,]       -0.020711281         0.035039384
##  [65,]        1.331302248         1.214206439
##  [66,]        0.823547763         0.130926013
##  [67,]        1.325087702         2.082225139
##  [68,]        0.650060334        -0.296635130
##  [69,]       -0.209938242         1.508405638
##  [70,]        0.814311884        -1.832145921
##  [71,]       -0.162262255         1.293121259
##  [72,]        0.790398059        -0.913901102
##  [73,]       -1.748381209        -0.695301507
##  [74,]       -0.491407604        -0.194688774
##  [75,]        0.412141882        -0.800468967
##  [76,]       -0.187349336         0.656307371
##  [77,]       -0.921413159        -0.177080281
##  [78,]        0.018813416         0.263527092
##  [79,]        1.042658895        -1.937532291
##  [80,]       -0.051781262        -0.513319030
##  [81,]        0.244039861        -1.022192074
##  [82,]        1.481821051         0.494419423
##  [83,]       -1.331598410         1.484002350
##  [84,]        1.137762662        -1.060474348
##  [85,]       -0.134813586         0.493351331
##  [86,]        1.315825535        -0.414522513
##  [87,]        0.682061771         0.488067720
##  [88,]        0.535350442        -1.633664129
##  [89,]        0.632342956         0.225381174
##  [90,]        1.210623851         1.904740845
##  [91,]        0.974461999        -0.870021407
##  [92,]        1.514826473        -0.743807073
##  [93,]       -0.903170638        -0.412801554
##  [94,]       -0.424614324        -0.484295152
##  [95,]       -0.907036117         0.920294512
##  [96,]        0.145373203        -0.251261639
##  [97,]       -1.103138902         1.085484601
##  [98,]        0.281393801        -0.829082605
##  [99,]       -1.640298483        -0.827851370
## [100,]        1.601647264         0.187664731
## attr(,"scaled:center")
##  student_engagement student_performance 
##            48.71911            61.65285 
## attr(,"scaled:scale")
##  student_engagement student_performance 
##            10.16499            14.96075

#Performed K-means clustering

kmeans_result <- kmeans(scaled_data, centers = 3)
kmeans_result

## K-means clustering with 3 clusters of sizes 42, 28, 30
## 
## Cluster means:
##   student_engagement student_performance
## 1         -0.3480864         -0.78316939
## 2          1.2301548          0.09485208
## 3         -0.6608236          1.00790854
## 
## Clustering vector:
##   [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
##  [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
##  [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
## 
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
##  (between_SS / total_SS =  59.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

#visualize K-means clusters

fviz_cluster(kmeans_result, data = scaled_data)

kmeans_result

## K-means clustering with 3 clusters of sizes 42, 28, 30
## 
## Cluster means:
##   student_engagement student_performance
## 1         -0.3480864         -0.78316939
## 2          1.2301548          0.09485208
## 3         -0.6608236          1.00790854
## 
## Clustering vector:
##   [1] 1 2 3 2 1 1 3 2 2 2 3 1 3 1 3 1 1 3 2 3 1 1 1 1 2 1 3 2 1 2 3 3 1 3 2 3 1
##  [38] 2 3 1 1 3 2 1 1 2 1 1 3 3 1 3 1 3 1 3 3 1 2 1 1 3 3 1 2 2 2 2 3 1 3 2 1 1
##  [75] 1 3 1 3 1 1 1 2 3 2 3 2 2 1 2 2 2 2 1 1 3 1 3 1 1 2
## 
## Within cluster sum of squares by cluster:
## [1] 32.04178 26.66170 22.24595
##  (between_SS / total_SS =  59.1 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Performed hierarchical clustering (using Ward’s method)

hclust_result <- hclust(dist(scaled_data), method = "ward.D2")
hclust_result

## 
## Call:
## hclust(d = dist(scaled_data), method = "ward.D2")
## 
## Cluster method   : ward.D2 
## Distance         : euclidean 
## Number of objects: 100

cluster_assignments <- cutree(hclust_result, k = 3)
cluster_assignments

##   [1] 1 1 2 1 1 3 2 1 1 1 3 3 2 3 2 1 3 3 1 2 3 3 3 3 1 3 2 1 3 1 2 3 3 2 3 2 3
##  [38] 1 2 3 3 2 1 3 3 1 1 3 3 2 3 2 3 2 3 3 2 1 1 3 1 3 2 3 1 1 1 1 2 1 2 1 3 3
##  [75] 1 3 3 3 1 3 1 1 2 1 3 1 1 1 1 1 1 1 3 3 2 3 2 1 3 1

plot(hclust_result)
plot(hclust_result)
rect.hclust(hclust_result, k = 3, border = "red")

Submission

Submit a report containing the following:

A brief description of your approach to dimensionality reduction and clustering.

A brief description of the approach to dimensionality reduction and clustering applied in the analysis

Dimensionality Reduction Principal Component Analysis (PCA) was used as the dimensionality reduction method. PCA is a method for reducing the number of variables in a dataset while retaining as much of the original variance as possible. PCA was applied to the two original characteristics in this analysis: student_engagement and student_performance. The principal component analysis (PCA) identifies linear combinations of these qualities (principal components) that capture the most variance in the data. The amount of variance explained by the primary components was sorted, and a subset of these components was chosen for further analysis based on the desired level of dimensionality reduction.

Clustering (including K-Means and Hierarchical Clustering):

On the reduced-dimensional data, two clustering algorithms were used: K-Means Clustering divides the data into a preset number of clusters (in this case, three) based on similarity in the reduced feature space. Hierarchical Clustering (Ward’s Method): Hierarchical clustering creates a tree-like structure of clusters that allows us to experiment with various levels of granularity in grouping. The three clusters for K-Means were chosen based on the analytical goal or domain knowledge. To compare the results and identify potential trends in the data, both K-Means and hierarchical clustering were applied to the same dataset.

Interpretation

The analysis revealed three distinct clusters of students based on their engagement and performance characteristics. These clusters provide a basis for segmenting the student population and tailoring educational interventions or support strategies. The choice of PCA for dimensionality reduction and the use of K-Means and hierarchical clustering for clustering were made based on their suitability for the dataset and the research or application goals. Further analysis and validation may be conducted to refine the clustering results and derive actionable insights from the data.

The results of your analysis, including the number of clusters identified, the characteristics of each cluster, and any other insights you gained from the data.

The K-means clustering and hierarchical clustering studies provide information about the structure of the student data. Here’s how the results were interpreted: K-Means Clustering: Three groups with 42, 28, and 30 students each were identified. For the two features, student_engagement and student_performance, each cluster has its own centroid or mean value. Cluster 1: When compared to the other clusters, this cluster has lower values for both student involvement and student performance. These pupils exhibit below-average engagement and performance. Cluster 2: Students in this cluster are highly engaged but perform only averagely. Cluster 3: Students in this cluster have below-average involvement but above-average performance. Each student is assigned to one of these clusters using the clustering vector

**Ward's Method of Hierarchical Clustering**

Using Ward’s approach and Euclidean distance, hierarchical clustering classified the students into groups based on their similarities. The dendrogram structure demonstrates how the clusters formed. Hierarchical clustering, like K-means, revealed three major groupings

 **Interpretation**
 
 
 Based on their engagement and performance, the results indicate that the dataset contains three separate groups of pupils.

Students in Cluster 1 are neither highly engaged nor high performers. Cluster 2 comprises of pupils that are very engaged yet perform below average. Students in Cluster 3 have below-average involvement but above-average achievement. These clusters may be useful for focused interventions or customized support measures. Cluster 2 may benefit from more resources to boost performance, whilst Cluster 3 may require tactics to increase participation. Cluster 1 may require extensive assistance in both engagement and performance. The decision between K-means and hierarchical clustering is determined by the individual aims and data properties.

A discussion of the implications of your findings for learning analytics.
Provide at least one scholarly reference.

The findings of the scoping review regarding the theoretical influences on learning analytics have several important implications for the field of learning analytics:

Theoretical Pluralism is Beneficial: The review emphasizes the wide spectrum of theoretical influences in learning analytics, with no single theory prevailing. This theoretical variety should be treasured and safeguarded. It implies that learning analytics is not bound by a single theoretical perspective, allowing academics and practitioners to draw from a diverse range of ideas and methodologies.

Theory is Critical in Data-Driven Methodologies: Despite the abundance of data available in learning analytics, the function of theory remains critical. According to the review, larger datasets underline the need of theory in analysis. Theory assists researchers in making sense of data, finding relevant factors, interpreting outcomes, and turning data-driven insights into actionable insights**

Learning Analytics Is intrinsically multidisciplinary: Learning analytics is intrinsically multidisciplinary, pulling from subjects such as computer science, educational psychology, neuroscience, and anthropology. The paper underlines the need for learning analytics academics and practitioners embracing this multidisciplinary character and exploring diverse theories from these disciplines.**

Bridging Theory and Practice Challenges: The review identifies a potential gap between the academic community’s emphasis on theories and the practical, data-driven orientation of learning analytics programs. Bridging this gap is critical for the progress of the subject, as it ensures that theoretical findings are converted into successful educational interventions and advances.

Continuous Learning Theory Exploration: Learning analytics should continue to investigate and adapt multiple learning theories to its context. While certain traditional theories, like as behaviorism, cognitivism, and constructivism, continue to be relevant, newer theories, such as connectivism, have also found a home in the area. Researchers must have an open mind to new theories that reflect changing settings and technologies.

Greater Integration: The review recommends that there is a need for greater integration across diverse groups within learning analytics, particularly those focused on data-driven, practical approaches and those prioritizing theoretical advancement. Bridging this gap can lead to more holistic and effective learning analytics research and practice.

Importance of Ethical Considerations: As learning analytics evolves, ethical considerations, particularly those related to data privacy and algorithmic bias, should be informed by, and incorporated with appropriate ethical theories. The review does not address ethical considerations explicitly, yet they are an important component of responsible and ethical learning analytics practice.

In conclusion, the scoping review findings indicate that learning analytics is an interdisciplinary field with a rich theoretical terrain. This diversity should be capitalized on to spur innovation and improve educational outcomes. For the benefit of students and educators, theoretical viewpoints should guide the ethical use of data, bridging the gap between theory and practice.

-Khalil, M., Prinsloo, P., & Slade, S. (2022). The use and application of learning theory in learning analytics: A scoping review. Journal of Computing in Higher Education. Advance online publication (article/10.1007/s12528-022-09340-3)

Your report should include your code. Submit the published RPubs link to Blackboard.

Lab 3 Case Study: Unsupervised Learning in Learning Analytics

Renu Mutha

2023-09-29