LA Assignment 26

Shyam Kumar M, Saurav K S

Use dendrograms and hierarchical clustering to analyze similarity between universities based on ranking metrics.

Step 1: Load required packages

  1. tidyverse: For data manipulation and visualization
  2. cluster: For clustering algorithms
  3. factoextra: For enhanced visualization of clustering

Step 2: Import the CSV File

# A tibble: 6 × 6
  University                    TLR    RP    GO    OI    PR
  <chr>                       <dbl> <dbl> <dbl> <dbl> <dbl>
1 Indian Institute of Science  89.8  98.6  92.3  85    90.1
2 Jawaharlal Nehru University  78.5  85.2  86.4  82.1  88.7
3 Banaras Hindu University     75.4  82.1  85.7  80    85.9
4 University of Delhi          70.3  75.5  80.2  79.5  84.5
5 Amrita Vishwa Vidyapeetham   77.6  70.3  79.8  81.3  83  
6 Jadavpur University          72.1  68.7  78.1  77.6  81.2
  • This reads the CSV file.

  • head() to preview 6 rows of data.

Step 3: Preprocess the Data

  • na.omit: used to remove rows with missing values.

  • Sets university names as row identifies for use in the dendogram.

  • Only keep numeric columns (metrics) for clustering.

Step 4: Scale the Data

              TLR         RP          GO         OI          PR
 [1,]  2.45162099  2.0973214  1.94256682  1.7366701  1.62919733
 [2,]  0.54367904  0.9785273  1.00924889  0.9606542  1.31802730
 [3,]  0.02026133  0.7197018  0.89851625  0.3987116  0.69568726
 [4,] -0.84084522  0.1686540  0.02847411  0.2649158  0.38451724
 [5,]  0.39171906 -0.2655049 -0.03480169  0.7465808  0.05112079
 [6,] -0.53692526 -0.3990922 -0.30372380 -0.2435084 -0.34895495
 [7,] -0.06416088 -0.6913145 -0.58846487 -0.8322101 -0.54899282
 [8,] -0.68888524 -1.0837274 -0.80993014 -1.4209119 -0.81570998
 [9,] -0.28365863 -0.5994732 -0.99975752 -0.4843410 -0.97129500
[10,] -0.99280520 -0.9250924 -1.14212805 -1.1265610 -1.39359717
attr(,"scaled:center")
  TLR    RP    GO    OI    PR 
75.28 73.48 80.02 78.51 82.77 
attr(,"scaled:scale")
      TLR        RP        GO        OI        PR 
 5.922612 11.977182  6.321533  3.737037  4.499148 
  • Scales all columns to have a mean of 0 and standard deviation of 1

Step 5: Compute Distance Matrix

           1         2         3         4         5         6         7
2  2.5420567                                                            
3  3.4001413 1.0277670                                                  
4  4.6839227 2.2113829 1.3845070                                        
5  4.1480998 2.0764463 1.5862454 1.4334807                              
6  5.2956118 3.0030444 2.1239732 1.1495805 1.4466797                    
7  5.6440613 3.5228762 2.6969620 1.9489804 1.8837199 0.8812278          
8  6.5959067 4.3938836 3.5027683 2.5645727 2.8084583 1.5336887 1.0054964
9  5.9251311 3.8130942 2.9991074 2.0874264 2.0147735 1.0168908 0.7246570
10 6.9173534 4.7246533 3.8193517 2.7726487 3.0288620 1.7488521 1.4225239
           8         9
2                     
3                     
4                     
5                     
6                     
7                     
8                     
9  1.1559052          
10 0.8052850 1.1045233
  • Calculates Euclidean distance between universities.

  • This distance matrix is the input for the clustering algorithm

Step 6: Apply Hierarchical Clustering


Call:
hclust(d = dist_matrix, method = "ward.D2")

Cluster method   : ward.D2 
Distance         : euclidean 
Number of objects: 10 
  • Performs hierarchical clustering

  • Ward’s method minimizes within-cluster variance.

Step 7: Plot Dendrogram

  • Visualizes the clustering result

  • k gives the no. of clusters.

Step 8: Add Cluster Assignments

# A tibble: 10 × 6
     TLR    RP    GO    OI    PR Cluster
   <dbl> <dbl> <dbl> <dbl> <dbl>   <int>
 1  89.8  98.6  92.3  85    90.1       1
 2  78.5  85.2  86.4  82.1  88.7       2
 3  75.4  82.1  85.7  80    85.9       2
 4  70.3  75.5  80.2  79.5  84.5       2
 5  77.6  70.3  79.8  81.3  83         2
 6  72.1  68.7  78.1  77.6  81.2       3
 7  74.9  65.2  76.3  75.4  80.3       3
 8  71.2  60.5  74.9  73.2  79.1       3
 9  73.6  66.3  73.7  76.7  78.4       3
10  69.4  62.4  72.8  74.3  76.5       3
  • Assigning cluster number to each university