GROUP 4 MEMBERS
1.Vinod G Moovankara
2.Pahar Singh
3.Nitin Sharma
4.Ambuj Singh
5.Shilpee Srivastava Saxena
6.Kaushal Falwaria


Q. Mitra decides to form homogeneous subgroups of players, which would better help him to express the nuances of T20 cricket. How would you go about implementing this? Apply relevant data analysis technique and generate useful insights.

DATA SET GIVEN:

The Indian Premier League Gauging Player Performance excel.XLS

Reading the data

library(readxl)
#setwd()
ipl <- read_excel("The Indian Premier League Gauging Player Performance excel.xlsx", 
    sheet = "Sheet1")
## New names:
## • `` -> `...1`
## • `` -> `...2`
View(ipl)

Second column in the data is blank. So we need to remove it.

ipl<- (ipl[,-c(2)])
ipl1<- ipl

We decided to generate a Batting Index (BI) to compare players.

We generated it by multiplying the average and strike rate (SR). In T20 we need batsmen who have consistent performance (indicated by their average) and high strike rate, i.e., how fast they score. To improve the data we divide the SR by 100, to get the SR per ball

#ipl1$SR <- ipl1$SR/100
ipl1$BI <- ipl1$Avg*ipl$SR/100

We now dropped columns 5 and 6 (average and strike rate)

ipl1<- (ipl1[,-c(5:6)])

Scaling all the columns

ipls <- scale (ipl1[,c(4:9)])

We did a few plots (elbow plot) to identify the best number of clusters

Silhouette tells us the best k number where the value is positive - that is the relation is positive and separation is also high. or how much b is greater than a.

We also did a standard wss plot

#install.packages("factoextra")
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_nbclust(ipls[,1:5], kmeans, method = "wss")

fviz_nbclust(ipls[,1:5], kmeans, method = "silhouette")

Since both the graphs above have shown that the 3 Cluster is the optimal, we will do the same.

set.seed(1234)
cluster_1<-kmeans(ipls,3)
cluster_1
## K-means clustering with 3 clusters of sizes 6, 38, 26
## 
## Cluster means:
##         Runs   Hundreds    Fifties      Fours      Sixes     Salary
## 1  1.4157758  3.2425739  0.9693617  1.3713348  0.6253612  0.7560000
## 2 -0.7970102 -0.3039913 -0.6898410 -0.7720593 -0.6145994 -0.2806928
## 3  0.8381436 -0.3039913  0.7845303  0.8119325  0.7539466  0.2357817
## 
## Clustering vector:
##  [1] 3 1 2 2 3 2 2 2 3 3 2 2 1 2 2 3 2 3 3 2 2 1 3 2 2 2 3 1 2 2 3 2 3 2 2 3 2 3
## [39] 3 2 3 3 2 2 2 2 3 2 3 3 2 1 2 3 2 3 2 2 3 3 3 2 2 3 3 2 2 1 2 2
## 
## Within cluster sum of squares by cluster:
## [1] 31.16488 43.71399 85.54479
##  (between_SS / total_SS =  61.3 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
cluster_1$totss
## [1] 414
cluster_1$betweenss
## [1] 253.5763
cluster_1$tot.withinss
## [1] 160.4237
cluster_1$betweenss/cluster_1$totss*100
## [1] 61.25032
cluster_1$tot.withinss/cluster_1$totss*100
## [1] 38.74968

Total = 414 Total of between = 226.1073 (55% heterogeneity) Total of within = 187.8927 (45% heterogeneity)

The above data indicates high homogeneity within the Clusters and low homogeneity between the Clusters

Now we will add the cluster number to the table

#Step 1 - make it a class
class(ipl)
## [1] "tbl_df"     "tbl"        "data.frame"
#Step 2 - make it a data frame
ipl <- data.frame(ipl)
#Step 3 - add cluster columnn
ipl$cluster_1 <- cluster_1$cluster

Now we will rename cluster 2 by “Valuable”, 1 by “Extremely Valuable” and 2 by “Under performer”

ipl$cluster_1<-replace(ipl$cluster_1, ipl$cluster_1 ==1, "Extremely Valuable")
ipl$cluster_1<-replace(ipl$cluster_1, ipl$cluster_1 ==3, "Valuable")  
ipl$cluster_1<-replace(ipl$cluster_1, ipl$cluster_1 ==2, "Under performer")  

Now we can show the final result

ipl
##    ...1              Player                        Team Runs   Avg     SR
## 1     1      AB de Villiers Royal Challengers Bangalore  442 44.20 154.00
## 2     2      Ajinkya Rahane            Rajasthan Royals  393 32.75 137.89
## 3     3       Akshdeep Nath Royal Challengers Bangalore   61 12.20 107.01
## 4     4       Ambati Rayudu         Chennai Super Kings  282 23.50  93.06
## 5     5       Andre Russell       Kolkata Knight Riders  510 56.66 204.81
## 6     6          Axar Patel Royal Challengers Bangalore  110 18.33 125.00
## 7     7          Ben Stokes            Rajasthan Royals  123 20.50 124.24
## 8     8   Bhuvneshwar Kumar         Sunrisers Hyderabad   12  4.00  63.15
## 9     9         Chris Gayle             Kings XI Punjab  490 40.83 153.60
## 10   10          Chris Lynn       Kolkata Knight Riders  405 31.15 139.65
## 11   11        Chris Morris              Delhi Capitals   32  5.33  86.48
## 12   12        Colin Ingram              Delhi Capitals  184 18.40 119.48
## 13   13        David Warner         Sunrisers Hyderabad  692 69.20 143.86
## 14   14        David Miller             Kings XI Punjab  213 26.62 129.87
## 15   15        Deepak Hooda         Sunrisers Hyderabad   64 10.66 101.58
## 16   16      Dinesh Karthik       Kolkata Knight Riders  253 31.62 146.24
## 17   17        Dwayne Bravo         Chennai Super Kings   80 16.00 121.21
## 18   18      Faf du Plessis         Chennai Super Kings  396 36.00 123.36
## 19   19       Hardik Pandya              Mumbai Indians  402 44.66 191.42
## 20   20        Ishan Kishan              Mumbai Indians  101 16.83 101.00
## 21   21        Jofra Archer            Rajasthan Royals   67 33.50 167.50
## 22   22      Jonny Bairstow         Sunrisers Hyderabad  445 55.62 157.24
## 23   23         Jos Buttler            Rajasthan Royals  311 38.87 151.70
## 24   24     Kane Williamson         Sunrisers Hyderabad  156 22.28 120.00
## 25   25        Kedar Jadhav         Chennai Super Kings  162 18.00  95.85
## 26   26          Keemo Paul            Rajasthan Royals   18  3.60  75.00
## 27   27      Kieron Pollard              Mumbai Indians  279 34.87 156.74
## 28   28            KL Rahul             Kings XI Punjab  593 53.90 135.38
## 29   29       Krunal Pandya              Mumbai Indians  183 16.63 122.00
## 30   30       Mandeep Singh             Kings XI Punjab  165 41.25 137.50
## 31   31       Manish Pandey         Sunrisers Hyderabad  344 43.00 130.79
## 32   32      Marcus Stoinis Royal Challengers Bangalore  211 52.75 135.25
## 33   33      Mayank Agarwal             Kings XI Punjab  332 25.53 141.88
## 34   34           Moeen Ali Royal Challengers Bangalore  220 27.50 165.41
## 35   35       Mohammad Nabi         Sunrisers Hyderabad  115 19.16 151.31
## 36   36            MS Dhoni         Chennai Super Kings  416 83.20 134.62
## 37   37     Nicholas Pooran             Kings XI Punjab  168 28.00 157.00
## 38   38         Nitish Rana       Kolkata Knight Riders  344 34.40 146.38
## 39   39       Parthiv Patel Royal Challengers Bangalore  373 26.64 139.17
## 40   40       Piyush Chawla       Kolkata Knight Riders   42 14.00 113.51
## 41   41        Prithvi Shaw              Delhi Capitals  353 22.06 133.71
## 42   42     Quinton de Kock              Mumbai Indians  529 35.26 132.91
## 43   43      Rahul Tripathi            Rajasthan Royals  141 23.50 119.49
## 44   44         Rashid Khan         Sunrisers Hyderabad   34  6.80 147.82
## 45   45 Ravichandran Ashwin             Kings XI Punjab   42  8.40 150.00
## 46   46     Ravindra Jadeja         Chennai Super Kings  106 35.33 120.45
## 47   47        Rishabh Pant              Delhi Capitals  488 37.53 162.66
## 48   48         Riyan Parag            Rajasthan Royals  160 32.00 126.98
## 49   49       Robin Uthappa       Kolkata Knight Riders  282 31.33 115.10
## 50   50        Rohit Sharma              Mumbai Indians  405 28.92 128.57
## 51   51          Sam Curran             Kings XI Punjab   95 23.75 172.72
## 52   52        Sanju Samson            Rajasthan Royals  342 34.20 148.69
## 53   55       Sarfaraz Khan             Kings XI Punjab  180 45.00 125.87
## 54   56        Shane Watson         Chennai Super Kings  398 23.41 127.56
## 55   57 Sherfane Rutherford              Delhi Capitals   73 14.60 135.18
## 56   60      Shikhar Dhawan              Delhi Capitals  521 34.73 135.67
## 57   61     Shimron Hetmyer Royal Challengers Bangalore   90 18.00 123.28
## 58   63       Shreyas Gopal            Rajasthan Royals   63 15.75 136.95
## 59   64        Shreyas Iyer              Delhi Capitals  463 30.86 119.94
## 60   65        Shubman Gill       Kolkata Knight Riders  296 32.88 124.36
## 61   67         Steve Smith            Rajasthan Royals  319 39.87 116.00
## 62   68        Stuart Binny            Rajasthan Royals   70 23.33 175.00
## 63   71        Sunil Narine       Kolkata Knight Riders  143 17.87 166.27
## 64   72        Suresh Raina         Chennai Super Kings  383 23.93 121.97
## 65   74    Suryakumar Yadav              Mumbai Indians  424 32.61 130.86
## 66   76         Umesh Yadav       Kolkata Knight Riders   25 12.50 100.00
## 67   77       Vijay Shankar         Sunrisers Hyderabad  244 20.33 126.42
## 68   80         Virat Kohli Royal Challengers Bangalore  464 33.14 141.46
## 69   84     Wriddhiman Saha         Sunrisers Hyderabad   86 17.20 162.26
## 70   92        Yusuf Pathan         Sunrisers Hyderabad   40 13.33  88.88
##    Hundreds Fifties Fours Sixes  Salary          cluster_1
## 1         0       5    31    26 1.71875           Valuable
## 2         1       1    45     9 0.62500 Extremely Valuable
## 3         0       0     5     2 0.51430    Under performer
## 4         0       1    20     7 0.34375    Under performer
## 5         0       4    31    52 1.32813           Valuable
## 6         0       0    10     3 0.71430    Under performer
## 7         0       0     8     4 1.95313    Under performer
## 8         0       0     1     0 1.32813    Under performer
## 9         0       4    45    34 0.31250           Valuable
## 10        0       4    41    22 1.50000           Valuable
## 11        0       0     1     2 1.71875    Under performer
## 12        0       0    20     5 0.91430    Under performer
## 13        1       8    57    21 1.95313 Extremely Valuable
## 14        0       1    19     7 0.46875    Under performer
## 15        0       0     5     1 0.56250    Under performer
## 16        0       2    22    14 1.15625           Valuable
## 17        0       0     6     3 1.00000    Under performer
## 18        0       3    36    15 0.25000           Valuable
## 19        0       1    28    29 1.71875           Valuable
## 20        0       0     8     4 0.96875    Under performer
## 21        0       0     4     4 1.12500    Under performer
## 22        1       2    48    18 0.31430 Extremely Valuable
## 23        0       3    38    14 0.68750           Valuable
## 24        0       1    12     5 0.46875    Under performer
## 25        0       1    19     3 1.21875    Under performer
## 26        0       0     1     1 0.07140    Under performer
## 27        0       1    14    22 0.84375           Valuable
## 28        1       6    49    25 1.71875 Extremely Valuable
## 29        0       0    18     5 1.37500    Under performer
## 30        0       0    10     4 0.21875    Under performer
## 31        0       3    34     6 1.71875           Valuable
## 32        0       0    14    10 0.96875    Under performer
## 33        0       2    26    14 0.15625           Valuable
## 34        0       2    16    17 0.26563    Under performer
## 35        0       0     8     7 0.15625    Under performer
## 36        0       3    22    23 2.34375           Valuable
## 37        0       0    10    14 0.60000    Under performer
## 38        0       3    27    21 0.53125           Valuable
## 39        0       2    48    10 0.26563           Valuable
## 40        0       0     4     2 0.65625    Under performer
## 41        0       2    45     9 0.18750           Valuable
## 42        0       4    45    25 0.43750           Valuable
## 43        0       1    13     2 0.53125    Under performer
## 44        0       0     2     2 1.40625    Under performer
## 45        0       0     3     3 1.18750    Under performer
## 46        0       0     7     4 1.09375    Under performer
## 47        0       3    37    27 2.34375           Valuable
## 48        0       1    17     5 0.02860    Under performer
## 49        0       1    28    10 1.00000           Valuable
## 50        0       2    52    10 2.34375           Valuable
## 51        0       1    13     3 1.02860    Under performer
## 52        1       0    28    13 1.25000 Extremely Valuable
## 53        0       1    19     4 0.03570    Under performer
## 54        0       3    42    20 0.62500           Valuable
## 55        0       0     2     7 0.28570    Under performer
## 56        0       5    64    11 0.81250           Valuable
## 57        0       1     4     7 0.60000    Under performer
## 58        0       0     8     1 0.03125    Under performer
## 59        0       3    41    14 1.09375           Valuable
## 60        0       3    21    10 0.28125           Valuable
## 61        0       3    30     4 1.95313           Valuable
## 62        0       0     5     4 0.07813    Under performer
## 63        0       0    17     9 1.95313    Under performer
## 64        0       3    45     9 1.71875           Valuable
## 65        0       2    45    10 0.50000           Valuable
## 66        0       0     3     1 0.65625    Under performer
## 67        0       0    11    12 0.50000    Under performer
## 68        1       2    46    13 2.65625 Extremely Valuable
## 69        0       0    13     1 0.17140    Under performer
## 70        0       0     1     1 0.29688    Under performer

Plotting the results graphically

#install.packages("cluster")
library("cluster")
clusplot(ipl, ipl$cluster_1,
         color = TRUE, shade = TRUE,
         labels = 2,lines = 0)

The main factors that prompted the grouping were:

In Cluster 2 (Extremely valuable), the batsmen have scored - - the maximum runs - centuries - maximum 4s - reasonable amount of 6s - have a high BI (Batting Index)

In Cluster 1 (Valuable), the batsmen have scored - - high runs - a few 50s - maximum 6s - reasonable amount of 4s - have the second highest BI (Batting Index)

In Cluster 3 (Under performer)