1. Defining the question and context.

Kira Plastinina is a Russian brand that is sold through a defunct chain of retail stores in Russia, Ukraine, Kazakhstan, Belarus, China, Philippines, and Armenia. The brand’s Sales and Marketing team would like to understand their customer’s behavior from data that they have collected over the past year. More specifically, they would like to learn the characteristics of customer groups.

Perform clustering stating insights drawn from your analysis and visualizations. Upon implementation, provide comparisons between the approaches learned this week i.e. K-Means clustering vs Hierarchical clustering highlighting the strengths and limitations of each approach in the context of your analysis.

2. Defining the metrics of success

The analysis will be a success once we have gotten the clusters that the records belong to.

3. Experimental Design.

  1. Exploratory Data Analysis
  2. Data Cleaning
  3. Perform Exploratory Data Analysis (Univariate, Bivariate & Multivariate)
  4. Implement the Solution
  5. Conclusion
  6. Recommendation

4. Appropriateness of the available data.

The dataset can be found here.

5. Reading the data

## Installing the required packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(caretEnsemble)
## 
## Attaching package: 'caretEnsemble'
## The following object is masked from 'package:ggplot2':
## 
##     autoplot
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(rpart)
library(randomForest)
## randomForest 4.7-1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:psych':
## 
##     outlier
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(superml) # for label encoding
## Loading required package: R6
library(e1071) # Holds the Naive Bayes function.
library(grid)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:randomForest':
## 
##     combine
## The following object is masked from 'package:dplyr':
## 
##     combine
library(heatmaply)
## Loading required package: plotly
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Loading required package: viridis
## Loading required package: viridisLite
## Registered S3 methods overwritten by 'registry':
##   method               from 
##   print.registry_field proxy
##   print.registry_entry proxy
## 
## ======================
## Welcome to heatmaply version 1.3.0
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## You may ask questions at stackoverflow, use the r and heatmaply tags: 
##   https://stackoverflow.com/questions/tagged/heatmaply
## ======================
library(ggcorrplot)
library(cluster)
library(purrr)
library(CatEncoders, warn.conflicts = FALSE)
library(devtools)
## Loading required package: usethis
library(magrittr)
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(NbClust)


data <- read.csv("http://bit.ly/EcommerceCustomersDataset")

6. Data Understanding.

## 7.1 Previewing the head of the dataset
head(data)
##   Administrative Administrative_Duration Informational Informational_Duration
## 1              0                       0             0                      0
## 2              0                       0             0                      0
## 3              0                      -1             0                     -1
## 4              0                       0             0                      0
## 5              0                       0             0                      0
## 6              0                       0             0                      0
##   ProductRelated ProductRelated_Duration BounceRates ExitRates PageValues
## 1              1                0.000000  0.20000000 0.2000000          0
## 2              2               64.000000  0.00000000 0.1000000          0
## 3              1               -1.000000  0.20000000 0.2000000          0
## 4              2                2.666667  0.05000000 0.1400000          0
## 5             10              627.500000  0.02000000 0.0500000          0
## 6             19              154.216667  0.01578947 0.0245614          0
##   SpecialDay Month OperatingSystems Browser Region TrafficType
## 1          0   Feb                1       1      1           1
## 2          0   Feb                2       2      1           2
## 3          0   Feb                4       1      9           3
## 4          0   Feb                3       2      2           4
## 5          0   Feb                3       3      1           4
## 6          0   Feb                2       2      1           3
##         VisitorType Weekend Revenue
## 1 Returning_Visitor   FALSE   FALSE
## 2 Returning_Visitor   FALSE   FALSE
## 3 Returning_Visitor   FALSE   FALSE
## 4 Returning_Visitor   FALSE   FALSE
## 5 Returning_Visitor    TRUE   FALSE
## 6 Returning_Visitor   FALSE   FALSE
## 7.2 Previewing the tail of the dataset
tail(data)
##       Administrative Administrative_Duration Informational
## 12325              0                       0             1
## 12326              3                     145             0
## 12327              0                       0             0
## 12328              0                       0             0
## 12329              4                      75             0
## 12330              0                       0             0
##       Informational_Duration ProductRelated ProductRelated_Duration BounceRates
## 12325                      0             16                 503.000 0.000000000
## 12326                      0             53                1783.792 0.007142857
## 12327                      0              5                 465.750 0.000000000
## 12328                      0              6                 184.250 0.083333333
## 12329                      0             15                 346.000 0.000000000
## 12330                      0              3                  21.250 0.000000000
##        ExitRates PageValues SpecialDay Month OperatingSystems Browser Region
## 12325 0.03764706    0.00000          0   Nov                2       2      1
## 12326 0.02903061   12.24172          0   Dec                4       6      1
## 12327 0.02133333    0.00000          0   Nov                3       2      1
## 12328 0.08666667    0.00000          0   Nov                3       2      1
## 12329 0.02105263    0.00000          0   Nov                2       2      3
## 12330 0.06666667    0.00000          0   Nov                3       2      1
##       TrafficType       VisitorType Weekend Revenue
## 12325           1 Returning_Visitor   FALSE   FALSE
## 12326           1 Returning_Visitor    TRUE   FALSE
## 12327           8 Returning_Visitor    TRUE   FALSE
## 12328          13 Returning_Visitor    TRUE   FALSE
## 12329          11 Returning_Visitor   FALSE   FALSE
## 12330           2       New_Visitor    TRUE   FALSE
dim(data)
## [1] 12330    18
## 7.3 Checking the data types of the variables
str(data)
## 'data.frame':    12330 obs. of  18 variables:
##  $ Administrative         : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Administrative_Duration: num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ Informational          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Informational_Duration : num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ ProductRelated         : int  1 2 1 2 10 19 1 1 2 3 ...
##  $ ProductRelated_Duration: num  0 64 -1 2.67 627.5 ...
##  $ BounceRates            : num  0.2 0 0.2 0.05 0.02 ...
##  $ ExitRates              : num  0.2 0.1 0.2 0.14 0.05 ...
##  $ PageValues             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SpecialDay             : num  0 0 0 0 0 0 0.4 0 0.8 0.4 ...
##  $ Month                  : chr  "Feb" "Feb" "Feb" "Feb" ...
##  $ OperatingSystems       : int  1 2 4 3 3 2 2 1 2 2 ...
##  $ Browser                : int  1 2 1 2 3 2 4 2 2 4 ...
##  $ Region                 : int  1 1 9 2 1 1 3 1 2 1 ...
##  $ TrafficType            : int  1 2 3 4 4 3 3 5 3 2 ...
##  $ VisitorType            : chr  "Returning_Visitor" "Returning_Visitor" "Returning_Visitor" "Returning_Visitor" ...
##  $ Weekend                : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
##  $ Revenue                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
summary(data)
##  Administrative   Administrative_Duration Informational   
##  Min.   : 0.000   Min.   :  -1.00         Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.:   0.00         1st Qu.: 0.000  
##  Median : 1.000   Median :   8.00         Median : 0.000  
##  Mean   : 2.318   Mean   :  80.91         Mean   : 0.504  
##  3rd Qu.: 4.000   3rd Qu.:  93.50         3rd Qu.: 0.000  
##  Max.   :27.000   Max.   :3398.75         Max.   :24.000  
##  NA's   :14       NA's   :14              NA's   :14      
##  Informational_Duration ProductRelated   ProductRelated_Duration
##  Min.   :  -1.00        Min.   :  0.00   Min.   :   -1.0        
##  1st Qu.:   0.00        1st Qu.:  7.00   1st Qu.:  185.0        
##  Median :   0.00        Median : 18.00   Median :  599.8        
##  Mean   :  34.51        Mean   : 31.76   Mean   : 1196.0        
##  3rd Qu.:   0.00        3rd Qu.: 38.00   3rd Qu.: 1466.5        
##  Max.   :2549.38        Max.   :705.00   Max.   :63973.5        
##  NA's   :14             NA's   :14       NA's   :14             
##   BounceRates         ExitRates         PageValues        SpecialDay     
##  Min.   :0.000000   Min.   :0.00000   Min.   :  0.000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.01429   1st Qu.:  0.000   1st Qu.:0.00000  
##  Median :0.003119   Median :0.02512   Median :  0.000   Median :0.00000  
##  Mean   :0.022152   Mean   :0.04300   Mean   :  5.889   Mean   :0.06143  
##  3rd Qu.:0.016684   3rd Qu.:0.05000   3rd Qu.:  0.000   3rd Qu.:0.00000  
##  Max.   :0.200000   Max.   :0.20000   Max.   :361.764   Max.   :1.00000  
##  NA's   :14         NA's   :14                                           
##     Month           OperatingSystems    Browser           Region     
##  Length:12330       Min.   :1.000    Min.   : 1.000   Min.   :1.000  
##  Class :character   1st Qu.:2.000    1st Qu.: 2.000   1st Qu.:1.000  
##  Mode  :character   Median :2.000    Median : 2.000   Median :3.000  
##                     Mean   :2.124    Mean   : 2.357   Mean   :3.147  
##                     3rd Qu.:3.000    3rd Qu.: 2.000   3rd Qu.:4.000  
##                     Max.   :8.000    Max.   :13.000   Max.   :9.000  
##                                                                      
##   TrafficType    VisitorType         Weekend         Revenue       
##  Min.   : 1.00   Length:12330       Mode :logical   Mode :logical  
##  1st Qu.: 2.00   Class :character   FALSE:9462      FALSE:10422    
##  Median : 2.00   Mode  :character   TRUE :2868      TRUE :1908     
##  Mean   : 4.07                                                     
##  3rd Qu.: 4.00                                                     
##  Max.   :20.00                                                     
## 

7. Data Cleaning

7.1 Dealing with duplicates

duplicated_rows <- data[duplicated(data),]
count(duplicated_rows)
##     n
## 1 119

The data set has 119 duplicated data.

# Dealing with duplicates by dropping them.
new_data <- data[!duplicated(data),]
# Let's confirm the changes made 
sum(duplicated(new_data))
## [1] 0

7.2 Dealing with missing data.

# Checking for missing values
sum(is.na(data))
## [1] 112
# Dropping our missing values 
clean_data <- new_data[complete.cases(new_data),]
# Confirm changes made
colSums(is.na(clean_data))
##          Administrative Administrative_Duration           Informational 
##                       0                       0                       0 
##  Informational_Duration          ProductRelated ProductRelated_Duration 
##                       0                       0                       0 
##             BounceRates               ExitRates              PageValues 
##                       0                       0                       0 
##              SpecialDay                   Month        OperatingSystems 
##                       0                       0                       0 
##                 Browser                  Region             TrafficType 
##                       0                       0                       0 
##             VisitorType                 Weekend                 Revenue 
##                       0                       0                       0
# changing column names to lowercase
colnames(clean_data) = tolower(colnames(clean_data))
print(colnames(clean_data))
##  [1] "administrative"          "administrative_duration"
##  [3] "informational"           "informational_duration" 
##  [5] "productrelated"          "productrelated_duration"
##  [7] "bouncerates"             "exitrates"              
##  [9] "pagevalues"              "specialday"             
## [11] "month"                   "operatingsystems"       
## [13] "browser"                 "region"                 
## [15] "traffictype"             "visitortype"            
## [17] "weekend"                 "revenue"
# Changing the datatypes of some of the columns into factors
# Making a list of the columns

fact_cols = c('month', 'operatingsystems',  'browser',  'region',   'traffictype', 'visitortype')
print(fact_cols)
## [1] "month"            "operatingsystems" "browser"          "region"          
## [5] "traffictype"      "visitortype"
#Changing columns to factors
clean_data[ ,fact_cols] %<>% lapply(function(x) as.factor(as.character(x)))

# Checking whether the data types have changed
str(clean_data)
## 'data.frame':    12199 obs. of  18 variables:
##  $ administrative         : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ administrative_duration: num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ informational          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ informational_duration : num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ productrelated         : int  1 2 1 2 10 19 1 1 2 3 ...
##  $ productrelated_duration: num  0 64 -1 2.67 627.5 ...
##  $ bouncerates            : num  0.2 0 0.2 0.05 0.02 ...
##  $ exitrates              : num  0.2 0.1 0.2 0.14 0.05 ...
##  $ pagevalues             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ specialday             : num  0 0 0 0 0 0 0.4 0 0.8 0.4 ...
##  $ month                  : Factor w/ 10 levels "Aug","Dec","Feb",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ operatingsystems       : Factor w/ 8 levels "1","2","3","4",..: 1 2 4 3 3 2 2 1 2 2 ...
##  $ browser                : Factor w/ 13 levels "1","10","11",..: 1 6 1 6 7 6 8 6 6 8 ...
##  $ region                 : Factor w/ 9 levels "1","2","3","4",..: 1 1 9 2 1 1 3 1 2 1 ...
##  $ traffictype            : Factor w/ 20 levels "1","10","11",..: 1 12 14 15 15 14 14 16 14 12 ...
##  $ visitortype            : Factor w/ 3 levels "New_Visitor",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ weekend                : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
##  $ revenue                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...

8. Exploratory Data Analysis.

8.1 Univariate Analysis.

8.1.1 Numerical Variables

# using describe from the psych package gives more statistical summaries including mean, median, skew, kurtosis, min, max and variance.

describe(clean_data)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
##                         vars     n    mean      sd median trimmed    mad min
## administrative             1 12199    2.34    3.33   1.00    1.66   1.48   0
## administrative_duration    2 12199   81.68  177.53   9.00   42.87  13.34  -1
## informational              3 12199    0.51    1.28   0.00    0.18   0.00   0
## informational_duration     4 12199   34.84  141.46   0.00    3.73   0.00  -1
## productrelated             5 12199   32.06   44.60  18.00   23.06  19.27   0
## productrelated_duration    6 12199 1207.51 1919.93 609.54  832.36 745.12  -1
## bouncerates                7 12199    0.02    0.05   0.00    0.01   0.00   0
## exitrates                  8 12199    0.04    0.05   0.03    0.03   0.02   0
## pagevalues                 9 12199    5.95   18.66   0.00    1.33   0.00   0
## specialday                10 12199    0.06    0.20   0.00    0.00   0.00   0
## month*                    11 12199    6.17    2.37   7.00    6.36   1.48   1
## operatingsystems*         12 12199    2.12    0.91   2.00    2.06   0.00   1
## browser*                  13 12199    5.33    2.46   6.00    5.38   0.00   1
## region*                   14 12199    3.15    2.40   3.00    2.79   2.97   1
## traffictype*              15 12199    9.98    5.69  12.00   10.18   2.97   1
## visitortype*              16 12199    2.72    0.69   3.00    2.89   0.00   1
## weekend                   17 12199     NaN      NA     NA     NaN     NA Inf
## revenue                   18 12199     NaN      NA     NA     NaN     NA Inf
##                              max    range  skew kurtosis    se
## administrative             27.00    27.00  1.95     4.63  0.03
## administrative_duration  3398.75  3399.75  5.59    50.09  1.61
## informational              24.00    24.00  4.01    26.64  0.01
## informational_duration   2549.38  2550.38  7.54    75.45  1.28
## productrelated            705.00   705.00  4.33    31.04  0.40
## productrelated_duration 63973.52 63974.52  7.25   136.57 17.38
## bouncerates                 0.20     0.20  3.15     9.25  0.00
## exitrates                   0.20     0.20  2.23     4.62  0.00
## pagevalues                361.76   361.76  6.35    64.93  0.17
## specialday                  1.00     1.00  3.28     9.78  0.00
## month*                     10.00     9.00 -0.83    -0.37  0.02
## operatingsystems*           8.00     7.00  2.03    10.27  0.01
## browser*                   13.00    12.00 -0.53     0.11  0.02
## region*                     9.00     8.00  0.98    -0.16  0.02
## traffictype*               20.00    19.00 -0.58    -1.13  0.05
## visitortype*                3.00     2.00 -2.05     2.23  0.01
## weekend                     -Inf     -Inf    NA       NA    NA
## revenue                     -Inf     -Inf    NA       NA    NA

8.1.2 Categorical Variables

# Getting the modes
# Creating a function to get the modes
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
month.mode <- getmode(clean_data$month)
month.mode
## [1] May
## Levels: Aug Dec Feb Jul June Mar May Nov Oct Sep
operatingsystems.mode <- getmode(clean_data$operatingsystems)
operatingsystems.mode
## [1] 2
## Levels: 1 2 3 4 5 6 7 8
browser.mode <- getmode(clean_data$browser)
browser.mode
## [1] 2
## Levels: 1 10 11 12 13 2 3 4 5 6 7 8 9
region.mode <- getmode(clean_data$region)
region.mode
## [1] 1
## Levels: 1 2 3 4 5 6 7 8 9
traffictype.mode <- getmode(clean_data$traffictype)
traffictype.mode
## [1] 2
## Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
traffictype.mode <- getmode(clean_data$traffictype)
traffictype.mode
## [1] 2
## Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9
visitortype.mode <- getmode(clean_data$visitortype)
visitortype.mode
## [1] Returning_Visitor
## Levels: New_Visitor Other Returning_Visitor

8.1.3 Graphical Anlaysis

a) Boxplots

options(repr.plot.width = 7, repr.plot.height = 5)

clean_data %>%
  ggplot(aes(visitortype, productrelated, col = revenue)) + 
  geom_boxplot() + 
  labs(x = 'Visitor Type', y = 'Product Related', title = 'Box plot of product related feature per visitor type') +
  scale_color_brewer(palette = 'Set1') +
  theme(legend.position = 'top')

b) Histograms

# Plotting histograms
fac_cols = c('month', 'operatingsystems',   'browser',  'region')

columns = colnames(select(clean_data, fac_cols))
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(fac_cols)` instead of `fac_cols` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
p = list()
options(repr.plot.width = 10, repr.plot.height = 6)
for (i in 1:4){
  p[[i]] = clean_data %>%
    ggplot(aes_string(columns[i])) + 
    geom_bar(color = 'blue') +
    labs(y = 'Frequency', x = '', title = toupper(columns[i])) +
    theme(plot.title = element_text(size = 10), axis.title.y = element_text(size = 10))
}

do.call(grid.arrange,p)

8.2 Bivariate and Multivariate Analysis

8.2.1 Corelations

# Plotting a correlogram to check for correlations
options(repr.plot.width = 6, repr.plot.height = 5)

corr = round(cor(select_if(clean_data, is.numeric)), 2)
ggcorrplot(corr, hc.order = T, ggtheme = ggplot2::theme_gray,
   colors = c("red", "white", "blue"), lab = F)

8.2.2 Scatterplot

# Plotting scatter plots to check for correlations
options(repr.plot.width = 11, repr.plot.height = 5)

sc1 = ggplot(clean_data, aes(productrelated, productrelated_duration, col = revenue)) + 
    geom_point() + theme(legend.position = 'none') + 
    labs(x='Product related', y ='Product related duration')

sc2 = ggplot(clean_data, aes(administrative, administrative_duration, col = revenue)) +
    geom_point() + theme(legend.position = 'none') +
    labs(x = 'Administrative', y = 'Administrative duration')

sc3 = ggplot(clean_data, aes(informational, informational_duration, col = revenue)) + 
    geom_point() + theme(legend.position = 'none') + 
    labs(x = 'Informational', y = 'Informational duration')

sc4 = ggplot(clean_data, aes(pagevalues,    specialday  , col = revenue)) + 
    geom_point() + theme(legend.position = 'none') +
    labs(x = 'Page values', y = 'Special day')

sc5 = ggplot(clean_data, aes(exitrates, bouncerates)) + 
    geom_point(aes( col = weekend)) + theme(legend.position = 'none') +
    labs(x = 'Exit Rates', y = 'Bounce Rates')

grid.arrange(sc1, sc2, sc3, sc4, sc5, ncol = 3, nrow = 2, 
             top = textGrob("Scatter plots",gp=gpar(fontsize=14,font=3)))

9. Implimenting the solution - Unsupervised Learning

Encoding categorical columns

# Creating a copy of the cleaned dataframe
original_cleandata = data.table::copy(clean_data)

# Label encoding some of the columns
month = data.frame(model.matrix(~0+clean_data$month))
os = data.frame(model.matrix(~0+clean_data$operatingsystems))
brws = data.frame(model.matrix(~0+clean_data$browser))
rgn = data.frame(model.matrix(~0+clean_data$region))
traf = data.frame(model.matrix(~0+clean_data$traffictype))
vt = data.frame(model.matrix(~0+clean_data$visitortype))
wknd = data.frame(model.matrix(~0+clean_data$weekend))
rev = data.frame(model.matrix(~0+clean_data$revenue))

# Dropping columns which have already encoded
drop_cols = c('month', 'operatingsystems',  'browser',  'region',   'traffictype', 'visitortype', 'weekend', 'revenue')
clean_data = select(data.frame(cbind(clean_data, month, os, brws, rgn, traf, vt, wknd, rev)), -drop_cols)
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(drop_cols)` instead of `drop_cols` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

9.1 K-means

# Normalising the data
clean_data = as.data.frame(apply(clean_data, 2,  function(x) (x - min(x))/max(x) - min(x)))
# Using the elbow method to find the optimal number of clusters

fviz_nbclust(x = clean_data,FUNcluster = kmeans, method = 'wss' )

# Performing clustering with the optimal number of clusters
kmeans_res = kmeans(clean_data, 4)

# Checking the cluster centers of each variable
kmeans_res$centers
##   administrative administrative_duration informational informational_duration
## 1     0.07270233                1.020702    0.01725440               1.011265
## 2     0.07995737                1.024129    0.01959335               1.011295
## 3     0.08290721                1.022533    0.01901496               1.012762
## 4     0.11212134                1.030820    0.02994840               1.021359
##   productrelated productrelated_duration bouncerates exitrates  pagevalues
## 1     0.04010250                1.016415  0.12579521 0.2401570 0.011538823
## 2     0.03412530                1.013790  0.12227867 0.2269629 0.014650105
## 3     0.04540390                1.019068  0.10500293 0.2177918 0.005709942
## 4     0.06122086                1.025689  0.05694488 0.1419438 0.040207990
##    specialday clean_data.monthAug clean_data.monthDec clean_data.monthFeb
## 1 0.230102443          0.00000000           0.0000000         0.000000000
## 2 0.052739456          0.03744580           0.1474182         0.026409145
## 3 0.004715484          0.04806166           0.2106098         0.021990478
## 4 0.006413564          0.04644305           0.1485440         0.006634722
##   clean_data.monthJul clean_data.monthJune clean_data.monthMar
## 1          0.00000000           0.00000000           0.0000000
## 2          0.03547497           0.02286165           0.1454474
## 3          0.04896849           0.04012696           0.2326003
## 4          0.04644305           0.01842978           0.1688168
##   clean_data.monthMay clean_data.monthNov clean_data.monthOct
## 1          1.00000000           0.0000000          0.00000000
## 2          0.24477730           0.2510840          0.05873078
## 3          0.00000000           0.2885967          0.05486284
## 4          0.06229266           0.3955031          0.05823811
##   clean_data.monthSep clean_data.operatingsystems1 clean_data.operatingsystems2
## 1          0.00000000                   0.02127660                    0.6970055
## 2          0.03035081                   0.88884509                    0.0000000
## 3          0.05418273                   0.02947178                    0.6880526
## 4          0.04865463                   0.04017693                    0.6384077
##   clean_data.operatingsystems3 clean_data.operatingsystems4
## 1                    0.2592593                   0.01930654
## 2                    0.0000000                   0.10642491
## 3                    0.2482430                   0.02131036
## 4                    0.2863988                   0.02395872
##   clean_data.operatingsystems5 clean_data.operatingsystems6
## 1                 0.0000000000                  0.002758077
## 2                 0.0000000000                  0.000000000
## 3                 0.0009068238                  0.001813648
## 4                 0.0007371913                  0.001474383
##   clean_data.operatingsystems7 clean_data.operatingsystems8 clean_data.browser1
## 1                 0.0000000000                  0.000394011        0.0011820331
## 2                 0.0023649980                  0.002364998        0.9456050453
## 3                 0.0000000000                  0.010201768        0.0004534119
## 4                 0.0003685957                  0.008477700        0.0081091043
##   clean_data.browser10 clean_data.browser11 clean_data.browser12
## 1          0.019306541         0.0000000000          0.000394011
## 2          0.001182499         0.0000000000          0.000000000
## 3          0.015416005         0.0009068238          0.001133530
## 4          0.015849613         0.0007371913          0.001474383
##   clean_data.browser13 clean_data.browser2 clean_data.browser3
## 1         0.0000000000           0.8006304          0.01339638
## 2         0.0003941663           0.0000000          0.00000000
## 3         0.0086148266           0.8213557          0.00952165
## 4         0.0062661261           0.8193881          0.01068927
##   clean_data.browser4 clean_data.browser5 clean_data.browser6
## 1        0.0839243499         0.051615445         0.023640662
## 2        0.0007883327         0.001970832         0.001182499
## 3        0.0766266153         0.043754251         0.017456359
## 4        0.0652414302         0.050497604         0.012532252
##   clean_data.browser7 clean_data.browser8 clean_data.browser9
## 1         0.005516154         0.000394011         0.000000000
## 2         0.000000000         0.048876626         0.000000000
## 3         0.004534119         0.000000000         0.000226706
## 4         0.005528935         0.003685957         0.000000000
##   clean_data.region1 clean_data.region2 clean_data.region3 clean_data.region4
## 1          0.3451537         0.10874704          0.2001576         0.10598897
## 2          0.4174222         0.08356326          0.2104848         0.10642491
## 3          0.3840399         0.08864203          0.1897529         0.08682838
## 4          0.3988205         0.09141172          0.1854036         0.09067453
##   clean_data.region5 clean_data.region6 clean_data.region7 clean_data.region8
## 1         0.03230890         0.07919622         0.06422380         0.03703704
## 2         0.01497832         0.06109578         0.03862830         0.03941663
## 3         0.02697801         0.06461120         0.07073226         0.03377919
## 4         0.02875046         0.05860671         0.06819020         0.03243642
##   clean_data.region9 clean_data.traffictype1 clean_data.traffictype10
## 1         0.02718676               0.1256895               0.00000000
## 2         0.02798581               0.1545132               0.04454080
## 3         0.05463614               0.2666062               0.04828837
## 4         0.04570586               0.1828234               0.04570586
##   clean_data.traffictype11 clean_data.traffictype12 clean_data.traffictype13
## 1               0.02482270              0.000000000              0.118991332
## 2               0.01773749              0.000000000              0.001182499
## 3               0.01314895              0.000226706              0.070958966
## 4               0.02985625              0.000000000              0.040545522
##   clean_data.traffictype14 clean_data.traffictype15 clean_data.traffictype16
## 1             0.0031520883             0.0035460993             0.0007880221
## 2             0.0000000000             0.0094599921             0.0000000000
## 3             0.0004534119             0.0002267060             0.0000000000
## 4             0.0011057870             0.0007371913             0.0003685957
##   clean_data.traffictype17 clean_data.traffictype18 clean_data.traffictype19
## 1             0.0000000000               0.00394011             0.0047281324
## 2             0.0003941663               0.00000000             0.0011824990
## 3             0.0000000000               0.00000000             0.0000000000
## 4             0.0000000000               0.00000000             0.0007371913
##   clean_data.traffictype2 clean_data.traffictype20 clean_data.traffictype3
## 1               0.1863672              0.002758077              0.19897557
## 2               0.3370122              0.010248325              0.24674813
## 3               0.3205622              0.023577420              0.14169123
## 4               0.4294139              0.020641356              0.09620346
##   clean_data.traffictype4 clean_data.traffictype5 clean_data.traffictype6
## 1              0.21197794              0.02482270              0.08707644
## 2              0.09026409              0.02404415              0.01694915
## 3              0.03922013              0.01790977              0.02448424
## 4              0.04644305              0.02100995              0.02617029
##   clean_data.traffictype7 clean_data.traffictype8 clean_data.traffictype9
## 1             0.002364066              0.00000000             0.000000000
## 2             0.001970832              0.03586914             0.007883327
## 3             0.002947178              0.02947178             0.000226706
## 4             0.005897530              0.04496867             0.007371913
##   clean_data.visitortypeNew_Visitor clean_data.visitortypeOther
## 1                        0.07052797                 0.000000000
## 2                        0.15727237                 0.002364998
## 3                        0.12446157                 0.012015416
## 4                        0.20862514                 0.008109104
##   clean_data.visitortypeReturning_Visitor clean_data.weekendFALSE
## 1                               0.9294720               0.8494878
## 2                               0.8403626               0.7185652
## 3                               0.8635230               1.0000000
## 4                               0.7832658               0.3512717
##   clean_data.weekendTRUE clean_data.revenueFALSE clean_data.revenueTRUE
## 1              0.1505122               0.9208038             0.07919622
## 2              0.2814348               0.8588885             0.14111155
## 3              0.0000000               1.0000000             0.00000000
## 4              0.6487283               0.5027645             0.49723553
# Visualising the clusters of the whole dataset
options(repr.plot.width = 11, repr.plot.height = 6)
fviz_cluster(kmeans_res, clean_data)

# determining k using the silhouette method
fviz_nbclust(x = clean_data,FUNcluster = kmeans, method = 'silhouette' )

# using gap statistic
set.seed(42)
clust_gap <- clusGap(x = clean_data, FUN = kmeans, K.max = 15, nstart = 25,
                    B = 5)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 609950)
## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations

## Warning: did not converge in 10 iterations
fviz_gap_stat(clust_gap)

9.2 Hierarchical Clustering

d <- dist(clean_data, method="euclidean")
# Clustering  algorithm deployment
model <- hclust(d, method="ward.D2")
# viewing the dendogram
plot(model, cex=0.6, hang=-1)

# Ward's method
hc <- hclust(d, method="ward.D2")
# cut the tree into 5 parts
sub_grp <- cutree(hc, k=4)
table(sub_grp)
## sub_grp
##    1    2    3    4 
## 2899 4100 2844 2356
plot(hc, cex=2, hang=-1 )
rect.hclust(hc, k=4, border=2:5)

10. Conclusion.

K-means has more elaborate clusters as compared to hierarchical. It should therefore be considered when performing clustering.