1. Research question

Kira Plastinina (Links to an external site.) is a Russian brand that is sold through a defunct chain of retail stores in Russia, Ukraine, Kazakhstan, Belarus, China, Philippines, and Armenia. The brand’s Sales and Marketing team would like to understand their customer’s behavior from data that they have collected over the past year. More specifically, they would like to learn the characteristics of customer groups.

2. Success criteria

Identify customer behaviour and more specifically learn the characteristics of customer groups. This shall be achieved by;

  1. Performing clustering stating insights drawn from the analysis and visualizations.

  2. Providing comparisons between K-Means clustering vs Hierarchical clustering highlighting the strengths and limitations of each approach in the context of the analysis.

3. Research Methodology

4. Understanding the data provided

5.

Loading libraries

# Loading the relevant libraries for this study
library(stringr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
library(countrycode)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble  3.1.7     ✔ purrr   0.3.4
## ✔ tidyr   1.2.0     ✔ forcats 0.5.1
## ✔ readr   2.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(moments)
library(paletteer)
library(Amelia,Rcpp)
## Loading required package: Rcpp
## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.0, built: 2021-05-26)
## ## Copyright (C) 2005-2022 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(animation)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(cluster) # clustering algorithms 
library(factoextra) # clustering algorithms & visualization
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(caret)
## 
## Attaching package: 'caret'
## The following object is masked from 'package:survival':
## 
##     cluster
## The following object is masked from 'package:purrr':
## 
##     lift
library(ISLR) # for college dataset
library(Rtsne) # for t-SNE plot

loading dataset

# laoding customer dataset
#
customer_dataset <- read.csv("http://bit.ly/EcommerceCustomersDataset")

Dataset verification

Due to lack of comparable data this dataset shall be assumed to be valid and relevant for this study

Previewing dataset

# previewing first six records of customer dataset
#
head(customer_dataset)
##   Administrative Administrative_Duration Informational Informational_Duration
## 1              0                       0             0                      0
## 2              0                       0             0                      0
## 3              0                      -1             0                     -1
## 4              0                       0             0                      0
## 5              0                       0             0                      0
## 6              0                       0             0                      0
##   ProductRelated ProductRelated_Duration BounceRates ExitRates PageValues
## 1              1                0.000000  0.20000000 0.2000000          0
## 2              2               64.000000  0.00000000 0.1000000          0
## 3              1               -1.000000  0.20000000 0.2000000          0
## 4              2                2.666667  0.05000000 0.1400000          0
## 5             10              627.500000  0.02000000 0.0500000          0
## 6             19              154.216667  0.01578947 0.0245614          0
##   SpecialDay Month OperatingSystems Browser Region TrafficType
## 1          0   Feb                1       1      1           1
## 2          0   Feb                2       2      1           2
## 3          0   Feb                4       1      9           3
## 4          0   Feb                3       2      2           4
## 5          0   Feb                3       3      1           4
## 6          0   Feb                2       2      1           3
##         VisitorType Weekend Revenue
## 1 Returning_Visitor   FALSE   FALSE
## 2 Returning_Visitor   FALSE   FALSE
## 3 Returning_Visitor   FALSE   FALSE
## 4 Returning_Visitor   FALSE   FALSE
## 5 Returning_Visitor    TRUE   FALSE
## 6 Returning_Visitor   FALSE   FALSE
# previewing the characteristics of the data set
#
str(customer_dataset)
## 'data.frame':    12330 obs. of  18 variables:
##  $ Administrative         : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Administrative_Duration: num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ Informational          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Informational_Duration : num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ ProductRelated         : int  1 2 1 2 10 19 1 1 2 3 ...
##  $ ProductRelated_Duration: num  0 64 -1 2.67 627.5 ...
##  $ BounceRates            : num  0.2 0 0.2 0.05 0.02 ...
##  $ ExitRates              : num  0.2 0.1 0.2 0.14 0.05 ...
##  $ PageValues             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SpecialDay             : num  0 0 0 0 0 0 0.4 0 0.8 0.4 ...
##  $ Month                  : chr  "Feb" "Feb" "Feb" "Feb" ...
##  $ OperatingSystems       : int  1 2 4 3 3 2 2 1 2 2 ...
##  $ Browser                : int  1 2 1 2 3 2 4 2 2 4 ...
##  $ Region                 : int  1 1 9 2 1 1 3 1 2 1 ...
##  $ TrafficType            : int  1 2 3 4 4 3 3 5 3 2 ...
##  $ VisitorType            : chr  "Returning_Visitor" "Returning_Visitor" "Returning_Visitor" "Returning_Visitor" ...
##  $ Weekend                : logi  FALSE FALSE FALSE FALSE TRUE FALSE ...
##  $ Revenue                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...

The data set has 10 continuous variables and 8 categorical variables. The categorical variables are; (“Month”, “OperatingSystems”, “Browser”, “Region”, “TrafficType”, “VisitorType”, “Weekend”, and “Revenue”)

These columns shall be converted to factors for anlysis in order to store the categorical variables

# Establish the data set class
#
class(customer_dataset)
## [1] "data.frame"

The data set is a data frame

# view the number of rows and columns in the dataset
#
dim(customer_dataset)
## [1] 12330    18

The dataset has 18 columns and 12330 records

Cleaning dataset

Data validity

As initially stated the categorical variables; (“Month”, “OperatingSystems”, “Browser”, “Region”, “TrafficType”, “VisitorType”, “Weekend”, and “Revenue”) shall be converted to factors.

# Converting categorical variables to factors
# Specifying columns
#
cols <- c("Month", "OperatingSystems", 
"Browser", "Region", "TrafficType", "VisitorType", "Weekend", "Revenue")

# Conversion
#
customer_dataset[cols] <- lapply(customer_dataset[cols], factor)

# Previewing the characteristics of the converted variables
#
str(customer_dataset)
## 'data.frame':    12330 obs. of  18 variables:
##  $ Administrative         : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ Administrative_Duration: num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ Informational          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Informational_Duration : num  0 0 -1 0 0 0 -1 -1 0 0 ...
##  $ ProductRelated         : int  1 2 1 2 10 19 1 1 2 3 ...
##  $ ProductRelated_Duration: num  0 64 -1 2.67 627.5 ...
##  $ BounceRates            : num  0.2 0 0.2 0.05 0.02 ...
##  $ ExitRates              : num  0.2 0.1 0.2 0.14 0.05 ...
##  $ PageValues             : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ SpecialDay             : num  0 0 0 0 0 0 0.4 0 0.8 0.4 ...
##  $ Month                  : Factor w/ 10 levels "Aug","Dec","Feb",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ OperatingSystems       : Factor w/ 8 levels "1","2","3","4",..: 1 2 4 3 3 2 2 1 2 2 ...
##  $ Browser                : Factor w/ 13 levels "1","2","3","4",..: 1 2 1 2 3 2 4 2 2 4 ...
##  $ Region                 : Factor w/ 9 levels "1","2","3","4",..: 1 1 9 2 1 1 3 1 2 1 ...
##  $ TrafficType            : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 4 3 3 5 3 2 ...
##  $ VisitorType            : Factor w/ 3 levels "New_Visitor",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Weekend                : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 2 1 1 2 1 1 ...
##  $ Revenue                : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...

Al columns have appropiate datatypes. All variables shall be renamed to lowercase.

# Renaming the variables of the customer dataset
#
names(customer_dataset) <- c("administrative", "administrative_duration", 
                             "informational",   "informational_duration",   
                             "productrelated", "productrelated_duration",   
                             "bouncerates", "exitrates",    "pagevalues",   
                             "SpecialDay", "month", "operatingsystems", 
                             "browser", "region", "traffictype", "visitortype",
                             "weekend", "revenue")

# Preview column names of data set variables
#
colnames(customer_dataset)
##  [1] "administrative"          "administrative_duration"
##  [3] "informational"           "informational_duration" 
##  [5] "productrelated"          "productrelated_duration"
##  [7] "bouncerates"             "exitrates"              
##  [9] "pagevalues"              "SpecialDay"             
## [11] "month"                   "operatingsystems"       
## [13] "browser"                 "region"                 
## [15] "traffictype"             "visitortype"            
## [17] "weekend"                 "revenue"

All variables have been appropriately renamed

Missing values

# Checking the number of missing values per column in the data set
#
colSums(is.na(customer_dataset))
##          administrative administrative_duration           informational 
##                      14                      14                      14 
##  informational_duration          productrelated productrelated_duration 
##                      14                      14                      14 
##             bouncerates               exitrates              pagevalues 
##                      14                      14                       0 
##              SpecialDay                   month        operatingsystems 
##                       0                       0                       0 
##                 browser                  region             traffictype 
##                       0                       0                       0 
##             visitortype                 weekend                 revenue 
##                       0                       0                       0

The administrative, administrative_duration, informational, informational_duration, product_related, product_related_duration, bounce_rates, exit_rates variables have 14 missing values each. This missing values explored first

# checking missing data visualization
# 
missmap(customer_dataset)

From the missing data plot, the number of missing values seems insignificant relative to the records in data set hence shall be omitted

# Dropping missing values in customer dataset.
#
customer_dataset <- na.omit(customer_dataset)

# Checking for any remaining null values
#
colSums(is.na(customer_dataset))
##          administrative administrative_duration           informational 
##                       0                       0                       0 
##  informational_duration          productrelated productrelated_duration 
##                       0                       0                       0 
##             bouncerates               exitrates              pagevalues 
##                       0                       0                       0 
##              SpecialDay                   month        operatingsystems 
##                       0                       0                       0 
##                 browser                  region             traffictype 
##                       0                       0                       0 
##             visitortype                 weekend                 revenue 
##                       0                       0                       0

All missing values have been dropped

Duplicate values

# Checking for duplicate values in the customer data set
#
#
duplicated_rows = customer_dataset[duplicated(customer_dataset),]

# Printing out the duplicated rows
head(duplicated_rows)
##     administrative administrative_duration informational informational_duration
## 159              0                       0             0                      0
## 179              0                       0             0                      0
## 419              0                       0             0                      0
## 457              0                       0             0                      0
## 484              0                       0             0                      0
## 513              0                       0             0                      0
##     productrelated productrelated_duration bouncerates exitrates pagevalues
## 159              1                       0         0.2       0.2          0
## 179              1                       0         0.2       0.2          0
## 419              1                       0         0.2       0.2          0
## 457              1                       0         0.2       0.2          0
## 484              1                       0         0.2       0.2          0
## 513              1                       0         0.2       0.2          0
##     SpecialDay month operatingsystems browser region traffictype
## 159          0   Feb                1       1      1           3
## 179          0   Feb                3       2      3           3
## 419          0   Mar                1       1      1           1
## 457          0   Mar                2       2      4           1
## 484          0   Mar                3       2      3           1
## 513          0   Mar                2       2      1           1
##           visitortype weekend revenue
## 159 Returning_Visitor   FALSE   FALSE
## 179 Returning_Visitor   FALSE   FALSE
## 419 Returning_Visitor    TRUE   FALSE
## 457 Returning_Visitor   FALSE   FALSE
## 484 Returning_Visitor   FALSE   FALSE
## 513 Returning_Visitor   FALSE   FALSE

The data set has 117 duplicate records. The duplicate records shall be dropped

# Dropping duplicate records
#
customer_dataset1 <- customer_dataset[!duplicated(customer_dataset), ]

# Checking for any remaining duplicate records
#
customer_dataset1[duplicated(customer_dataset1),]
##  [1] administrative          administrative_duration informational          
##  [4] informational_duration  productrelated          productrelated_duration
##  [7] bouncerates             exitrates               pagevalues             
## [10] SpecialDay              month                   operatingsystems       
## [13] browser                 region                  traffictype            
## [16] visitortype             weekend                 revenue                
## <0 rows> (or 0-length row.names)

All duplicate records have been successfully dropped

Checking for outliers in the numeric variables

# number of rows in data frame
#
num_rows = nrow(customer_dataset1)
  
# creating ID column vector
#
ID <- c(1:num_rows)
 
# binding id column to the data frame
#
customer_dataset2 <- cbind(ID , customer_dataset1)
# Applying names function to get column names from numeric columns in dataset
# as a list
#
colnames <- names(select_if(customer_dataset2, is.numeric))   

# Print vector of column names
#
colnames                      
##  [1] "ID"                      "administrative"         
##  [3] "administrative_duration" "informational"          
##  [5] "informational_duration"  "productrelated"         
##  [7] "productrelated_duration" "bouncerates"            
##  [9] "exitrates"               "pagevalues"             
## [11] "SpecialDay"
# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
                  measure.vars=c("administrative_duration" ,
                                 "informational_duration",
                                 "pagevalues"))

# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
  
# printing the plot
#
print(p)

# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
                  measure.vars=c("productrelated_duration"))

# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
  
# printing the plot
#
print(p)

# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
                  measure.vars=c("administrative", "informational"))

# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
  
# printing the plot
#
print(p)

# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
                  measure.vars=c("productrelated"))

# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
  
# printing the plot
#
print(p)

# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
                  measure.vars=c("bouncerates","SpecialDay"))

# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
  
# printing the plot
#
print(p)

# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
                  measure.vars=c("exitrates"))

# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
  
# printing the plot
#
print(p)

All numeric columns seem to have outliers and are not normally distributed. These shall be maintained and investigated further subsequently.

Feature engineering

No additional variables that can aid the analysis can be derived from this data set.

Sampling

The data set has over 12000 records. This will computationally lengthy when computing the kmean and hierarchical clustering algorithms. A random sample of 3000 of the records will be selected. This is sufficient to meet objectives of our study

# selecting 3000 records randomly from the customer dataset
#
customer_dataset1 <- customer_dataset1[sample(nrow(customer_dataset1), 3000), ]

Univariate analysis

Non_numeric data

# Selecting non numeric columns in the ad data set
#
non_num <- customer_dataset1 %>% select_if(negate(is.numeric))

# Previewing first six records of non_numeric columns in data frame
#
head(non_num)
##       month operatingsystems browser region traffictype       visitortype
## 12046   Dec                1       8      1          10 Returning_Visitor
## 1969    Mar                1       1      1           1 Returning_Visitor
## 9329    Dec                2       2      1           1 Returning_Visitor
## 10233   Nov                3       2      1          10 Returning_Visitor
## 11320   Dec                3       2      9          13 Returning_Visitor
## 3863    May                2       2      4           3 Returning_Visitor
##       weekend revenue
## 12046   FALSE   FALSE
## 1969    FALSE    TRUE
## 9329    FALSE   FALSE
## 10233    TRUE   FALSE
## 11320   FALSE   FALSE
## 3863    FALSE   FALSE
# Finding unique values of the non_numeric columns
#
rapply(non_num,function(x)length(unique(x)))
##            month operatingsystems          browser           region 
##               10                8               12                9 
##      traffictype      visitortype          weekend          revenue 
##               17                3                2                2

The data was collected over 10 months. There were a total of 8 Operating systems in the data set. 13 different browser types are captured in the data set. The respondents come from a total of 9 regions. The traffic type has a total of 20 classes. The visitor type has a total of 3 classes. Weekend has 2 unique variable same to revenue

Numerical Data

The distribution of the data (mean, mode, median , skew) shall be computed for the numeriv variables

# Creating data set with numeric variables only

# Identifying the numeric class in the data and evaluating if there are any
# outliers
#
num_cols <- unlist(lapply(customer_dataset1, is.numeric)) 

# Subset numeric columns of data
#
num_dataset <- customer_dataset1[ , num_cols]

# Printing the subset to RStudio console
#
head(num_dataset)
##       administrative administrative_duration informational
## 12046              5                200.8333             0
## 1969               0                  0.0000             0
## 9329               0                  0.0000             0
## 10233              1                  3.0000             0
## 11320              5                106.0000             0
## 3863               0                  0.0000             0
##       informational_duration productrelated productrelated_duration bouncerates
## 12046                      0             44                1909.081 0.004761905
## 1969                       0             32                 999.000 0.000000000
## 9329                       0              9                  82.500 0.000000000
## 10233                      0             11                1046.667 0.005555556
## 11320                      0             44                 646.825 0.007407407
## 3863                       0            141                2886.524 0.004316547
##        exitrates pagevalues SpecialDay
## 12046 0.01825397   0.000000        0.0
## 1969  0.01344086   9.053082        0.0
## 9329  0.03333333   0.000000        0.0
## 10233 0.02500000   0.000000        0.0
## 11320 0.02885185   0.000000        0.0
## 3863  0.01879809   0.000000        0.8
# Creating the mode function that will perform our mode operation for us
# ---
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Computing some descriptive statistics
# ---
# 
desc_stats <- data.frame(
  Mode = apply(num_dataset, 2, getmode), # Mode
  Med = apply(num_dataset, 2, median), # median
  Mean = apply(num_dataset, 2, mean),  # mean
  SD = apply(num_dataset, 2, sd),      # Standard deviation
  Var = apply(num_dataset, 2, var),     # Variance
  Min = apply(num_dataset, 2, min),     # minimum
  Max = apply(num_dataset, 2, max),      # Maximum
  skewness = skewness(num_dataset),      # skewness
  kurtosis = kurtosis(num_dataset)      # kurtosis
)
desc_stats <- round(desc_stats, 2)
desc_stats
##                         Mode    Med    Mean      SD        Var Min      Max
## administrative           0.0   1.00    2.40    3.37      11.36   0    24.00
## administrative_duration  0.0  11.00   85.70  176.51   31156.09  -1  1922.00
## informational            0.0   0.00    0.52    1.25       1.56   0    12.00
## informational_duration   0.0   0.00   34.59  135.98   18491.59  -1  2195.30
## productrelated           1.0  19.00   31.53   40.52    1641.79   0   397.00
## productrelated_duration  0.0 620.85 1182.63 1664.59 2770844.67  -1 16093.31
## bouncerates              0.0   0.00    0.02    0.04       0.00   0     0.20
## exitrates                0.2   0.03    0.04    0.04       0.00   0     0.20
## pagevalues               0.0   0.00    6.15   18.41     339.06   0   255.57
## SpecialDay               0.0   0.00    0.06    0.19       0.04   0     1.00
##                         skewness kurtosis
## administrative              1.83     6.80
## administrative_duration     4.58    32.28
## informational               3.39    18.15
## informational_duration      7.18    70.84
## productrelated              3.38    19.41
## productrelated_duration     3.26    17.87
## bouncerates                 3.27    13.23
## exitrates                   2.27     7.90
## pagevalues                  5.34    45.25
## SpecialDay                  3.51    14.44

Most variables had a zero values. All the variables have a positive skew(skewed to the right). The standard deviations relative to the mean of the variables indicate the there is a high variation in the data in the various variables in the data set. The variables also have large kurtosis values. The variables in the data are heavy-tailed or light-tailed relative to a normal distribution

Graphicals

# Histogram plots of numeric data in the ad_dataset
hist.data.frame(num_dataset)

The histogram plots affirm the initial observation made from the kurtosis values that the variables have heavy tails. The varaiables have a large skew to the left.

# Bar chart of the genders in data set
ggplot(customer_dataset1, aes(x = revenue)) +
    geom_bar(fill = "coral") +
    theme_classic()

# Bar chart of the individuals who clicked and those who did not click on ad 
ggplot(customer_dataset1, aes(x = weekend)) +
    geom_bar(fill = "coral") +
    theme_classic()

# Bar chart of the months the data was collected
ggplot(customer_dataset1, aes(x = visitortype)) +
    geom_bar(fill = "coral") +
    theme_classic()

# Bar chart of the hours the data was collected
ggplot(customer_dataset1, aes(x = traffictype)) +
    geom_bar(fill = "coral") +
    theme_classic()

# Bar chart of the weekdays the data was collected
ggplot(customer_dataset1, aes(x = region)) +
    geom_bar(fill = "coral") +
    theme_classic()

# Bar chart of the continent the data was collected
ggplot(customer_dataset1, aes(x = browser)) +
    geom_bar(fill = "coral") +
    theme_classic()

# Bar chart of the continent the data was collected
ggplot(customer_dataset1, aes(x = operatingsystems)) +
    geom_bar(fill = "coral") +
    theme_classic()

# Bar chart of the continent the data was collected
ggplot(customer_dataset1, aes(x = month)) +
    geom_bar(fill = "coral") +
    theme_classic()

From the bar plots;

The sites were visited the most on May and November. The users that accessed the site used operating system 2 and browser 2 the most. Most users originated from region 1. Traffic type two was the most observed traffic type in the data set. Most of the traffic comprised of returning visitors. Most of the traffic was noted during the work days. Most clients that visited the site ended up not buying anything no revenue

Bivariate analysis

Covariance

Covariance is a statistical representation of the degree to which two variables vary together.Here the relationship between the different numerical data in data Frame shall be calculated

# Create Covariance matrix of the numerical data in dataset
#
cov(num_dataset)
##                         administrative administrative_duration informational
## administrative             11.35650539              351.765339   1.450904190
## administrative_duration   351.76533853            31156.088527  56.320491489
## informational               1.45090419               56.320491   1.558049906
## informational_duration    115.64431522             3912.172799 105.385490236
## productrelated             54.86522885             1625.309888  18.303052128
## productrelated_duration  2023.75434407            70208.972496 787.573179512
## bouncerates                -0.03129744               -1.090037  -0.006156577
## exitrates                  -0.04713444               -1.663237  -0.009349887
## pagevalues                  6.16978898              193.234729   1.234833054
## SpecialDay                 -0.05497228               -2.420281  -0.011962210
##                         informational_duration productrelated
## administrative                     115.6443152     54.8652289
## administrative_duration           3912.1727985   1625.3098876
## informational                      105.3854902     18.3030521
## informational_duration           18491.5945284   1283.3141078
## productrelated                    1283.3141078   1641.7892880
## productrelated_duration          59147.7470705  58746.2312691
## bouncerates                         -0.4182875     -0.3610840
## exitrates                           -0.6577976     -0.5465468
## pagevalues                          59.3351531     44.0049514
## SpecialDay                          -0.3912316     -0.2272413
##                         productrelated_duration   bouncerates     exitrates
## administrative                       2023.75434 -3.129744e-02 -4.713444e-02
## administrative_duration             70208.97250 -1.090037e+00 -1.663237e+00
## informational                         787.57318 -6.156577e-03 -9.349887e-03
## informational_duration              59147.74707 -4.182875e-01 -6.577976e-01
## productrelated                      58746.23127 -3.610840e-01 -5.465468e-01
## productrelated_duration           2770844.66658 -1.372450e+01 -1.983652e+01
## bouncerates                           -13.72450  1.897513e-03  1.757263e-03
## exitrates                             -19.83652  1.757263e-03  2.015653e-03
## pagevalues                           1743.35272 -9.927698e-02 -1.501428e-01
## SpecialDay                            -11.58330  6.542691e-04  8.931951e-04
##                            pagevalues    SpecialDay
## administrative             6.16978898 -5.497228e-02
## administrative_duration  193.23472891 -2.420281e+00
## informational              1.23483305 -1.196221e-02
## informational_duration    59.33515314 -3.912316e-01
## productrelated            44.00495143 -2.272413e-01
## productrelated_duration 1743.35271539 -1.158330e+01
## bouncerates               -0.09927698  6.542691e-04
## exitrates                 -0.15014278  8.931951e-04
## pagevalues               339.05661266 -1.930041e-01
## SpecialDay                -0.19300406  3.692296e-02

All variables aside from bounce rates, exit rates and special day had positive covaraiance amongst each othe

Correlation

# Correlation matrix of numerical data in the customer data set
#
cor(num_dataset)
##                         administrative administrative_duration informational
## administrative              1.00000000              0.59136991    0.34492581
## administrative_duration     0.59136991              1.00000000    0.25562557
## informational               0.34492581              0.25562557    1.00000000
## informational_duration      0.25235665              0.16298941    0.62087326
## productrelated              0.40180573              0.22725111    0.36188795
## productrelated_duration     0.36076911              0.23895442    0.37904794
## bouncerates                -0.21320337             -0.14176775   -0.11322864
## exitrates                  -0.31153595             -0.20988184   -0.16684293
## pagevalues                  0.09942872              0.05945352    0.05372562
## SpecialDay                 -0.08489325             -0.07135855   -0.04987380
##                         informational_duration productrelated
## administrative                      0.25235665     0.40180573
## administrative_duration             0.16298941     0.22725111
## informational                       0.62087326     0.36188795
## informational_duration              1.00000000     0.23290943
## productrelated                      0.23290943     1.00000000
## productrelated_duration             0.26130333     0.87099410
## bouncerates                        -0.07061474    -0.20457720
## exitrates                          -0.10774504    -0.30044210
## pagevalues                          0.02369675     0.05898027
## SpecialDay                         -0.01497264    -0.02918638
##                         productrelated_duration bouncerates  exitrates
## administrative                       0.36076911 -0.21320337 -0.3115360
## administrative_duration              0.23895442 -0.14176775 -0.2098818
## informational                        0.37904794 -0.11322864 -0.1668429
## informational_duration               0.26130333 -0.07061474 -0.1077450
## productrelated                       0.87099410 -0.20457720 -0.3004421
## productrelated_duration              1.00000000 -0.18927716 -0.2654309
## bouncerates                         -0.18927716  1.00000000  0.8985384
## exitrates                           -0.26543094  0.89853843  1.0000000
## pagevalues                           0.05687784 -0.12377134 -0.1816187
## SpecialDay                          -0.03621412  0.07816563  0.1035357
##                          pagevalues  SpecialDay
## administrative           0.09942872 -0.08489325
## administrative_duration  0.05945352 -0.07135855
## informational            0.05372562 -0.04987380
## informational_duration   0.02369675 -0.01497264
## productrelated           0.05898027 -0.02918638
## productrelated_duration  0.05687784 -0.03621412
## bouncerates             -0.12377134  0.07816563
## exitrates               -0.18161868  0.10353573
## pagevalues               1.00000000 -0.05454841
## SpecialDay              -0.05454841  1.00000000

From the correlation values, the variables administrative duration and administrative, informational duration and informational had good correlations (>0.6)

Bivariate graphicals

# pair plot of variables with numeric data
#
pairs(num_dataset,                     # Data frame of variables
      col = 'blue',                    # Modify color
      labels = colnames(num_dataset),  # Variable names
      pch = 21,                 # Pch symbol
      main = "Customer dataset",    # Title of the plot
      row1attop = TRUE,         # If FALSE, changes the direction of the diagonal
      gap = 1,                  # Distance between subplots
      cex.labels = NULL,        # Size of the diagonal text
      font.labels = 1)          # Font style of the diagonal text

From the pair plots above, the special day variable shows no kind of relationship with the other numeric variables. This shall not be investigated any further in multivariate analysis section.

The exit rates and bounce rates had a positive linear correlation. However, no discernible relationship was observed between these two an the other numeric variables

# A bar plot of weekend data labelled with revenue data
#
ggplot(customer_dataset1, aes(x = weekend, fill = revenue)) +
    geom_bar(position = position_dodge()) +
    theme_classic()

For weekends and weekdays, the more visits to the site ended up with no revenue being received.

# Bar chart side by side of visitor type to whether revenue was 
# received or not
#
ggplot(customer_dataset1, aes(x = visitortype, fill = revenue)) +
    geom_bar(position = position_dodge()) +
    theme_classic()

Returning visitors, new visitors abd other type of visitor most of the time did not end up spending after visiting the site

# Bar chart side by side of browser and revenue
#
ggplot(customer_dataset1, aes(x = browser, fill = revenue)) +
    geom_bar(position = position_dodge()) +
    theme_classic()

Individuals that accessed the site via browser type2 ended up earning the company the most revenue relative to other browsers.

# Bar chart side by side of region comparing revenue
#
ggplot(customer_dataset1, aes(x = region, fill = revenue)) +
    geom_bar(position = position_dodge()) +
    theme_classic()

Individuals that accessed the site from region one ended up netting revenue to the company on a lot more occasions relative to the other regions.

# Bar chart side by side of month comparing revenue
#
ggplot(customer_dataset1, aes(x = month, fill = revenue)) +
    geom_bar(position = position_dodge()) +
    theme_classic()

The company recorded the highest revenue returns per returns per site visit during the month of November. During the month of February, almost no site visit ended up with the customer spending.

Encoding

Here the categorical variables are one hot encoded.

# define one-hot encoding function
#
dummy <- dummyVars(" ~ .", data=customer_dataset1[, -18])

#perform one-hot encoding on data frame
customer_encoded <- data.frame(predict(dummy, 
                                       newdata=customer_dataset1[, -18]))

# view final data frame
#
head(customer_encoded)
##       administrative administrative_duration informational
## 12046              5                200.8333             0
## 1969               0                  0.0000             0
## 9329               0                  0.0000             0
## 10233              1                  3.0000             0
## 11320              5                106.0000             0
## 3863               0                  0.0000             0
##       informational_duration productrelated productrelated_duration bouncerates
## 12046                      0             44                1909.081 0.004761905
## 1969                       0             32                 999.000 0.000000000
## 9329                       0              9                  82.500 0.000000000
## 10233                      0             11                1046.667 0.005555556
## 11320                      0             44                 646.825 0.007407407
## 3863                       0            141                2886.524 0.004316547
##        exitrates pagevalues SpecialDay month.Aug month.Dec month.Feb month.Jul
## 12046 0.01825397   0.000000        0.0         0         1         0         0
## 1969  0.01344086   9.053082        0.0         0         0         0         0
## 9329  0.03333333   0.000000        0.0         0         1         0         0
## 10233 0.02500000   0.000000        0.0         0         0         0         0
## 11320 0.02885185   0.000000        0.0         0         1         0         0
## 3863  0.01879809   0.000000        0.8         0         0         0         0
##       month.June month.Mar month.May month.Nov month.Oct month.Sep
## 12046          0         0         0         0         0         0
## 1969           0         1         0         0         0         0
## 9329           0         0         0         0         0         0
## 10233          0         0         0         1         0         0
## 11320          0         0         0         0         0         0
## 3863           0         0         1         0         0         0
##       operatingsystems.1 operatingsystems.2 operatingsystems.3
## 12046                  1                  0                  0
## 1969                   1                  0                  0
## 9329                   0                  1                  0
## 10233                  0                  0                  1
## 11320                  0                  0                  1
## 3863                   0                  1                  0
##       operatingsystems.4 operatingsystems.5 operatingsystems.6
## 12046                  0                  0                  0
## 1969                   0                  0                  0
## 9329                   0                  0                  0
## 10233                  0                  0                  0
## 11320                  0                  0                  0
## 3863                   0                  0                  0
##       operatingsystems.7 operatingsystems.8 browser.1 browser.2 browser.3
## 12046                  0                  0         0         0         0
## 1969                   0                  0         1         0         0
## 9329                   0                  0         0         1         0
## 10233                  0                  0         0         1         0
## 11320                  0                  0         0         1         0
## 3863                   0                  0         0         1         0
##       browser.4 browser.5 browser.6 browser.7 browser.8 browser.9 browser.10
## 12046         0         0         0         0         1         0          0
## 1969          0         0         0         0         0         0          0
## 9329          0         0         0         0         0         0          0
## 10233         0         0         0         0         0         0          0
## 11320         0         0         0         0         0         0          0
## 3863          0         0         0         0         0         0          0
##       browser.11 browser.12 browser.13 region.1 region.2 region.3 region.4
## 12046          0          0          0        1        0        0        0
## 1969           0          0          0        1        0        0        0
## 9329           0          0          0        1        0        0        0
## 10233          0          0          0        1        0        0        0
## 11320          0          0          0        0        0        0        0
## 3863           0          0          0        0        0        0        1
##       region.5 region.6 region.7 region.8 region.9 traffictype.1 traffictype.2
## 12046        0        0        0        0        0             0             0
## 1969         0        0        0        0        0             1             0
## 9329         0        0        0        0        0             1             0
## 10233        0        0        0        0        0             0             0
## 11320        0        0        0        0        1             0             0
## 3863         0        0        0        0        0             0             0
##       traffictype.3 traffictype.4 traffictype.5 traffictype.6 traffictype.7
## 12046             0             0             0             0             0
## 1969              0             0             0             0             0
## 9329              0             0             0             0             0
## 10233             0             0             0             0             0
## 11320             0             0             0             0             0
## 3863              1             0             0             0             0
##       traffictype.8 traffictype.9 traffictype.10 traffictype.11 traffictype.12
## 12046             0             0              1              0              0
## 1969              0             0              0              0              0
## 9329              0             0              0              0              0
## 10233             0             0              1              0              0
## 11320             0             0              0              0              0
## 3863              0             0              0              0              0
##       traffictype.13 traffictype.14 traffictype.15 traffictype.16
## 12046              0              0              0              0
## 1969               0              0              0              0
## 9329               0              0              0              0
## 10233              0              0              0              0
## 11320              1              0              0              0
## 3863               0              0              0              0
##       traffictype.17 traffictype.18 traffictype.19 traffictype.20
## 12046              0              0              0              0
## 1969               0              0              0              0
## 9329               0              0              0              0
## 10233              0              0              0              0
## 11320              0              0              0              0
## 3863               0              0              0              0
##       visitortype.New_Visitor visitortype.Other visitortype.Returning_Visitor
## 12046                       0                 0                             1
## 1969                        0                 0                             1
## 9329                        0                 0                             1
## 10233                       0                 0                             1
## 11320                       0                 0                             1
## 3863                        0                 0                             1
##       weekend.FALSE weekend.TRUE
## 12046             1            0
## 1969              1            0
## 9329              1            0
## 10233             0            1
## 11320             1            0
## 3863              1            0
# add revenue varaible to encoded dataset
#
customer_encoded$revenue <- customer_dataset1$revenue

# preview first six columns
head(customer_encoded)
##       administrative administrative_duration informational
## 12046              5                200.8333             0
## 1969               0                  0.0000             0
## 9329               0                  0.0000             0
## 10233              1                  3.0000             0
## 11320              5                106.0000             0
## 3863               0                  0.0000             0
##       informational_duration productrelated productrelated_duration bouncerates
## 12046                      0             44                1909.081 0.004761905
## 1969                       0             32                 999.000 0.000000000
## 9329                       0              9                  82.500 0.000000000
## 10233                      0             11                1046.667 0.005555556
## 11320                      0             44                 646.825 0.007407407
## 3863                       0            141                2886.524 0.004316547
##        exitrates pagevalues SpecialDay month.Aug month.Dec month.Feb month.Jul
## 12046 0.01825397   0.000000        0.0         0         1         0         0
## 1969  0.01344086   9.053082        0.0         0         0         0         0
## 9329  0.03333333   0.000000        0.0         0         1         0         0
## 10233 0.02500000   0.000000        0.0         0         0         0         0
## 11320 0.02885185   0.000000        0.0         0         1         0         0
## 3863  0.01879809   0.000000        0.8         0         0         0         0
##       month.June month.Mar month.May month.Nov month.Oct month.Sep
## 12046          0         0         0         0         0         0
## 1969           0         1         0         0         0         0
## 9329           0         0         0         0         0         0
## 10233          0         0         0         1         0         0
## 11320          0         0         0         0         0         0
## 3863           0         0         1         0         0         0
##       operatingsystems.1 operatingsystems.2 operatingsystems.3
## 12046                  1                  0                  0
## 1969                   1                  0                  0
## 9329                   0                  1                  0
## 10233                  0                  0                  1
## 11320                  0                  0                  1
## 3863                   0                  1                  0
##       operatingsystems.4 operatingsystems.5 operatingsystems.6
## 12046                  0                  0                  0
## 1969                   0                  0                  0
## 9329                   0                  0                  0
## 10233                  0                  0                  0
## 11320                  0                  0                  0
## 3863                   0                  0                  0
##       operatingsystems.7 operatingsystems.8 browser.1 browser.2 browser.3
## 12046                  0                  0         0         0         0
## 1969                   0                  0         1         0         0
## 9329                   0                  0         0         1         0
## 10233                  0                  0         0         1         0
## 11320                  0                  0         0         1         0
## 3863                   0                  0         0         1         0
##       browser.4 browser.5 browser.6 browser.7 browser.8 browser.9 browser.10
## 12046         0         0         0         0         1         0          0
## 1969          0         0         0         0         0         0          0
## 9329          0         0         0         0         0         0          0
## 10233         0         0         0         0         0         0          0
## 11320         0         0         0         0         0         0          0
## 3863          0         0         0         0         0         0          0
##       browser.11 browser.12 browser.13 region.1 region.2 region.3 region.4
## 12046          0          0          0        1        0        0        0
## 1969           0          0          0        1        0        0        0
## 9329           0          0          0        1        0        0        0
## 10233          0          0          0        1        0        0        0
## 11320          0          0          0        0        0        0        0
## 3863           0          0          0        0        0        0        1
##       region.5 region.6 region.7 region.8 region.9 traffictype.1 traffictype.2
## 12046        0        0        0        0        0             0             0
## 1969         0        0        0        0        0             1             0
## 9329         0        0        0        0        0             1             0
## 10233        0        0        0        0        0             0             0
## 11320        0        0        0        0        1             0             0
## 3863         0        0        0        0        0             0             0
##       traffictype.3 traffictype.4 traffictype.5 traffictype.6 traffictype.7
## 12046             0             0             0             0             0
## 1969              0             0             0             0             0
## 9329              0             0             0             0             0
## 10233             0             0             0             0             0
## 11320             0             0             0             0             0
## 3863              1             0             0             0             0
##       traffictype.8 traffictype.9 traffictype.10 traffictype.11 traffictype.12
## 12046             0             0              1              0              0
## 1969              0             0              0              0              0
## 9329              0             0              0              0              0
## 10233             0             0              1              0              0
## 11320             0             0              0              0              0
## 3863              0             0              0              0              0
##       traffictype.13 traffictype.14 traffictype.15 traffictype.16
## 12046              0              0              0              0
## 1969               0              0              0              0
## 9329               0              0              0              0
## 10233              0              0              0              0
## 11320              1              0              0              0
## 3863               0              0              0              0
##       traffictype.17 traffictype.18 traffictype.19 traffictype.20
## 12046              0              0              0              0
## 1969               0              0              0              0
## 9329               0              0              0              0
## 10233              0              0              0              0
## 11320              0              0              0              0
## 3863               0              0              0              0
##       visitortype.New_Visitor visitortype.Other visitortype.Returning_Visitor
## 12046                       0                 0                             1
## 1969                        0                 0                             1
## 9329                        0                 0                             1
## 10233                       0                 0                             1
## 11320                       0                 0                             1
## 3863                        0                 0                             1
##       weekend.FALSE weekend.TRUE revenue
## 12046             1            0   FALSE
## 1969              1            0    TRUE
## 9329              1            0   FALSE
## 10233             0            1   FALSE
## 11320             1            0   FALSE
## 3863              1            0   FALSE

Data normalization

Kmeans is only suitable for clustering continuous data. The categorical variables shall be dropped and the the continuous variables normalized for modeling

The data of numeric variables from the skew and Kurtosis analysis did not follow a Gaussian distribution. The data shall first be normalized

# normalizing the customer data set
#
customerNorm <- as.data.frame(scale(num_dataset))

# Visualize first six records of normalized dataset
#
head(customerNorm)
##       administrative administrative_duration informational
## 12046      0.7704387               0.6522770    -0.4179289
## 1969      -0.7132666              -0.4855189    -0.4179289
## 9329      -0.7132666              -0.4855189    -0.4179289
## 10233     -0.4165255              -0.4685227    -0.4179289
## 11320      0.7704387               0.1150107    -0.4179289
## 3863      -0.7132666              -0.4855189    -0.4179289
##       informational_duration productrelated productrelated_duration bouncerates
## 12046             -0.2543652     0.30787203              0.43641435  -0.3398862
## 1969              -0.2543652     0.01171467             -0.11031671  -0.4492033
## 9329              -0.2543652    -0.55592028             -0.66090425  -0.4492033
## 10233             -0.2543652    -0.50656072             -0.08168095  -0.3216666
## 11320             -0.2543652     0.30787203             -0.32188591  -0.2791544
## 3863              -0.2543652     2.70181073              1.02361338  -0.3501101
##        exitrates pagevalues SpecialDay
## 12046 -0.4954884 -0.3337685 -0.2942093
## 1969  -0.6026941  0.1578864 -0.2942093
## 9329  -0.1596153 -0.3337685 -0.2942093
## 10233 -0.3452294 -0.3337685 -0.2942093
## 11320 -0.2594344 -0.3337685 -0.2942093
## 3863  -0.4833687 -0.3337685  3.8691295

Implementing solution- K-means clustering

K-means clustering is a clustering algorithm that is commonly used for partitioning a given data set into a set of k groups (i.e. k clusters), where k represents the number of groups pre-specified. Cluster analysis is widely used in the biological and behavioral sciences, marketing, and medical research

Finding optimum k value

We can compute k-means in R with the kmeans function. Here will group the data into from 2-6 clusters (centers = 2-6). We will add nstart=25 generates 25 initial configuration

# setting seed
#
set.seed(123)

# kmeans clusters ranging from 2-6 clusters
#
customer_k2 <- kmeans(customerNorm, centers = 2, nstart = 25)
customer_k3 <- kmeans(customerNorm, centers = 3, nstart = 25)
customer_k4 <- kmeans(customerNorm, centers = 4, nstart = 25)
customer_k5 <- kmeans(customerNorm, centers = 5, nstart = 25)
customer_k6 <- kmeans(customerNorm, centers = 6, nstart = 25)
# We can plot these clusters for different K value to compare.
#
p1 <- fviz_cluster(customer_k2, geom = "point", data = customerNorm) + ggtitle(" K = 2")
p2 <- fviz_cluster(customer_k3, geom = "point", data = customerNorm) + ggtitle(" K = 3")
p3 <- fviz_cluster(customer_k4, geom = "point", data = customerNorm) + ggtitle(" K = 4")
p4 <- fviz_cluster(customer_k5, geom = "point", data = customerNorm) + ggtitle(" K = 5")
p5 <- fviz_cluster(customer_k6, geom = "point", data = customerNorm) + ggtitle(" K = 6")

grid.arrange(p1, p2, p3, p4, p5, nrow = 2)

from the plots a K value of two seems to classify the data best

Determining Optimal Clusters

K-means clustering requires that you specify in advance the number of clusters to extract. A plot of the total within-groups sums of squares against the number of clusters in a k-means solution can be helpful. A bend in the graph can suggest the appropriate number of clusters.

We shall employ the elbow method, Silhouette method and Gap statistic

# Determining Optimal clusters (k) Using Elbow 
#
fviz_nbclust(x = customerNorm,FUNcluster = kmeans, method = 'wss' )

# Determining Optimal clusters (k) Using Average Silhouette Method
#
fviz_nbclust(x = customerNorm,FUNcluster = kmeans, method = 'silhouette' )

# compute gap statistic
set.seed(123)
gap_stat <- clusGap(x = customerNorm, FUN = kmeans, K.max = 15, nstart = 25, B = 50, iter.max=30)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 150000)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 150000)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 150000)
# Print the result
print(gap_stat, method = "firstmax")
## Clustering Gap statistic ["clusGap"] from call:
## clusGap(x = customerNorm, FUNcluster = kmeans, K.max = 15, B = 50,     nstart = 25, iter.max = 30)
## B=50 simulated reference sets, k = 1..15; spaceH0="scaledPCA"
##  --> Number of clusters (method 'firstmax'): 15
##           logW   E.logW      gap      SE.sim
##  [1,] 7.943042 9.346686 1.403644 0.003049904
##  [2,] 7.825745 9.259997 1.434252 0.003181899
##  [3,] 7.703457 9.209317 1.505861 0.002987526
##  [4,] 7.621512 9.172116 1.550604 0.003098483
##  [5,] 7.549524 9.146627 1.597103 0.003072211
##  [6,] 7.491998 9.124017 1.632019 0.003090704
##  [7,] 7.449260 9.107396 1.658136 0.003043800
##  [8,] 7.412774 9.091432 1.678658 0.003105021
##  [9,] 7.379764 9.078670 1.698906 0.003080937
## [10,] 7.324016 9.066652 1.742636 0.003129988
## [11,] 7.285475 9.055486 1.770011 0.003083274
## [12,] 7.252109 9.044897 1.792788 0.003127841
## [13,] 7.225391 9.034961 1.809569 0.003108134
## [14,] 7.203641 9.025603 1.821963 0.003114827
## [15,] 7.178958 9.016678 1.837720 0.003144910
# plot the result to determine the optimal number of clusters.
#
fviz_gap_stat(gap_stat)

Using the Gap statistic, silhouette method and the elbow method the optimum was found to be k =2

Modeling with optimal cluster value

# setting seed
#
set.seed(123)

# Compute k-means clustering with k = 2
#
cl <- kmeans(customerNorm, centers = 2, nstart = 25)
# Visualization of results
#
fviz_cluster(cl, data = customerNorm)

# Let’s check out the centers and size of each cluster.
#
cl$centers
##   administrative administrative_duration informational informational_duration
## 1     -0.2737953              -0.2338145    -0.2685363             -0.2095138
## 2      1.3656977               1.1662725     1.3394657              1.0450600
##   productrelated productrelated_duration bouncerates   exitrates  pagevalues
## 1     -0.2546176              -0.2513869  0.06240071  0.09600273 -0.05511301
## 2      1.2700386               1.2539237 -0.31125621 -0.47886390  0.27490504
##    SpecialDay
## 1  0.03732511
## 2 -0.18617853
# cluster sizes
#
cl$size
## [1] 2499  501
# We can extract the clusters and add to our initial data to do 
# some descriptive statistics at the cluster level
#
customerNorm %>% 
  mutate(Cluster = cl$cluster) %>%
  group_by(Cluster) %>%
  summarize_all('median')
## # A tibble: 2 × 11
##   Cluster administrative administrative_duration informational informational_du…
##     <int>          <dbl>                   <dbl>         <dbl>             <dbl>
## 1       1         -0.713                  -0.486        -0.418            -0.254
## 2       2          1.36                    0.631         1.18              0.265
## # … with 6 more variables: productrelated <dbl>, productrelated_duration <dbl>,
## #   bouncerates <dbl>, exitrates <dbl>, pagevalues <dbl>, SpecialDay <dbl>

Evaluation of the model.

Finally, summarize our model.

# Print model summary with  30 records only
#
print(cl)
## K-means clustering with 2 clusters of sizes 2499, 501
## 
## Cluster means:
##   administrative administrative_duration informational informational_duration
## 1     -0.2737953              -0.2338145    -0.2685363             -0.2095138
## 2      1.3656977               1.1662725     1.3394657              1.0450600
##   productrelated productrelated_duration bouncerates   exitrates  pagevalues
## 1     -0.2546176              -0.2513869  0.06240071  0.09600273 -0.05511301
## 2      1.2700386               1.2539237 -0.31125621 -0.47886390  0.27490504
##    SpecialDay
## 1  0.03732511
## 2 -0.18617853
## 
## Clustering vector:
## 12046  1969  9329 10233 11320  3863   928  8622  4888 10545  3679  2669  5174 
##     1     1     1     1     1     1     1     1     1     1     1     2     1 
##   137  9001  4422  7051  4313  8477  9857  5535  1989 11753 12222 11680 11051 
##     1     1     1     1     1     2     1     1     2     1     2     1     1 
##  7434   381  2767 10280  1005  6503 11275  8525  2786  9506  6924  4476 11025 
##     1     1     1     1     1     1     2     1     1     1     1     1     2 
##  4063    69   950   154  7571  3880  6045  1686  7480  2456  8933  6770  7027 
##     2     1     1     1     1     1     1     2     1     1     2     2     1 
##   845   796   245  5574  4858  5597  3915  8017 11098 10444  4807 11620  9994 
##     2     1     1     1     1     2     1     2     1     1     1     1     1 
## 11460  7579 11923   351  1034  8119  5699  8244  5902  8504  5583 10876  5470 
##     1     1     2     1     1     1     2     1     1     1     1     2     1 
##  7744  8939  5960 11468 11919  1701   316  8383  3418  1646  6569  9408 10404 
##     2     1     1     1     1     1     2     1     1     1     2     1     1 
##  2591  6438 10373 10000  9625  3921 11293  4465  1205   982  2906  3060  6248 
##     2     1     2     1     1     1     1     1     1     1     2     2     1 
##  1608 10727  3013  9057  5987  6419 10695 12206  5124  7256  6149 11770  9640 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
## 11873  3527  1360  3565  2005  3279  3029 10285  5792   453  1953  8615  1980 
##     1     2     1     1     1     1     1     2     1     1     1     1     1 
##   820  5274  8619  6483  9935 11503  5750  5344 11219 11027  9469  2256  4363 
##     1     1     2     1     1     1     1     1     2     2     1     2     1 
## 10917  5336  4570  8929  7869  3577  1330  5120  3604 10420  3959  3742  5799 
##     2     1     1     1     1     1     1     1     1     2     1     1     1 
##  5081 10356   222 10979 10388 11250  8323  9411  8841  9806  8056 11371 10268 
##     1     1     1     1     2     2     1     1     1     1     1     1     1 
##  4703  6113  4523  9264  7146  7313 10196  4822  6827  3597 11555  9970  2635 
##     1     1     1     2     2     1     2     2     1     1     2     2     1 
##  1949  7971   397  1366   436  7906 11709  6241 11554  6525  5893   868  9726 
##     1     1     1     1     1     1     1     2     1     1     1     1     1 
##  7688  4880 11273  5234  8572  2687  1522  7568  2068  4892  4581  2893  4571 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  7095  3030  5914 12215  6898  1175 11257  6473  2617  4319  2693 11111  9704 
##     2     2     1     1     1     1     2     1     1     1     1     1     1 
##  3177 11462  3919 11704  3317   787  5188  1403  8345 10322  8613  6940 10085 
##     1     1     1     1     1     1     1     1     2     2     2     2     1 
## 10555  6939  1163  4307  8835  4200  4968  6306  6351  9312 10949  7824   860 
##     1     1     1     1     1     2     1     1     1     2     2     1     1 
##  8513  1051 12217  9306 10642  8865    70   619  6604 10931  3969  6771  6949 
##     1     1     2     1     2     1     1     2     2     2     1     2     2 
##  6983  2574  5607  4014  8559  9200   985  4901  2569  6047  7308  9122 10799 
##     1     1     1     1     1     1     1     1     2     2     2     1     1 
##  3061 10733 10887 11213  7749  3854 11469  3278  6970  8607 10035 10329 10161 
##     1     2     1     1     1     1     1     1     2     1     2     1     1 
##  1018  4262  2880 10251  7285  6163  6356   437  3672  5538 11781 10132  5985 
##     1     1     1     1     1     2     1     1     1     1     1     1     1 
##  1365  6302   966  5088 11242  4109   689  5898   444  4944  7539  9124  5453 
##     1     2     1     1     1     2     1     1     1     1     2     1     1 
##  2529  2757  3168  6275  3523 12238  8707  2055 11236   486  6750   526  9758 
##     1     1     1     1     1     2     1     1     1     1     2     1     1 
##  1565 10174   435  4861  9801 11790 10433  6715  1407   810   864    97 10429 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
##  5222  6416  9072  9871  1623  5974  4342  6009  3512  9044  4728  6901   945 
##     1     1     1     1     1     1     1     1     1     2     1     1     2 
##  5859  6784  2316  6557  7783  7261 11313  7021  7936  8240  7243   254 11129 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  8245  7179   266   105  1320  6343  4626  9507  7658  9601 10133  3887  7613 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  5804  5283  2178   935 11708  7690  5198 11595  1286  6048   462  5474  2648 
##     1     1     1     1     1     2     1     2     1     1     1     2     1 
##  2760  5037 11490  3751  9731 10606  9402 11807  8509  6987  5320 12294  9788 
##     2     1     1     1     1     1     1     2     1     1     1     1     1 
##  8121 11186   632  5980  6911   681  3773  3111  6657   162 11962  7482   639 
##     2     1     1     1     1     1     1     1     1     1     2     1     1 
##  9113   455   947  8127  1270  1152  3759  2568  6352 11271 10525  1507  7169 
##     1     1     1     1     1     1     1     1     1     2     1     1     2 
##  1782 11149  8616 11953  4707  9208  2375  8192  7852  6017  8033  5191  6157 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  3227  5564  8920  4195  9892  6536  7122  9680  6134  6614  3629 10777  6378 
##     1     1     2     1     1     1     2     1     2     2     2     1     1 
## 11081  2585  9634  9664  3136  1155  9228  8784  7981   687 11463   624 10868 
##     1     1     2     1     1     1     1     1     1     1     1     1     1 
##  1470 11662  8975   148  2677  6662 11832 10275 10614  1822  1946 10986  5512 
##     1     1     1     1     1     1     1     2     1     1     1     2     1 
##  8232  7947  2013   367  5824  5544  7755  2955  2232  9223  1101 10678  3746 
##     1     1     2     1     2     1     1     1     1     1     1     1     1 
##  7980  7564  7108    47  1178 10245  5040 10460  6663  1125 11854   500  6829 
##     1     2     1     1     1     2     2     2     1     1     2     2     1 
##  2847  1662  3803  9346  9135  1512   554 12014   745  2868  8103  5260  8680 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##   958 12276  9309   770  3731  6271  3098  4503  8884 10707 10217  6702  4012 
##     1     1     1     2     1     1     1     1     2     1     1     1     1 
##   198  7259  1140  1042   951  8952  8896  3172   178  5847  2991  6370  2714 
##     1     1     1     2     1     2     1     2     1     1     1     1     1 
## 12150  4335 11683  8903  1998 10865  3457  5953     6 10645  9530  1253  7385 
##     1     1     1     1     1     1     1     2     1     2     1     1     2 
##  7220   993 11055 10215  3065  9693  7500  6040  7163   961   541  1943  1484 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  8658  4805  3789  5530  8484 11303  5814   751  9799   743  1422  6927 11604 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##  1187 11109  3145   350  9276  5744 10810    96   715  9512  1815   466  9930 
##     1     2     1     1     1     1     1     1     1     1     1     1     1 
##   344 12067 11506  1489  1479  2484  8312  5182  8227  7291  6762  1417   969 
##     1     1     1     1     1     1     1     1     1     2     1     1     1 
##  2386  4667  1670  2252  2053  7933   718  1900  2195   901  6035  2864  2793 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## 10671  5192 10326  5349    36  5820  6130  6485 12229  7012  3757  6851 11325 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  2547  6916  9686 11710  3071  7485  4039  5706 10315 11729  8597 12003   176 
##     1     1     1     2     1     2     1     1     1     1     2     1     1 
##  5952  4181  1531 10870   528  1361 10925 10719  7518  3405 11602  9398  1404 
##     1     2     1     2     1     1     2     1     1     1     2     1     1 
## 10838 11668 12180  8858  4326  2570  8205  2168 11742  5628   345  1184  8957 
##     1     1     2     1     1     1     1     1     1     1     1     1     1 
##  5514  1780 12205  8560  2420  8406  5746 11412  9932   451  4952  3540  9455 
##     2     1     2     1     1     1     1     1     1     1     1     1     1 
## 10028  8931  7595  1307  8992   922  7245  2517  4845  8836  4848  7089  1260 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  3315  6971 10874  7230 11610 10964   134   494 11315  3558  4371  5054 11194 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##   575  6706  2548 10447   420  6943  4192  3460  8586  1332  3547  8167  2579 
##     2     1     1     1     1     1     1     1     2     1     1     1     1 
##  1542  5394 10802  3705  7816  4068  4448  3218  5762 11868  3157  5655 12033 
##     1     1     2     1     1     1     1     1     1     1     2     1     1 
##  7173  1899  2403  5377  7297 12247 10350   754  2512  3997 11395   439  2126 
##     1     1     1     1     1     1     1     1     1     1     2     1     1 
##  7150  1040  1378  6908 11537   359     8  9478  8942  4925   264  7566  8182 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##    38  9143  2919  6101  5742  3230  8795  3035  4261  4242 10520  6632  6963 
##     1     1     1     1     1     1     2     1     2     1     1     1     1 
##  3217   241   139    17  8649  9617  6925  4436  7511  7988 11135  1935  8446 
##     1     1     1     1     1     2     2     1     1     1     2     1     1 
## 11227  6146 10204  7782  3835  8990  3579  8639  9913  6768  5846  2249  2231 
##     1     1     1     1     1     1     1     1     1     1     2     1     2 
##  2037  3845 11106  4377  2444  6671 11965  9144  4435  5110  4640  5636  7846 
##     1     1     1     1     1     1     2     1     1     1     1     1     1 
##   336  6331  8861  4597  8696  1554  4949   738  6098 11088  6099  5381  5195 
##     1     1     1     1     1     1     1     1     1     2     2     1     2 
##  1387  7203  5539  8152  2390  5701  1804 10419 10005   291  9573  1261  7638 
##     1     1     1     1     2     1     1     1     1     1     2     1     1 
## 11305  1461  2941 10221   261 12136  4015  3131 12001  3344  9709  2114  7302 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##  1150  5936   196  3347 10070  8358 11038 12115  8965 11533  3602 10619  4127 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  7546  3675  1636  8098 11898  5329  6759  9738  6811  6262 10361   520 11974 
##     1     1     1     1     1     1     1     2     1     2     2     1     1 
##  7662 10323  5522  7812  6170  3348  5288  2372  8396  5380  4251 10194  1795 
##     1     1     1     1     1     1     1     2     1     1     1     1     1 
##  8612  8941  7490  9335  2865  4223  6171 11252  9315  1818 10836  9303 11822 
##     1     1     1     2     1     1     2     1     1     1     2     1     1 
##  4382  9136  1059  9249  3711  7907  2253  4163  2498 10061 11033  2663  6926 
##     2     1     1     2     1     1     1     2     1     1     1     1     1 
##  4425  8475 10968  9963  5541  2452 10023 11013   834  2234  8863  6125 11638 
##     1     1     1     1     1     2     1     1     1     1     1     2     1 
##  3439  2152  7601 12007  5078  5513 11120  2567  2644 10055 10334  2429  6929 
##     1     1     1     1     1     2     1     1     1     1     2     2     1 
##   405  7801  8678 12323  5791  1727  4115  4074 11432  2182  9400  6246  7019 
##     1     1     2     1     1     1     2     1     1     1     2     1     1 
##  6803  8356  8986  6612  1576  8948  2653  1720  7162  8731  5944  9196  7399 
##     2     2     1     1     1     2     1     1     1     1     1     2     2 
##  5107  6875 11641  6855  4033  3839  8538  5491  4009  2289  5948  9837  5224 
##     1     1     1     2     1     1     1     1     1     1     1     1     1 
## 10539 11722  1056 11356 11197   683  3637   652  9527  2321 10909    88   932 
##     2     1     1     1     1     1     2     1     1     1     2     1     2 
##  9690  6959 12201  9846  4430 11538  7242   695  4303  6155   991  4991  9192 
##     2     2     1     2     1     2     1     2     1     1     1     1     1 
##  7205 11054  3697  5579 10378  8079  5768  8155 10670  9104  3494 12158  9701 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  2758  2937  5912  6088  2745  8258  5202 10687  8792  2931 11684  4036  3939 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
##  4453  6886  1381  7631  6572  3582  1637  7990 10097   323  3932  8463   914 
##     1     2     1     1     1     1     1     1     1     1     1     1     1 
## 10237 10742  4423  8185  3015  4388 11529  9132  1083  5278  5480 11716   693 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  7268  4988  2925 10732  2076 12261   534  4967  1337  3206 11660  1710  3908 
##     1     1     1     1     1     2     1     1     1     1     1     1     1 
## 10769   558  1837  6383  6204  7681  4616  3980    63  9940 11382  5993 10230 
##     1     1     1     1     1     1     1     1     2     2     1     1     1 
##  5216  2495  1157  7558  7190  9551  7244  2597  7920  5225  9752  9958  6783 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
## 10804  3489  8235  5621  7171 11108 10541  3292 11339  4394  6800  2260  1960 
##     1     1     2     1     1     1     2     1     1     1     1     2     1 
##   450  7217  9674  8853 11895  8350  3998 11052 11780  8688 11341  7573 11586 
##     1     1     1     1     1     2     1     1     2     2     1     2     1 
## 10189  8010  6222 11163  6780 12134  7157 11408  1106 11744  1622  2211  6309 
##     1     2     1     1     2     2     1     2     2     2     1     1     1 
##  5543  3078  9250  5245  1885 10813 10129 11228  2792   389  4193  2606  5362 
##     1     1     2     1     1     1     1     2     1     1     1     1     1 
##  7113  9043  9611 10150  4152  6073 10616 10811  9872  2158  6985  8888    34 
##     1     1     1     1     1     2     1     1     1     1     1     1     1 
##  7321 11887  8685 12174 10648  4840  5611 10181  6754     4  3101   510   858 
##     1     1     1     1     1     1     1     2     1     1     1     1     1 
##  6686 11880 12162   111  3906 12125  3524 12098  8320 11439 10546 10518   628 
##     2     1     1     1     1     1     1     1     1     1     1     2     1 
## 10252 10428  7837 11064 10083  6745  6775  3388  6713 10224 11674  2973  1068 
##     2     1     1     1     2     1     1     1     1     1     1     1     1 
##  9678  4577  3607  3178  1495 10911  6141 10040  2854 11943 12227 11087  1846 
##     1     2     1     1     1     1     1     1     1     1     1     1     1 
##  5693  7029 11658  4351  4627 10790  1105  4434 11706   946  3495  5332  9026 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## 10193   726  8264  6335  6615  3367 12207  8880  3954   921  7628  3739 11201 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##  5767   529 10158   416  2636  3093  9147 12060  5560  5848  4386  2712 11122 
##     1     1     1     1     1     2     2     1     1     1     1     1     1 
##  6645   611  3319  9946  7426  1715  2371   235  8055  1442  7652  2319  7389 
##     2     2     1     1     1     1     1     1     1     1     1     1     1 
##  8495 10091  7815  3655  7408  2913 11053  5475  2778 11212  6043  3740  9923 
##     1     1     1     2     1     1     1     1     1     1     1     1     2 
## 12219 10487  7882  3156 10741  4638  1888 10117  4488  2824   696   827  3532 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## 11401  7923  7969  8704  6998   257  5728  4379  1728  6180  7411  3764  3633 
##     2     1     1     1     2     1     1     2     1     1     2     1     1 
## 11223  5158  7368  1423  5803  7472   540  6355  5776  2706 12101 11014   430 
##     1     1     1     1     1     1     1     1     1     1     1     2     1 
##  3257  6185  7168  5724  5449  9471  3894 11989   702  4295  5293  7916 11992 
##     1     1     1     1     1     1     1     2     1     1     1     1     1 
##  2177  1930 10943  7645 11007   195 11848 10620  2313  9269  9845 10033  1091 
##     1     1     1     1     1     1     1     2     2     2     1     2     1 
##  9591  3726 11984  1345  8638  3844  6404  1800   414 11302   647   731   626 
##     1     1     2     1     1     1     2     1     1     2     1     1     1 
##  6822  1400 10024  6465  3750   262  4030  4710  8533   395  1316  6462  5145 
##     1     1     1     1     2     1     1     1     1     1     1     1     1 
##  7294 10652  6507  6986  2036  7857   881  7264  9548   882  7632 11628  8518 
##     1     1     1     1     1     2     1     2     1     1     1     1     1 
## 10627  5031 11635  8137  9423  7379   180  3149  2002  1063  7550  6590  6885 
##     1     2     1     1     1     1     1     2     1     1     2     1     1 
## 10849   109 10020  1239 11585  2506 10018  5520  4302  5505  6322  6177  8972 
##     1     1     1     1     1     1     1     2     2     1     1     1     1 
##  1336 10871  9275  7867  6110 11820  9295   706  4531  1288  5563 11159  2884 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  2171  6773  8347  7395  8983 10248  9736  8859  7069  9383  3895 12289  5058 
##     1     1     2     1     1     1     1     1     1     2     1     1     1 
##  8847  5891  8921  9277  4542  7667  9934  5266  9019  5632  6816  3704  8962 
##     2     1     1     1     1     1     2     1     2     1     1     1     1 
##  8479  4722   588  4419 10928  4123  3938  2532  3768  2655  6669  1879  4789 
##     1     1     1     1     2     1     1     1     1     1     1     1     2 
##  8369 11640   429  9549  2603  3379  9242  6175  7396  7768  5933   295 10534 
##     2     2     1     2     1     1     1     1     1     1     1     1     1 
##  2794   268  8593  5832  2609  2212  5138  1446  9656 10676 10965 11190  1596 
##     1     1     1     1     1     1     1     1     1     1     1     2     1 
##  6379   844  6942 10990  7189  3865  1444  5823  7272  7286  3244  6025 12248 
##     1     1     2     1     1     2     1     1     1     1     1     1     2 
## 10101  3202 11233  7137  8004  6894 10916  2840  3440  5589  2907  7471 10239 
##     1     1     2     1     1     2     2     2     2     1     1     1     2 
##  8300 11738  4035  2946 11835  8129  9238  3031  2837 11948  9489  1238 10348 
##     1     1     1     1     1     1     1     1     1     1     2     1     1 
##  5154 12314 10491   711  4906  8359  5270  6589  3222   385  6792  6703  5261 
##     2     2     1     1     1     1     1     1     1     2     2     1     1 
## 10095  8100  8860  5469  5375  8999  8738 11619 11017   655  4962 12024  9405 
##     1     1     1     2     2     1     2     1     1     1     1     1     1 
##   949  6071  1882  5315  3871  5094  4739  7940  1678  2282 10853  4493 10321 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
##  2414  6849   742  4834  4782  2607 11755  7148  3720  6167  6078  4982  6202 
##     1     2     1     1     1     2     1     1     2     2     1     1     1 
##  7549  4162  1667  1406  1162  4689  7702  6372  9521 11237  6190   253 11327 
##     1     1     2     1     1     1     1     2     1     1     1     1     1 
##  2254  2466 11199  4247  6637  4345  6861  5429  5576   659  7222  6281 10956 
##     1     1     1     1     1     1     2     1     2     1     1     1     1 
##  3567   173  2781  1313  6972  1099  3012  7492 12216  9331 11137 12105  1692 
##     1     1     1     1     1     1     2     1     1     2     1     1     1 
##  4520  5087 11045  5561  2856  9866  7995  7034  7898  6187  4624  6858  6809 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## 12041  1833  2128  8837  2900 10007  1763  2191  8623  3322  5111  3190  3295 
##     1     1     1     2     1     1     1     1     1     1     1     1     1 
##  8052 10660  8132   452  1235    41  1549   862  6636 10343  4214  4160  9496 
##     1     1     2     1     1     1     1     1     1     1     1     1     1 
##  4072  6874   125 11860  2619  6140  4700 12093  7274 11142  6194 11175  7391 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##  1088  4920  9017  4165  2358  2329  7131  2150 10103  4695  4529  4280   756 
##     1     1     1     1     1     1     1     1     2     1     1     2     1 
##  9596  2845  5268  3353  2702   498  1302  9374  3935  1875  1275   641   883 
##     2     2     1     1     1     1     1     1     1     1     1     1     1 
## 10967  3761   296 12198  1115  2308  9170 10241  6080  8270  5786 10260  4578 
##     1     1     1     1     1     1     1     1     1     2     1     1     1 
##  5074  3363 10797  4433 10547   448 10489  4470  4931 12183  4040  5895  7024 
##     1     1     1     2     1     1     1     1     1     2     1     1     1 
##  9284  3070  3185  4090 11359  6680  7780  3973  2882  2132  7590  3126  4926 
##     1     1     1     1     1     1     1     1     1     1     2     1     1 
##  7737 10743  8540  8928 11569 10169 12009  5775  1660  8494  2003  5313  2780 
##     2     1     1     2     1     2     1     1     1     1     1     1     1 
##  9773   919   461  9999  7827  5978 11830  6687 10570  3766  7279  8451  9462 
##     1     2     1     2     1     1     2     1     1     2     1     1     1 
##  2918  1509  2956 11678  2708  7068  2139 10325 11372  5913  5868  5545  3941 
##     1     1     1     1     1     1     1     2     2     1     1     1     1 
##  2009  9990  9219  7053  4783  7081  7606  9244  5061  2504  1668  7775  8917 
##     2     1     2     1     1     2     1     1     1     1     1     1     1 
##  1249  6701  1566  3488  6596  1496  9613  4755 11485  3594  4357 11367 10653 
##     1     1     1     1     1     2     1     1     2     1     1     1     1 
##  1615 12267  5289  3961  1905  9407  9205  2595   292  6511  2804  9579  1944 
##     1     2     1     1     2     1     1     1     1     1     1     1     1 
## 10831  8081  8682  7327  8364  9330  2025 11929  2012   116  1168  3475  4681 
##     1     1     2     1     1     1     1     1     2     1     1     1     2 
##  3338  6673  2493  5755  6011  7397  4081  3239  5006  9002  5567  3182 12036 
##     1     1     1     2     1     1     1     1     1     2     1     1     1 
##  8037  3681 12317  2790  7994 11626  6565  4723  8995  9811  8566  5232  1844 
##     1     2     1     1     1     1     1     1     1     2     1     1     1 
##  9338  4662    53  9911  4604 12165  3505  4147  4718 10822  1835  3787  9741 
##     1     1     1     1     1     1     1     2     1     2     1     1     1 
##  7870  3813  5233  8374 11010  9995  7941  1343 10470  5276  3689  1649 12064 
##     1     2     1     1     1     1     1     2     1     2     1     1     1 
##  4198   816 10435 11072  2998   836  2828  1228   642  5447 11666  9571  4256 
##     1     1     1     2     1     1     1     1     2     1     1     1     1 
##  5569  1689 11556  6653  4173  6635  6257  8307  9475  6128  5479  1252  6390 
##     1     1     2     1     1     1     1     1     1     2     1     2     2 
##  4630  3294  2445  8524  7455  1389  1931 12116  9157  1613  9483   240 11667 
##     2     1     1     2     1     1     1     2     2     1     1     1     1 
##  8879 11612 11301  9979  7008 10699 10895 11956  6544  2855  3956  4530  6466 
##     2     2     1     2     2     1     2     1     1     1     2     1     1 
##  8925   679   680  7998  6692  4137  3725 10086 11393  8838  3024  1902  8520 
##     1     1     1     1     2     1     1     1     1     1     2     2     2 
##  1185   151 11456  8053   976  9784 10552  2279  2327  6354  2496  6266  1380 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##  4644   570  7678  4366  4589  2201 11280  6575  7808  6808  2021  8916 10443 
##     1     1     1     1     1     1     1     1     2     1     1     1     1 
##  1329  1714  1928 10171  3263  7915 12018   238  8527  4049  5384  7698  6658 
##     1     1     1     2     1     1     2     1     1     1     1     1     1 
## 10188  9245  8070 11767 11290   778  6212    60  1089  4854  6069  8249  4923 
##     1     1     1     1     2     2     1     1     1     1     1     2     1 
##  8478  3590  6111  9165  8019  4473  9520 11200 11884  4469  2678  9787 10292 
##     1     1     2     2     1     1     1     2     1     1     1     1     1 
##  7192  5114  4138 10649   522  4139  8466  3144    84  4599   354  5059  6232 
##     1     1     1     2     1     1     1     2     1     1     2     1     1 
##  6323 12094  3735  2300  3000  8432  8901  7170  2873  1774  8183  6454  3478 
##     2     2     1     1     1     1     1     1     1     2     1     1     1 
##  5316  5557  5324 12063   146  8715  9468  2091   672  7004  4554  2643  5053 
##     1     1     1     1     1     1     2     1     1     1     1     1     1 
##  3072  5926  4051  5438  2737  8340  5976  8868  6537 11573  2960 11726  5067 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  5448  3224  3609  4240  1098  3974  3125  9735  3623  5828  1768  8266  9253 
##     2     1     1     1     1     1     1     1     1     2     1     1     2 
##  3554  7792 10297  5103  1505  2903  7372  6215  5924  1864  1257 10212  6659 
##     1     1     2     2     1     2     2     2     1     1     1     1     1 
## 11272  7796  4343  9231  8895  9343 10581  7647  9061 11927  6066  1721  7729 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##  3057 10084    55  5572  6243  3255  9651  9052 11409  9925  7754  4360  6543 
##     1     1     1     2     1     1     1     1     2     1     1     2     1 
##  2497 11040  1213  5181  9880  9907  6420  7084   140  7865 10725  9587 11068 
##     1     1     1     1     1     1     1     1     1     2     1     1     1 
##  1991  9525   963  6739 10165  8705  5998   794 11633  6057   651  6458    27 
##     1     1     1     1     1     1     1     1     1     1     1     2     1 
##  4546  2116  2481  2031  4097  5618  2142 10238  3882  2369  5640 12023  8943 
##     1     1     1     2     1     2     1     1     1     1     2     1     1 
##  1526  1074  8069  1677 10126  9151    19  9781  3154  1695  8371  5092  1676 
##     1     1     1     1     1     1     1     2     1     1     1     1     1 
##  6288   579  7403  6022  4712  1124  8833  4569 10302  5108  8231  9920  2170 
##     1     1     2     1     1     1     1     2     2     1     2     2     1 
##  1873  9387 10749  1374  2558  4815 11578  2596 11364  3656 10665  2439 10718 
##     1     2     1     1     2     1     1     1     1     1     1     1     1 
## 11411 12044 10664  5157  7254 11789  2661  4489  6909  2598  7392  5496  4918 
##     1     1     2     1     1     1     1     1     1     1     2     1     1 
##  2501  4070 10794  7665  9181  9184   133  7343  8329  5822 11059  4642  2422 
##     1     1     1     1     1     1     1     1     1     1     2     1     1 
##  9024  6196  2247  9823  1682  7174 11011  5008  5011  4715   373  4773  8201 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  6282  3369   917  9906  2924 10313  9030  8447  5478 11259  6482 10880  1794 
##     1     1     2     1     1     1     1     1     1     2     1     1     2 
##  5219  5492  4873  5416  6678 11603 11577 10407  6369  3552  3390  9706  2292 
##     1     1     1     1     1     2     2     1     1     2     1     2     1 
##  2842 10082  9377  6197  8905  3334  1130 11985  1811  9035  5821 10883  1750 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  7118  8634  4646  4136  2968  4285  3336  5850  8302  7504  9567  4000  5036 
##     1     1     2     1     2     1     1     1     2     1     1     1     1 
##     2  5531 12096  3184 11161  4935   964  9422 10064  3800  3504  5421   631 
##     1     1     2     1     1     1     1     2     1     1     2     1     2 
##  5294  5318  5622 11095  3966 10773  2174  4842  4963  1603  7640  8324  7604 
##     1     1     1     1     1     1     1     1     1     1     1     1     2 
##  8826 10746  7059    72  3836  3054  1090  8253 10910  9305  6294  5537 10635 
##     1     1     1     1     1     1     1     2     1     1     1     2     1 
##  2240 10785  9286  8690  9185  9989  5450  3907 12274 11464  8016  8534  8313 
##     1     1     1     1     1     1     2     1     1     1     1     1     1 
##  7673  5303  1004 12074 11360  2406  1057  8343  8379 10617  9354  9361  2063 
##     1     1     1     1     1     1     1     1     1     2     2     1     1 
##  8497  9676  3936 10765  6631  2857  8301  6868 11527  3005  9370  7364  9618 
##     2     1     1     1     1     2     1     1     2     1     1     2     1 
##  5371  7165  3141 11481  2071 11255  2978   598 12309  7565  7452  2600   876 
##     1     2     1     1     2     2     1     1     1     1     1     2     1 
##  4235   615   723  1355 11333  2826  3194  2061 12277  1755  6192  6296 12249 
##     2     1     2     1     2     2     2     1     1     1     2     1     1 
##  1424  4460   803  5430  3266  1362 10938  8241  7600  8174 11480 10395   348 
##     1     1     1     1     1     1     2     1     1     2     1     1     1 
##  1322  4330 11184  6072  8757  4161  1733  8195   225  9503  3548 11771  9539 
##     1     1     1     1     2     1     2     1     2     1     1     1     1 
##  2962  8516  4086  4102  4656  1901  2086  5093  1915   205   320 12127  5571 
##     1     1     1     1     2     1     2     1     1     1     1     1     1 
##  3799  1910  4140  8143   177  8878 11376  7196  1401  5945  9201  7440    42 
##     1     1     1     1     1     2     2     1     1     1     1     1     1 
##  3897  2764  6769  4760  4771  7147  4080  3904   375  8442 11166  9614  9007 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
##  9862  5050  6010  3223  4899  2883  1664 11276  2087  7561  4234  2849  4495 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  9037  4770  3216  1011   132  8114 11922  2870  8392  1752 11226  3174  2255 
##     1     2     2     1     1     1     1     1     1     1     1     1     2 
##  4128  5719   978  2166  3970  4732 10479  8848  8656  5171  6211 11536  2196 
##     1     1     1     1     2     1     1     1     2     1     1     2     1 
## 10530  7211  3105   563  1848  8526  3171  4297  8591  8782  5836  1396  7956 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  2264 10473  7649  4654  2853  2000 10281  4089  1085  9803  9631  4563 12045 
##     1     2     1     1     1     1     2     1     1     2     1     1     1 
## 10411  9048  5082  4467  5239 10116  3639 10608  6883  5325  3847  2844  8729 
##     1     1     1     1     1     1     1     1     2     1     1     1     1 
##  2534  7119  4612  3364 11178 11441  3779  7448  5736  4943  5244  2577  7892 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## 11653  8474 12123  6290 11487 10354 11284  9195  2995   279  6342  3062  6722 
##     1     1     2     1     1     1     1     1     1     1     1     1     1 
##  7777  8457  3595 11596  4761  4264  2218   386  1429  6502 11346  8702  9427 
##     1     1     1     1     1     2     1     1     1     1     2     2     1 
##  7011  4414 10809  2233  6106  5105  2660  9101   905  5424   234  1580  2146 
##     2     1     1     1     1     1     1     1     1     1     2     2     1 
##  7080  7718  7278  8411  9962  1563 11043  8761 12061  2065  2350 11793  3259 
##     2     1     1     1     1     1     1     1     1     2     1     1     1 
##  3778   470  6582 10032  4318 12281  8065 12166 11550  2219  4131  2141  3118 
##     1     1     1     2     1     1     1     1     2     1     2     1     1 
## 10222  2654     5  1691  6501  2118  3059  1642  4941  9154  5852 10272  4890 
##     2     1     1     1     1     1     1     1     1     1     1     1     2 
##  3097  6585  7741  9213  2976  3023 10102 12159  8213   249  3715 11042  4855 
##     1     1     1     1     1     1     2     1     2     2     1     1     2 
##  5068  5765  1203 11671  8400   378  1985 12055 10852  2427  7669  3301  1743 
##     2     1     2     1     1     1     1     1     1     1     1     1     1 
##  9302  5809  1907  4596  8994  5510  6996  7176  5311 11357 10357  5112 11888 
##     1     1     1     1     2     2     1     1     1     1     1     2     1 
##   736  1113  6496   708  8225    68 11489  1524  9273 10763  5866  8336  5624 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  3050  2676  4688 12135  2474  3550  8898  9116  4224  8720  9121  2119  6233 
##     1     1     1     1     2     1     1     2     1     1     1     1     1 
##  9419   126 10157  7484 11083 11217  3574  9972   403  4519  4585  4745  2779 
##     1     1     1     1     1     1     1     2     2     1     1     1     1 
##  7625  4965 11021   487 12218   661  3825 11614  8652  6270  8198  9110  1747 
##     1     1     1     1     1     1     1     1     1     1     2     1     1 
##  5906 12152  2947  2405  2820  5863  8821 10712  4871    11  5467  5498  9576 
##     1     1     1     1     1     1     2     1     2     1     1     2     1 
##  9183  8735  1693  8472 11249  3591  9359  3732  9186 11289  7284  5552 12161 
##     2     1     1     1     1     1     1     1     1     1     1     1     1 
##  5342  9149  8012 11794  4188  7079  1605 10014  6751  8160  8001  1427   634 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  6562  3360  6989 10240 11655  5674 12185  5243  4729  7430  3598  9459  7825 
##     1     2     1     1     1     1     2     1     1     1     1     2     2 
## 11465  1258  4016  7370  9103  6135  1420  9509 12097  7159  7819 10999  1601 
##     1     1     1     1     2     1     1     1     2     1     1     1     1 
##  8796 12310 10062 10236 10060   587  5189 10914  6718  4886  4359  4808 11982 
##     1     1     1     1     1     1     2     1     1     1     2     1     1 
##   855  1052  5737  4753  4653 10793  2761  3406  5950  2633  4664  2601  3420 
##     1     2     1     1     1     1     1     1     1     2     1     1     1 
##  9125  5920  5745  6159  2097  6767  4804  9021  3272  6457  7087  4003  2905 
##     2     1     1     1     1     1     1     1     2     1     2     1     1 
##  1999  6406  9819  6429  8501  8434   130  7961  3018  1661  4887  6008  5877 
##     1     1     2     1     1     1     1     1     1     1     1     1     1 
##  4044  3208 11681  1839  1816  5962  4134  8554  9725  2904  3173  4354  6469 
##     1     1     1     1     2     1     1     1     1     1     2     2     1 
##  2202  9385  5358  1309  1195  1306   584  5536  9384  2216 11933  7498  5127 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
## 10057   792  5937  6831  4672 10972  9538 12004  4227  1673  7522  4919  3926 
##     2     1     1     1     1     1     2     1     2     1     1     1     1 
##  9180  3090 10589  2910   106  8886  7320   559  5132  7497  5677  8547  6633 
##     1     1     1     1     1     2     2     1     1     1     1     1     1 
##  9697  8247  5230 10962  7214  9188 10289 11567  1586   856   815  8458  8149 
##     1     1     1     1     1     1     2     1     1     2     1     1     1 
##  2825 10792 10464  6427  5465 11304 11256 11934  4561  9430  1556    79   920 
##     2     1     2     1     2     1     2     1     1     1     1     1     1 
##  2629  1450  8584  2964 10550 10517   307 10734 11436   108  8133  8060  7424 
##     1     1     1     1     1     1     1     1     1     1     1     1     1 
##  6278  9230  9434  1384 11251  4903 11903  9529  6182  2752  8094  5629  1014 
##     2     2     1     1     2     2     1     1     1     1     2     1     1 
##  1232  8355  2613 10377  3993  5739  3089  2729  6054   729  5711  8927  2376 
##     2     1     1     1     1     1     1     1     2     1     1     1     1 
##  1414  9038 10469  2078  8487   592  1541  5645 10041 11092  2513 10172 11375 
##     1     1     1     1     2     1     1     1     1     1     1     1     1 
##  2454  5975  2659  8407  8778  2226  1784  6091  4896 12301 11758 12071  3780 
##     1     1     1     2     2     1     1     1     1     1     1     1     1 
##  6023 11070  3438  9759  7739  8876  7048 10595  9390   258  5007  7548 11986 
##     2     1     1     1     1     1     1     1     2     2     1     1     1 
##  5723  6070 12085  5789  6274  8771  9199   460 12173  1501 
##     1     1     1     1     2     2     1     1     2     1 
## 
## Within cluster sum of squares by cluster:
## [1] 14094.32 10041.62
##  (between_SS / total_SS =  19.5 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

Challenging solution - Hierarchical Clustering

The solution obtained by employing kmeans clustering is challenged bye employing hierarchical clustering. Kmeans clustering only works for continuous data and the data set had 8 numeric variables which were all unused. Hierarchical clustering works for both continuous and categorical variables. However here we use Gower distances partitioned around medoids and silhouette width

Calculating Distance - Gower distnace

A popular choice for clustering is Euclidean distance. However, Euclidean distance is only valid for continuous variables, and thus is not applicable here. In order for a clustering algorithm to yield sensible results, we have to use a distance metric that can handle mixed data types. In this case, we will use something called Gower distance.

# Removing the revenue column prior clustering

gower_dist <- daisy(customer_dataset1[1:17],
                    metric = "gower",
                    type = list(logratio = 3))

# Checking attributes to ensure the correct methods are being used
# (I = interval, N = nominal)
#
summary(gower_dist)
## 4498500 dissimilarities, summarized :
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000252 0.2621000 0.3222500 0.3185400 0.3804800 0.7105000 
## Metric :  mixed ;  Types = I, I, I, I, I, I, I, I, I, I, N, N, N, N, N, N, N 
## Number of objects : 3000
# Printing to see which pairs are most similar and which are least
# similar to see whether it makes any sense
#
gower_mat <- as.matrix(gower_dist)

# Output most similar pair
#
customer_dataset1[ which(gower_mat == min(gower_mat[gower_mat != min(gower_mat)]), arr.ind = TRUE)[1, ], ]
##      administrative administrative_duration informational
## 395               0                       0             0
## 1320              0                       0             0
##      informational_duration productrelated productrelated_duration bouncerates
## 395                       0              3                   103.0           0
## 1320                      0              3                    96.5           0
##       exitrates pagevalues SpecialDay month operatingsystems browser region
## 395  0.02222222          0          0   Mar                1       1      1
## 1320 0.02222222          0          0   Mar                1       1      1
##      traffictype       visitortype weekend revenue
## 395            3 Returning_Visitor    TRUE   FALSE
## 1320           3 Returning_Visitor    TRUE   FALSE

The record with index ID 3607, 5171 are the most similar. comparing the data in the target variables this seems plausible

# finding most dissimilar outputs
#
customer_dataset1[
  which(gower_mat == max(gower_mat[gower_mat != max(gower_mat)]),
        arr.ind = TRUE)[1, ], ]
##      administrative administrative_duration informational
## 3871              0                   0.000             0
## 1106             15                1011.361             2
##      informational_duration productrelated productrelated_duration bouncerates
## 3871                    0.0              1                   0.000         0.2
## 1106                  171.5             54                1405.131         0.0
##        exitrates pagevalues SpecialDay month operatingsystems browser region
## 3871 0.200000000    0.00000        0.8   May                4       1      1
## 1106 0.006127451   37.41814        0.0   Mar                3       2      3
##      traffictype       visitortype weekend revenue
## 3871           4 Returning_Visitor   FALSE   FALSE
## 1106           2       New_Visitor    TRUE   FALSE

The record with index ID 3871 and 10641 are the most dissimilar and comparing the data in the target variables this seems plausible

Choosing a clustering algorithm

We shall employ partitioning around medoids (PAM) in this step. This is because;

It is Easy to understand, more robust to noise and outliers when compared to k-means, and has the added benefit of having an observation serve as the exemplar for each cluster

# Calculate silhouette width for many k using PAM
#
sil_width <- c(NA)

# We shall impute with values ranging from 2 to 10
for(i in 2:10){
  pam_fit <- pam(gower_dist,
                 diss = TRUE,
                 k = i)
  sil_width[i] <- pam_fit$silinfo$avg.width
  
}

# Plot silhouette width (higher is better)

plot(1:10, sil_width,
     xlab = "Number of clusters",
     ylab = "Silhouette Width")
lines(1:10, sil_width)

From the silhouette the highest point is K = 2, that is our optimum k value.

Model interpretation

Descriptive statistics

From the plot the optimum value of k was found to be 2. After running the algorithm and selecting two clusters, we can interpret the clusters by running summary on each cluster

# Model clustering the customers
#
cust_cl <- pam(gower_dist, diss = TRUE, k = 2)

# Results of clustering
#
customer_results <- customer_dataset1[1:17] %>%
  mutate(cluster = cust_cl$clustering) %>%
  group_by(cluster) %>%
  do(the_summary = summary(.))

# summary of he results of clustering process
#
customer_results$the_summary
## [[1]]
##  administrative   administrative_duration informational   
##  Min.   : 0.000   Min.   :  -1.0          Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:   0.0          1st Qu.:0.0000  
##  Median : 2.000   Median :  45.0          Median :0.0000  
##  Mean   : 2.894   Mean   : 115.2          Mean   :0.6675  
##  3rd Qu.: 5.000   3rd Qu.: 158.0          3rd Qu.:1.0000  
##  Max.   :22.000   Max.   :1922.0          Max.   :9.0000  
##                                                           
##  informational_duration productrelated   productrelated_duration
##  Min.   :  -1.0         Min.   :  0.00   Min.   :   -1          
##  1st Qu.:   0.0         1st Qu.:  8.00   1st Qu.:  227          
##  Median :   0.0         Median : 18.00   Median :  630          
##  Mean   :  45.5         Mean   : 30.67   Mean   : 1149          
##  3rd Qu.:   0.0         3rd Qu.: 39.00   3rd Qu.: 1459          
##  Max.   :2195.3         Max.   :397.00   Max.   :11940          
##                                                                 
##   bouncerates         exitrates         pagevalues        SpecialDay     
##  Min.   :0.000000   Min.   :0.00000   Min.   :  0.000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.01339   1st Qu.:  0.000   1st Qu.:0.00000  
##  Median :0.004396   Median :0.02302   Median :  0.000   Median :0.00000  
##  Mean   :0.021542   Mean   :0.03891   Mean   :  6.392   Mean   :0.03088  
##  3rd Qu.:0.016667   3rd Qu.:0.04000   3rd Qu.:  0.000   3rd Qu.:0.00000  
##  Max.   :0.200000   Max.   :0.20000   Max.   :255.569   Max.   :1.00000  
##                                                                          
##      month     operatingsystems    browser        region     traffictype 
##  Nov    :310   1      :608      1      :576   1      :305   2      :290  
##  May    :134   3      : 98      2      :146   3      :186   3      :167  
##  Mar    :107   4      : 51      8      : 28   4      : 71   1      :131  
##  Dec    :104   2      : 34      4      : 19   2      : 68   4      : 55  
##  Oct    : 41   8      : 10      5      : 12   6      : 54   10     : 45  
##  Jul    : 32   5      :  1      6      :  7   7      : 35   8      : 35  
##  (Other): 75   (Other):  1      (Other): 15   (Other): 84   (Other): 80  
##             visitortype   weekend       cluster 
##  New_Visitor      :138   FALSE:576   Min.   :1  
##  Other            :  9   TRUE :227   1st Qu.:1  
##  Returning_Visitor:656               Median :1  
##                                      Mean   :1  
##                                      3rd Qu.:1  
##                                      Max.   :1  
##                                                 
## 
## [[2]]
##  administrative   administrative_duration informational    
##  Min.   : 0.000   Min.   :  -1.0          Min.   : 0.0000  
##  1st Qu.: 0.000   1st Qu.:   0.0          1st Qu.: 0.0000  
##  Median : 1.000   Median :   0.0          Median : 0.0000  
##  Mean   : 2.224   Mean   :  74.9          Mean   : 0.4684  
##  3rd Qu.: 3.000   3rd Qu.:  81.0          3rd Qu.: 0.0000  
##  Max.   :24.000   Max.   :1764.0          Max.   :12.0000  
##                                                            
##  informational_duration productrelated   productrelated_duration
##  Min.   :  -1.0         Min.   :  0.00   Min.   :   -1.0        
##  1st Qu.:   0.0         1st Qu.:  8.00   1st Qu.:  192.9        
##  Median :   0.0         Median : 19.00   Median :  618.2        
##  Mean   :  30.6         Mean   : 31.84   Mean   : 1194.9        
##  3rd Qu.:   0.0         3rd Qu.: 38.00   3rd Qu.: 1436.5        
##  Max.   :1830.5         Max.   :391.00   Max.   :16093.3        
##                                                                 
##   bouncerates         exitrates         pagevalues        SpecialDay     
##  Min.   :0.000000   Min.   :0.00000   Min.   :  0.000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.01417   1st Qu.:  0.000   1st Qu.:0.00000  
##  Median :0.002597   Median :0.02500   Median :  0.000   Median :0.00000  
##  Mean   :0.018846   Mean   :0.04108   Mean   :  6.056   Mean   :0.06591  
##  3rd Qu.:0.015541   3rd Qu.:0.05000   3rd Qu.:  0.000   3rd Qu.:0.00000  
##  Max.   :0.200000   Max.   :0.20000   Max.   :239.980   Max.   :1.00000  
##                                                                          
##      month     operatingsystems    browser         region     traffictype 
##  May    :682   2      :1562     2      :1822   1      :843   2      :673  
##  Nov    :424   3      : 544     4      : 148   3      :421   1      :452  
##  Mar    :356   4      :  48     5      :  92   4      :208   3      :339  
##  Dec    :309   1      :  27     10     :  40   2      :203   4      :214  
##  Sep    :100   8      :  11     6      :  31   7      :146   13     :161  
##  Jul    : 86   6      :   4     3      :  26   6      :144   6      : 98  
##  (Other):240   (Other):   1     (Other):  38   (Other):232   (Other):260  
##             visitortype    weekend        cluster 
##  New_Visitor      : 273   FALSE:1711   Min.   :2  
##  Other            :   8   TRUE : 486   1st Qu.:2  
##  Returning_Visitor:1916                Median :2  
##                                        Mean   :2  
##                                        3rd Qu.:2  
##                                        Max.   :2  
## 

There are data set has been split into two. One cluster is the group where when they visit the sight there is revenue earned by the company. The other group is those visits that don’t earn the company any money False(no money earned) - 1 True(money earned) - 2

# medoids of customer dataset
#
customer_dataset1[cust_cl$medoids, ]
##      administrative administrative_duration informational
## 6101              2                   38.04             0
## 4165              0                    0.00             0
##      informational_duration productrelated productrelated_duration bouncerates
## 6101                      0              8                  886.57           0
## 4165                      0              7                   70.00           0
##       exitrates pagevalues SpecialDay month operatingsystems browser region
## 6101 0.03076923          0          0   Nov                1       1      1
## 4165 0.02857143          0          0   May                2       2      1
##      traffictype       visitortype weekend revenue
## 6101           2 Returning_Visitor   FALSE   FALSE
## 4165           2 Returning_Visitor   FALSE   FALSE

From the cluster analysis we can see that false is the medoid of browser and operating system 2. while True is the medoid for operating system 1 and browser 1.

Visualization

This shall be performed by employing, t-distributed stochastic neighborhood embedding, or t-SNE. This method is a dimension reduction technique that tries to preserve local structure so as to make clusters visible in a 2D or 3D visualization.

It has the ability to handle a custom distance metric like the one we created above

# Defining plot
#
tsne_1 <- Rtsne(gower_dist, is_distance = TRUE)

# plotting the clustered data
#
tsne_data <- tsne_1$Y %>%
  data.frame() %>%
  setNames(c("X", "Y")) %>%
  mutate(cluster = factor(cust_cl$clustering),
         name = customer_dataset1[1:17])
# ggplot
#
ggplot(aes(x = X, y = Y), data = tsne_data) +
  geom_point(aes(color = cluster))

From the plot above we can see there two clusters. They are not that distinctive though.

summary of the model

# printing the summary of the model
#
print(cust_cl)
## Medoids:
##      ID           
## [1,] "771"  "6101"
## [2,] "1681" "4165"
## Clustering vector:
## 12046  1969  9329 10233 11320  3863   928  8622  4888 10545  3679  2669  5174 
##     1     1     2     1     2     2     2     2     2     2     2     1     2 
##   137  9001  4422  7051  4313  8477  9857  5535  1989 11753 12222 11680 11051 
##     2     2     2     1     2     2     2     2     1     2     2     2     1 
##  7434   381  2767 10280  1005  6503 11275  8525  2786  9506  6924  4476 11025 
##     1     1     2     1     2     2     2     2     2     1     2     2     1 
##  4063    69   950   154  7571  3880  6045  1686  7480  2456  8933  6770  7027 
##     2     2     1     2     2     2     1     2     2     2     1     1     2 
##   845   796   245  5574  4858  5597  3915  8017 11098 10444  4807 11620  9994 
##     2     2     2     1     2     2     2     2     2     2     2     1     2 
## 11460  7579 11923   351  1034  8119  5699  8244  5902  8504  5583 10876  5470 
##     1     2     2     2     2     2     2     1     2     2     2     1     2 
##  7744  8939  5960 11468 11919  1701   316  8383  3418  1646  6569  9408 10404 
##     2     1     2     1     2     2     1     2     2     2     1     2     2 
##  2591  6438 10373 10000  9625  3921 11293  4465  1205   982  2906  3060  6248 
##     2     2     2     1     1     2     2     2     2     1     2     2     1 
##  1608 10727  3013  9057  5987  6419 10695 12206  5124  7256  6149 11770  9640 
##     2     1     2     2     1     1     2     1     2     2     2     1     1 
## 11873  3527  1360  3565  2005  3279  3029 10285  5792   453  1953  8615  1980 
##     2     2     2     1     2     2     2     2     2     2     2     2     1 
##   820  5274  8619  6483  9935 11503  5750  5344 11219 11027  9469  2256  4363 
##     2     1     2     1     2     2     2     2     2     1     2     2     2 
## 10917  5336  4570  8929  7869  3577  1330  5120  3604 10420  3959  3742  5799 
##     1     2     2     2     2     2     1     1     2     1     2     2     1 
##  5081 10356   222 10979 10388 11250  8323  9411  8841  9806  8056 11371 10268 
##     2     2     2     2     2     2     2     2     1     2     2     2     2 
##  4703  6113  4523  9264  7146  7313 10196  4822  6827  3597 11555  9970  2635 
##     1     2     1     2     2     2     2     2     2     2     2     1     2 
##  1949  7971   397  1366   436  7906 11709  6241 11554  6525  5893   868  9726 
##     2     1     2     2     2     2     1     2     2     2     2     1     2 
##  7688  4880 11273  5234  8572  2687  1522  7568  2068  4892  4581  2893  4571 
##     2     2     2     2     1     2     2     1     2     2     2     2     2 
##  7095  3030  5914 12215  6898  1175 11257  6473  2617  4319  2693 11111  9704 
##     2     2     2     2     2     2     2     1     2     2     2     1     2 
##  3177 11462  3919 11704  3317   787  5188  1403  8345 10322  8613  6940 10085 
##     2     2     1     2     2     2     2     2     2     1     2     2     2 
## 10555  6939  1163  4307  8835  4200  4968  6306  6351  9312 10949  7824   860 
##     1     2     1     1     2     2     2     1     2     1     2     1     2 
##  8513  1051 12217  9306 10642  8865    70   619  6604 10931  3969  6771  6949 
##     2     2     2     2     2     1     2     1     1     1     1     2     2 
##  6983  2574  5607  4014  8559  9200   985  4901  2569  6047  7308  9122 10799 
##     2     1     2     1     2     1     2     1     2     2     2     2     1 
##  3061 10733 10887 11213  7749  3854 11469  3278  6970  8607 10035 10329 10161 
##     2     1     1     2     2     2     2     2     2     2     2     1     2 
##  1018  4262  2880 10251  7285  6163  6356   437  3672  5538 11781 10132  5985 
##     2     2     1     2     2     1     2     2     2     1     1     1     2 
##  1365  6302   966  5088 11242  4109   689  5898   444  4944  7539  9124  5453 
##     2     2     1     2     2     2     2     1     1     2     2     2     2 
##  2529  2757  3168  6275  3523 12238  8707  2055 11236   486  6750   526  9758 
##     1     1     2     2     2     2     2     2     1     2     1     1     2 
##  1565 10174   435  4861  9801 11790 10433  6715  1407   810   864    97 10429 
##     1     2     2     2     2     2     2     2     2     2     2     1     1 
##  5222  6416  9072  9871  1623  5974  4342  6009  3512  9044  4728  6901   945 
##     2     2     2     2     2     1     2     2     2     1     2     2     2 
##  5859  6784  2316  6557  7783  7261 11313  7021  7936  8240  7243   254 11129 
##     2     1     2     2     1     1     2     2     1     1     2     2     1 
##  8245  7179   266   105  1320  6343  4626  9507  7658  9601 10133  3887  7613 
##     1     2     2     1     1     2     2     2     2     2     2     1     1 
##  5804  5283  2178   935 11708  7690  5198 11595  1286  6048   462  5474  2648 
##     2     2     2     1     1     2     1     2     1     1     2     1     1 
##  2760  5037 11490  3751  9731 10606  9402 11807  8509  6987  5320 12294  9788 
##     1     2     2     2     1     2     2     2     2     2     2     2     2 
##  8121 11186   632  5980  6911   681  3773  3111  6657   162 11962  7482   639 
##     1     1     2     2     2     2     2     2     2     1     1     2     2 
##  9113   455   947  8127  1270  1152  3759  2568  6352 11271 10525  1507  7169 
##     1     2     2     1     2     1     2     2     1     1     2     2     1 
##  1782 11149  8616 11953  4707  9208  2375  8192  7852  6017  8033  5191  6157 
##     2     1     2     2     2     2     2     2     1     1     1     2     1 
##  3227  5564  8920  4195  9892  6536  7122  9680  6134  6614  3629 10777  6378 
##     2     2     2     2     2     1     2     2     2     1     2     2     2 
## 11081  2585  9634  9664  3136  1155  9228  8784  7981   687 11463   624 10868 
##     1     2     2     2     2     1     2     2     1     2     2     2     2 
##  1470 11662  8975   148  2677  6662 11832 10275 10614  1822  1946 10986  5512 
##     2     2     1     2     2     2     2     1     2     2     2     2     1 
##  8232  7947  2013   367  5824  5544  7755  2955  2232  9223  1101 10678  3746 
##     2     1     2     2     2     1     2     2     2     2     2     1     2 
##  7980  7564  7108    47  1178 10245  5040 10460  6663  1125 11854   500  6829 
##     1     2     2     1     2     1     1     2     2     2     1     2     2 
##  2847  1662  3803  9346  9135  1512   554 12014   745  2868  8103  5260  8680 
##     2     2     2     1     1     2     1     2     2     2     1     2     2 
##   958 12276  9309   770  3731  6271  3098  4503  8884 10707 10217  6702  4012 
##     1     2     2     1     2     1     2     2     2     1     2     1     2 
##   198  7259  1140  1042   951  8952  8896  3172   178  5847  2991  6370  2714 
##     2     2     1     1     2     2     1     1     1     2     2     2     2 
## 12150  4335 11683  8903  1998 10865  3457  5953     6 10645  9530  1253  7385 
##     2     2     1     2     2     1     2     2     2     2     2     2     2 
##  7220   993 11055 10215  3065  9693  7500  6040  7163   961   541  1943  1484 
##     1     2     1     2     2     1     2     1     1     2     2     2     2 
##  8658  4805  3789  5530  8484 11303  5814   751  9799   743  1422  6927 11604 
##     2     2     2     1     2     2     1     2     1     2     2     2     1 
##  1187 11109  3145   350  9276  5744 10810    96   715  9512  1815   466  9930 
##     1     1     2     2     1     1     2     1     2     2     2     2     2 
##   344 12067 11506  1489  1479  2484  8312  5182  8227  7291  6762  1417   969 
##     1     2     2     2     2     2     2     2     1     1     1     2     1 
##  2386  4667  1670  2252  2053  7933   718  1900  2195   901  6035  2864  2793 
##     2     2     2     2     2     1     2     1     2     2     1     2     1 
## 10671  5192 10326  5349    36  5820  6130  6485 12229  7012  3757  6851 11325 
##     2     2     2     2     2     2     1     2     2     1     2     1     1 
##  2547  6916  9686 11710  3071  7485  4039  5706 10315 11729  8597 12003   176 
##     2     2     1     2     2     1     2     2     1     1     1     2     1 
##  5952  4181  1531 10870   528  1361 10925 10719  7518  3405 11602  9398  1404 
##     2     2     2     2     2     2     2     2     2     2     1     2     1 
## 10838 11668 12180  8858  4326  2570  8205  2168 11742  5628   345  1184  8957 
##     2     2     2     2     2     2     2     2     2     1     2     1     1 
##  5514  1780 12205  8560  2420  8406  5746 11412  9932   451  4952  3540  9455 
##     1     2     2     1     2     1     2     1     2     2     2     2     2 
## 10028  8931  7595  1307  8992   922  7245  2517  4845  8836  4848  7089  1260 
##     2     2     1     2     1     1     2     1     2     2     2     2     2 
##  3315  6971 10874  7230 11610 10964   134   494 11315  3558  4371  5054 11194 
##     2     2     1     2     2     1     2     2     2     2     2     1     1 
##   575  6706  2548 10447   420  6943  4192  3460  8586  1332  3547  8167  2579 
##     2     2     1     1     2     2     2     2     1     1     1     2     1 
##  1542  5394 10802  3705  7816  4068  4448  3218  5762 11868  3157  5655 12033 
##     2     2     1     1     1     2     2     2     2     2     2     2     2 
##  7173  1899  2403  5377  7297 12247 10350   754  2512  3997 11395   439  2126 
##     2     2     2     2     2     2     1     1     2     2     2     2     2 
##  7150  1040  1378  6908 11537   359     8  9478  8942  4925   264  7566  8182 
##     1     2     2     2     2     2     2     2     2     1     2     2     2 
##    38  9143  2919  6101  5742  3230  8795  3035  4261  4242 10520  6632  6963 
##     2     2     1     1     2     2     1     2     2     2     2     2     2 
##  3217   241   139    17  8649  9617  6925  4436  7511  7988 11135  1935  8446 
##     2     2     2     1     1     2     1     2     2     1     2     2     2 
## 11227  6146 10204  7782  3835  8990  3579  8639  9913  6768  5846  2249  2231 
##     1     2     2     2     2     2     2     2     1     2     2     2     1 
##  2037  3845 11106  4377  2444  6671 11965  9144  4435  5110  4640  5636  7846 
##     1     2     1     1     1     2     1     2     2     2     2     2     2 
##   336  6331  8861  4597  8696  1554  4949   738  6098 11088  6099  5381  5195 
##     2     2     2     2     1     2     2     1     2     2     2     2     2 
##  1387  7203  5539  8152  2390  5701  1804 10419 10005   291  9573  1261  7638 
##     2     2     2     1     2     2     2     2     1     2     2     2     1 
## 11305  1461  2941 10221   261 12136  4015  3131 12001  3344  9709  2114  7302 
##     2     2     2     1     1     2     2     2     2     2     1     2     2 
##  1150  5936   196  3347 10070  8358 11038 12115  8965 11533  3602 10619  4127 
##     2     2     1     1     2     2     2     1     2     1     2     2     2 
##  7546  3675  1636  8098 11898  5329  6759  9738  6811  6262 10361   520 11974 
##     2     2     2     2     1     2     1     1     2     2     1     2     2 
##  7662 10323  5522  7812  6170  3348  5288  2372  8396  5380  4251 10194  1795 
##     2     2     2     2     2     2     2     2     2     2     2     2     2 
##  8612  8941  7490  9335  2865  4223  6171 11252  9315  1818 10836  9303 11822 
##     2     2     2     1     2     2     2     2     1     2     2     2     1 
##  4382  9136  1059  9249  3711  7907  2253  4163  2498 10061 11033  2663  6926 
##     2     2     1     1     2     2     2     2     2     2     2     2     1 
##  4425  8475 10968  9963  5541  2452 10023 11013   834  2234  8863  6125 11638 
##     2     2     1     1     1     1     2     2     2     2     1     1     2 
##  3439  2152  7601 12007  5078  5513 11120  2567  2644 10055 10334  2429  6929 
##     2     2     2     1     2     1     1     2     2     2     2     2     2 
##   405  7801  8678 12323  5791  1727  4115  4074 11432  2182  9400  6246  7019 
##     2     2     2     2     1     1     2     2     2     1     2     2     2 
##  6803  8356  8986  6612  1576  8948  2653  1720  7162  8731  5944  9196  7399 
##     2     2     2     1     2     2     1     2     1     2     2     1     1 
##  5107  6875 11641  6855  4033  3839  8538  5491  4009  2289  5948  9837  5224 
##     1     2     2     1     2     2     2     2     2     2     2     1     2 
## 10539 11722  1056 11356 11197   683  3637   652  9527  2321 10909    88   932 
##     2     2     2     2     2     2     2     2     1     2     2     1     2 
##  9690  6959 12201  9846  4430 11538  7242   695  4303  6155   991  4991  9192 
##     2     2     2     2     2     2     2     2     2     2     2     2     1 
##  7205 11054  3697  5579 10378  8079  5768  8155 10670  9104  3494 12158  9701 
##     2     1     2     2     2     1     1     2     2     1     2     1     2 
##  2758  2937  5912  6088  2745  8258  5202 10687  8792  2931 11684  4036  3939 
##     1     2     2     2     2     2     2     2     2     2     2     2     2 
##  4453  6886  1381  7631  6572  3582  1637  7990 10097   323  3932  8463   914 
##     2     2     2     1     2     2     2     2     2     2     2     2     2 
## 10237 10742  4423  8185  3015  4388 11529  9132  1083  5278  5480 11716   693 
##     2     2     2     2     2     2     1     2     2     2     1     2     2 
##  7268  4988  2925 10732  2076 12261   534  4967  1337  3206 11660  1710  3908 
##     1     2     2     2     2     1     2     2     2     2     2     2     2 
## 10769   558  1837  6383  6204  7681  4616  3980    63  9940 11382  5993 10230 
##     1     2     1     2     2     1     2     2     1     1     1     2     2 
##  5216  2495  1157  7558  7190  9551  7244  2597  7920  5225  9752  9958  6783 
##     2     2     1     1     1     2     2     2     2     2     1     1     1 
## 10804  3489  8235  5621  7171 11108 10541  3292 11339  4394  6800  2260  1960 
##     1     2     2     2     2     2     2     2     2     2     1     2     2 
##   450  7217  9674  8853 11895  8350  3998 11052 11780  8688 11341  7573 11586 
##     1     1     1     1     2     2     2     2     1     2     2     2     2 
## 10189  8010  6222 11163  6780 12134  7157 11408  1106 11744  1622  2211  6309 
##     1     1     1     2     2     2     1     2     2     2     2     1     1 
##  5543  3078  9250  5245  1885 10813 10129 11228  2792   389  4193  2606  5362 
##     2     2     1     2     2     1     1     2     2     2     1     2     2 
##  7113  9043  9611 10150  4152  6073 10616 10811  9872  2158  6985  8888    34 
##     1     2     2     2     2     2     2     1     1     2     2     1     1 
##  7321 11887  8685 12174 10648  4840  5611 10181  6754     4  3101   510   858 
##     1     2     2     2     2     1     1     1     1     2     2     2     2 
##  6686 11880 12162   111  3906 12125  3524 12098  8320 11439 10546 10518   628 
##     1     2     2     2     1     2     2     2     1     1     2     1     2 
## 10252 10428  7837 11064 10083  6745  6775  3388  6713 10224 11674  2973  1068 
##     1     2     2     2     1     2     2     2     2     2     2     2     2 
##  9678  4577  3607  3178  1495 10911  6141 10040  2854 11943 12227 11087  1846 
##     2     1     2     1     2     2     2     1     2     2     2     2     2 
##  5693  7029 11658  4351  4627 10790  1105  4434 11706   946  3495  5332  9026 
##     1     1     2     2     2     2     2     2     2     2     2     2     2 
## 10193   726  8264  6335  6615  3367 12207  8880  3954   921  7628  3739 11201 
##     1     2     1     2     2     2     2     1     1     2     2     1     2 
##  5767   529 10158   416  2636  3093  9147 12060  5560  5848  4386  2712 11122 
##     2     2     2     2     2     2     2     1     2     2     2     1     2 
##  6645   611  3319  9946  7426  1715  2371   235  8055  1442  7652  2319  7389 
##     2     2     2     2     1     2     2     2     2     2     2     2     2 
##  8495 10091  7815  3655  7408  2913 11053  5475  2778 11212  6043  3740  9923 
##     2     2     2     2     2     1     2     2     2     1     2     2     1 
## 12219 10487  7882  3156 10741  4638  1888 10117  4488  2824   696   827  3532 
##     1     2     2     1     2     2     2     2     2     2     2     2     2 
## 11401  7923  7969  8704  6998   257  5728  4379  1728  6180  7411  3764  3633 
##     2     1     1     2     2     2     2     2     2     2     2     2     2 
## 11223  5158  7368  1423  5803  7472   540  6355  5776  2706 12101 11014   430 
##     2     2     2     2     2     1     2     2     1     2     1     2     2 
##  3257  6185  7168  5724  5449  9471  3894 11989   702  4295  5293  7916 11992 
##     2     2     2     2     2     2     2     2     2     2     2     1     2 
##  2177  1930 10943  7645 11007   195 11848 10620  2313  9269  9845 10033  1091 
##     2     2     2     2     1     1     2     1     2     2     2     2     2 
##  9591  3726 11984  1345  8638  3844  6404  1800   414 11302   647   731   626 
##     2     1     2     1     2     1     2     2     2     1     2     2     2 
##  6822  1400 10024  6465  3750   262  4030  4710  8533   395  1316  6462  5145 
##     2     2     2     2     2     2     2     2     2     1     2     2     2 
##  7294 10652  6507  6986  2036  7857   881  7264  9548   882  7632 11628  8518 
##     2     2     2     2     2     2     2     1     2     2     1     2     2 
## 10627  5031 11635  8137  9423  7379   180  3149  2002  1063  7550  6590  6885 
##     2     1     1     2     2     1     1     2     2     2     2     2     1 
## 10849   109 10020  1239 11585  2506 10018  5520  4302  5505  6322  6177  8972 
##     2     2     2     1     2     2     1     1     2     1     2     2     2 
##  1336 10871  9275  7867  6110 11820  9295   706  4531  1288  5563 11159  2884 
##     2     2     2     2     1     1     1     2     2     2     2     2     2 
##  2171  6773  8347  7395  8983 10248  9736  8859  7069  9383  3895 12289  5058 
##     2     1     1     1     2     1     2     2     2     1     2     1     2 
##  8847  5891  8921  9277  4542  7667  9934  5266  9019  5632  6816  3704  8962 
##     2     2     2     2     2     1     1     2     1     2     2     2     2 
##  8479  4722   588  4419 10928  4123  3938  2532  3768  2655  6669  1879  4789 
##     2     2     2     2     2     1     2     2     2     2     1     1     2 
##  8369 11640   429  9549  2603  3379  9242  6175  7396  7768  5933   295 10534 
##     1     2     2     1     2     2     2     2     1     2     2     2     1 
##  2794   268  8593  5832  2609  2212  5138  1446  9656 10676 10965 11190  1596 
##     2     2     1     1     1     2     2     1     2     2     2     2     2 
##  6379   844  6942 10990  7189  3865  1444  5823  7272  7286  3244  6025 12248 
##     1     2     2     2     2     2     2     2     2     2     2     2     2 
## 10101  3202 11233  7137  8004  6894 10916  2840  3440  5589  2907  7471 10239 
##     2     2     1     1     2     2     1     2     2     2     2     2     2 
##  8300 11738  4035  2946 11835  8129  9238  3031  2837 11948  9489  1238 10348 
##     2     2     1     2     1     2     2     2     1     1     1     2     2 
##  5154 12314 10491   711  4906  8359  5270  6589  3222   385  6792  6703  5261 
##     2     2     2     1     2     2     2     1     2     1     2     1     2 
## 10095  8100  8860  5469  5375  8999  8738 11619 11017   655  4962 12024  9405 
##     2     2     2     2     2     1     2     2     2     2     2     2     2 
##   949  6071  1882  5315  3871  5094  4739  7940  1678  2282 10853  4493 10321 
##     2     1     1     2     2     2     1     2     1     2     2     2     1 
##  2414  6849   742  4834  4782  2607 11755  7148  3720  6167  6078  4982  6202 
##     2     1     2     2     2     1     1     2     2     2     2     2     2 
##  7549  4162  1667  1406  1162  4689  7702  6372  9521 11237  6190   253 11327 
##     2     2     1     2     2     2     1     1     2     2     1     1     2 
##  2254  2466 11199  4247  6637  4345  6861  5429  5576   659  7222  6281 10956 
##     2     2     2     2     2     2     2     2     2     2     2     2     1 
##  3567   173  2781  1313  6972  1099  3012  7492 12216  9331 11137 12105  1692 
##     1     1     2     2     2     1     2     2     1     2     1     1     2 
##  4520  5087 11045  5561  2856  9866  7995  7034  7898  6187  4624  6858  6809 
##     2     2     1     2     1     2     2     1     2     2     2     2     2 
## 12041  1833  2128  8837  2900 10007  1763  2191  8623  3322  5111  3190  3295 
##     1     2     1     1     2     2     2     2     2     2     2     2     2 
##  8052 10660  8132   452  1235    41  1549   862  6636 10343  4214  4160  9496 
##     2     2     2     1     1     2     1     2     2     2     2     2     1 
##  4072  6874   125 11860  2619  6140  4700 12093  7274 11142  6194 11175  7391 
##     2     2     2     2     2     2     2     2     2     2     2     2     2 
##  1088  4920  9017  4165  2358  2329  7131  2150 10103  4695  4529  4280   756 
##     2     2     2     2     2     1     2     2     1     1     2     1     1 
##  9596  2845  5268  3353  2702   498  1302  9374  3935  1875  1275   641   883 
##     2     2     2     2     2     2     1     1     2     2     2     2     2 
## 10967  3761   296 12198  1115  2308  9170 10241  6080  8270  5786 10260  4578 
##     2     2     2     2     2     2     1     1     2     1     2     2     2 
##  5074  3363 10797  4433 10547   448 10489  4470  4931 12183  4040  5895  7024 
##     2     1     2     2     1     1     2     2     2     2     2     2     2 
##  9284  3070  3185  4090 11359  6680  7780  3973  2882  2132  7590  3126  4926 
##     2     2     2     2     2     2     1     2     2     2     2     2     2 
##  7737 10743  8540  8928 11569 10169 12009  5775  1660  8494  2003  5313  2780 
##     2     2     2     2     1     2     2     2     2     2     2     1     2 
##  9773   919   461  9999  7827  5978 11830  6687 10570  3766  7279  8451  9462 
##     1     1     2     1     2     2     2     2     2     1     2     2     2 
##  2918  1509  2956 11678  2708  7068  2139 10325 11372  5913  5868  5545  3941 
##     2     2     2     1     2     2     2     1     2     2     1     2     2 
##  2009  9990  9219  7053  4783  7081  7606  9244  5061  2504  1668  7775  8917 
##     2     1     2     2     2     2     1     2     2     2     2     2     1 
##  1249  6701  1566  3488  6596  1496  9613  4755 11485  3594  4357 11367 10653 
##     2     2     2     2     2     2     2     2     1     1     2     1     2 
##  1615 12267  5289  3961  1905  9407  9205  2595   292  6511  2804  9579  1944 
##     2     1     2     2     2     2     1     1     2     2     2     2     2 
## 10831  8081  8682  7327  8364  9330  2025 11929  2012   116  1168  3475  4681 
##     2     2     1     2     2     2     2     1     2     1     2     2     2 
##  3338  6673  2493  5755  6011  7397  4081  3239  5006  9002  5567  3182 12036 
##     2     2     2     2     1     2     2     1     2     1     2     1     2 
##  8037  3681 12317  2790  7994 11626  6565  4723  8995  9811  8566  5232  1844 
##     2     2     1     1     2     1     2     2     2     1     1     2     2 
##  9338  4662    53  9911  4604 12165  3505  4147  4718 10822  1835  3787  9741 
##     2     2     2     2     2     2     2     1     1     1     2     2     1 
##  7870  3813  5233  8374 11010  9995  7941  1343 10470  5276  3689  1649 12064 
##     2     2     2     2     2     1     2     2     2     2     1     1     1 
##  4198   816 10435 11072  2998   836  2828  1228   642  5447 11666  9571  4256 
##     2     2     1     2     1     2     2     2     1     2     2     2     2 
##  5569  1689 11556  6653  4173  6635  6257  8307  9475  6128  5479  1252  6390 
##     2     2     2     2     1     2     1     2     2     2     1     2     2 
##  4630  3294  2445  8524  7455  1389  1931 12116  9157  1613  9483   240 11667 
##     2     2     2     2     1     1     2     2     2     1     2     2     2 
##  8879 11612 11301  9979  7008 10699 10895 11956  6544  2855  3956  4530  6466 
##     2     2     2     1     2     2     2     2     1     2     1     2     1 
##  8925   679   680  7998  6692  4137  3725 10086 11393  8838  3024  1902  8520 
##     2     2     2     1     2     1     2     2     2     2     1     1     2 
##  1185   151 11456  8053   976  9784 10552  2279  2327  6354  2496  6266  1380 
##     1     2     2     1     1     2     2     1     2     1     1     1     1 
##  4644   570  7678  4366  4589  2201 11280  6575  7808  6808  2021  8916 10443 
##     2     2     2     2     1     2     2     2     2     2     2     2     1 
##  1329  1714  1928 10171  3263  7915 12018   238  8527  4049  5384  7698  6658 
##     2     1     2     2     2     1     1     2     2     2     2     2     2 
## 10188  9245  8070 11767 11290   778  6212    60  1089  4854  6069  8249  4923 
##     2     2     2     1     1     2     1     2     1     2     2     2     2 
##  8478  3590  6111  9165  8019  4473  9520 11200 11884  4469  2678  9787 10292 
##     2     2     2     2     1     2     2     2     2     2     2     1     2 
##  7192  5114  4138 10649   522  4139  8466  3144    84  4599   354  5059  6232 
##     2     2     2     2     2     1     2     2     2     2     2     2     2 
##  6323 12094  3735  2300  3000  8432  8901  7170  2873  1774  8183  6454  3478 
##     2     2     2     2     2     2     2     1     2     2     2     2     2 
##  5316  5557  5324 12063   146  8715  9468  2091   672  7004  4554  2643  5053 
##     2     1     2     2     2     1     2     2     2     2     2     2     2 
##  3072  5926  4051  5438  2737  8340  5976  8868  6537 11573  2960 11726  5067 
##     2     2     1     2     2     1     2     2     1     2     2     2     2 
##  5448  3224  3609  4240  1098  3974  3125  9735  3623  5828  1768  8266  9253 
##     2     2     2     2     1     2     2     2     1     2     2     2     2 
##  3554  7792 10297  5103  1505  2903  7372  6215  5924  1864  1257 10212  6659 
##     2     2     2     2     2     2     2     1     2     2     2     2     2 
## 11272  7796  4343  9231  8895  9343 10581  7647  9061 11927  6066  1721  7729 
##     1     2     2     2     1     2     2     2     2     2     1     2     1 
##  3057 10084    55  5572  6243  3255  9651  9052 11409  9925  7754  4360  6543 
##     2     2     2     2     2     2     1     2     2     2     1     2     2 
##  2497 11040  1213  5181  9880  9907  6420  7084   140  7865 10725  9587 11068 
##     2     2     1     2     2     1     2     2     1     1     2     2     2 
##  1991  9525   963  6739 10165  8705  5998   794 11633  6057   651  6458    27 
##     1     2     2     2     1     2     2     2     2     2     2     2     2 
##  4546  2116  2481  2031  4097  5618  2142 10238  3882  2369  5640 12023  8943 
##     2     2     1     2     2     2     2     1     2     2     2     2     1 
##  1526  1074  8069  1677 10126  9151    19  9781  3154  1695  8371  5092  1676 
##     2     2     2     2     1     2     2     2     2     2     2     2     1 
##  6288   579  7403  6022  4712  1124  8833  4569 10302  5108  8231  9920  2170 
##     2     2     2     1     2     2     1     2     1     2     2     1     2 
##  1873  9387 10749  1374  2558  4815 11578  2596 11364  3656 10665  2439 10718 
##     2     1     1     2     2     2     2     2     2     2     2     2     2 
## 11411 12044 10664  5157  7254 11789  2661  4489  6909  2598  7392  5496  4918 
##     2     2     2     2     2     2     2     2     2     2     2     1     2 
##  2501  4070 10794  7665  9181  9184   133  7343  8329  5822 11059  4642  2422 
##     1     2     1     1     2     2     2     2     1     1     2     1     1 
##  9024  6196  2247  9823  1682  7174 11011  5008  5011  4715   373  4773  8201 
##     2     2     2     2     2     2     2     2     2     2     2     1     1 
##  6282  3369   917  9906  2924 10313  9030  8447  5478 11259  6482 10880  1794 
##     2     2     2     1     2     1     2     1     2     2     1     2     2 
##  5219  5492  4873  5416  6678 11603 11577 10407  6369  3552  3390  9706  2292 
##     2     2     1     2     2     2     1     2     1     2     2     1     2 
##  2842 10082  9377  6197  8905  3334  1130 11985  1811  9035  5821 10883  1750 
##     2     2     2     2     2     2     2     1     2     2     2     2     2 
##  7118  8634  4646  4136  2968  4285  3336  5850  8302  7504  9567  4000  5036 
##     2     1     2     1     2     2     2     2     1     2     2     1     2 
##     2  5531 12096  3184 11161  4935   964  9422 10064  3800  3504  5421   631 
##     2     2     2     1     2     2     2     1     2     2     2     2     2 
##  5294  5318  5622 11095  3966 10773  2174  4842  4963  1603  7640  8324  7604 
##     2     2     2     2     2     2     2     2     2     2     2     1     1 
##  8826 10746  7059    72  3836  3054  1090  8253 10910  9305  6294  5537 10635 
##     2     1     2     2     2     2     2     1     1     2     2     2     2 
##  2240 10785  9286  8690  9185  9989  5450  3907 12274 11464  8016  8534  8313 
##     2     2     1     2     2     2     2     2     2     2     2     1     1 
##  7673  5303  1004 12074 11360  2406  1057  8343  8379 10617  9354  9361  2063 
##     2     2     1     2     1     1     1     2     2     2     1     1     2 
##  8497  9676  3936 10765  6631  2857  8301  6868 11527  3005  9370  7364  9618 
##     2     2     2     2     2     2     2     2     1     2     2     2     1 
##  5371  7165  3141 11481  2071 11255  2978   598 12309  7565  7452  2600   876 
##     1     1     2     2     1     2     2     2     2     2     2     2     1 
##  4235   615   723  1355 11333  2826  3194  2061 12277  1755  6192  6296 12249 
##     2     2     1     2     1     2     2     2     2     2     2     2     2 
##  1424  4460   803  5430  3266  1362 10938  8241  7600  8174 11480 10395   348 
##     1     2     1     1     2     2     2     2     2     2     1     1     1 
##  1322  4330 11184  6072  8757  4161  1733  8195   225  9503  3548 11771  9539 
##     2     2     1     2     2     2     2     2     2     2     2     1     1 
##  2962  8516  4086  4102  4656  1901  2086  5093  1915   205   320 12127  5571 
##     2     2     2     2     2     2     2     2     2     2     1     2     2 
##  3799  1910  4140  8143   177  8878 11376  7196  1401  5945  9201  7440    42 
##     2     2     2     2     1     1     2     2     2     1     2     2     1 
##  3897  2764  6769  4760  4771  7147  4080  3904   375  8442 11166  9614  9007 
##     2     2     2     2     2     2     2     1     2     2     2     2     2 
##  9862  5050  6010  3223  4899  2883  1664 11276  2087  7561  4234  2849  4495 
##     2     2     2     2     2     2     1     2     1     2     2     2     2 
##  9037  4770  3216  1011   132  8114 11922  2870  8392  1752 11226  3174  2255 
##     1     1     2     2     2     1     2     2     1     2     2     1     2 
##  4128  5719   978  2166  3970  4732 10479  8848  8656  5171  6211 11536  2196 
##     2     1     2     2     1     2     2     1     2     2     2     1     2 
## 10530  7211  3105   563  1848  8526  3171  4297  8591  8782  5836  1396  7956 
##     2     1     2     2     2     2     2     2     2     1     2     2     2 
##  2264 10473  7649  4654  2853  2000 10281  4089  1085  9803  9631  4563 12045 
##     1     2     2     2     2     2     1     2     1     1     2     2     1 
## 10411  9048  5082  4467  5239 10116  3639 10608  6883  5325  3847  2844  8729 
##     2     1     2     2     2     2     2     2     2     2     2     2     2 
##  2534  7119  4612  3364 11178 11441  3779  7448  5736  4943  5244  2577  7892 
##     1     2     2     1     2     1     2     2     2     2     1     1     2 
## 11653  8474 12123  6290 11487 10354 11284  9195  2995   279  6342  3062  6722 
##     2     1     1     1     1     1     2     2     2     2     2     2     2 
##  7777  8457  3595 11596  4761  4264  2218   386  1429  6502 11346  8702  9427 
##     2     2     2     2     1     2     2     2     2     1     2     2     2 
##  7011  4414 10809  2233  6106  5105  2660  9101   905  5424   234  1580  2146 
##     2     2     2     2     1     2     2     2     2     2     2     2     2 
##  7080  7718  7278  8411  9962  1563 11043  8761 12061  2065  2350 11793  3259 
##     2     2     2     2     1     2     1     1     2     2     1     2     2 
##  3778   470  6582 10032  4318 12281  8065 12166 11550  2219  4131  2141  3118 
##     2     1     1     1     2     1     2     2     2     2     2     2     2 
## 10222  2654     5  1691  6501  2118  3059  1642  4941  9154  5852 10272  4890 
##     1     2     2     2     2     2     1     2     2     2     2     2     2 
##  3097  6585  7741  9213  2976  3023 10102 12159  8213   249  3715 11042  4855 
##     1     2     2     1     2     2     1     1     1     2     2     2     2 
##  5068  5765  1203 11671  8400   378  1985 12055 10852  2427  7669  3301  1743 
##     2     2     1     2     2     2     2     2     2     2     2     1     1 
##  9302  5809  1907  4596  8994  5510  6996  7176  5311 11357 10357  5112 11888 
##     1     2     2     1     1     2     1     1     1     2     2     1     2 
##   736  1113  6496   708  8225    68 11489  1524  9273 10763  5866  8336  5624 
##     2     2     1     2     1     2     1     2     1     1     1     2     2 
##  3050  2676  4688 12135  2474  3550  8898  9116  4224  8720  9121  2119  6233 
##     2     2     2     1     2     1     2     1     2     2     2     2     2 
##  9419   126 10157  7484 11083 11217  3574  9972   403  4519  4585  4745  2779 
##     1     1     2     2     2     2     1     2     1     2     2     2     2 
##  7625  4965 11021   487 12218   661  3825 11614  8652  6270  8198  9110  1747 
##     1     2     2     2     2     2     2     1     2     1     2     1     2 
##  5906 12152  2947  2405  2820  5863  8821 10712  4871    11  5467  5498  9576 
##     1     2     2     2     2     1     2     2     1     1     2     1     1 
##  9183  8735  1693  8472 11249  3591  9359  3732  9186 11289  7284  5552 12161 
##     2     2     2     2     2     2     2     2     2     1     1     1     1 
##  5342  9149  8012 11794  4188  7079  1605 10014  6751  8160  8001  1427   634 
##     2     1     2     2     2     2     2     2     2     1     2     2     2 
##  6562  3360  6989 10240 11655  5674 12185  5243  4729  7430  3598  9459  7825 
##     2     2     1     1     1     1     2     2     1     2     1     2     2 
## 11465  1258  4016  7370  9103  6135  1420  9509 12097  7159  7819 10999  1601 
##     1     2     2     2     1     2     2     2     1     2     1     2     2 
##  8796 12310 10062 10236 10060   587  5189 10914  6718  4886  4359  4808 11982 
##     2     2     2     1     1     1     2     2     1     2     2     2     2 
##   855  1052  5737  4753  4653 10793  2761  3406  5950  2633  4664  2601  3420 
##     2     2     1     1     1     1     2     2     2     2     2     2     2 
##  9125  5920  5745  6159  2097  6767  4804  9021  3272  6457  7087  4003  2905 
##     2     2     2     2     2     2     2     2     2     2     2     2     2 
##  1999  6406  9819  6429  8501  8434   130  7961  3018  1661  4887  6008  5877 
##     1     2     1     2     1     1     2     2     2     1     1     2     2 
##  4044  3208 11681  1839  1816  5962  4134  8554  9725  2904  3173  4354  6469 
##     2     2     2     2     2     2     1     2     2     2     2     2     2 
##  2202  9385  5358  1309  1195  1306   584  5536  9384  2216 11933  7498  5127 
##     2     1     1     2     1     1     1     2     1     1     1     1     2 
## 10057   792  5937  6831  4672 10972  9538 12004  4227  1673  7522  4919  3926 
##     1     1     2     2     2     2     2     2     2     2     1     2     2 
##  9180  3090 10589  2910   106  8886  7320   559  5132  7497  5677  8547  6633 
##     1     2     1     2     1     1     2     1     1     2     2     2     2 
##  9697  8247  5230 10962  7214  9188 10289 11567  1586   856   815  8458  8149 
##     2     1     2     2     2     1     1     1     2     2     2     2     2 
##  2825 10792 10464  6427  5465 11304 11256 11934  4561  9430  1556    79   920 
##     2     2     2     2     2     2     1     2     2     2     2     1     1 
##  2629  1450  8584  2964 10550 10517   307 10734 11436   108  8133  8060  7424 
##     2     2     2     2     2     2     2     1     2     1     1     2     1 
##  6278  9230  9434  1384 11251  4903 11903  9529  6182  2752  8094  5629  1014 
##     1     1     1     2     2     2     2     1     2     1     1     2     2 
##  1232  8355  2613 10377  3993  5739  3089  2729  6054   729  5711  8927  2376 
##     1     1     1     2     1     2     2     2     2     1     2     2     2 
##  1414  9038 10469  2078  8487   592  1541  5645 10041 11092  2513 10172 11375 
##     2     2     2     1     2     1     2     2     1     2     2     1     2 
##  2454  5975  2659  8407  8778  2226  1784  6091  4896 12301 11758 12071  3780 
##     1     2     2     2     1     2     2     2     1     2     1     2     2 
##  6023 11070  3438  9759  7739  8876  7048 10595  9390   258  5007  7548 11986 
##     2     1     2     1     2     2     2     2     1     1     1     1     2 
##  5723  6070 12085  5789  6274  8771  9199   460 12173  1501 
##     2     2     2     2     2     1     1     2     1     2 
## Objective function:
##     build      swap 
## 0.2141276 0.2141220 
## 
## Available components:
## [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
## [6] "clusinfo"   "silinfo"    "diss"       "call"

Conclusion

Kmeans model

Kmeans model was was computational fast and it’s results were easily interpret-able. It could also handle the large data set.

However it could only cluster continuous data. It failed to account for the 7 feature variables in the data set. Furthermore the algorithm was affected by outliers hence data had to be scaled.

performance of kmeans model;

Within cluster sum of squares by cluster: [1] 14093.503 9872.964 (between_SS / total_SS = 20.1 %)

This implies that 20.1% of the data were in the cluster

Heirachial model

The hierarchical model could handle both continuous and categorical variables. It was also robust to outliers and data distribution.

However, it was slower in clustering the data in the data set. It’s results were harder to interpret.

Performance metrics Objective function: build swap 0.2131155 0.2131155

This implies that the model had 21.211% of the data in the cluster.

From this it can be concluded that the hierarchical model works best in segmenting customers for the retailer.

Reccommendation

The marketing team of Kira Plastinina is advised to implement the hierarchical system to better develop better insights into their customer base

Further Questions

A) Do we have the right data

For this study and to meet the objectives set by the entrepreneur, this data provides relevant information to meet those objectives.

B) Do we have the right question?

Yes. Developing a machine learning model that can segment the businesses clients will help the business to be a able to figure out where to place the most effort and get maximum return