Kira Plastinina (Links to an external site.) is a Russian brand that is sold through a defunct chain of retail stores in Russia, Ukraine, Kazakhstan, Belarus, China, Philippines, and Armenia. The brand’s Sales and Marketing team would like to understand their customer’s behavior from data that they have collected over the past year. More specifically, they would like to learn the characteristics of customer groups.
Identify customer behaviour and more specifically learn the characteristics of customer groups. This shall be achieved by;
Performing clustering stating insights drawn from the analysis and visualizations.
Providing comparisons between K-Means clustering vs Hierarchical clustering highlighting the strengths and limitations of each approach in the context of the analysis.
Defining the research questions and work plan
Loading the dataset
Previewing the dataset
Cleaning the dataset which will entail dealing with outliers, duplicates and missing values appropriately
Feature engineering
Performing Uni variate, bivariate and multivariate analysis on the data set
Creating unsupervised learning algorithm
Challenging solution
Concluding based on the findings of the research
Providing recommendations based on the conclusions arrived at
Further questions
The dataset consists of 10 numerical and 8 categorical attributes. The ‘Revenue’ attribute can be used as the class label.
“Administrative”, “Administrative Duration”, “Informational”, “Informational Duration”, “Product Related” and “Product Related Duration” represents the number of different types of pages visited by the visitor in that session and total time spent in each of these page categories. The values of these features are derived from the URL information of the pages visited by the user and updated in real-time when a user takes an action, e.g. moving from one page to another.
The “Bounce Rate”, “Exit Rate” and “Page Value” features represent the metrics measured by “Google Analytics” for each page in the e-commerce site.
The value of the “Bounce Rate” feature for a web page refers to the percentage of visitors who enter the site from that page and then leave (“bounce”) without triggering any other requests to the analytics server during that session.
The value of the “Exit Rate” feature for a specific web page is calculated as for all pageviews to the page, the percentage that was the last in the session.
The “Page Value” feature represents the average value for a web page that a user visited before completing an e-commerce transaction.
The “Special Day” feature indicates the closeness of the site visiting time to a specific special day (e.g. Mother’s Day, Valentine’s Day) in which the sessions are more likely to be finalized with the transaction. The value of this attribute is determined by considering the dynamics of e-commerce such as the duration between the order date and delivery date. For example, for Valentina’s day, this value takes a nonzero value between February 2 and February 12, zero before and after this date unless it is close to another special day, and its maximum value of 1 on February 8.
The dataset also includes the operating system, browser, region, traffic type, visitor type as returning or new visitor, a Boolean value indicating whether the date of the visit is weekend, and month of the year.
# Loading the relevant libraries for this study
library(stringr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
library(countrycode)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble 3.1.7 ✔ purrr 0.3.4
## ✔ tidyr 1.2.0 ✔ forcats 0.5.1
## ✔ readr 2.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
library(moments)
library(paletteer)
library(Amelia,Rcpp)
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.0, built: 2021-05-26)
## ## Copyright (C) 2005-2022 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(animation)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(cluster) # clustering algorithms
library(factoextra) # clustering algorithms & visualization
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(caret)
##
## Attaching package: 'caret'
## The following object is masked from 'package:survival':
##
## cluster
## The following object is masked from 'package:purrr':
##
## lift
library(ISLR) # for college dataset
library(Rtsne) # for t-SNE plot
# laoding customer dataset
#
customer_dataset <- read.csv("http://bit.ly/EcommerceCustomersDataset")
Due to lack of comparable data this dataset shall be assumed to be valid and relevant for this study
# previewing first six records of customer dataset
#
head(customer_dataset)
## Administrative Administrative_Duration Informational Informational_Duration
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 -1 0 -1
## 4 0 0 0 0
## 5 0 0 0 0
## 6 0 0 0 0
## ProductRelated ProductRelated_Duration BounceRates ExitRates PageValues
## 1 1 0.000000 0.20000000 0.2000000 0
## 2 2 64.000000 0.00000000 0.1000000 0
## 3 1 -1.000000 0.20000000 0.2000000 0
## 4 2 2.666667 0.05000000 0.1400000 0
## 5 10 627.500000 0.02000000 0.0500000 0
## 6 19 154.216667 0.01578947 0.0245614 0
## SpecialDay Month OperatingSystems Browser Region TrafficType
## 1 0 Feb 1 1 1 1
## 2 0 Feb 2 2 1 2
## 3 0 Feb 4 1 9 3
## 4 0 Feb 3 2 2 4
## 5 0 Feb 3 3 1 4
## 6 0 Feb 2 2 1 3
## VisitorType Weekend Revenue
## 1 Returning_Visitor FALSE FALSE
## 2 Returning_Visitor FALSE FALSE
## 3 Returning_Visitor FALSE FALSE
## 4 Returning_Visitor FALSE FALSE
## 5 Returning_Visitor TRUE FALSE
## 6 Returning_Visitor FALSE FALSE
# previewing the characteristics of the data set
#
str(customer_dataset)
## 'data.frame': 12330 obs. of 18 variables:
## $ Administrative : int 0 0 0 0 0 0 0 1 0 0 ...
## $ Administrative_Duration: num 0 0 -1 0 0 0 -1 -1 0 0 ...
## $ Informational : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Informational_Duration : num 0 0 -1 0 0 0 -1 -1 0 0 ...
## $ ProductRelated : int 1 2 1 2 10 19 1 1 2 3 ...
## $ ProductRelated_Duration: num 0 64 -1 2.67 627.5 ...
## $ BounceRates : num 0.2 0 0.2 0.05 0.02 ...
## $ ExitRates : num 0.2 0.1 0.2 0.14 0.05 ...
## $ PageValues : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SpecialDay : num 0 0 0 0 0 0 0.4 0 0.8 0.4 ...
## $ Month : chr "Feb" "Feb" "Feb" "Feb" ...
## $ OperatingSystems : int 1 2 4 3 3 2 2 1 2 2 ...
## $ Browser : int 1 2 1 2 3 2 4 2 2 4 ...
## $ Region : int 1 1 9 2 1 1 3 1 2 1 ...
## $ TrafficType : int 1 2 3 4 4 3 3 5 3 2 ...
## $ VisitorType : chr "Returning_Visitor" "Returning_Visitor" "Returning_Visitor" "Returning_Visitor" ...
## $ Weekend : logi FALSE FALSE FALSE FALSE TRUE FALSE ...
## $ Revenue : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
The data set has 10 continuous variables and 8 categorical variables. The categorical variables are; (“Month”, “OperatingSystems”, “Browser”, “Region”, “TrafficType”, “VisitorType”, “Weekend”, and “Revenue”)
These columns shall be converted to factors for anlysis in order to store the categorical variables
# Establish the data set class
#
class(customer_dataset)
## [1] "data.frame"
The data set is a data frame
# view the number of rows and columns in the dataset
#
dim(customer_dataset)
## [1] 12330 18
The dataset has 18 columns and 12330 records
As initially stated the categorical variables; (“Month”, “OperatingSystems”, “Browser”, “Region”, “TrafficType”, “VisitorType”, “Weekend”, and “Revenue”) shall be converted to factors.
# Converting categorical variables to factors
# Specifying columns
#
cols <- c("Month", "OperatingSystems",
"Browser", "Region", "TrafficType", "VisitorType", "Weekend", "Revenue")
# Conversion
#
customer_dataset[cols] <- lapply(customer_dataset[cols], factor)
# Previewing the characteristics of the converted variables
#
str(customer_dataset)
## 'data.frame': 12330 obs. of 18 variables:
## $ Administrative : int 0 0 0 0 0 0 0 1 0 0 ...
## $ Administrative_Duration: num 0 0 -1 0 0 0 -1 -1 0 0 ...
## $ Informational : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Informational_Duration : num 0 0 -1 0 0 0 -1 -1 0 0 ...
## $ ProductRelated : int 1 2 1 2 10 19 1 1 2 3 ...
## $ ProductRelated_Duration: num 0 64 -1 2.67 627.5 ...
## $ BounceRates : num 0.2 0 0.2 0.05 0.02 ...
## $ ExitRates : num 0.2 0.1 0.2 0.14 0.05 ...
## $ PageValues : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SpecialDay : num 0 0 0 0 0 0 0.4 0 0.8 0.4 ...
## $ Month : Factor w/ 10 levels "Aug","Dec","Feb",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ OperatingSystems : Factor w/ 8 levels "1","2","3","4",..: 1 2 4 3 3 2 2 1 2 2 ...
## $ Browser : Factor w/ 13 levels "1","2","3","4",..: 1 2 1 2 3 2 4 2 2 4 ...
## $ Region : Factor w/ 9 levels "1","2","3","4",..: 1 1 9 2 1 1 3 1 2 1 ...
## $ TrafficType : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 4 3 3 5 3 2 ...
## $ VisitorType : Factor w/ 3 levels "New_Visitor",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Weekend : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 2 1 1 2 1 1 ...
## $ Revenue : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
Al columns have appropiate datatypes. All variables shall be renamed to lowercase.
# Renaming the variables of the customer dataset
#
names(customer_dataset) <- c("administrative", "administrative_duration",
"informational", "informational_duration",
"productrelated", "productrelated_duration",
"bouncerates", "exitrates", "pagevalues",
"SpecialDay", "month", "operatingsystems",
"browser", "region", "traffictype", "visitortype",
"weekend", "revenue")
# Preview column names of data set variables
#
colnames(customer_dataset)
## [1] "administrative" "administrative_duration"
## [3] "informational" "informational_duration"
## [5] "productrelated" "productrelated_duration"
## [7] "bouncerates" "exitrates"
## [9] "pagevalues" "SpecialDay"
## [11] "month" "operatingsystems"
## [13] "browser" "region"
## [15] "traffictype" "visitortype"
## [17] "weekend" "revenue"
All variables have been appropriately renamed
# Checking the number of missing values per column in the data set
#
colSums(is.na(customer_dataset))
## administrative administrative_duration informational
## 14 14 14
## informational_duration productrelated productrelated_duration
## 14 14 14
## bouncerates exitrates pagevalues
## 14 14 0
## SpecialDay month operatingsystems
## 0 0 0
## browser region traffictype
## 0 0 0
## visitortype weekend revenue
## 0 0 0
The administrative, administrative_duration, informational, informational_duration, product_related, product_related_duration, bounce_rates, exit_rates variables have 14 missing values each. This missing values explored first
# checking missing data visualization
#
missmap(customer_dataset)
From the missing data plot, the number of missing values seems
insignificant relative to the records in data set hence shall be
omitted
# Dropping missing values in customer dataset.
#
customer_dataset <- na.omit(customer_dataset)
# Checking for any remaining null values
#
colSums(is.na(customer_dataset))
## administrative administrative_duration informational
## 0 0 0
## informational_duration productrelated productrelated_duration
## 0 0 0
## bouncerates exitrates pagevalues
## 0 0 0
## SpecialDay month operatingsystems
## 0 0 0
## browser region traffictype
## 0 0 0
## visitortype weekend revenue
## 0 0 0
All missing values have been dropped
# Checking for duplicate values in the customer data set
#
#
duplicated_rows = customer_dataset[duplicated(customer_dataset),]
# Printing out the duplicated rows
head(duplicated_rows)
## administrative administrative_duration informational informational_duration
## 159 0 0 0 0
## 179 0 0 0 0
## 419 0 0 0 0
## 457 0 0 0 0
## 484 0 0 0 0
## 513 0 0 0 0
## productrelated productrelated_duration bouncerates exitrates pagevalues
## 159 1 0 0.2 0.2 0
## 179 1 0 0.2 0.2 0
## 419 1 0 0.2 0.2 0
## 457 1 0 0.2 0.2 0
## 484 1 0 0.2 0.2 0
## 513 1 0 0.2 0.2 0
## SpecialDay month operatingsystems browser region traffictype
## 159 0 Feb 1 1 1 3
## 179 0 Feb 3 2 3 3
## 419 0 Mar 1 1 1 1
## 457 0 Mar 2 2 4 1
## 484 0 Mar 3 2 3 1
## 513 0 Mar 2 2 1 1
## visitortype weekend revenue
## 159 Returning_Visitor FALSE FALSE
## 179 Returning_Visitor FALSE FALSE
## 419 Returning_Visitor TRUE FALSE
## 457 Returning_Visitor FALSE FALSE
## 484 Returning_Visitor FALSE FALSE
## 513 Returning_Visitor FALSE FALSE
The data set has 117 duplicate records. The duplicate records shall be dropped
# Dropping duplicate records
#
customer_dataset1 <- customer_dataset[!duplicated(customer_dataset), ]
# Checking for any remaining duplicate records
#
customer_dataset1[duplicated(customer_dataset1),]
## [1] administrative administrative_duration informational
## [4] informational_duration productrelated productrelated_duration
## [7] bouncerates exitrates pagevalues
## [10] SpecialDay month operatingsystems
## [13] browser region traffictype
## [16] visitortype weekend revenue
## <0 rows> (or 0-length row.names)
All duplicate records have been successfully dropped
# number of rows in data frame
#
num_rows = nrow(customer_dataset1)
# creating ID column vector
#
ID <- c(1:num_rows)
# binding id column to the data frame
#
customer_dataset2 <- cbind(ID , customer_dataset1)
# Applying names function to get column names from numeric columns in dataset
# as a list
#
colnames <- names(select_if(customer_dataset2, is.numeric))
# Print vector of column names
#
colnames
## [1] "ID" "administrative"
## [3] "administrative_duration" "informational"
## [5] "informational_duration" "productrelated"
## [7] "productrelated_duration" "bouncerates"
## [9] "exitrates" "pagevalues"
## [11] "SpecialDay"
# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
measure.vars=c("administrative_duration" ,
"informational_duration",
"pagevalues"))
# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
# printing the plot
#
print(p)
# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
measure.vars=c("productrelated_duration"))
# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
# printing the plot
#
print(p)
# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
measure.vars=c("administrative", "informational"))
# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
# printing the plot
#
print(p)
# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
measure.vars=c("productrelated"))
# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
# printing the plot
#
print(p)
# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
measure.vars=c("bouncerates","SpecialDay"))
# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
# printing the plot
#
print(p)
# creating the modified data frame
#
data_mod1 <- melt(customer_dataset2, id.vars='ID',
measure.vars=c("exitrates"))
# creating a plot of area income
#
p <- ggplot(data_mod1) +
geom_boxplot(aes(x=ID, y=value, color=variable))
# printing the plot
#
print(p)
All numeric columns seem to have outliers and are not normally distributed. These shall be maintained and investigated further subsequently.
No additional variables that can aid the analysis can be derived from this data set.
The data set has over 12000 records. This will computationally lengthy when computing the kmean and hierarchical clustering algorithms. A random sample of 3000 of the records will be selected. This is sufficient to meet objectives of our study
# selecting 3000 records randomly from the customer dataset
#
customer_dataset1 <- customer_dataset1[sample(nrow(customer_dataset1), 3000), ]
# Selecting non numeric columns in the ad data set
#
non_num <- customer_dataset1 %>% select_if(negate(is.numeric))
# Previewing first six records of non_numeric columns in data frame
#
head(non_num)
## month operatingsystems browser region traffictype visitortype
## 12046 Dec 1 8 1 10 Returning_Visitor
## 1969 Mar 1 1 1 1 Returning_Visitor
## 9329 Dec 2 2 1 1 Returning_Visitor
## 10233 Nov 3 2 1 10 Returning_Visitor
## 11320 Dec 3 2 9 13 Returning_Visitor
## 3863 May 2 2 4 3 Returning_Visitor
## weekend revenue
## 12046 FALSE FALSE
## 1969 FALSE TRUE
## 9329 FALSE FALSE
## 10233 TRUE FALSE
## 11320 FALSE FALSE
## 3863 FALSE FALSE
# Finding unique values of the non_numeric columns
#
rapply(non_num,function(x)length(unique(x)))
## month operatingsystems browser region
## 10 8 12 9
## traffictype visitortype weekend revenue
## 17 3 2 2
The data was collected over 10 months. There were a total of 8 Operating systems in the data set. 13 different browser types are captured in the data set. The respondents come from a total of 9 regions. The traffic type has a total of 20 classes. The visitor type has a total of 3 classes. Weekend has 2 unique variable same to revenue
The distribution of the data (mean, mode, median , skew) shall be computed for the numeriv variables
# Creating data set with numeric variables only
# Identifying the numeric class in the data and evaluating if there are any
# outliers
#
num_cols <- unlist(lapply(customer_dataset1, is.numeric))
# Subset numeric columns of data
#
num_dataset <- customer_dataset1[ , num_cols]
# Printing the subset to RStudio console
#
head(num_dataset)
## administrative administrative_duration informational
## 12046 5 200.8333 0
## 1969 0 0.0000 0
## 9329 0 0.0000 0
## 10233 1 3.0000 0
## 11320 5 106.0000 0
## 3863 0 0.0000 0
## informational_duration productrelated productrelated_duration bouncerates
## 12046 0 44 1909.081 0.004761905
## 1969 0 32 999.000 0.000000000
## 9329 0 9 82.500 0.000000000
## 10233 0 11 1046.667 0.005555556
## 11320 0 44 646.825 0.007407407
## 3863 0 141 2886.524 0.004316547
## exitrates pagevalues SpecialDay
## 12046 0.01825397 0.000000 0.0
## 1969 0.01344086 9.053082 0.0
## 9329 0.03333333 0.000000 0.0
## 10233 0.02500000 0.000000 0.0
## 11320 0.02885185 0.000000 0.0
## 3863 0.01879809 0.000000 0.8
# Creating the mode function that will perform our mode operation for us
# ---
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Computing some descriptive statistics
# ---
#
desc_stats <- data.frame(
Mode = apply(num_dataset, 2, getmode), # Mode
Med = apply(num_dataset, 2, median), # median
Mean = apply(num_dataset, 2, mean), # mean
SD = apply(num_dataset, 2, sd), # Standard deviation
Var = apply(num_dataset, 2, var), # Variance
Min = apply(num_dataset, 2, min), # minimum
Max = apply(num_dataset, 2, max), # Maximum
skewness = skewness(num_dataset), # skewness
kurtosis = kurtosis(num_dataset) # kurtosis
)
desc_stats <- round(desc_stats, 2)
desc_stats
## Mode Med Mean SD Var Min Max
## administrative 0.0 1.00 2.40 3.37 11.36 0 24.00
## administrative_duration 0.0 11.00 85.70 176.51 31156.09 -1 1922.00
## informational 0.0 0.00 0.52 1.25 1.56 0 12.00
## informational_duration 0.0 0.00 34.59 135.98 18491.59 -1 2195.30
## productrelated 1.0 19.00 31.53 40.52 1641.79 0 397.00
## productrelated_duration 0.0 620.85 1182.63 1664.59 2770844.67 -1 16093.31
## bouncerates 0.0 0.00 0.02 0.04 0.00 0 0.20
## exitrates 0.2 0.03 0.04 0.04 0.00 0 0.20
## pagevalues 0.0 0.00 6.15 18.41 339.06 0 255.57
## SpecialDay 0.0 0.00 0.06 0.19 0.04 0 1.00
## skewness kurtosis
## administrative 1.83 6.80
## administrative_duration 4.58 32.28
## informational 3.39 18.15
## informational_duration 7.18 70.84
## productrelated 3.38 19.41
## productrelated_duration 3.26 17.87
## bouncerates 3.27 13.23
## exitrates 2.27 7.90
## pagevalues 5.34 45.25
## SpecialDay 3.51 14.44
Most variables had a zero values. All the variables have a positive skew(skewed to the right). The standard deviations relative to the mean of the variables indicate the there is a high variation in the data in the various variables in the data set. The variables also have large kurtosis values. The variables in the data are heavy-tailed or light-tailed relative to a normal distribution
# Histogram plots of numeric data in the ad_dataset
hist.data.frame(num_dataset)
The histogram plots affirm the initial observation made from the kurtosis values that the variables have heavy tails. The varaiables have a large skew to the left.
# Bar chart of the genders in data set
ggplot(customer_dataset1, aes(x = revenue)) +
geom_bar(fill = "coral") +
theme_classic()
# Bar chart of the individuals who clicked and those who did not click on ad
ggplot(customer_dataset1, aes(x = weekend)) +
geom_bar(fill = "coral") +
theme_classic()
# Bar chart of the months the data was collected
ggplot(customer_dataset1, aes(x = visitortype)) +
geom_bar(fill = "coral") +
theme_classic()
# Bar chart of the hours the data was collected
ggplot(customer_dataset1, aes(x = traffictype)) +
geom_bar(fill = "coral") +
theme_classic()
# Bar chart of the weekdays the data was collected
ggplot(customer_dataset1, aes(x = region)) +
geom_bar(fill = "coral") +
theme_classic()
# Bar chart of the continent the data was collected
ggplot(customer_dataset1, aes(x = browser)) +
geom_bar(fill = "coral") +
theme_classic()
# Bar chart of the continent the data was collected
ggplot(customer_dataset1, aes(x = operatingsystems)) +
geom_bar(fill = "coral") +
theme_classic()
# Bar chart of the continent the data was collected
ggplot(customer_dataset1, aes(x = month)) +
geom_bar(fill = "coral") +
theme_classic()
From the bar plots;
The sites were visited the most on May and November. The users that accessed the site used operating system 2 and browser 2 the most. Most users originated from region 1. Traffic type two was the most observed traffic type in the data set. Most of the traffic comprised of returning visitors. Most of the traffic was noted during the work days. Most clients that visited the site ended up not buying anything no revenue
Covariance is a statistical representation of the degree to which two variables vary together.Here the relationship between the different numerical data in data Frame shall be calculated
# Create Covariance matrix of the numerical data in dataset
#
cov(num_dataset)
## administrative administrative_duration informational
## administrative 11.35650539 351.765339 1.450904190
## administrative_duration 351.76533853 31156.088527 56.320491489
## informational 1.45090419 56.320491 1.558049906
## informational_duration 115.64431522 3912.172799 105.385490236
## productrelated 54.86522885 1625.309888 18.303052128
## productrelated_duration 2023.75434407 70208.972496 787.573179512
## bouncerates -0.03129744 -1.090037 -0.006156577
## exitrates -0.04713444 -1.663237 -0.009349887
## pagevalues 6.16978898 193.234729 1.234833054
## SpecialDay -0.05497228 -2.420281 -0.011962210
## informational_duration productrelated
## administrative 115.6443152 54.8652289
## administrative_duration 3912.1727985 1625.3098876
## informational 105.3854902 18.3030521
## informational_duration 18491.5945284 1283.3141078
## productrelated 1283.3141078 1641.7892880
## productrelated_duration 59147.7470705 58746.2312691
## bouncerates -0.4182875 -0.3610840
## exitrates -0.6577976 -0.5465468
## pagevalues 59.3351531 44.0049514
## SpecialDay -0.3912316 -0.2272413
## productrelated_duration bouncerates exitrates
## administrative 2023.75434 -3.129744e-02 -4.713444e-02
## administrative_duration 70208.97250 -1.090037e+00 -1.663237e+00
## informational 787.57318 -6.156577e-03 -9.349887e-03
## informational_duration 59147.74707 -4.182875e-01 -6.577976e-01
## productrelated 58746.23127 -3.610840e-01 -5.465468e-01
## productrelated_duration 2770844.66658 -1.372450e+01 -1.983652e+01
## bouncerates -13.72450 1.897513e-03 1.757263e-03
## exitrates -19.83652 1.757263e-03 2.015653e-03
## pagevalues 1743.35272 -9.927698e-02 -1.501428e-01
## SpecialDay -11.58330 6.542691e-04 8.931951e-04
## pagevalues SpecialDay
## administrative 6.16978898 -5.497228e-02
## administrative_duration 193.23472891 -2.420281e+00
## informational 1.23483305 -1.196221e-02
## informational_duration 59.33515314 -3.912316e-01
## productrelated 44.00495143 -2.272413e-01
## productrelated_duration 1743.35271539 -1.158330e+01
## bouncerates -0.09927698 6.542691e-04
## exitrates -0.15014278 8.931951e-04
## pagevalues 339.05661266 -1.930041e-01
## SpecialDay -0.19300406 3.692296e-02
All variables aside from bounce rates, exit rates and special day had positive covaraiance amongst each othe
# Correlation matrix of numerical data in the customer data set
#
cor(num_dataset)
## administrative administrative_duration informational
## administrative 1.00000000 0.59136991 0.34492581
## administrative_duration 0.59136991 1.00000000 0.25562557
## informational 0.34492581 0.25562557 1.00000000
## informational_duration 0.25235665 0.16298941 0.62087326
## productrelated 0.40180573 0.22725111 0.36188795
## productrelated_duration 0.36076911 0.23895442 0.37904794
## bouncerates -0.21320337 -0.14176775 -0.11322864
## exitrates -0.31153595 -0.20988184 -0.16684293
## pagevalues 0.09942872 0.05945352 0.05372562
## SpecialDay -0.08489325 -0.07135855 -0.04987380
## informational_duration productrelated
## administrative 0.25235665 0.40180573
## administrative_duration 0.16298941 0.22725111
## informational 0.62087326 0.36188795
## informational_duration 1.00000000 0.23290943
## productrelated 0.23290943 1.00000000
## productrelated_duration 0.26130333 0.87099410
## bouncerates -0.07061474 -0.20457720
## exitrates -0.10774504 -0.30044210
## pagevalues 0.02369675 0.05898027
## SpecialDay -0.01497264 -0.02918638
## productrelated_duration bouncerates exitrates
## administrative 0.36076911 -0.21320337 -0.3115360
## administrative_duration 0.23895442 -0.14176775 -0.2098818
## informational 0.37904794 -0.11322864 -0.1668429
## informational_duration 0.26130333 -0.07061474 -0.1077450
## productrelated 0.87099410 -0.20457720 -0.3004421
## productrelated_duration 1.00000000 -0.18927716 -0.2654309
## bouncerates -0.18927716 1.00000000 0.8985384
## exitrates -0.26543094 0.89853843 1.0000000
## pagevalues 0.05687784 -0.12377134 -0.1816187
## SpecialDay -0.03621412 0.07816563 0.1035357
## pagevalues SpecialDay
## administrative 0.09942872 -0.08489325
## administrative_duration 0.05945352 -0.07135855
## informational 0.05372562 -0.04987380
## informational_duration 0.02369675 -0.01497264
## productrelated 0.05898027 -0.02918638
## productrelated_duration 0.05687784 -0.03621412
## bouncerates -0.12377134 0.07816563
## exitrates -0.18161868 0.10353573
## pagevalues 1.00000000 -0.05454841
## SpecialDay -0.05454841 1.00000000
From the correlation values, the variables administrative duration and administrative, informational duration and informational had good correlations (>0.6)
# pair plot of variables with numeric data
#
pairs(num_dataset, # Data frame of variables
col = 'blue', # Modify color
labels = colnames(num_dataset), # Variable names
pch = 21, # Pch symbol
main = "Customer dataset", # Title of the plot
row1attop = TRUE, # If FALSE, changes the direction of the diagonal
gap = 1, # Distance between subplots
cex.labels = NULL, # Size of the diagonal text
font.labels = 1) # Font style of the diagonal text
From the pair plots above, the special day variable shows no kind of relationship with the other numeric variables. This shall not be investigated any further in multivariate analysis section.
The exit rates and bounce rates had a positive linear correlation. However, no discernible relationship was observed between these two an the other numeric variables
# A bar plot of weekend data labelled with revenue data
#
ggplot(customer_dataset1, aes(x = weekend, fill = revenue)) +
geom_bar(position = position_dodge()) +
theme_classic()
For weekends and weekdays, the more visits to the site ended up with no revenue being received.
# Bar chart side by side of visitor type to whether revenue was
# received or not
#
ggplot(customer_dataset1, aes(x = visitortype, fill = revenue)) +
geom_bar(position = position_dodge()) +
theme_classic()
Returning visitors, new visitors abd other type of visitor most of the time did not end up spending after visiting the site
# Bar chart side by side of browser and revenue
#
ggplot(customer_dataset1, aes(x = browser, fill = revenue)) +
geom_bar(position = position_dodge()) +
theme_classic()
Individuals that accessed the site via browser type2 ended up earning the company the most revenue relative to other browsers.
# Bar chart side by side of region comparing revenue
#
ggplot(customer_dataset1, aes(x = region, fill = revenue)) +
geom_bar(position = position_dodge()) +
theme_classic()
Individuals that accessed the site from region one ended up netting revenue to the company on a lot more occasions relative to the other regions.
# Bar chart side by side of month comparing revenue
#
ggplot(customer_dataset1, aes(x = month, fill = revenue)) +
geom_bar(position = position_dodge()) +
theme_classic()
The company recorded the highest revenue returns per returns per site visit during the month of November. During the month of February, almost no site visit ended up with the customer spending.
Here the categorical variables are one hot encoded.
# define one-hot encoding function
#
dummy <- dummyVars(" ~ .", data=customer_dataset1[, -18])
#perform one-hot encoding on data frame
customer_encoded <- data.frame(predict(dummy,
newdata=customer_dataset1[, -18]))
# view final data frame
#
head(customer_encoded)
## administrative administrative_duration informational
## 12046 5 200.8333 0
## 1969 0 0.0000 0
## 9329 0 0.0000 0
## 10233 1 3.0000 0
## 11320 5 106.0000 0
## 3863 0 0.0000 0
## informational_duration productrelated productrelated_duration bouncerates
## 12046 0 44 1909.081 0.004761905
## 1969 0 32 999.000 0.000000000
## 9329 0 9 82.500 0.000000000
## 10233 0 11 1046.667 0.005555556
## 11320 0 44 646.825 0.007407407
## 3863 0 141 2886.524 0.004316547
## exitrates pagevalues SpecialDay month.Aug month.Dec month.Feb month.Jul
## 12046 0.01825397 0.000000 0.0 0 1 0 0
## 1969 0.01344086 9.053082 0.0 0 0 0 0
## 9329 0.03333333 0.000000 0.0 0 1 0 0
## 10233 0.02500000 0.000000 0.0 0 0 0 0
## 11320 0.02885185 0.000000 0.0 0 1 0 0
## 3863 0.01879809 0.000000 0.8 0 0 0 0
## month.June month.Mar month.May month.Nov month.Oct month.Sep
## 12046 0 0 0 0 0 0
## 1969 0 1 0 0 0 0
## 9329 0 0 0 0 0 0
## 10233 0 0 0 1 0 0
## 11320 0 0 0 0 0 0
## 3863 0 0 1 0 0 0
## operatingsystems.1 operatingsystems.2 operatingsystems.3
## 12046 1 0 0
## 1969 1 0 0
## 9329 0 1 0
## 10233 0 0 1
## 11320 0 0 1
## 3863 0 1 0
## operatingsystems.4 operatingsystems.5 operatingsystems.6
## 12046 0 0 0
## 1969 0 0 0
## 9329 0 0 0
## 10233 0 0 0
## 11320 0 0 0
## 3863 0 0 0
## operatingsystems.7 operatingsystems.8 browser.1 browser.2 browser.3
## 12046 0 0 0 0 0
## 1969 0 0 1 0 0
## 9329 0 0 0 1 0
## 10233 0 0 0 1 0
## 11320 0 0 0 1 0
## 3863 0 0 0 1 0
## browser.4 browser.5 browser.6 browser.7 browser.8 browser.9 browser.10
## 12046 0 0 0 0 1 0 0
## 1969 0 0 0 0 0 0 0
## 9329 0 0 0 0 0 0 0
## 10233 0 0 0 0 0 0 0
## 11320 0 0 0 0 0 0 0
## 3863 0 0 0 0 0 0 0
## browser.11 browser.12 browser.13 region.1 region.2 region.3 region.4
## 12046 0 0 0 1 0 0 0
## 1969 0 0 0 1 0 0 0
## 9329 0 0 0 1 0 0 0
## 10233 0 0 0 1 0 0 0
## 11320 0 0 0 0 0 0 0
## 3863 0 0 0 0 0 0 1
## region.5 region.6 region.7 region.8 region.9 traffictype.1 traffictype.2
## 12046 0 0 0 0 0 0 0
## 1969 0 0 0 0 0 1 0
## 9329 0 0 0 0 0 1 0
## 10233 0 0 0 0 0 0 0
## 11320 0 0 0 0 1 0 0
## 3863 0 0 0 0 0 0 0
## traffictype.3 traffictype.4 traffictype.5 traffictype.6 traffictype.7
## 12046 0 0 0 0 0
## 1969 0 0 0 0 0
## 9329 0 0 0 0 0
## 10233 0 0 0 0 0
## 11320 0 0 0 0 0
## 3863 1 0 0 0 0
## traffictype.8 traffictype.9 traffictype.10 traffictype.11 traffictype.12
## 12046 0 0 1 0 0
## 1969 0 0 0 0 0
## 9329 0 0 0 0 0
## 10233 0 0 1 0 0
## 11320 0 0 0 0 0
## 3863 0 0 0 0 0
## traffictype.13 traffictype.14 traffictype.15 traffictype.16
## 12046 0 0 0 0
## 1969 0 0 0 0
## 9329 0 0 0 0
## 10233 0 0 0 0
## 11320 1 0 0 0
## 3863 0 0 0 0
## traffictype.17 traffictype.18 traffictype.19 traffictype.20
## 12046 0 0 0 0
## 1969 0 0 0 0
## 9329 0 0 0 0
## 10233 0 0 0 0
## 11320 0 0 0 0
## 3863 0 0 0 0
## visitortype.New_Visitor visitortype.Other visitortype.Returning_Visitor
## 12046 0 0 1
## 1969 0 0 1
## 9329 0 0 1
## 10233 0 0 1
## 11320 0 0 1
## 3863 0 0 1
## weekend.FALSE weekend.TRUE
## 12046 1 0
## 1969 1 0
## 9329 1 0
## 10233 0 1
## 11320 1 0
## 3863 1 0
# add revenue varaible to encoded dataset
#
customer_encoded$revenue <- customer_dataset1$revenue
# preview first six columns
head(customer_encoded)
## administrative administrative_duration informational
## 12046 5 200.8333 0
## 1969 0 0.0000 0
## 9329 0 0.0000 0
## 10233 1 3.0000 0
## 11320 5 106.0000 0
## 3863 0 0.0000 0
## informational_duration productrelated productrelated_duration bouncerates
## 12046 0 44 1909.081 0.004761905
## 1969 0 32 999.000 0.000000000
## 9329 0 9 82.500 0.000000000
## 10233 0 11 1046.667 0.005555556
## 11320 0 44 646.825 0.007407407
## 3863 0 141 2886.524 0.004316547
## exitrates pagevalues SpecialDay month.Aug month.Dec month.Feb month.Jul
## 12046 0.01825397 0.000000 0.0 0 1 0 0
## 1969 0.01344086 9.053082 0.0 0 0 0 0
## 9329 0.03333333 0.000000 0.0 0 1 0 0
## 10233 0.02500000 0.000000 0.0 0 0 0 0
## 11320 0.02885185 0.000000 0.0 0 1 0 0
## 3863 0.01879809 0.000000 0.8 0 0 0 0
## month.June month.Mar month.May month.Nov month.Oct month.Sep
## 12046 0 0 0 0 0 0
## 1969 0 1 0 0 0 0
## 9329 0 0 0 0 0 0
## 10233 0 0 0 1 0 0
## 11320 0 0 0 0 0 0
## 3863 0 0 1 0 0 0
## operatingsystems.1 operatingsystems.2 operatingsystems.3
## 12046 1 0 0
## 1969 1 0 0
## 9329 0 1 0
## 10233 0 0 1
## 11320 0 0 1
## 3863 0 1 0
## operatingsystems.4 operatingsystems.5 operatingsystems.6
## 12046 0 0 0
## 1969 0 0 0
## 9329 0 0 0
## 10233 0 0 0
## 11320 0 0 0
## 3863 0 0 0
## operatingsystems.7 operatingsystems.8 browser.1 browser.2 browser.3
## 12046 0 0 0 0 0
## 1969 0 0 1 0 0
## 9329 0 0 0 1 0
## 10233 0 0 0 1 0
## 11320 0 0 0 1 0
## 3863 0 0 0 1 0
## browser.4 browser.5 browser.6 browser.7 browser.8 browser.9 browser.10
## 12046 0 0 0 0 1 0 0
## 1969 0 0 0 0 0 0 0
## 9329 0 0 0 0 0 0 0
## 10233 0 0 0 0 0 0 0
## 11320 0 0 0 0 0 0 0
## 3863 0 0 0 0 0 0 0
## browser.11 browser.12 browser.13 region.1 region.2 region.3 region.4
## 12046 0 0 0 1 0 0 0
## 1969 0 0 0 1 0 0 0
## 9329 0 0 0 1 0 0 0
## 10233 0 0 0 1 0 0 0
## 11320 0 0 0 0 0 0 0
## 3863 0 0 0 0 0 0 1
## region.5 region.6 region.7 region.8 region.9 traffictype.1 traffictype.2
## 12046 0 0 0 0 0 0 0
## 1969 0 0 0 0 0 1 0
## 9329 0 0 0 0 0 1 0
## 10233 0 0 0 0 0 0 0
## 11320 0 0 0 0 1 0 0
## 3863 0 0 0 0 0 0 0
## traffictype.3 traffictype.4 traffictype.5 traffictype.6 traffictype.7
## 12046 0 0 0 0 0
## 1969 0 0 0 0 0
## 9329 0 0 0 0 0
## 10233 0 0 0 0 0
## 11320 0 0 0 0 0
## 3863 1 0 0 0 0
## traffictype.8 traffictype.9 traffictype.10 traffictype.11 traffictype.12
## 12046 0 0 1 0 0
## 1969 0 0 0 0 0
## 9329 0 0 0 0 0
## 10233 0 0 1 0 0
## 11320 0 0 0 0 0
## 3863 0 0 0 0 0
## traffictype.13 traffictype.14 traffictype.15 traffictype.16
## 12046 0 0 0 0
## 1969 0 0 0 0
## 9329 0 0 0 0
## 10233 0 0 0 0
## 11320 1 0 0 0
## 3863 0 0 0 0
## traffictype.17 traffictype.18 traffictype.19 traffictype.20
## 12046 0 0 0 0
## 1969 0 0 0 0
## 9329 0 0 0 0
## 10233 0 0 0 0
## 11320 0 0 0 0
## 3863 0 0 0 0
## visitortype.New_Visitor visitortype.Other visitortype.Returning_Visitor
## 12046 0 0 1
## 1969 0 0 1
## 9329 0 0 1
## 10233 0 0 1
## 11320 0 0 1
## 3863 0 0 1
## weekend.FALSE weekend.TRUE revenue
## 12046 1 0 FALSE
## 1969 1 0 TRUE
## 9329 1 0 FALSE
## 10233 0 1 FALSE
## 11320 1 0 FALSE
## 3863 1 0 FALSE
Kmeans is only suitable for clustering continuous data. The categorical variables shall be dropped and the the continuous variables normalized for modeling
The data of numeric variables from the skew and Kurtosis analysis did not follow a Gaussian distribution. The data shall first be normalized
# normalizing the customer data set
#
customerNorm <- as.data.frame(scale(num_dataset))
# Visualize first six records of normalized dataset
#
head(customerNorm)
## administrative administrative_duration informational
## 12046 0.7704387 0.6522770 -0.4179289
## 1969 -0.7132666 -0.4855189 -0.4179289
## 9329 -0.7132666 -0.4855189 -0.4179289
## 10233 -0.4165255 -0.4685227 -0.4179289
## 11320 0.7704387 0.1150107 -0.4179289
## 3863 -0.7132666 -0.4855189 -0.4179289
## informational_duration productrelated productrelated_duration bouncerates
## 12046 -0.2543652 0.30787203 0.43641435 -0.3398862
## 1969 -0.2543652 0.01171467 -0.11031671 -0.4492033
## 9329 -0.2543652 -0.55592028 -0.66090425 -0.4492033
## 10233 -0.2543652 -0.50656072 -0.08168095 -0.3216666
## 11320 -0.2543652 0.30787203 -0.32188591 -0.2791544
## 3863 -0.2543652 2.70181073 1.02361338 -0.3501101
## exitrates pagevalues SpecialDay
## 12046 -0.4954884 -0.3337685 -0.2942093
## 1969 -0.6026941 0.1578864 -0.2942093
## 9329 -0.1596153 -0.3337685 -0.2942093
## 10233 -0.3452294 -0.3337685 -0.2942093
## 11320 -0.2594344 -0.3337685 -0.2942093
## 3863 -0.4833687 -0.3337685 3.8691295
K-means clustering is a clustering algorithm that is commonly used for partitioning a given data set into a set of k groups (i.e. k clusters), where k represents the number of groups pre-specified. Cluster analysis is widely used in the biological and behavioral sciences, marketing, and medical research
We can compute k-means in R with the kmeans function. Here will group the data into from 2-6 clusters (centers = 2-6). We will add nstart=25 generates 25 initial configuration
# setting seed
#
set.seed(123)
# kmeans clusters ranging from 2-6 clusters
#
customer_k2 <- kmeans(customerNorm, centers = 2, nstart = 25)
customer_k3 <- kmeans(customerNorm, centers = 3, nstart = 25)
customer_k4 <- kmeans(customerNorm, centers = 4, nstart = 25)
customer_k5 <- kmeans(customerNorm, centers = 5, nstart = 25)
customer_k6 <- kmeans(customerNorm, centers = 6, nstart = 25)
# We can plot these clusters for different K value to compare.
#
p1 <- fviz_cluster(customer_k2, geom = "point", data = customerNorm) + ggtitle(" K = 2")
p2 <- fviz_cluster(customer_k3, geom = "point", data = customerNorm) + ggtitle(" K = 3")
p3 <- fviz_cluster(customer_k4, geom = "point", data = customerNorm) + ggtitle(" K = 4")
p4 <- fviz_cluster(customer_k5, geom = "point", data = customerNorm) + ggtitle(" K = 5")
p5 <- fviz_cluster(customer_k6, geom = "point", data = customerNorm) + ggtitle(" K = 6")
grid.arrange(p1, p2, p3, p4, p5, nrow = 2)
from the plots a K value of two seems to classify the data best
K-means clustering requires that you specify in advance the number of clusters to extract. A plot of the total within-groups sums of squares against the number of clusters in a k-means solution can be helpful. A bend in the graph can suggest the appropriate number of clusters.
We shall employ the elbow method, Silhouette method and Gap statistic
# Determining Optimal clusters (k) Using Elbow
#
fviz_nbclust(x = customerNorm,FUNcluster = kmeans, method = 'wss' )
# Determining Optimal clusters (k) Using Average Silhouette Method
#
fviz_nbclust(x = customerNorm,FUNcluster = kmeans, method = 'silhouette' )
# compute gap statistic
set.seed(123)
gap_stat <- clusGap(x = customerNorm, FUN = kmeans, K.max = 15, nstart = 25, B = 50, iter.max=30)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 150000)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 150000)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 150000)
# Print the result
print(gap_stat, method = "firstmax")
## Clustering Gap statistic ["clusGap"] from call:
## clusGap(x = customerNorm, FUNcluster = kmeans, K.max = 15, B = 50, nstart = 25, iter.max = 30)
## B=50 simulated reference sets, k = 1..15; spaceH0="scaledPCA"
## --> Number of clusters (method 'firstmax'): 15
## logW E.logW gap SE.sim
## [1,] 7.943042 9.346686 1.403644 0.003049904
## [2,] 7.825745 9.259997 1.434252 0.003181899
## [3,] 7.703457 9.209317 1.505861 0.002987526
## [4,] 7.621512 9.172116 1.550604 0.003098483
## [5,] 7.549524 9.146627 1.597103 0.003072211
## [6,] 7.491998 9.124017 1.632019 0.003090704
## [7,] 7.449260 9.107396 1.658136 0.003043800
## [8,] 7.412774 9.091432 1.678658 0.003105021
## [9,] 7.379764 9.078670 1.698906 0.003080937
## [10,] 7.324016 9.066652 1.742636 0.003129988
## [11,] 7.285475 9.055486 1.770011 0.003083274
## [12,] 7.252109 9.044897 1.792788 0.003127841
## [13,] 7.225391 9.034961 1.809569 0.003108134
## [14,] 7.203641 9.025603 1.821963 0.003114827
## [15,] 7.178958 9.016678 1.837720 0.003144910
# plot the result to determine the optimal number of clusters.
#
fviz_gap_stat(gap_stat)
Using the Gap statistic, silhouette method and the elbow method the optimum was found to be k =2
# setting seed
#
set.seed(123)
# Compute k-means clustering with k = 2
#
cl <- kmeans(customerNorm, centers = 2, nstart = 25)
# Visualization of results
#
fviz_cluster(cl, data = customerNorm)
# Let’s check out the centers and size of each cluster.
#
cl$centers
## administrative administrative_duration informational informational_duration
## 1 -0.2737953 -0.2338145 -0.2685363 -0.2095138
## 2 1.3656977 1.1662725 1.3394657 1.0450600
## productrelated productrelated_duration bouncerates exitrates pagevalues
## 1 -0.2546176 -0.2513869 0.06240071 0.09600273 -0.05511301
## 2 1.2700386 1.2539237 -0.31125621 -0.47886390 0.27490504
## SpecialDay
## 1 0.03732511
## 2 -0.18617853
# cluster sizes
#
cl$size
## [1] 2499 501
# We can extract the clusters and add to our initial data to do
# some descriptive statistics at the cluster level
#
customerNorm %>%
mutate(Cluster = cl$cluster) %>%
group_by(Cluster) %>%
summarize_all('median')
## # A tibble: 2 × 11
## Cluster administrative administrative_duration informational informational_du…
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 -0.713 -0.486 -0.418 -0.254
## 2 2 1.36 0.631 1.18 0.265
## # … with 6 more variables: productrelated <dbl>, productrelated_duration <dbl>,
## # bouncerates <dbl>, exitrates <dbl>, pagevalues <dbl>, SpecialDay <dbl>
Finally, summarize our model.
# Print model summary with 30 records only
#
print(cl)
## K-means clustering with 2 clusters of sizes 2499, 501
##
## Cluster means:
## administrative administrative_duration informational informational_duration
## 1 -0.2737953 -0.2338145 -0.2685363 -0.2095138
## 2 1.3656977 1.1662725 1.3394657 1.0450600
## productrelated productrelated_duration bouncerates exitrates pagevalues
## 1 -0.2546176 -0.2513869 0.06240071 0.09600273 -0.05511301
## 2 1.2700386 1.2539237 -0.31125621 -0.47886390 0.27490504
## SpecialDay
## 1 0.03732511
## 2 -0.18617853
##
## Clustering vector:
## 12046 1969 9329 10233 11320 3863 928 8622 4888 10545 3679 2669 5174
## 1 1 1 1 1 1 1 1 1 1 1 2 1
## 137 9001 4422 7051 4313 8477 9857 5535 1989 11753 12222 11680 11051
## 1 1 1 1 1 2 1 1 2 1 2 1 1
## 7434 381 2767 10280 1005 6503 11275 8525 2786 9506 6924 4476 11025
## 1 1 1 1 1 1 2 1 1 1 1 1 2
## 4063 69 950 154 7571 3880 6045 1686 7480 2456 8933 6770 7027
## 2 1 1 1 1 1 1 2 1 1 2 2 1
## 845 796 245 5574 4858 5597 3915 8017 11098 10444 4807 11620 9994
## 2 1 1 1 1 2 1 2 1 1 1 1 1
## 11460 7579 11923 351 1034 8119 5699 8244 5902 8504 5583 10876 5470
## 1 1 2 1 1 1 2 1 1 1 1 2 1
## 7744 8939 5960 11468 11919 1701 316 8383 3418 1646 6569 9408 10404
## 2 1 1 1 1 1 2 1 1 1 2 1 1
## 2591 6438 10373 10000 9625 3921 11293 4465 1205 982 2906 3060 6248
## 2 1 2 1 1 1 1 1 1 1 2 2 1
## 1608 10727 3013 9057 5987 6419 10695 12206 5124 7256 6149 11770 9640
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 11873 3527 1360 3565 2005 3279 3029 10285 5792 453 1953 8615 1980
## 1 2 1 1 1 1 1 2 1 1 1 1 1
## 820 5274 8619 6483 9935 11503 5750 5344 11219 11027 9469 2256 4363
## 1 1 2 1 1 1 1 1 2 2 1 2 1
## 10917 5336 4570 8929 7869 3577 1330 5120 3604 10420 3959 3742 5799
## 2 1 1 1 1 1 1 1 1 2 1 1 1
## 5081 10356 222 10979 10388 11250 8323 9411 8841 9806 8056 11371 10268
## 1 1 1 1 2 2 1 1 1 1 1 1 1
## 4703 6113 4523 9264 7146 7313 10196 4822 6827 3597 11555 9970 2635
## 1 1 1 2 2 1 2 2 1 1 2 2 1
## 1949 7971 397 1366 436 7906 11709 6241 11554 6525 5893 868 9726
## 1 1 1 1 1 1 1 2 1 1 1 1 1
## 7688 4880 11273 5234 8572 2687 1522 7568 2068 4892 4581 2893 4571
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 7095 3030 5914 12215 6898 1175 11257 6473 2617 4319 2693 11111 9704
## 2 2 1 1 1 1 2 1 1 1 1 1 1
## 3177 11462 3919 11704 3317 787 5188 1403 8345 10322 8613 6940 10085
## 1 1 1 1 1 1 1 1 2 2 2 2 1
## 10555 6939 1163 4307 8835 4200 4968 6306 6351 9312 10949 7824 860
## 1 1 1 1 1 2 1 1 1 2 2 1 1
## 8513 1051 12217 9306 10642 8865 70 619 6604 10931 3969 6771 6949
## 1 1 2 1 2 1 1 2 2 2 1 2 2
## 6983 2574 5607 4014 8559 9200 985 4901 2569 6047 7308 9122 10799
## 1 1 1 1 1 1 1 1 2 2 2 1 1
## 3061 10733 10887 11213 7749 3854 11469 3278 6970 8607 10035 10329 10161
## 1 2 1 1 1 1 1 1 2 1 2 1 1
## 1018 4262 2880 10251 7285 6163 6356 437 3672 5538 11781 10132 5985
## 1 1 1 1 1 2 1 1 1 1 1 1 1
## 1365 6302 966 5088 11242 4109 689 5898 444 4944 7539 9124 5453
## 1 2 1 1 1 2 1 1 1 1 2 1 1
## 2529 2757 3168 6275 3523 12238 8707 2055 11236 486 6750 526 9758
## 1 1 1 1 1 2 1 1 1 1 2 1 1
## 1565 10174 435 4861 9801 11790 10433 6715 1407 810 864 97 10429
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 5222 6416 9072 9871 1623 5974 4342 6009 3512 9044 4728 6901 945
## 1 1 1 1 1 1 1 1 1 2 1 1 2
## 5859 6784 2316 6557 7783 7261 11313 7021 7936 8240 7243 254 11129
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 8245 7179 266 105 1320 6343 4626 9507 7658 9601 10133 3887 7613
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5804 5283 2178 935 11708 7690 5198 11595 1286 6048 462 5474 2648
## 1 1 1 1 1 2 1 2 1 1 1 2 1
## 2760 5037 11490 3751 9731 10606 9402 11807 8509 6987 5320 12294 9788
## 2 1 1 1 1 1 1 2 1 1 1 1 1
## 8121 11186 632 5980 6911 681 3773 3111 6657 162 11962 7482 639
## 2 1 1 1 1 1 1 1 1 1 2 1 1
## 9113 455 947 8127 1270 1152 3759 2568 6352 11271 10525 1507 7169
## 1 1 1 1 1 1 1 1 1 2 1 1 2
## 1782 11149 8616 11953 4707 9208 2375 8192 7852 6017 8033 5191 6157
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3227 5564 8920 4195 9892 6536 7122 9680 6134 6614 3629 10777 6378
## 1 1 2 1 1 1 2 1 2 2 2 1 1
## 11081 2585 9634 9664 3136 1155 9228 8784 7981 687 11463 624 10868
## 1 1 2 1 1 1 1 1 1 1 1 1 1
## 1470 11662 8975 148 2677 6662 11832 10275 10614 1822 1946 10986 5512
## 1 1 1 1 1 1 1 2 1 1 1 2 1
## 8232 7947 2013 367 5824 5544 7755 2955 2232 9223 1101 10678 3746
## 1 1 2 1 2 1 1 1 1 1 1 1 1
## 7980 7564 7108 47 1178 10245 5040 10460 6663 1125 11854 500 6829
## 1 2 1 1 1 2 2 2 1 1 2 2 1
## 2847 1662 3803 9346 9135 1512 554 12014 745 2868 8103 5260 8680
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 958 12276 9309 770 3731 6271 3098 4503 8884 10707 10217 6702 4012
## 1 1 1 2 1 1 1 1 2 1 1 1 1
## 198 7259 1140 1042 951 8952 8896 3172 178 5847 2991 6370 2714
## 1 1 1 2 1 2 1 2 1 1 1 1 1
## 12150 4335 11683 8903 1998 10865 3457 5953 6 10645 9530 1253 7385
## 1 1 1 1 1 1 1 2 1 2 1 1 2
## 7220 993 11055 10215 3065 9693 7500 6040 7163 961 541 1943 1484
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 8658 4805 3789 5530 8484 11303 5814 751 9799 743 1422 6927 11604
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 1187 11109 3145 350 9276 5744 10810 96 715 9512 1815 466 9930
## 1 2 1 1 1 1 1 1 1 1 1 1 1
## 344 12067 11506 1489 1479 2484 8312 5182 8227 7291 6762 1417 969
## 1 1 1 1 1 1 1 1 1 2 1 1 1
## 2386 4667 1670 2252 2053 7933 718 1900 2195 901 6035 2864 2793
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 10671 5192 10326 5349 36 5820 6130 6485 12229 7012 3757 6851 11325
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2547 6916 9686 11710 3071 7485 4039 5706 10315 11729 8597 12003 176
## 1 1 1 2 1 2 1 1 1 1 2 1 1
## 5952 4181 1531 10870 528 1361 10925 10719 7518 3405 11602 9398 1404
## 1 2 1 2 1 1 2 1 1 1 2 1 1
## 10838 11668 12180 8858 4326 2570 8205 2168 11742 5628 345 1184 8957
## 1 1 2 1 1 1 1 1 1 1 1 1 1
## 5514 1780 12205 8560 2420 8406 5746 11412 9932 451 4952 3540 9455
## 2 1 2 1 1 1 1 1 1 1 1 1 1
## 10028 8931 7595 1307 8992 922 7245 2517 4845 8836 4848 7089 1260
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3315 6971 10874 7230 11610 10964 134 494 11315 3558 4371 5054 11194
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 575 6706 2548 10447 420 6943 4192 3460 8586 1332 3547 8167 2579
## 2 1 1 1 1 1 1 1 2 1 1 1 1
## 1542 5394 10802 3705 7816 4068 4448 3218 5762 11868 3157 5655 12033
## 1 1 2 1 1 1 1 1 1 1 2 1 1
## 7173 1899 2403 5377 7297 12247 10350 754 2512 3997 11395 439 2126
## 1 1 1 1 1 1 1 1 1 1 2 1 1
## 7150 1040 1378 6908 11537 359 8 9478 8942 4925 264 7566 8182
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 38 9143 2919 6101 5742 3230 8795 3035 4261 4242 10520 6632 6963
## 1 1 1 1 1 1 2 1 2 1 1 1 1
## 3217 241 139 17 8649 9617 6925 4436 7511 7988 11135 1935 8446
## 1 1 1 1 1 2 2 1 1 1 2 1 1
## 11227 6146 10204 7782 3835 8990 3579 8639 9913 6768 5846 2249 2231
## 1 1 1 1 1 1 1 1 1 1 2 1 2
## 2037 3845 11106 4377 2444 6671 11965 9144 4435 5110 4640 5636 7846
## 1 1 1 1 1 1 2 1 1 1 1 1 1
## 336 6331 8861 4597 8696 1554 4949 738 6098 11088 6099 5381 5195
## 1 1 1 1 1 1 1 1 1 2 2 1 2
## 1387 7203 5539 8152 2390 5701 1804 10419 10005 291 9573 1261 7638
## 1 1 1 1 2 1 1 1 1 1 2 1 1
## 11305 1461 2941 10221 261 12136 4015 3131 12001 3344 9709 2114 7302
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 1150 5936 196 3347 10070 8358 11038 12115 8965 11533 3602 10619 4127
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 7546 3675 1636 8098 11898 5329 6759 9738 6811 6262 10361 520 11974
## 1 1 1 1 1 1 1 2 1 2 2 1 1
## 7662 10323 5522 7812 6170 3348 5288 2372 8396 5380 4251 10194 1795
## 1 1 1 1 1 1 1 2 1 1 1 1 1
## 8612 8941 7490 9335 2865 4223 6171 11252 9315 1818 10836 9303 11822
## 1 1 1 2 1 1 2 1 1 1 2 1 1
## 4382 9136 1059 9249 3711 7907 2253 4163 2498 10061 11033 2663 6926
## 2 1 1 2 1 1 1 2 1 1 1 1 1
## 4425 8475 10968 9963 5541 2452 10023 11013 834 2234 8863 6125 11638
## 1 1 1 1 1 2 1 1 1 1 1 2 1
## 3439 2152 7601 12007 5078 5513 11120 2567 2644 10055 10334 2429 6929
## 1 1 1 1 1 2 1 1 1 1 2 2 1
## 405 7801 8678 12323 5791 1727 4115 4074 11432 2182 9400 6246 7019
## 1 1 2 1 1 1 2 1 1 1 2 1 1
## 6803 8356 8986 6612 1576 8948 2653 1720 7162 8731 5944 9196 7399
## 2 2 1 1 1 2 1 1 1 1 1 2 2
## 5107 6875 11641 6855 4033 3839 8538 5491 4009 2289 5948 9837 5224
## 1 1 1 2 1 1 1 1 1 1 1 1 1
## 10539 11722 1056 11356 11197 683 3637 652 9527 2321 10909 88 932
## 2 1 1 1 1 1 2 1 1 1 2 1 2
## 9690 6959 12201 9846 4430 11538 7242 695 4303 6155 991 4991 9192
## 2 2 1 2 1 2 1 2 1 1 1 1 1
## 7205 11054 3697 5579 10378 8079 5768 8155 10670 9104 3494 12158 9701
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2758 2937 5912 6088 2745 8258 5202 10687 8792 2931 11684 4036 3939
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 4453 6886 1381 7631 6572 3582 1637 7990 10097 323 3932 8463 914
## 1 2 1 1 1 1 1 1 1 1 1 1 1
## 10237 10742 4423 8185 3015 4388 11529 9132 1083 5278 5480 11716 693
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 7268 4988 2925 10732 2076 12261 534 4967 1337 3206 11660 1710 3908
## 1 1 1 1 1 2 1 1 1 1 1 1 1
## 10769 558 1837 6383 6204 7681 4616 3980 63 9940 11382 5993 10230
## 1 1 1 1 1 1 1 1 2 2 1 1 1
## 5216 2495 1157 7558 7190 9551 7244 2597 7920 5225 9752 9958 6783
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 10804 3489 8235 5621 7171 11108 10541 3292 11339 4394 6800 2260 1960
## 1 1 2 1 1 1 2 1 1 1 1 2 1
## 450 7217 9674 8853 11895 8350 3998 11052 11780 8688 11341 7573 11586
## 1 1 1 1 1 2 1 1 2 2 1 2 1
## 10189 8010 6222 11163 6780 12134 7157 11408 1106 11744 1622 2211 6309
## 1 2 1 1 2 2 1 2 2 2 1 1 1
## 5543 3078 9250 5245 1885 10813 10129 11228 2792 389 4193 2606 5362
## 1 1 2 1 1 1 1 2 1 1 1 1 1
## 7113 9043 9611 10150 4152 6073 10616 10811 9872 2158 6985 8888 34
## 1 1 1 1 1 2 1 1 1 1 1 1 1
## 7321 11887 8685 12174 10648 4840 5611 10181 6754 4 3101 510 858
## 1 1 1 1 1 1 1 2 1 1 1 1 1
## 6686 11880 12162 111 3906 12125 3524 12098 8320 11439 10546 10518 628
## 2 1 1 1 1 1 1 1 1 1 1 2 1
## 10252 10428 7837 11064 10083 6745 6775 3388 6713 10224 11674 2973 1068
## 2 1 1 1 2 1 1 1 1 1 1 1 1
## 9678 4577 3607 3178 1495 10911 6141 10040 2854 11943 12227 11087 1846
## 1 2 1 1 1 1 1 1 1 1 1 1 1
## 5693 7029 11658 4351 4627 10790 1105 4434 11706 946 3495 5332 9026
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 10193 726 8264 6335 6615 3367 12207 8880 3954 921 7628 3739 11201
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 5767 529 10158 416 2636 3093 9147 12060 5560 5848 4386 2712 11122
## 1 1 1 1 1 2 2 1 1 1 1 1 1
## 6645 611 3319 9946 7426 1715 2371 235 8055 1442 7652 2319 7389
## 2 2 1 1 1 1 1 1 1 1 1 1 1
## 8495 10091 7815 3655 7408 2913 11053 5475 2778 11212 6043 3740 9923
## 1 1 1 2 1 1 1 1 1 1 1 1 2
## 12219 10487 7882 3156 10741 4638 1888 10117 4488 2824 696 827 3532
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 11401 7923 7969 8704 6998 257 5728 4379 1728 6180 7411 3764 3633
## 2 1 1 1 2 1 1 2 1 1 2 1 1
## 11223 5158 7368 1423 5803 7472 540 6355 5776 2706 12101 11014 430
## 1 1 1 1 1 1 1 1 1 1 1 2 1
## 3257 6185 7168 5724 5449 9471 3894 11989 702 4295 5293 7916 11992
## 1 1 1 1 1 1 1 2 1 1 1 1 1
## 2177 1930 10943 7645 11007 195 11848 10620 2313 9269 9845 10033 1091
## 1 1 1 1 1 1 1 2 2 2 1 2 1
## 9591 3726 11984 1345 8638 3844 6404 1800 414 11302 647 731 626
## 1 1 2 1 1 1 2 1 1 2 1 1 1
## 6822 1400 10024 6465 3750 262 4030 4710 8533 395 1316 6462 5145
## 1 1 1 1 2 1 1 1 1 1 1 1 1
## 7294 10652 6507 6986 2036 7857 881 7264 9548 882 7632 11628 8518
## 1 1 1 1 1 2 1 2 1 1 1 1 1
## 10627 5031 11635 8137 9423 7379 180 3149 2002 1063 7550 6590 6885
## 1 2 1 1 1 1 1 2 1 1 2 1 1
## 10849 109 10020 1239 11585 2506 10018 5520 4302 5505 6322 6177 8972
## 1 1 1 1 1 1 1 2 2 1 1 1 1
## 1336 10871 9275 7867 6110 11820 9295 706 4531 1288 5563 11159 2884
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2171 6773 8347 7395 8983 10248 9736 8859 7069 9383 3895 12289 5058
## 1 1 2 1 1 1 1 1 1 2 1 1 1
## 8847 5891 8921 9277 4542 7667 9934 5266 9019 5632 6816 3704 8962
## 2 1 1 1 1 1 2 1 2 1 1 1 1
## 8479 4722 588 4419 10928 4123 3938 2532 3768 2655 6669 1879 4789
## 1 1 1 1 2 1 1 1 1 1 1 1 2
## 8369 11640 429 9549 2603 3379 9242 6175 7396 7768 5933 295 10534
## 2 2 1 2 1 1 1 1 1 1 1 1 1
## 2794 268 8593 5832 2609 2212 5138 1446 9656 10676 10965 11190 1596
## 1 1 1 1 1 1 1 1 1 1 1 2 1
## 6379 844 6942 10990 7189 3865 1444 5823 7272 7286 3244 6025 12248
## 1 1 2 1 1 2 1 1 1 1 1 1 2
## 10101 3202 11233 7137 8004 6894 10916 2840 3440 5589 2907 7471 10239
## 1 1 2 1 1 2 2 2 2 1 1 1 2
## 8300 11738 4035 2946 11835 8129 9238 3031 2837 11948 9489 1238 10348
## 1 1 1 1 1 1 1 1 1 1 2 1 1
## 5154 12314 10491 711 4906 8359 5270 6589 3222 385 6792 6703 5261
## 2 2 1 1 1 1 1 1 1 2 2 1 1
## 10095 8100 8860 5469 5375 8999 8738 11619 11017 655 4962 12024 9405
## 1 1 1 2 2 1 2 1 1 1 1 1 1
## 949 6071 1882 5315 3871 5094 4739 7940 1678 2282 10853 4493 10321
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 2414 6849 742 4834 4782 2607 11755 7148 3720 6167 6078 4982 6202
## 1 2 1 1 1 2 1 1 2 2 1 1 1
## 7549 4162 1667 1406 1162 4689 7702 6372 9521 11237 6190 253 11327
## 1 1 2 1 1 1 1 2 1 1 1 1 1
## 2254 2466 11199 4247 6637 4345 6861 5429 5576 659 7222 6281 10956
## 1 1 1 1 1 1 2 1 2 1 1 1 1
## 3567 173 2781 1313 6972 1099 3012 7492 12216 9331 11137 12105 1692
## 1 1 1 1 1 1 2 1 1 2 1 1 1
## 4520 5087 11045 5561 2856 9866 7995 7034 7898 6187 4624 6858 6809
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 12041 1833 2128 8837 2900 10007 1763 2191 8623 3322 5111 3190 3295
## 1 1 1 2 1 1 1 1 1 1 1 1 1
## 8052 10660 8132 452 1235 41 1549 862 6636 10343 4214 4160 9496
## 1 1 2 1 1 1 1 1 1 1 1 1 1
## 4072 6874 125 11860 2619 6140 4700 12093 7274 11142 6194 11175 7391
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 1088 4920 9017 4165 2358 2329 7131 2150 10103 4695 4529 4280 756
## 1 1 1 1 1 1 1 1 2 1 1 2 1
## 9596 2845 5268 3353 2702 498 1302 9374 3935 1875 1275 641 883
## 2 2 1 1 1 1 1 1 1 1 1 1 1
## 10967 3761 296 12198 1115 2308 9170 10241 6080 8270 5786 10260 4578
## 1 1 1 1 1 1 1 1 1 2 1 1 1
## 5074 3363 10797 4433 10547 448 10489 4470 4931 12183 4040 5895 7024
## 1 1 1 2 1 1 1 1 1 2 1 1 1
## 9284 3070 3185 4090 11359 6680 7780 3973 2882 2132 7590 3126 4926
## 1 1 1 1 1 1 1 1 1 1 2 1 1
## 7737 10743 8540 8928 11569 10169 12009 5775 1660 8494 2003 5313 2780
## 2 1 1 2 1 2 1 1 1 1 1 1 1
## 9773 919 461 9999 7827 5978 11830 6687 10570 3766 7279 8451 9462
## 1 2 1 2 1 1 2 1 1 2 1 1 1
## 2918 1509 2956 11678 2708 7068 2139 10325 11372 5913 5868 5545 3941
## 1 1 1 1 1 1 1 2 2 1 1 1 1
## 2009 9990 9219 7053 4783 7081 7606 9244 5061 2504 1668 7775 8917
## 2 1 2 1 1 2 1 1 1 1 1 1 1
## 1249 6701 1566 3488 6596 1496 9613 4755 11485 3594 4357 11367 10653
## 1 1 1 1 1 2 1 1 2 1 1 1 1
## 1615 12267 5289 3961 1905 9407 9205 2595 292 6511 2804 9579 1944
## 1 2 1 1 2 1 1 1 1 1 1 1 1
## 10831 8081 8682 7327 8364 9330 2025 11929 2012 116 1168 3475 4681
## 1 1 2 1 1 1 1 1 2 1 1 1 2
## 3338 6673 2493 5755 6011 7397 4081 3239 5006 9002 5567 3182 12036
## 1 1 1 2 1 1 1 1 1 2 1 1 1
## 8037 3681 12317 2790 7994 11626 6565 4723 8995 9811 8566 5232 1844
## 1 2 1 1 1 1 1 1 1 2 1 1 1
## 9338 4662 53 9911 4604 12165 3505 4147 4718 10822 1835 3787 9741
## 1 1 1 1 1 1 1 2 1 2 1 1 1
## 7870 3813 5233 8374 11010 9995 7941 1343 10470 5276 3689 1649 12064
## 1 2 1 1 1 1 1 2 1 2 1 1 1
## 4198 816 10435 11072 2998 836 2828 1228 642 5447 11666 9571 4256
## 1 1 1 2 1 1 1 1 2 1 1 1 1
## 5569 1689 11556 6653 4173 6635 6257 8307 9475 6128 5479 1252 6390
## 1 1 2 1 1 1 1 1 1 2 1 2 2
## 4630 3294 2445 8524 7455 1389 1931 12116 9157 1613 9483 240 11667
## 2 1 1 2 1 1 1 2 2 1 1 1 1
## 8879 11612 11301 9979 7008 10699 10895 11956 6544 2855 3956 4530 6466
## 2 2 1 2 2 1 2 1 1 1 2 1 1
## 8925 679 680 7998 6692 4137 3725 10086 11393 8838 3024 1902 8520
## 1 1 1 1 2 1 1 1 1 1 2 2 2
## 1185 151 11456 8053 976 9784 10552 2279 2327 6354 2496 6266 1380
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 4644 570 7678 4366 4589 2201 11280 6575 7808 6808 2021 8916 10443
## 1 1 1 1 1 1 1 1 2 1 1 1 1
## 1329 1714 1928 10171 3263 7915 12018 238 8527 4049 5384 7698 6658
## 1 1 1 2 1 1 2 1 1 1 1 1 1
## 10188 9245 8070 11767 11290 778 6212 60 1089 4854 6069 8249 4923
## 1 1 1 1 2 2 1 1 1 1 1 2 1
## 8478 3590 6111 9165 8019 4473 9520 11200 11884 4469 2678 9787 10292
## 1 1 2 2 1 1 1 2 1 1 1 1 1
## 7192 5114 4138 10649 522 4139 8466 3144 84 4599 354 5059 6232
## 1 1 1 2 1 1 1 2 1 1 2 1 1
## 6323 12094 3735 2300 3000 8432 8901 7170 2873 1774 8183 6454 3478
## 2 2 1 1 1 1 1 1 1 2 1 1 1
## 5316 5557 5324 12063 146 8715 9468 2091 672 7004 4554 2643 5053
## 1 1 1 1 1 1 2 1 1 1 1 1 1
## 3072 5926 4051 5438 2737 8340 5976 8868 6537 11573 2960 11726 5067
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5448 3224 3609 4240 1098 3974 3125 9735 3623 5828 1768 8266 9253
## 2 1 1 1 1 1 1 1 1 2 1 1 2
## 3554 7792 10297 5103 1505 2903 7372 6215 5924 1864 1257 10212 6659
## 1 1 2 2 1 2 2 2 1 1 1 1 1
## 11272 7796 4343 9231 8895 9343 10581 7647 9061 11927 6066 1721 7729
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 3057 10084 55 5572 6243 3255 9651 9052 11409 9925 7754 4360 6543
## 1 1 1 2 1 1 1 1 2 1 1 2 1
## 2497 11040 1213 5181 9880 9907 6420 7084 140 7865 10725 9587 11068
## 1 1 1 1 1 1 1 1 1 2 1 1 1
## 1991 9525 963 6739 10165 8705 5998 794 11633 6057 651 6458 27
## 1 1 1 1 1 1 1 1 1 1 1 2 1
## 4546 2116 2481 2031 4097 5618 2142 10238 3882 2369 5640 12023 8943
## 1 1 1 2 1 2 1 1 1 1 2 1 1
## 1526 1074 8069 1677 10126 9151 19 9781 3154 1695 8371 5092 1676
## 1 1 1 1 1 1 1 2 1 1 1 1 1
## 6288 579 7403 6022 4712 1124 8833 4569 10302 5108 8231 9920 2170
## 1 1 2 1 1 1 1 2 2 1 2 2 1
## 1873 9387 10749 1374 2558 4815 11578 2596 11364 3656 10665 2439 10718
## 1 2 1 1 2 1 1 1 1 1 1 1 1
## 11411 12044 10664 5157 7254 11789 2661 4489 6909 2598 7392 5496 4918
## 1 1 2 1 1 1 1 1 1 1 2 1 1
## 2501 4070 10794 7665 9181 9184 133 7343 8329 5822 11059 4642 2422
## 1 1 1 1 1 1 1 1 1 1 2 1 1
## 9024 6196 2247 9823 1682 7174 11011 5008 5011 4715 373 4773 8201
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6282 3369 917 9906 2924 10313 9030 8447 5478 11259 6482 10880 1794
## 1 1 2 1 1 1 1 1 1 2 1 1 2
## 5219 5492 4873 5416 6678 11603 11577 10407 6369 3552 3390 9706 2292
## 1 1 1 1 1 2 2 1 1 2 1 2 1
## 2842 10082 9377 6197 8905 3334 1130 11985 1811 9035 5821 10883 1750
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 7118 8634 4646 4136 2968 4285 3336 5850 8302 7504 9567 4000 5036
## 1 1 2 1 2 1 1 1 2 1 1 1 1
## 2 5531 12096 3184 11161 4935 964 9422 10064 3800 3504 5421 631
## 1 1 2 1 1 1 1 2 1 1 2 1 2
## 5294 5318 5622 11095 3966 10773 2174 4842 4963 1603 7640 8324 7604
## 1 1 1 1 1 1 1 1 1 1 1 1 2
## 8826 10746 7059 72 3836 3054 1090 8253 10910 9305 6294 5537 10635
## 1 1 1 1 1 1 1 2 1 1 1 2 1
## 2240 10785 9286 8690 9185 9989 5450 3907 12274 11464 8016 8534 8313
## 1 1 1 1 1 1 2 1 1 1 1 1 1
## 7673 5303 1004 12074 11360 2406 1057 8343 8379 10617 9354 9361 2063
## 1 1 1 1 1 1 1 1 1 2 2 1 1
## 8497 9676 3936 10765 6631 2857 8301 6868 11527 3005 9370 7364 9618
## 2 1 1 1 1 2 1 1 2 1 1 2 1
## 5371 7165 3141 11481 2071 11255 2978 598 12309 7565 7452 2600 876
## 1 2 1 1 2 2 1 1 1 1 1 2 1
## 4235 615 723 1355 11333 2826 3194 2061 12277 1755 6192 6296 12249
## 2 1 2 1 2 2 2 1 1 1 2 1 1
## 1424 4460 803 5430 3266 1362 10938 8241 7600 8174 11480 10395 348
## 1 1 1 1 1 1 2 1 1 2 1 1 1
## 1322 4330 11184 6072 8757 4161 1733 8195 225 9503 3548 11771 9539
## 1 1 1 1 2 1 2 1 2 1 1 1 1
## 2962 8516 4086 4102 4656 1901 2086 5093 1915 205 320 12127 5571
## 1 1 1 1 2 1 2 1 1 1 1 1 1
## 3799 1910 4140 8143 177 8878 11376 7196 1401 5945 9201 7440 42
## 1 1 1 1 1 2 2 1 1 1 1 1 1
## 3897 2764 6769 4760 4771 7147 4080 3904 375 8442 11166 9614 9007
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 9862 5050 6010 3223 4899 2883 1664 11276 2087 7561 4234 2849 4495
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 9037 4770 3216 1011 132 8114 11922 2870 8392 1752 11226 3174 2255
## 1 2 2 1 1 1 1 1 1 1 1 1 2
## 4128 5719 978 2166 3970 4732 10479 8848 8656 5171 6211 11536 2196
## 1 1 1 1 2 1 1 1 2 1 1 2 1
## 10530 7211 3105 563 1848 8526 3171 4297 8591 8782 5836 1396 7956
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 2264 10473 7649 4654 2853 2000 10281 4089 1085 9803 9631 4563 12045
## 1 2 1 1 1 1 2 1 1 2 1 1 1
## 10411 9048 5082 4467 5239 10116 3639 10608 6883 5325 3847 2844 8729
## 1 1 1 1 1 1 1 1 2 1 1 1 1
## 2534 7119 4612 3364 11178 11441 3779 7448 5736 4943 5244 2577 7892
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 11653 8474 12123 6290 11487 10354 11284 9195 2995 279 6342 3062 6722
## 1 1 2 1 1 1 1 1 1 1 1 1 1
## 7777 8457 3595 11596 4761 4264 2218 386 1429 6502 11346 8702 9427
## 1 1 1 1 1 2 1 1 1 1 2 2 1
## 7011 4414 10809 2233 6106 5105 2660 9101 905 5424 234 1580 2146
## 2 1 1 1 1 1 1 1 1 1 2 2 1
## 7080 7718 7278 8411 9962 1563 11043 8761 12061 2065 2350 11793 3259
## 2 1 1 1 1 1 1 1 1 2 1 1 1
## 3778 470 6582 10032 4318 12281 8065 12166 11550 2219 4131 2141 3118
## 1 1 1 2 1 1 1 1 2 1 2 1 1
## 10222 2654 5 1691 6501 2118 3059 1642 4941 9154 5852 10272 4890
## 2 1 1 1 1 1 1 1 1 1 1 1 2
## 3097 6585 7741 9213 2976 3023 10102 12159 8213 249 3715 11042 4855
## 1 1 1 1 1 1 2 1 2 2 1 1 2
## 5068 5765 1203 11671 8400 378 1985 12055 10852 2427 7669 3301 1743
## 2 1 2 1 1 1 1 1 1 1 1 1 1
## 9302 5809 1907 4596 8994 5510 6996 7176 5311 11357 10357 5112 11888
## 1 1 1 1 2 2 1 1 1 1 1 2 1
## 736 1113 6496 708 8225 68 11489 1524 9273 10763 5866 8336 5624
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 3050 2676 4688 12135 2474 3550 8898 9116 4224 8720 9121 2119 6233
## 1 1 1 1 2 1 1 2 1 1 1 1 1
## 9419 126 10157 7484 11083 11217 3574 9972 403 4519 4585 4745 2779
## 1 1 1 1 1 1 1 2 2 1 1 1 1
## 7625 4965 11021 487 12218 661 3825 11614 8652 6270 8198 9110 1747
## 1 1 1 1 1 1 1 1 1 1 2 1 1
## 5906 12152 2947 2405 2820 5863 8821 10712 4871 11 5467 5498 9576
## 1 1 1 1 1 1 2 1 2 1 1 2 1
## 9183 8735 1693 8472 11249 3591 9359 3732 9186 11289 7284 5552 12161
## 2 1 1 1 1 1 1 1 1 1 1 1 1
## 5342 9149 8012 11794 4188 7079 1605 10014 6751 8160 8001 1427 634
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6562 3360 6989 10240 11655 5674 12185 5243 4729 7430 3598 9459 7825
## 1 2 1 1 1 1 2 1 1 1 1 2 2
## 11465 1258 4016 7370 9103 6135 1420 9509 12097 7159 7819 10999 1601
## 1 1 1 1 2 1 1 1 2 1 1 1 1
## 8796 12310 10062 10236 10060 587 5189 10914 6718 4886 4359 4808 11982
## 1 1 1 1 1 1 2 1 1 1 2 1 1
## 855 1052 5737 4753 4653 10793 2761 3406 5950 2633 4664 2601 3420
## 1 2 1 1 1 1 1 1 1 2 1 1 1
## 9125 5920 5745 6159 2097 6767 4804 9021 3272 6457 7087 4003 2905
## 2 1 1 1 1 1 1 1 2 1 2 1 1
## 1999 6406 9819 6429 8501 8434 130 7961 3018 1661 4887 6008 5877
## 1 1 2 1 1 1 1 1 1 1 1 1 1
## 4044 3208 11681 1839 1816 5962 4134 8554 9725 2904 3173 4354 6469
## 1 1 1 1 2 1 1 1 1 1 2 2 1
## 2202 9385 5358 1309 1195 1306 584 5536 9384 2216 11933 7498 5127
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 10057 792 5937 6831 4672 10972 9538 12004 4227 1673 7522 4919 3926
## 2 1 1 1 1 1 2 1 2 1 1 1 1
## 9180 3090 10589 2910 106 8886 7320 559 5132 7497 5677 8547 6633
## 1 1 1 1 1 2 2 1 1 1 1 1 1
## 9697 8247 5230 10962 7214 9188 10289 11567 1586 856 815 8458 8149
## 1 1 1 1 1 1 2 1 1 2 1 1 1
## 2825 10792 10464 6427 5465 11304 11256 11934 4561 9430 1556 79 920
## 2 1 2 1 2 1 2 1 1 1 1 1 1
## 2629 1450 8584 2964 10550 10517 307 10734 11436 108 8133 8060 7424
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 6278 9230 9434 1384 11251 4903 11903 9529 6182 2752 8094 5629 1014
## 2 2 1 1 2 2 1 1 1 1 2 1 1
## 1232 8355 2613 10377 3993 5739 3089 2729 6054 729 5711 8927 2376
## 2 1 1 1 1 1 1 1 2 1 1 1 1
## 1414 9038 10469 2078 8487 592 1541 5645 10041 11092 2513 10172 11375
## 1 1 1 1 2 1 1 1 1 1 1 1 1
## 2454 5975 2659 8407 8778 2226 1784 6091 4896 12301 11758 12071 3780
## 1 1 1 2 2 1 1 1 1 1 1 1 1
## 6023 11070 3438 9759 7739 8876 7048 10595 9390 258 5007 7548 11986
## 2 1 1 1 1 1 1 1 2 2 1 1 1
## 5723 6070 12085 5789 6274 8771 9199 460 12173 1501
## 1 1 1 1 2 2 1 1 2 1
##
## Within cluster sum of squares by cluster:
## [1] 14094.32 10041.62
## (between_SS / total_SS = 19.5 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
The solution obtained by employing kmeans clustering is challenged bye employing hierarchical clustering. Kmeans clustering only works for continuous data and the data set had 8 numeric variables which were all unused. Hierarchical clustering works for both continuous and categorical variables. However here we use Gower distances partitioned around medoids and silhouette width
A popular choice for clustering is Euclidean distance. However, Euclidean distance is only valid for continuous variables, and thus is not applicable here. In order for a clustering algorithm to yield sensible results, we have to use a distance metric that can handle mixed data types. In this case, we will use something called Gower distance.
# Removing the revenue column prior clustering
gower_dist <- daisy(customer_dataset1[1:17],
metric = "gower",
type = list(logratio = 3))
# Checking attributes to ensure the correct methods are being used
# (I = interval, N = nominal)
#
summary(gower_dist)
## 4498500 dissimilarities, summarized :
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000252 0.2621000 0.3222500 0.3185400 0.3804800 0.7105000
## Metric : mixed ; Types = I, I, I, I, I, I, I, I, I, I, N, N, N, N, N, N, N
## Number of objects : 3000
# Printing to see which pairs are most similar and which are least
# similar to see whether it makes any sense
#
gower_mat <- as.matrix(gower_dist)
# Output most similar pair
#
customer_dataset1[ which(gower_mat == min(gower_mat[gower_mat != min(gower_mat)]), arr.ind = TRUE)[1, ], ]
## administrative administrative_duration informational
## 395 0 0 0
## 1320 0 0 0
## informational_duration productrelated productrelated_duration bouncerates
## 395 0 3 103.0 0
## 1320 0 3 96.5 0
## exitrates pagevalues SpecialDay month operatingsystems browser region
## 395 0.02222222 0 0 Mar 1 1 1
## 1320 0.02222222 0 0 Mar 1 1 1
## traffictype visitortype weekend revenue
## 395 3 Returning_Visitor TRUE FALSE
## 1320 3 Returning_Visitor TRUE FALSE
The record with index ID 3607, 5171 are the most similar. comparing the data in the target variables this seems plausible
# finding most dissimilar outputs
#
customer_dataset1[
which(gower_mat == max(gower_mat[gower_mat != max(gower_mat)]),
arr.ind = TRUE)[1, ], ]
## administrative administrative_duration informational
## 3871 0 0.000 0
## 1106 15 1011.361 2
## informational_duration productrelated productrelated_duration bouncerates
## 3871 0.0 1 0.000 0.2
## 1106 171.5 54 1405.131 0.0
## exitrates pagevalues SpecialDay month operatingsystems browser region
## 3871 0.200000000 0.00000 0.8 May 4 1 1
## 1106 0.006127451 37.41814 0.0 Mar 3 2 3
## traffictype visitortype weekend revenue
## 3871 4 Returning_Visitor FALSE FALSE
## 1106 2 New_Visitor TRUE FALSE
The record with index ID 3871 and 10641 are the most dissimilar and comparing the data in the target variables this seems plausible
We shall employ partitioning around medoids (PAM) in this step. This is because;
It is Easy to understand, more robust to noise and outliers when compared to k-means, and has the added benefit of having an observation serve as the exemplar for each cluster
# Calculate silhouette width for many k using PAM
#
sil_width <- c(NA)
# We shall impute with values ranging from 2 to 10
for(i in 2:10){
pam_fit <- pam(gower_dist,
diss = TRUE,
k = i)
sil_width[i] <- pam_fit$silinfo$avg.width
}
# Plot silhouette width (higher is better)
plot(1:10, sil_width,
xlab = "Number of clusters",
ylab = "Silhouette Width")
lines(1:10, sil_width)
From the silhouette the highest point is K = 2, that is our optimum k value.
From the plot the optimum value of k was found to be 2. After running the algorithm and selecting two clusters, we can interpret the clusters by running summary on each cluster
# Model clustering the customers
#
cust_cl <- pam(gower_dist, diss = TRUE, k = 2)
# Results of clustering
#
customer_results <- customer_dataset1[1:17] %>%
mutate(cluster = cust_cl$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.))
# summary of he results of clustering process
#
customer_results$the_summary
## [[1]]
## administrative administrative_duration informational
## Min. : 0.000 Min. : -1.0 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0 1st Qu.:0.0000
## Median : 2.000 Median : 45.0 Median :0.0000
## Mean : 2.894 Mean : 115.2 Mean :0.6675
## 3rd Qu.: 5.000 3rd Qu.: 158.0 3rd Qu.:1.0000
## Max. :22.000 Max. :1922.0 Max. :9.0000
##
## informational_duration productrelated productrelated_duration
## Min. : -1.0 Min. : 0.00 Min. : -1
## 1st Qu.: 0.0 1st Qu.: 8.00 1st Qu.: 227
## Median : 0.0 Median : 18.00 Median : 630
## Mean : 45.5 Mean : 30.67 Mean : 1149
## 3rd Qu.: 0.0 3rd Qu.: 39.00 3rd Qu.: 1459
## Max. :2195.3 Max. :397.00 Max. :11940
##
## bouncerates exitrates pagevalues SpecialDay
## Min. :0.000000 Min. :0.00000 Min. : 0.000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.01339 1st Qu.: 0.000 1st Qu.:0.00000
## Median :0.004396 Median :0.02302 Median : 0.000 Median :0.00000
## Mean :0.021542 Mean :0.03891 Mean : 6.392 Mean :0.03088
## 3rd Qu.:0.016667 3rd Qu.:0.04000 3rd Qu.: 0.000 3rd Qu.:0.00000
## Max. :0.200000 Max. :0.20000 Max. :255.569 Max. :1.00000
##
## month operatingsystems browser region traffictype
## Nov :310 1 :608 1 :576 1 :305 2 :290
## May :134 3 : 98 2 :146 3 :186 3 :167
## Mar :107 4 : 51 8 : 28 4 : 71 1 :131
## Dec :104 2 : 34 4 : 19 2 : 68 4 : 55
## Oct : 41 8 : 10 5 : 12 6 : 54 10 : 45
## Jul : 32 5 : 1 6 : 7 7 : 35 8 : 35
## (Other): 75 (Other): 1 (Other): 15 (Other): 84 (Other): 80
## visitortype weekend cluster
## New_Visitor :138 FALSE:576 Min. :1
## Other : 9 TRUE :227 1st Qu.:1
## Returning_Visitor:656 Median :1
## Mean :1
## 3rd Qu.:1
## Max. :1
##
##
## [[2]]
## administrative administrative_duration informational
## Min. : 0.000 Min. : -1.0 Min. : 0.0000
## 1st Qu.: 0.000 1st Qu.: 0.0 1st Qu.: 0.0000
## Median : 1.000 Median : 0.0 Median : 0.0000
## Mean : 2.224 Mean : 74.9 Mean : 0.4684
## 3rd Qu.: 3.000 3rd Qu.: 81.0 3rd Qu.: 0.0000
## Max. :24.000 Max. :1764.0 Max. :12.0000
##
## informational_duration productrelated productrelated_duration
## Min. : -1.0 Min. : 0.00 Min. : -1.0
## 1st Qu.: 0.0 1st Qu.: 8.00 1st Qu.: 192.9
## Median : 0.0 Median : 19.00 Median : 618.2
## Mean : 30.6 Mean : 31.84 Mean : 1194.9
## 3rd Qu.: 0.0 3rd Qu.: 38.00 3rd Qu.: 1436.5
## Max. :1830.5 Max. :391.00 Max. :16093.3
##
## bouncerates exitrates pagevalues SpecialDay
## Min. :0.000000 Min. :0.00000 Min. : 0.000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.01417 1st Qu.: 0.000 1st Qu.:0.00000
## Median :0.002597 Median :0.02500 Median : 0.000 Median :0.00000
## Mean :0.018846 Mean :0.04108 Mean : 6.056 Mean :0.06591
## 3rd Qu.:0.015541 3rd Qu.:0.05000 3rd Qu.: 0.000 3rd Qu.:0.00000
## Max. :0.200000 Max. :0.20000 Max. :239.980 Max. :1.00000
##
## month operatingsystems browser region traffictype
## May :682 2 :1562 2 :1822 1 :843 2 :673
## Nov :424 3 : 544 4 : 148 3 :421 1 :452
## Mar :356 4 : 48 5 : 92 4 :208 3 :339
## Dec :309 1 : 27 10 : 40 2 :203 4 :214
## Sep :100 8 : 11 6 : 31 7 :146 13 :161
## Jul : 86 6 : 4 3 : 26 6 :144 6 : 98
## (Other):240 (Other): 1 (Other): 38 (Other):232 (Other):260
## visitortype weekend cluster
## New_Visitor : 273 FALSE:1711 Min. :2
## Other : 8 TRUE : 486 1st Qu.:2
## Returning_Visitor:1916 Median :2
## Mean :2
## 3rd Qu.:2
## Max. :2
##
There are data set has been split into two. One cluster is the group where when they visit the sight there is revenue earned by the company. The other group is those visits that don’t earn the company any money False(no money earned) - 1 True(money earned) - 2
# medoids of customer dataset
#
customer_dataset1[cust_cl$medoids, ]
## administrative administrative_duration informational
## 6101 2 38.04 0
## 4165 0 0.00 0
## informational_duration productrelated productrelated_duration bouncerates
## 6101 0 8 886.57 0
## 4165 0 7 70.00 0
## exitrates pagevalues SpecialDay month operatingsystems browser region
## 6101 0.03076923 0 0 Nov 1 1 1
## 4165 0.02857143 0 0 May 2 2 1
## traffictype visitortype weekend revenue
## 6101 2 Returning_Visitor FALSE FALSE
## 4165 2 Returning_Visitor FALSE FALSE
From the cluster analysis we can see that false is the medoid of browser and operating system 2. while True is the medoid for operating system 1 and browser 1.
This shall be performed by employing, t-distributed stochastic neighborhood embedding, or t-SNE. This method is a dimension reduction technique that tries to preserve local structure so as to make clusters visible in a 2D or 3D visualization.
It has the ability to handle a custom distance metric like the one we created above
# Defining plot
#
tsne_1 <- Rtsne(gower_dist, is_distance = TRUE)
# plotting the clustered data
#
tsne_data <- tsne_1$Y %>%
data.frame() %>%
setNames(c("X", "Y")) %>%
mutate(cluster = factor(cust_cl$clustering),
name = customer_dataset1[1:17])
# ggplot
#
ggplot(aes(x = X, y = Y), data = tsne_data) +
geom_point(aes(color = cluster))
From the plot above we can see there two clusters. They are not that distinctive though.
# printing the summary of the model
#
print(cust_cl)
## Medoids:
## ID
## [1,] "771" "6101"
## [2,] "1681" "4165"
## Clustering vector:
## 12046 1969 9329 10233 11320 3863 928 8622 4888 10545 3679 2669 5174
## 1 1 2 1 2 2 2 2 2 2 2 1 2
## 137 9001 4422 7051 4313 8477 9857 5535 1989 11753 12222 11680 11051
## 2 2 2 1 2 2 2 2 1 2 2 2 1
## 7434 381 2767 10280 1005 6503 11275 8525 2786 9506 6924 4476 11025
## 1 1 2 1 2 2 2 2 2 1 2 2 1
## 4063 69 950 154 7571 3880 6045 1686 7480 2456 8933 6770 7027
## 2 2 1 2 2 2 1 2 2 2 1 1 2
## 845 796 245 5574 4858 5597 3915 8017 11098 10444 4807 11620 9994
## 2 2 2 1 2 2 2 2 2 2 2 1 2
## 11460 7579 11923 351 1034 8119 5699 8244 5902 8504 5583 10876 5470
## 1 2 2 2 2 2 2 1 2 2 2 1 2
## 7744 8939 5960 11468 11919 1701 316 8383 3418 1646 6569 9408 10404
## 2 1 2 1 2 2 1 2 2 2 1 2 2
## 2591 6438 10373 10000 9625 3921 11293 4465 1205 982 2906 3060 6248
## 2 2 2 1 1 2 2 2 2 1 2 2 1
## 1608 10727 3013 9057 5987 6419 10695 12206 5124 7256 6149 11770 9640
## 2 1 2 2 1 1 2 1 2 2 2 1 1
## 11873 3527 1360 3565 2005 3279 3029 10285 5792 453 1953 8615 1980
## 2 2 2 1 2 2 2 2 2 2 2 2 1
## 820 5274 8619 6483 9935 11503 5750 5344 11219 11027 9469 2256 4363
## 2 1 2 1 2 2 2 2 2 1 2 2 2
## 10917 5336 4570 8929 7869 3577 1330 5120 3604 10420 3959 3742 5799
## 1 2 2 2 2 2 1 1 2 1 2 2 1
## 5081 10356 222 10979 10388 11250 8323 9411 8841 9806 8056 11371 10268
## 2 2 2 2 2 2 2 2 1 2 2 2 2
## 4703 6113 4523 9264 7146 7313 10196 4822 6827 3597 11555 9970 2635
## 1 2 1 2 2 2 2 2 2 2 2 1 2
## 1949 7971 397 1366 436 7906 11709 6241 11554 6525 5893 868 9726
## 2 1 2 2 2 2 1 2 2 2 2 1 2
## 7688 4880 11273 5234 8572 2687 1522 7568 2068 4892 4581 2893 4571
## 2 2 2 2 1 2 2 1 2 2 2 2 2
## 7095 3030 5914 12215 6898 1175 11257 6473 2617 4319 2693 11111 9704
## 2 2 2 2 2 2 2 1 2 2 2 1 2
## 3177 11462 3919 11704 3317 787 5188 1403 8345 10322 8613 6940 10085
## 2 2 1 2 2 2 2 2 2 1 2 2 2
## 10555 6939 1163 4307 8835 4200 4968 6306 6351 9312 10949 7824 860
## 1 2 1 1 2 2 2 1 2 1 2 1 2
## 8513 1051 12217 9306 10642 8865 70 619 6604 10931 3969 6771 6949
## 2 2 2 2 2 1 2 1 1 1 1 2 2
## 6983 2574 5607 4014 8559 9200 985 4901 2569 6047 7308 9122 10799
## 2 1 2 1 2 1 2 1 2 2 2 2 1
## 3061 10733 10887 11213 7749 3854 11469 3278 6970 8607 10035 10329 10161
## 2 1 1 2 2 2 2 2 2 2 2 1 2
## 1018 4262 2880 10251 7285 6163 6356 437 3672 5538 11781 10132 5985
## 2 2 1 2 2 1 2 2 2 1 1 1 2
## 1365 6302 966 5088 11242 4109 689 5898 444 4944 7539 9124 5453
## 2 2 1 2 2 2 2 1 1 2 2 2 2
## 2529 2757 3168 6275 3523 12238 8707 2055 11236 486 6750 526 9758
## 1 1 2 2 2 2 2 2 1 2 1 1 2
## 1565 10174 435 4861 9801 11790 10433 6715 1407 810 864 97 10429
## 1 2 2 2 2 2 2 2 2 2 2 1 1
## 5222 6416 9072 9871 1623 5974 4342 6009 3512 9044 4728 6901 945
## 2 2 2 2 2 1 2 2 2 1 2 2 2
## 5859 6784 2316 6557 7783 7261 11313 7021 7936 8240 7243 254 11129
## 2 1 2 2 1 1 2 2 1 1 2 2 1
## 8245 7179 266 105 1320 6343 4626 9507 7658 9601 10133 3887 7613
## 1 2 2 1 1 2 2 2 2 2 2 1 1
## 5804 5283 2178 935 11708 7690 5198 11595 1286 6048 462 5474 2648
## 2 2 2 1 1 2 1 2 1 1 2 1 1
## 2760 5037 11490 3751 9731 10606 9402 11807 8509 6987 5320 12294 9788
## 1 2 2 2 1 2 2 2 2 2 2 2 2
## 8121 11186 632 5980 6911 681 3773 3111 6657 162 11962 7482 639
## 1 1 2 2 2 2 2 2 2 1 1 2 2
## 9113 455 947 8127 1270 1152 3759 2568 6352 11271 10525 1507 7169
## 1 2 2 1 2 1 2 2 1 1 2 2 1
## 1782 11149 8616 11953 4707 9208 2375 8192 7852 6017 8033 5191 6157
## 2 1 2 2 2 2 2 2 1 1 1 2 1
## 3227 5564 8920 4195 9892 6536 7122 9680 6134 6614 3629 10777 6378
## 2 2 2 2 2 1 2 2 2 1 2 2 2
## 11081 2585 9634 9664 3136 1155 9228 8784 7981 687 11463 624 10868
## 1 2 2 2 2 1 2 2 1 2 2 2 2
## 1470 11662 8975 148 2677 6662 11832 10275 10614 1822 1946 10986 5512
## 2 2 1 2 2 2 2 1 2 2 2 2 1
## 8232 7947 2013 367 5824 5544 7755 2955 2232 9223 1101 10678 3746
## 2 1 2 2 2 1 2 2 2 2 2 1 2
## 7980 7564 7108 47 1178 10245 5040 10460 6663 1125 11854 500 6829
## 1 2 2 1 2 1 1 2 2 2 1 2 2
## 2847 1662 3803 9346 9135 1512 554 12014 745 2868 8103 5260 8680
## 2 2 2 1 1 2 1 2 2 2 1 2 2
## 958 12276 9309 770 3731 6271 3098 4503 8884 10707 10217 6702 4012
## 1 2 2 1 2 1 2 2 2 1 2 1 2
## 198 7259 1140 1042 951 8952 8896 3172 178 5847 2991 6370 2714
## 2 2 1 1 2 2 1 1 1 2 2 2 2
## 12150 4335 11683 8903 1998 10865 3457 5953 6 10645 9530 1253 7385
## 2 2 1 2 2 1 2 2 2 2 2 2 2
## 7220 993 11055 10215 3065 9693 7500 6040 7163 961 541 1943 1484
## 1 2 1 2 2 1 2 1 1 2 2 2 2
## 8658 4805 3789 5530 8484 11303 5814 751 9799 743 1422 6927 11604
## 2 2 2 1 2 2 1 2 1 2 2 2 1
## 1187 11109 3145 350 9276 5744 10810 96 715 9512 1815 466 9930
## 1 1 2 2 1 1 2 1 2 2 2 2 2
## 344 12067 11506 1489 1479 2484 8312 5182 8227 7291 6762 1417 969
## 1 2 2 2 2 2 2 2 1 1 1 2 1
## 2386 4667 1670 2252 2053 7933 718 1900 2195 901 6035 2864 2793
## 2 2 2 2 2 1 2 1 2 2 1 2 1
## 10671 5192 10326 5349 36 5820 6130 6485 12229 7012 3757 6851 11325
## 2 2 2 2 2 2 1 2 2 1 2 1 1
## 2547 6916 9686 11710 3071 7485 4039 5706 10315 11729 8597 12003 176
## 2 2 1 2 2 1 2 2 1 1 1 2 1
## 5952 4181 1531 10870 528 1361 10925 10719 7518 3405 11602 9398 1404
## 2 2 2 2 2 2 2 2 2 2 1 2 1
## 10838 11668 12180 8858 4326 2570 8205 2168 11742 5628 345 1184 8957
## 2 2 2 2 2 2 2 2 2 1 2 1 1
## 5514 1780 12205 8560 2420 8406 5746 11412 9932 451 4952 3540 9455
## 1 2 2 1 2 1 2 1 2 2 2 2 2
## 10028 8931 7595 1307 8992 922 7245 2517 4845 8836 4848 7089 1260
## 2 2 1 2 1 1 2 1 2 2 2 2 2
## 3315 6971 10874 7230 11610 10964 134 494 11315 3558 4371 5054 11194
## 2 2 1 2 2 1 2 2 2 2 2 1 1
## 575 6706 2548 10447 420 6943 4192 3460 8586 1332 3547 8167 2579
## 2 2 1 1 2 2 2 2 1 1 1 2 1
## 1542 5394 10802 3705 7816 4068 4448 3218 5762 11868 3157 5655 12033
## 2 2 1 1 1 2 2 2 2 2 2 2 2
## 7173 1899 2403 5377 7297 12247 10350 754 2512 3997 11395 439 2126
## 2 2 2 2 2 2 1 1 2 2 2 2 2
## 7150 1040 1378 6908 11537 359 8 9478 8942 4925 264 7566 8182
## 1 2 2 2 2 2 2 2 2 1 2 2 2
## 38 9143 2919 6101 5742 3230 8795 3035 4261 4242 10520 6632 6963
## 2 2 1 1 2 2 1 2 2 2 2 2 2
## 3217 241 139 17 8649 9617 6925 4436 7511 7988 11135 1935 8446
## 2 2 2 1 1 2 1 2 2 1 2 2 2
## 11227 6146 10204 7782 3835 8990 3579 8639 9913 6768 5846 2249 2231
## 1 2 2 2 2 2 2 2 1 2 2 2 1
## 2037 3845 11106 4377 2444 6671 11965 9144 4435 5110 4640 5636 7846
## 1 2 1 1 1 2 1 2 2 2 2 2 2
## 336 6331 8861 4597 8696 1554 4949 738 6098 11088 6099 5381 5195
## 2 2 2 2 1 2 2 1 2 2 2 2 2
## 1387 7203 5539 8152 2390 5701 1804 10419 10005 291 9573 1261 7638
## 2 2 2 1 2 2 2 2 1 2 2 2 1
## 11305 1461 2941 10221 261 12136 4015 3131 12001 3344 9709 2114 7302
## 2 2 2 1 1 2 2 2 2 2 1 2 2
## 1150 5936 196 3347 10070 8358 11038 12115 8965 11533 3602 10619 4127
## 2 2 1 1 2 2 2 1 2 1 2 2 2
## 7546 3675 1636 8098 11898 5329 6759 9738 6811 6262 10361 520 11974
## 2 2 2 2 1 2 1 1 2 2 1 2 2
## 7662 10323 5522 7812 6170 3348 5288 2372 8396 5380 4251 10194 1795
## 2 2 2 2 2 2 2 2 2 2 2 2 2
## 8612 8941 7490 9335 2865 4223 6171 11252 9315 1818 10836 9303 11822
## 2 2 2 1 2 2 2 2 1 2 2 2 1
## 4382 9136 1059 9249 3711 7907 2253 4163 2498 10061 11033 2663 6926
## 2 2 1 1 2 2 2 2 2 2 2 2 1
## 4425 8475 10968 9963 5541 2452 10023 11013 834 2234 8863 6125 11638
## 2 2 1 1 1 1 2 2 2 2 1 1 2
## 3439 2152 7601 12007 5078 5513 11120 2567 2644 10055 10334 2429 6929
## 2 2 2 1 2 1 1 2 2 2 2 2 2
## 405 7801 8678 12323 5791 1727 4115 4074 11432 2182 9400 6246 7019
## 2 2 2 2 1 1 2 2 2 1 2 2 2
## 6803 8356 8986 6612 1576 8948 2653 1720 7162 8731 5944 9196 7399
## 2 2 2 1 2 2 1 2 1 2 2 1 1
## 5107 6875 11641 6855 4033 3839 8538 5491 4009 2289 5948 9837 5224
## 1 2 2 1 2 2 2 2 2 2 2 1 2
## 10539 11722 1056 11356 11197 683 3637 652 9527 2321 10909 88 932
## 2 2 2 2 2 2 2 2 1 2 2 1 2
## 9690 6959 12201 9846 4430 11538 7242 695 4303 6155 991 4991 9192
## 2 2 2 2 2 2 2 2 2 2 2 2 1
## 7205 11054 3697 5579 10378 8079 5768 8155 10670 9104 3494 12158 9701
## 2 1 2 2 2 1 1 2 2 1 2 1 2
## 2758 2937 5912 6088 2745 8258 5202 10687 8792 2931 11684 4036 3939
## 1 2 2 2 2 2 2 2 2 2 2 2 2
## 4453 6886 1381 7631 6572 3582 1637 7990 10097 323 3932 8463 914
## 2 2 2 1 2 2 2 2 2 2 2 2 2
## 10237 10742 4423 8185 3015 4388 11529 9132 1083 5278 5480 11716 693
## 2 2 2 2 2 2 1 2 2 2 1 2 2
## 7268 4988 2925 10732 2076 12261 534 4967 1337 3206 11660 1710 3908
## 1 2 2 2 2 1 2 2 2 2 2 2 2
## 10769 558 1837 6383 6204 7681 4616 3980 63 9940 11382 5993 10230
## 1 2 1 2 2 1 2 2 1 1 1 2 2
## 5216 2495 1157 7558 7190 9551 7244 2597 7920 5225 9752 9958 6783
## 2 2 1 1 1 2 2 2 2 2 1 1 1
## 10804 3489 8235 5621 7171 11108 10541 3292 11339 4394 6800 2260 1960
## 1 2 2 2 2 2 2 2 2 2 1 2 2
## 450 7217 9674 8853 11895 8350 3998 11052 11780 8688 11341 7573 11586
## 1 1 1 1 2 2 2 2 1 2 2 2 2
## 10189 8010 6222 11163 6780 12134 7157 11408 1106 11744 1622 2211 6309
## 1 1 1 2 2 2 1 2 2 2 2 1 1
## 5543 3078 9250 5245 1885 10813 10129 11228 2792 389 4193 2606 5362
## 2 2 1 2 2 1 1 2 2 2 1 2 2
## 7113 9043 9611 10150 4152 6073 10616 10811 9872 2158 6985 8888 34
## 1 2 2 2 2 2 2 1 1 2 2 1 1
## 7321 11887 8685 12174 10648 4840 5611 10181 6754 4 3101 510 858
## 1 2 2 2 2 1 1 1 1 2 2 2 2
## 6686 11880 12162 111 3906 12125 3524 12098 8320 11439 10546 10518 628
## 1 2 2 2 1 2 2 2 1 1 2 1 2
## 10252 10428 7837 11064 10083 6745 6775 3388 6713 10224 11674 2973 1068
## 1 2 2 2 1 2 2 2 2 2 2 2 2
## 9678 4577 3607 3178 1495 10911 6141 10040 2854 11943 12227 11087 1846
## 2 1 2 1 2 2 2 1 2 2 2 2 2
## 5693 7029 11658 4351 4627 10790 1105 4434 11706 946 3495 5332 9026
## 1 1 2 2 2 2 2 2 2 2 2 2 2
## 10193 726 8264 6335 6615 3367 12207 8880 3954 921 7628 3739 11201
## 1 2 1 2 2 2 2 1 1 2 2 1 2
## 5767 529 10158 416 2636 3093 9147 12060 5560 5848 4386 2712 11122
## 2 2 2 2 2 2 2 1 2 2 2 1 2
## 6645 611 3319 9946 7426 1715 2371 235 8055 1442 7652 2319 7389
## 2 2 2 2 1 2 2 2 2 2 2 2 2
## 8495 10091 7815 3655 7408 2913 11053 5475 2778 11212 6043 3740 9923
## 2 2 2 2 2 1 2 2 2 1 2 2 1
## 12219 10487 7882 3156 10741 4638 1888 10117 4488 2824 696 827 3532
## 1 2 2 1 2 2 2 2 2 2 2 2 2
## 11401 7923 7969 8704 6998 257 5728 4379 1728 6180 7411 3764 3633
## 2 1 1 2 2 2 2 2 2 2 2 2 2
## 11223 5158 7368 1423 5803 7472 540 6355 5776 2706 12101 11014 430
## 2 2 2 2 2 1 2 2 1 2 1 2 2
## 3257 6185 7168 5724 5449 9471 3894 11989 702 4295 5293 7916 11992
## 2 2 2 2 2 2 2 2 2 2 2 1 2
## 2177 1930 10943 7645 11007 195 11848 10620 2313 9269 9845 10033 1091
## 2 2 2 2 1 1 2 1 2 2 2 2 2
## 9591 3726 11984 1345 8638 3844 6404 1800 414 11302 647 731 626
## 2 1 2 1 2 1 2 2 2 1 2 2 2
## 6822 1400 10024 6465 3750 262 4030 4710 8533 395 1316 6462 5145
## 2 2 2 2 2 2 2 2 2 1 2 2 2
## 7294 10652 6507 6986 2036 7857 881 7264 9548 882 7632 11628 8518
## 2 2 2 2 2 2 2 1 2 2 1 2 2
## 10627 5031 11635 8137 9423 7379 180 3149 2002 1063 7550 6590 6885
## 2 1 1 2 2 1 1 2 2 2 2 2 1
## 10849 109 10020 1239 11585 2506 10018 5520 4302 5505 6322 6177 8972
## 2 2 2 1 2 2 1 1 2 1 2 2 2
## 1336 10871 9275 7867 6110 11820 9295 706 4531 1288 5563 11159 2884
## 2 2 2 2 1 1 1 2 2 2 2 2 2
## 2171 6773 8347 7395 8983 10248 9736 8859 7069 9383 3895 12289 5058
## 2 1 1 1 2 1 2 2 2 1 2 1 2
## 8847 5891 8921 9277 4542 7667 9934 5266 9019 5632 6816 3704 8962
## 2 2 2 2 2 1 1 2 1 2 2 2 2
## 8479 4722 588 4419 10928 4123 3938 2532 3768 2655 6669 1879 4789
## 2 2 2 2 2 1 2 2 2 2 1 1 2
## 8369 11640 429 9549 2603 3379 9242 6175 7396 7768 5933 295 10534
## 1 2 2 1 2 2 2 2 1 2 2 2 1
## 2794 268 8593 5832 2609 2212 5138 1446 9656 10676 10965 11190 1596
## 2 2 1 1 1 2 2 1 2 2 2 2 2
## 6379 844 6942 10990 7189 3865 1444 5823 7272 7286 3244 6025 12248
## 1 2 2 2 2 2 2 2 2 2 2 2 2
## 10101 3202 11233 7137 8004 6894 10916 2840 3440 5589 2907 7471 10239
## 2 2 1 1 2 2 1 2 2 2 2 2 2
## 8300 11738 4035 2946 11835 8129 9238 3031 2837 11948 9489 1238 10348
## 2 2 1 2 1 2 2 2 1 1 1 2 2
## 5154 12314 10491 711 4906 8359 5270 6589 3222 385 6792 6703 5261
## 2 2 2 1 2 2 2 1 2 1 2 1 2
## 10095 8100 8860 5469 5375 8999 8738 11619 11017 655 4962 12024 9405
## 2 2 2 2 2 1 2 2 2 2 2 2 2
## 949 6071 1882 5315 3871 5094 4739 7940 1678 2282 10853 4493 10321
## 2 1 1 2 2 2 1 2 1 2 2 2 1
## 2414 6849 742 4834 4782 2607 11755 7148 3720 6167 6078 4982 6202
## 2 1 2 2 2 1 1 2 2 2 2 2 2
## 7549 4162 1667 1406 1162 4689 7702 6372 9521 11237 6190 253 11327
## 2 2 1 2 2 2 1 1 2 2 1 1 2
## 2254 2466 11199 4247 6637 4345 6861 5429 5576 659 7222 6281 10956
## 2 2 2 2 2 2 2 2 2 2 2 2 1
## 3567 173 2781 1313 6972 1099 3012 7492 12216 9331 11137 12105 1692
## 1 1 2 2 2 1 2 2 1 2 1 1 2
## 4520 5087 11045 5561 2856 9866 7995 7034 7898 6187 4624 6858 6809
## 2 2 1 2 1 2 2 1 2 2 2 2 2
## 12041 1833 2128 8837 2900 10007 1763 2191 8623 3322 5111 3190 3295
## 1 2 1 1 2 2 2 2 2 2 2 2 2
## 8052 10660 8132 452 1235 41 1549 862 6636 10343 4214 4160 9496
## 2 2 2 1 1 2 1 2 2 2 2 2 1
## 4072 6874 125 11860 2619 6140 4700 12093 7274 11142 6194 11175 7391
## 2 2 2 2 2 2 2 2 2 2 2 2 2
## 1088 4920 9017 4165 2358 2329 7131 2150 10103 4695 4529 4280 756
## 2 2 2 2 2 1 2 2 1 1 2 1 1
## 9596 2845 5268 3353 2702 498 1302 9374 3935 1875 1275 641 883
## 2 2 2 2 2 2 1 1 2 2 2 2 2
## 10967 3761 296 12198 1115 2308 9170 10241 6080 8270 5786 10260 4578
## 2 2 2 2 2 2 1 1 2 1 2 2 2
## 5074 3363 10797 4433 10547 448 10489 4470 4931 12183 4040 5895 7024
## 2 1 2 2 1 1 2 2 2 2 2 2 2
## 9284 3070 3185 4090 11359 6680 7780 3973 2882 2132 7590 3126 4926
## 2 2 2 2 2 2 1 2 2 2 2 2 2
## 7737 10743 8540 8928 11569 10169 12009 5775 1660 8494 2003 5313 2780
## 2 2 2 2 1 2 2 2 2 2 2 1 2
## 9773 919 461 9999 7827 5978 11830 6687 10570 3766 7279 8451 9462
## 1 1 2 1 2 2 2 2 2 1 2 2 2
## 2918 1509 2956 11678 2708 7068 2139 10325 11372 5913 5868 5545 3941
## 2 2 2 1 2 2 2 1 2 2 1 2 2
## 2009 9990 9219 7053 4783 7081 7606 9244 5061 2504 1668 7775 8917
## 2 1 2 2 2 2 1 2 2 2 2 2 1
## 1249 6701 1566 3488 6596 1496 9613 4755 11485 3594 4357 11367 10653
## 2 2 2 2 2 2 2 2 1 1 2 1 2
## 1615 12267 5289 3961 1905 9407 9205 2595 292 6511 2804 9579 1944
## 2 1 2 2 2 2 1 1 2 2 2 2 2
## 10831 8081 8682 7327 8364 9330 2025 11929 2012 116 1168 3475 4681
## 2 2 1 2 2 2 2 1 2 1 2 2 2
## 3338 6673 2493 5755 6011 7397 4081 3239 5006 9002 5567 3182 12036
## 2 2 2 2 1 2 2 1 2 1 2 1 2
## 8037 3681 12317 2790 7994 11626 6565 4723 8995 9811 8566 5232 1844
## 2 2 1 1 2 1 2 2 2 1 1 2 2
## 9338 4662 53 9911 4604 12165 3505 4147 4718 10822 1835 3787 9741
## 2 2 2 2 2 2 2 1 1 1 2 2 1
## 7870 3813 5233 8374 11010 9995 7941 1343 10470 5276 3689 1649 12064
## 2 2 2 2 2 1 2 2 2 2 1 1 1
## 4198 816 10435 11072 2998 836 2828 1228 642 5447 11666 9571 4256
## 2 2 1 2 1 2 2 2 1 2 2 2 2
## 5569 1689 11556 6653 4173 6635 6257 8307 9475 6128 5479 1252 6390
## 2 2 2 2 1 2 1 2 2 2 1 2 2
## 4630 3294 2445 8524 7455 1389 1931 12116 9157 1613 9483 240 11667
## 2 2 2 2 1 1 2 2 2 1 2 2 2
## 8879 11612 11301 9979 7008 10699 10895 11956 6544 2855 3956 4530 6466
## 2 2 2 1 2 2 2 2 1 2 1 2 1
## 8925 679 680 7998 6692 4137 3725 10086 11393 8838 3024 1902 8520
## 2 2 2 1 2 1 2 2 2 2 1 1 2
## 1185 151 11456 8053 976 9784 10552 2279 2327 6354 2496 6266 1380
## 1 2 2 1 1 2 2 1 2 1 1 1 1
## 4644 570 7678 4366 4589 2201 11280 6575 7808 6808 2021 8916 10443
## 2 2 2 2 1 2 2 2 2 2 2 2 1
## 1329 1714 1928 10171 3263 7915 12018 238 8527 4049 5384 7698 6658
## 2 1 2 2 2 1 1 2 2 2 2 2 2
## 10188 9245 8070 11767 11290 778 6212 60 1089 4854 6069 8249 4923
## 2 2 2 1 1 2 1 2 1 2 2 2 2
## 8478 3590 6111 9165 8019 4473 9520 11200 11884 4469 2678 9787 10292
## 2 2 2 2 1 2 2 2 2 2 2 1 2
## 7192 5114 4138 10649 522 4139 8466 3144 84 4599 354 5059 6232
## 2 2 2 2 2 1 2 2 2 2 2 2 2
## 6323 12094 3735 2300 3000 8432 8901 7170 2873 1774 8183 6454 3478
## 2 2 2 2 2 2 2 1 2 2 2 2 2
## 5316 5557 5324 12063 146 8715 9468 2091 672 7004 4554 2643 5053
## 2 1 2 2 2 1 2 2 2 2 2 2 2
## 3072 5926 4051 5438 2737 8340 5976 8868 6537 11573 2960 11726 5067
## 2 2 1 2 2 1 2 2 1 2 2 2 2
## 5448 3224 3609 4240 1098 3974 3125 9735 3623 5828 1768 8266 9253
## 2 2 2 2 1 2 2 2 1 2 2 2 2
## 3554 7792 10297 5103 1505 2903 7372 6215 5924 1864 1257 10212 6659
## 2 2 2 2 2 2 2 1 2 2 2 2 2
## 11272 7796 4343 9231 8895 9343 10581 7647 9061 11927 6066 1721 7729
## 1 2 2 2 1 2 2 2 2 2 1 2 1
## 3057 10084 55 5572 6243 3255 9651 9052 11409 9925 7754 4360 6543
## 2 2 2 2 2 2 1 2 2 2 1 2 2
## 2497 11040 1213 5181 9880 9907 6420 7084 140 7865 10725 9587 11068
## 2 2 1 2 2 1 2 2 1 1 2 2 2
## 1991 9525 963 6739 10165 8705 5998 794 11633 6057 651 6458 27
## 1 2 2 2 1 2 2 2 2 2 2 2 2
## 4546 2116 2481 2031 4097 5618 2142 10238 3882 2369 5640 12023 8943
## 2 2 1 2 2 2 2 1 2 2 2 2 1
## 1526 1074 8069 1677 10126 9151 19 9781 3154 1695 8371 5092 1676
## 2 2 2 2 1 2 2 2 2 2 2 2 1
## 6288 579 7403 6022 4712 1124 8833 4569 10302 5108 8231 9920 2170
## 2 2 2 1 2 2 1 2 1 2 2 1 2
## 1873 9387 10749 1374 2558 4815 11578 2596 11364 3656 10665 2439 10718
## 2 1 1 2 2 2 2 2 2 2 2 2 2
## 11411 12044 10664 5157 7254 11789 2661 4489 6909 2598 7392 5496 4918
## 2 2 2 2 2 2 2 2 2 2 2 1 2
## 2501 4070 10794 7665 9181 9184 133 7343 8329 5822 11059 4642 2422
## 1 2 1 1 2 2 2 2 1 1 2 1 1
## 9024 6196 2247 9823 1682 7174 11011 5008 5011 4715 373 4773 8201
## 2 2 2 2 2 2 2 2 2 2 2 1 1
## 6282 3369 917 9906 2924 10313 9030 8447 5478 11259 6482 10880 1794
## 2 2 2 1 2 1 2 1 2 2 1 2 2
## 5219 5492 4873 5416 6678 11603 11577 10407 6369 3552 3390 9706 2292
## 2 2 1 2 2 2 1 2 1 2 2 1 2
## 2842 10082 9377 6197 8905 3334 1130 11985 1811 9035 5821 10883 1750
## 2 2 2 2 2 2 2 1 2 2 2 2 2
## 7118 8634 4646 4136 2968 4285 3336 5850 8302 7504 9567 4000 5036
## 2 1 2 1 2 2 2 2 1 2 2 1 2
## 2 5531 12096 3184 11161 4935 964 9422 10064 3800 3504 5421 631
## 2 2 2 1 2 2 2 1 2 2 2 2 2
## 5294 5318 5622 11095 3966 10773 2174 4842 4963 1603 7640 8324 7604
## 2 2 2 2 2 2 2 2 2 2 2 1 1
## 8826 10746 7059 72 3836 3054 1090 8253 10910 9305 6294 5537 10635
## 2 1 2 2 2 2 2 1 1 2 2 2 2
## 2240 10785 9286 8690 9185 9989 5450 3907 12274 11464 8016 8534 8313
## 2 2 1 2 2 2 2 2 2 2 2 1 1
## 7673 5303 1004 12074 11360 2406 1057 8343 8379 10617 9354 9361 2063
## 2 2 1 2 1 1 1 2 2 2 1 1 2
## 8497 9676 3936 10765 6631 2857 8301 6868 11527 3005 9370 7364 9618
## 2 2 2 2 2 2 2 2 1 2 2 2 1
## 5371 7165 3141 11481 2071 11255 2978 598 12309 7565 7452 2600 876
## 1 1 2 2 1 2 2 2 2 2 2 2 1
## 4235 615 723 1355 11333 2826 3194 2061 12277 1755 6192 6296 12249
## 2 2 1 2 1 2 2 2 2 2 2 2 2
## 1424 4460 803 5430 3266 1362 10938 8241 7600 8174 11480 10395 348
## 1 2 1 1 2 2 2 2 2 2 1 1 1
## 1322 4330 11184 6072 8757 4161 1733 8195 225 9503 3548 11771 9539
## 2 2 1 2 2 2 2 2 2 2 2 1 1
## 2962 8516 4086 4102 4656 1901 2086 5093 1915 205 320 12127 5571
## 2 2 2 2 2 2 2 2 2 2 1 2 2
## 3799 1910 4140 8143 177 8878 11376 7196 1401 5945 9201 7440 42
## 2 2 2 2 1 1 2 2 2 1 2 2 1
## 3897 2764 6769 4760 4771 7147 4080 3904 375 8442 11166 9614 9007
## 2 2 2 2 2 2 2 1 2 2 2 2 2
## 9862 5050 6010 3223 4899 2883 1664 11276 2087 7561 4234 2849 4495
## 2 2 2 2 2 2 1 2 1 2 2 2 2
## 9037 4770 3216 1011 132 8114 11922 2870 8392 1752 11226 3174 2255
## 1 1 2 2 2 1 2 2 1 2 2 1 2
## 4128 5719 978 2166 3970 4732 10479 8848 8656 5171 6211 11536 2196
## 2 1 2 2 1 2 2 1 2 2 2 1 2
## 10530 7211 3105 563 1848 8526 3171 4297 8591 8782 5836 1396 7956
## 2 1 2 2 2 2 2 2 2 1 2 2 2
## 2264 10473 7649 4654 2853 2000 10281 4089 1085 9803 9631 4563 12045
## 1 2 2 2 2 2 1 2 1 1 2 2 1
## 10411 9048 5082 4467 5239 10116 3639 10608 6883 5325 3847 2844 8729
## 2 1 2 2 2 2 2 2 2 2 2 2 2
## 2534 7119 4612 3364 11178 11441 3779 7448 5736 4943 5244 2577 7892
## 1 2 2 1 2 1 2 2 2 2 1 1 2
## 11653 8474 12123 6290 11487 10354 11284 9195 2995 279 6342 3062 6722
## 2 1 1 1 1 1 2 2 2 2 2 2 2
## 7777 8457 3595 11596 4761 4264 2218 386 1429 6502 11346 8702 9427
## 2 2 2 2 1 2 2 2 2 1 2 2 2
## 7011 4414 10809 2233 6106 5105 2660 9101 905 5424 234 1580 2146
## 2 2 2 2 1 2 2 2 2 2 2 2 2
## 7080 7718 7278 8411 9962 1563 11043 8761 12061 2065 2350 11793 3259
## 2 2 2 2 1 2 1 1 2 2 1 2 2
## 3778 470 6582 10032 4318 12281 8065 12166 11550 2219 4131 2141 3118
## 2 1 1 1 2 1 2 2 2 2 2 2 2
## 10222 2654 5 1691 6501 2118 3059 1642 4941 9154 5852 10272 4890
## 1 2 2 2 2 2 1 2 2 2 2 2 2
## 3097 6585 7741 9213 2976 3023 10102 12159 8213 249 3715 11042 4855
## 1 2 2 1 2 2 1 1 1 2 2 2 2
## 5068 5765 1203 11671 8400 378 1985 12055 10852 2427 7669 3301 1743
## 2 2 1 2 2 2 2 2 2 2 2 1 1
## 9302 5809 1907 4596 8994 5510 6996 7176 5311 11357 10357 5112 11888
## 1 2 2 1 1 2 1 1 1 2 2 1 2
## 736 1113 6496 708 8225 68 11489 1524 9273 10763 5866 8336 5624
## 2 2 1 2 1 2 1 2 1 1 1 2 2
## 3050 2676 4688 12135 2474 3550 8898 9116 4224 8720 9121 2119 6233
## 2 2 2 1 2 1 2 1 2 2 2 2 2
## 9419 126 10157 7484 11083 11217 3574 9972 403 4519 4585 4745 2779
## 1 1 2 2 2 2 1 2 1 2 2 2 2
## 7625 4965 11021 487 12218 661 3825 11614 8652 6270 8198 9110 1747
## 1 2 2 2 2 2 2 1 2 1 2 1 2
## 5906 12152 2947 2405 2820 5863 8821 10712 4871 11 5467 5498 9576
## 1 2 2 2 2 1 2 2 1 1 2 1 1
## 9183 8735 1693 8472 11249 3591 9359 3732 9186 11289 7284 5552 12161
## 2 2 2 2 2 2 2 2 2 1 1 1 1
## 5342 9149 8012 11794 4188 7079 1605 10014 6751 8160 8001 1427 634
## 2 1 2 2 2 2 2 2 2 1 2 2 2
## 6562 3360 6989 10240 11655 5674 12185 5243 4729 7430 3598 9459 7825
## 2 2 1 1 1 1 2 2 1 2 1 2 2
## 11465 1258 4016 7370 9103 6135 1420 9509 12097 7159 7819 10999 1601
## 1 2 2 2 1 2 2 2 1 2 1 2 2
## 8796 12310 10062 10236 10060 587 5189 10914 6718 4886 4359 4808 11982
## 2 2 2 1 1 1 2 2 1 2 2 2 2
## 855 1052 5737 4753 4653 10793 2761 3406 5950 2633 4664 2601 3420
## 2 2 1 1 1 1 2 2 2 2 2 2 2
## 9125 5920 5745 6159 2097 6767 4804 9021 3272 6457 7087 4003 2905
## 2 2 2 2 2 2 2 2 2 2 2 2 2
## 1999 6406 9819 6429 8501 8434 130 7961 3018 1661 4887 6008 5877
## 1 2 1 2 1 1 2 2 2 1 1 2 2
## 4044 3208 11681 1839 1816 5962 4134 8554 9725 2904 3173 4354 6469
## 2 2 2 2 2 2 1 2 2 2 2 2 2
## 2202 9385 5358 1309 1195 1306 584 5536 9384 2216 11933 7498 5127
## 2 1 1 2 1 1 1 2 1 1 1 1 2
## 10057 792 5937 6831 4672 10972 9538 12004 4227 1673 7522 4919 3926
## 1 1 2 2 2 2 2 2 2 2 1 2 2
## 9180 3090 10589 2910 106 8886 7320 559 5132 7497 5677 8547 6633
## 1 2 1 2 1 1 2 1 1 2 2 2 2
## 9697 8247 5230 10962 7214 9188 10289 11567 1586 856 815 8458 8149
## 2 1 2 2 2 1 1 1 2 2 2 2 2
## 2825 10792 10464 6427 5465 11304 11256 11934 4561 9430 1556 79 920
## 2 2 2 2 2 2 1 2 2 2 2 1 1
## 2629 1450 8584 2964 10550 10517 307 10734 11436 108 8133 8060 7424
## 2 2 2 2 2 2 2 1 2 1 1 2 1
## 6278 9230 9434 1384 11251 4903 11903 9529 6182 2752 8094 5629 1014
## 1 1 1 2 2 2 2 1 2 1 1 2 2
## 1232 8355 2613 10377 3993 5739 3089 2729 6054 729 5711 8927 2376
## 1 1 1 2 1 2 2 2 2 1 2 2 2
## 1414 9038 10469 2078 8487 592 1541 5645 10041 11092 2513 10172 11375
## 2 2 2 1 2 1 2 2 1 2 2 1 2
## 2454 5975 2659 8407 8778 2226 1784 6091 4896 12301 11758 12071 3780
## 1 2 2 2 1 2 2 2 1 2 1 2 2
## 6023 11070 3438 9759 7739 8876 7048 10595 9390 258 5007 7548 11986
## 2 1 2 1 2 2 2 2 1 1 1 1 2
## 5723 6070 12085 5789 6274 8771 9199 460 12173 1501
## 2 2 2 2 2 1 1 2 1 2
## Objective function:
## build swap
## 0.2141276 0.2141220
##
## Available components:
## [1] "medoids" "id.med" "clustering" "objective" "isolation"
## [6] "clusinfo" "silinfo" "diss" "call"
Kmeans model was was computational fast and it’s results were easily interpret-able. It could also handle the large data set.
However it could only cluster continuous data. It failed to account for the 7 feature variables in the data set. Furthermore the algorithm was affected by outliers hence data had to be scaled.
performance of kmeans model;
Within cluster sum of squares by cluster: [1] 14093.503 9872.964 (between_SS / total_SS = 20.1 %)
This implies that 20.1% of the data were in the cluster
The hierarchical model could handle both continuous and categorical variables. It was also robust to outliers and data distribution.
However, it was slower in clustering the data in the data set. It’s results were harder to interpret.
Performance metrics Objective function: build swap 0.2131155 0.2131155
This implies that the model had 21.211% of the data in the cluster.
From this it can be concluded that the hierarchical model works best in segmenting customers for the retailer.
The marketing team of Kira Plastinina is advised to implement the hierarchical system to better develop better insights into their customer base
For this study and to meet the objectives set by the entrepreneur, this data provides relevant information to meet those objectives.
Yes. Developing a machine learning model that can segment the businesses clients will help the business to be a able to figure out where to place the most effort and get maximum return