I am a Business Analyst for e-Commerce. I’d like to analyze Customer behaviour who visited our Company Website. The objective is to observe & analyze Online Shoppers Purchase Intention using Combining PCA and k-means Clustering
Data Set Information:
The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period.
https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset
The dataset consists of 10 numerical and 8 categorical attributes:
The ‘Revenue’ attribute can be used as the class label.
“Administrative”, “Administrative Duration”, “Informational”, “Informational Duration”, “Product Related” and “Product Related Duration” represent the number of different types of pages visited by the visitor in that session and total time spent in each of these page categories.
The values of these features are derived from the URL information of the pages visited by the user and updated in real time when a user takes an action, e.g. moving from one page to another.
“Bounce Rate”, “Exit Rate” and “Page Value” features represent the metrics measured by “Google Analytics” for each page in the e-commerce site.
The value of “Bounce Rate” feature for a web page refers to the percentage of visitors who enter the site from that page and then leave (“bounce”) without triggering any other requests to the analytics server during that session.
The value of “Exit Rate” feature for a specific web page is calculated as for all pageviews to the page, the percentage that were the last in the session.
The “Page Value” feature represents the average value for a web page that a user visited before completing an e-commerce transaction.
The “Special Day” feature indicates the closeness of the site visiting time to a specific special day (e.g. Mother’s Day, Valentine’s Day) in which the sessions are more likely to be finalized with transaction. The value of this attribute is determined by considering the dynamics of e-commerce such as the duration between the order date and delivery date. For example, for Valentine’s day, this value takes a nonzero value between February 2 and February 12, zero before and after this date unless it is close to another special day, and its maximum value of 1 on February 8.
The dataset also includes operating system, browser, region, traffic type, visitor type as returning or new visitor, a Boolean value indicating whether the date of the visit is weekend, and month of the year.
I just check the cluster for 2 categories which are Revenue & Visitor Type.
# load the library
library(tidyverse)## -- Attaching packages ------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1 v purrr 0.3.2
## v tibble 2.1.1 v dplyr 0.8.0.1
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(FactoMineR)
library(factoextra)## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
online <- read.csv("data/online_shoppers_intention.csv")
head(online)## Administrative Administrative_Duration Informational
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## Informational_Duration ProductRelated ProductRelated_Duration
## 1 0 1 0.000000
## 2 0 2 64.000000
## 3 0 1 0.000000
## 4 0 2 2.666667
## 5 0 10 627.500000
## 6 0 19 154.216667
## BounceRates ExitRates PageValues SpecialDay Month OperatingSystems
## 1 0.20000000 0.2000000 0 0 Feb 1
## 2 0.00000000 0.1000000 0 0 Feb 2
## 3 0.20000000 0.2000000 0 0 Feb 4
## 4 0.05000000 0.1400000 0 0 Feb 3
## 5 0.02000000 0.0500000 0 0 Feb 3
## 6 0.01578947 0.0245614 0 0 Feb 2
## Browser Region TrafficType VisitorType Weekend Revenue
## 1 1 1 1 Returning_Visitor FALSE FALSE
## 2 2 1 2 Returning_Visitor FALSE FALSE
## 3 1 9 3 Returning_Visitor FALSE FALSE
## 4 2 2 4 Returning_Visitor FALSE FALSE
## 5 3 1 4 Returning_Visitor TRUE FALSE
## 6 2 1 3 Returning_Visitor FALSE FALSE
str(online)## 'data.frame': 12330 obs. of 18 variables:
## $ Administrative : int 0 0 0 0 0 0 0 1 0 0 ...
## $ Administrative_Duration: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Informational : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Informational_Duration : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ProductRelated : int 1 2 1 2 10 19 1 0 2 3 ...
## $ ProductRelated_Duration: num 0 64 0 2.67 627.5 ...
## $ BounceRates : num 0.2 0 0.2 0.05 0.02 ...
## $ ExitRates : num 0.2 0.1 0.2 0.14 0.05 ...
## $ PageValues : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SpecialDay : num 0 0 0 0 0 0 0.4 0 0.8 0.4 ...
## $ Month : Factor w/ 10 levels "Aug","Dec","Feb",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ OperatingSystems : int 1 2 4 3 3 2 2 1 2 2 ...
## $ Browser : int 1 2 1 2 3 2 4 2 2 4 ...
## $ Region : int 1 1 9 2 1 1 3 1 2 1 ...
## $ TrafficType : int 1 2 3 4 4 3 3 5 3 2 ...
## $ VisitorType : Factor w/ 3 levels "New_Visitor",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Weekend : logi FALSE FALSE FALSE FALSE TRUE FALSE ...
## $ Revenue : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
summary(online)## Administrative Administrative_Duration Informational
## Min. : 0.000 Min. : 0.00 Min. : 0.0000
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.0000
## Median : 1.000 Median : 7.50 Median : 0.0000
## Mean : 2.315 Mean : 80.82 Mean : 0.5036
## 3rd Qu.: 4.000 3rd Qu.: 93.26 3rd Qu.: 0.0000
## Max. :27.000 Max. :3398.75 Max. :24.0000
##
## Informational_Duration ProductRelated ProductRelated_Duration
## Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 7.00 1st Qu.: 184.1
## Median : 0.00 Median : 18.00 Median : 598.9
## Mean : 34.47 Mean : 31.73 Mean : 1194.8
## 3rd Qu.: 0.00 3rd Qu.: 38.00 3rd Qu.: 1464.2
## Max. :2549.38 Max. :705.00 Max. :63973.5
##
## BounceRates ExitRates PageValues SpecialDay
## Min. :0.000000 Min. :0.00000 Min. : 0.000 Min. :0.00000
## 1st Qu.:0.000000 1st Qu.:0.01429 1st Qu.: 0.000 1st Qu.:0.00000
## Median :0.003112 Median :0.02516 Median : 0.000 Median :0.00000
## Mean :0.022191 Mean :0.04307 Mean : 5.889 Mean :0.06143
## 3rd Qu.:0.016813 3rd Qu.:0.05000 3rd Qu.: 0.000 3rd Qu.:0.00000
## Max. :0.200000 Max. :0.20000 Max. :361.764 Max. :1.00000
##
## Month OperatingSystems Browser Region
## May :3364 Min. :1.000 Min. : 1.000 Min. :1.000
## Nov :2998 1st Qu.:2.000 1st Qu.: 2.000 1st Qu.:1.000
## Mar :1907 Median :2.000 Median : 2.000 Median :3.000
## Dec :1727 Mean :2.124 Mean : 2.357 Mean :3.147
## Oct : 549 3rd Qu.:3.000 3rd Qu.: 2.000 3rd Qu.:4.000
## Sep : 448 Max. :8.000 Max. :13.000 Max. :9.000
## (Other):1337
## TrafficType VisitorType Weekend Revenue
## Min. : 1.00 New_Visitor : 1694 Mode :logical Mode :logical
## 1st Qu.: 2.00 Other : 85 FALSE:9462 FALSE:10422
## Median : 2.00 Returning_Visitor:10551 TRUE :2868 TRUE :1908
## Mean : 4.07
## 3rd Qu.: 4.00
## Max. :20.00
##
If we extract our principal components from the above matrix, the result is not going to be useful. When we think of PCA as a variance maximizing exercise, this become clearer: when we our PCA on the above data (un-scaled), the amount of variance explained by the different principal components is going to be dominated by variables that are on a larger range.
online_small <- online[1:100,1:10]
biplot(prcomp(online_small,scale = T), cex = 0.8)Use another theme - function fancy_biplot:
source("biplot.R")
fancy_biplot(prcomp(online_small,scale = T))# We would like to analyze 4 data
data.frame(online[c(30,58,67,77),])## Administrative Administrative_Duration Informational
## 30 1 6.000 1
## 58 4 56.000 2
## 67 4 44.000 0
## 77 10 1005.667 0
## Informational_Duration ProductRelated ProductRelated_Duration
## 30 0 45 1582.7500
## 58 120 36 998.7417
## 67 0 90 6951.9722
## 77 0 36 2111.3417
## BounceRates ExitRates PageValues SpecialDay Month OperatingSystems
## 30 0.043478261 0.05082126 54.17976 0.4 Feb 3
## 58 0.000000000 0.01473647 19.44708 0.2 Feb 2
## 67 0.002150538 0.01501303 0.00000 0.0 Feb 4
## 77 0.004347826 0.01449275 11.43941 0.0 Feb 2
## Browser Region TrafficType VisitorType Weekend Revenue
## 30 2 1 1 Returning_Visitor FALSE FALSE
## 58 2 4 1 Returning_Visitor FALSE FALSE
## 67 1 1 3 Returning_Visitor FALSE FALSE
## 77 6 1 2 Returning_Visitor FALSE TRUE
Based on the biplot, we can conclude: Data 58 has big Informational Duration & Informational, almost similar to data 58 is data 30. Data 67 has big ProductRelated_Duration, almost similar to this, Data 77 has big Administrative_Duration & Product Related Duration.
Before we only use small data from ‘online’, next we will use all the data
onlineNum <- online[,1:10]
onlineZ <- scale(onlineNum, center = T, scale = T)
pr <- prcomp(onlineZ)
summary(pr)## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 1.844 1.2943 1.0350 1.0054 0.97009 0.96287 0.6496
## Proportion of Variance 0.340 0.1675 0.1071 0.1011 0.09411 0.09271 0.0422
## Cumulative Proportion 0.340 0.5076 0.6147 0.7158 0.80987 0.90258 0.9448
## PC8 PC9 PC10
## Standard deviation 0.59301 0.35055 0.27858
## Proportion of Variance 0.03517 0.01229 0.00776
## Cumulative Proportion 0.97995 0.99224 1.00000
plot(pr,type = "l")Based on summary and Elbow method, the best cluster or how many PCs that will reflect all the data:
- until PC5, cumulative proportion is quite good: 0.80987
- using Elbow method, after PC3, there is no significant changes again, considering the variance & cumulative proportion, we will test to all the possibilities k = 3-5
Before we move into k means clustering, we would like to use other function to show PCA. using PCA function, we need to define the qualitative data as Factors.
online <- online %>%
mutate(
Weekend = as.factor(Weekend),
Revenue = as.factor(Revenue),
OperatingSystems = as.factor(OperatingSystems),
Browser = as.factor(Browser),
Region = as.factor(Region),
TrafficType = as.factor(TrafficType)
)
prOnlineFacto <- PCA(online, quali.sup= c(11:18) ,scale.unit = T, graph = F)
plot(prOnlineFacto)PCA using quali sup
plot.PCA(prOnlineFacto, choix = "var")
plot.PCA(prOnlineFacto, choix = "ind",habillage = 18, select = "contrib 10", invisible = "quali")PCA using quali sup
online_pca <- PCA(online, quali.sup = c(11:18), graph=F, scale.unit = T)plot.PCA(online_pca, choix = "var")
plot.PCA(online_pca, choix = "ind",habillage = 18, select = "contrib 5", invisible = "quali")PCA using quali sup
data.frame(online[c(5153,10641),])## Administrative Administrative_Duration Informational
## 5153 17 2629.254 24
## 10641 22 1153.682 3
## Informational_Duration ProductRelated ProductRelated_Duration
## 5153 2050.433 705 43171.233
## 10641 108.000 205 4295.305
## BounceRates ExitRates PageValues SpecialDay Month OperatingSystems
## 5153 0.004851285 0.015431438 0.763829 0 May 2
## 10641 0.001746725 0.008801049 177.528825 0 Nov 2
## Browser Region TrafficType VisitorType Weekend Revenue
## 5153 2 1 14 Returning_Visitor TRUE FALSE
## 10641 5 3 3 Returning_Visitor TRUE FALSE
Data 5153 has big Informational_Duration, ProductRelated_Duration, Admministrative_Duration. Data 10641 has low value of Informational.
As per stated above that we will to find the maximum k
set.seed(100)
# k-means with 3 clusters
online_km <- kmeans(onlineZ, 3) #bandingin pake Elbow
online$clust <- as.factor(online_km$cluster)
head(online)## Administrative Administrative_Duration Informational
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## Informational_Duration ProductRelated ProductRelated_Duration
## 1 0 1 0.000000
## 2 0 2 64.000000
## 3 0 1 0.000000
## 4 0 2 2.666667
## 5 0 10 627.500000
## 6 0 19 154.216667
## BounceRates ExitRates PageValues SpecialDay Month OperatingSystems
## 1 0.20000000 0.2000000 0 0 Feb 1
## 2 0.00000000 0.1000000 0 0 Feb 2
## 3 0.20000000 0.2000000 0 0 Feb 4
## 4 0.05000000 0.1400000 0 0 Feb 3
## 5 0.02000000 0.0500000 0 0 Feb 3
## 6 0.01578947 0.0245614 0 0 Feb 2
## Browser Region TrafficType VisitorType Weekend Revenue clust
## 1 1 1 1 Returning_Visitor FALSE FALSE 1
## 2 2 1 2 Returning_Visitor FALSE FALSE 1
## 3 1 9 3 Returning_Visitor FALSE FALSE 1
## 4 2 2 4 Returning_Visitor FALSE FALSE 1
## 5 3 1 4 Returning_Visitor TRUE FALSE 1
## 6 2 1 3 Returning_Visitor FALSE FALSE 1
online_km$centers## Administrative Administrative_Duration Informational
## 1 -0.2332921 -0.1986104 -0.2461845
## 2 -0.4091124 -0.3068585 -0.2469736
## 3 1.4958138 1.2493741 1.4711601
## Informational_Duration ProductRelated ProductRelated_Duration
## 1 -0.1939953 -0.2395940 -0.2202607
## 2 -0.1851529 -0.1616858 -0.1940331
## 3 1.1537882 1.3860745 1.3005984
## BounceRates ExitRates PageValues SpecialDay
## 1 0.03171517 0.05064445 -0.01870258 -0.2916209
## 2 0.27828636 0.38150678 -0.21784532 3.0961874
## 3 -0.33269469 -0.49474113 0.22740737 -0.2257802
online_km$iter## [1] 3
plot.PCA(online_pca, choix=c("ind"), label="none", col.ind= online$clust) #choix = individual
legend("topright", levels(online$clust), pch=19, col=1:4)PCA result for k = 3
Check the Elbow using wss function:
wss <- function(data, maxCluster = 10) {
# Initialize within sum of squares
SSw <- (nrow(data) - 1) * sum(apply(data, 2, var))
SSw <- vector()
for (i in 2:maxCluster) {
SSw[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:maxCluster, SSw, type = "o", xlab = "Number of Clusters", ylab = "Within groups sum of squares", pch=18)
}
wss(onlineZ) # method wssThere are other method to check the maximum k:
fviz_nbclust(onlineZ, kmeans, method = "silhouette") # method silhouetteThe Elbow as a result from wss function, we continue check if use k = 5,
online_km5 <- kmeans(onlineZ, 5)
online_km5$clust <- as.factor(online_km5$cluster)plot.PCA(online_pca, choix=c("ind"), label="none", col.ind=online_km5$clust) #choix = individual
legend("topright", levels(online_km5$clust), pch=19, col=1:4)PCA result for k = 5
online_km5$centers## Administrative Administrative_Duration Informational
## 1 1.310908346 0.99926824 0.37985074
## 2 -0.392526858 -0.30361589 -0.25154885
## 3 1.418098310 1.04250027 2.85570065
## 4 -0.008206934 -0.02623911 -0.09066038
## 5 -0.687222206 -0.45074395 -0.38881809
## Informational_Duration ProductRelated ProductRelated_Duration
## 1 0.05981531 0.55813884 0.47070281
## 2 -0.19196798 -0.24510112 -0.22695383
## 3 3.10802386 2.40576960 2.39539497
## 4 -0.11178168 -0.03912234 -0.01991354
## 5 -0.24492057 -0.65422899 -0.60036955
## BounceRates ExitRates PageValues SpecialDay
## 1 -0.3270197 -0.4801483 0.004715737 -0.15658862
## 2 -0.2318675 -0.1314659 -0.224938955 0.05376042
## 3 -0.3168492 -0.4735970 0.045631040 -0.16623220
## 4 -0.4010185 -0.5847576 3.498575152 -0.24543338
## 5 3.2443481 2.9667157 -0.317164982 0.17710109
online_km5$iter## [1] 6
The value above shows, maybe k = 4 better than 5, we will try below:
online_km4 <- kmeans(onlineZ, 4)
online_km4$clust <- as.factor(online_km4$cluster)plot.PCA(online_pca, choix=c("ind"), label="none", col.ind=online_km4$clust) #choix = individual
legend("topright", levels(online_km4$clust), pch=19, col=1:4)PCA result
online_km4$centers## Administrative Administrative_Duration Informational
## 1 1.4274242 1.0307384 2.7527566
## 2 -0.3908723 -0.3023091 -0.2520712
## 3 1.1999391 0.9131627 0.3178291
## 4 -0.6832144 -0.4490826 -0.3842200
## Informational_Duration ProductRelated ProductRelated_Duration
## 1 2.88843399 2.3721430 2.3532959
## 2 -0.19271784 -0.2376723 -0.2187825
## 3 0.03761746 0.4683853 0.3914884
## 4 -0.24429087 -0.6477109 -0.5973185
## BounceRates ExitRates PageValues SpecialDay
## 1 -0.3142597 -0.4705710 0.08422691 -0.15438352
## 2 -0.2524249 -0.1683578 -0.13622633 0.04090515
## 3 -0.3395200 -0.5021188 0.54799035 -0.18223570
## 4 3.0227322 2.8468297 -0.31716498 0.21394340
online_km4$iter## [1] 5
When we use $iter, we see that k-means take only 3 iterations to converge, stopping at the third iteration: it already identified 4 sufficiently distinct clusters and further iteration wouldn’t improve it any further.
fviz_screeplot(online_pca, addlabels = TRUE, ylim = c(0, 50))var_pca <- get_pca_var(online_pca)
var_pca## Principal Component Analysis Results for variables
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the variables"
## 2 "$cor" "Correlations between variables and dimensions"
## 3 "$cos2" "Cos2 for the variables"
## 4 "$contrib" "contributions of the variables"
head(var_pca$coord)## Dim.1 Dim.2 Dim.3 Dim.4
## Administrative 0.7050389 0.06793628 0.2645375 0.3110102
## Administrative_Duration 0.6069612 0.13871617 0.3323191 0.3665404
## Informational 0.6410573 0.36478844 0.1564255 -0.4746090
## Informational_Duration 0.5454962 0.39458007 0.1462845 -0.6015172
## ProductRelated 0.7588367 0.19252869 -0.4088971 0.2489076
## ProductRelated_Duration 0.7624212 0.24605368 -0.3753875 0.2159479
## Dim.5
## Administrative -0.287321899
## Administrative_Duration -0.378159791
## Informational -0.027365516
## Informational_Duration 0.002172933
## ProductRelated 0.272972429
## ProductRelated_Duration 0.269586543
head(var_pca$contrib)## Dim.1 Dim.2 Dim.3 Dim.4
## Administrative 14.618345 0.2755127 6.532303 9.569755
## Administrative_Duration 10.834125 1.1486620 10.308663 13.292155
## Informational 12.085531 7.9436515 2.284058 22.285560
## Informational_Duration 8.750957 9.2941218 1.997507 35.797086
## ProductRelated 16.934356 2.2127327 15.607020 6.129542
## ProductRelated_Duration 17.094719 3.6140803 13.153804 4.613704
## Dim.5
## Administrative 8.772263e+00
## Administrative_Duration 1.519585e+01
## Informational 7.957589e-02
## Informational_Duration 5.017262e-04
## ProductRelated 7.917932e+00
## ProductRelated_Duration 7.722726e+00
# Graph of variables: default plot
fviz_pca_var(online_pca, col.var = "black")# Control variable colors using their contributions
fviz_pca_var(online_pca, col.var="contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)# Contributions of variables to PC1
fviz_contrib(online_pca, choice = "var", axes = 1, top = 10)# Contributions of variables to PC2
fviz_contrib(online_pca, choice = "var", axes = 2, top = 10)ind_pca <- get_pca_var(online_pca)
ind_pca## Principal Component Analysis Results for variables
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the variables"
## 2 "$cor" "Correlations between variables and dimensions"
## 3 "$cos2" "Cos2 for the variables"
## 4 "$contrib" "contributions of the variables"
head(ind_pca$coord)## Dim.1 Dim.2 Dim.3 Dim.4
## Administrative 0.7050389 0.06793628 0.2645375 0.3110102
## Administrative_Duration 0.6069612 0.13871617 0.3323191 0.3665404
## Informational 0.6410573 0.36478844 0.1564255 -0.4746090
## Informational_Duration 0.5454962 0.39458007 0.1462845 -0.6015172
## ProductRelated 0.7588367 0.19252869 -0.4088971 0.2489076
## ProductRelated_Duration 0.7624212 0.24605368 -0.3753875 0.2159479
## Dim.5
## Administrative -0.287321899
## Administrative_Duration -0.378159791
## Informational -0.027365516
## Informational_Duration 0.002172933
## ProductRelated 0.272972429
## ProductRelated_Duration 0.269586543
head(ind_pca$contrib)## Dim.1 Dim.2 Dim.3 Dim.4
## Administrative 14.618345 0.2755127 6.532303 9.569755
## Administrative_Duration 10.834125 1.1486620 10.308663 13.292155
## Informational 12.085531 7.9436515 2.284058 22.285560
## Informational_Duration 8.750957 9.2941218 1.997507 35.797086
## ProductRelated 16.934356 2.2127327 15.607020 6.129542
## ProductRelated_Duration 17.094719 3.6140803 13.153804 4.613704
## Dim.5
## Administrative 8.772263e+00
## Administrative_Duration 1.519585e+01
## Informational 7.957589e-02
## Informational_Duration 5.017262e-04
## ProductRelated 7.917932e+00
## ProductRelated_Duration 7.722726e+00
# Graph of individuals
# 1. Use repel = TRUE to avoid overplotting
# 2. Control automatically the color of individuals using the cos2
# cos2 = the quality of the individuals on the factor map
# Use points only
# 3. Use gradient color
fviz_pca_ind(online_pca, col.ind = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping (slow if many points)
)Fviz result
Clustering based on Revenue:
fviz_pca_ind(online_pca,
label = "none", # hide individual labels
habillage = online$Revenue, # color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
addEllipses = TRUE # Concentration ellipses
)Clustering based on Visitor type:
fviz_pca_ind(online_pca,
label = "none", # hide individual labels
habillage = online$VisitorType, # color by groups
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
addEllipses = TRUE # Concentration ellipses
)Based on methods above, we can conclude: