Items Sale

Bill Clustering

The dataset is contain bill sample which have amount,nDay.

#sampleData

newData<-read.csv(file = "/home/sushil/Desktop/myWork/PVR/FNB/fnbDummy.csv",sep=",")
dat<-newData %>% select(totalBill)

#Inspect data structure
str(newData)

## 'data.frame':    157 obs. of  4 variables:
##  $ nDay     : int  20171010 20171010 20171010 20171010 20171010 20171010 20171010 20171010 20171010 20171010 ...
##  $ totalBill: int  500 600 6000 550 1500 6500 5450 5000 555 545 ...
##  $ time     : Factor w/ 5 levels "breakfast","dinner",..: 3 3 2 3 2 2 4 4 5 5 ...
##  $ day      : Factor w/ 7 levels "friday","monday",..: 6 7 5 1 3 4 2 6 7 5 ...

#Summarise data
summary(newData)

##       nDay            totalBill           time           day    
##  Min.   :20171010   Min.   : 150   breakfast:10   friday   :30  
##  1st Qu.:20171010   1st Qu.: 565   dinner   :30   monday   :12  
##  Median :20171010   Median :1300   evening  :68   saturday :44  
##  Mean   :20171010   Mean   :2366   lunch    :16   sunday   :30  
##  3rd Qu.:20171010   3rd Qu.:4800   walkin   :33   thursday :12  
##  Max.   :20171018   Max.   :6654                  tuesday  :18  
##                                                   wedensday:11

#newData<-read.csv(file = "/home/sushil/Desktop/myWork/PVR/FNB/FNBTransactions1.csv",sep=",")
#dat<-newData %>% select(total)
#dat<-dat %>% mutate(totalBill=total)


newData<-read.csv(file = "/home/sushil/Desktop/myWork/PVR/FNB/fnbDummy.csv",sep=",")

dat<-newData %>% select(totalBill)


km<-kmeans(dat,3)
dat$cluster<-as.character(km$cluster)
convertTable<-table(km$cluster,dat$totalBill)

plot(dat$totalBill,col=km$cluster)
points(km$centers, col = 1:2, pch = 10, cex = 2)

cluster1<-dat %>% filter(cluster==1)
cluster2<-dat %>% filter(cluster==2)
cluster3<-dat %>% filter(cluster==3)

minCluster1<-min(cluster1$totalBill)
minCluster2<-min(cluster2$totalBill)
minCluster3<-min(cluster3$totalBill)
maxCluster1<-max(cluster1$totalBill)
maxCluster2<-max(cluster2$totalBill)
maxCluster3<-max(cluster3$totalBill)
dataFrame<-data.frame(minCluster1,maxCluster1,minCluster2,maxCluster2,minCluster3,maxCluster3)

clusterSize<-km$size

print(clusterSize)

## [1] 77 31 49

print(dataFrame)

##   minCluster1 maxCluster1 minCluster2 maxCluster2 minCluster3 maxCluster3
## 1         150        1200        1300        3000        4000        6654

Conclusion-

In this we can see maximum bills are generated between 500-800 and second maximum bills are generated between 4000-6000. The minimum bill amount is 150rs and maximum bill amount is 6654.

Where we can use this cluster Size - 1-108 2-34 3-15