K-Means Clustering

Set data path as per your data file location

getwd()

## [1] "E:/R ALGORITHM/UBCF(User-based collaborative filtering)"

setwd("E:/R ALGORITHM/UBCF(User-based collaborative filtering)")

Read the file along with header

Age_income<-read.csv("Age_income.csv",header=TRUE)
print(Age_income)

##     Age Income
## 1    50   1273
## 2    24   1591
## 3    39   1680
## 4    46   1179
## 5    38   1107
## 6    42   1286
## 7    23   1328
## 8    36   1526
## 9    34   1188
## 10   28   1227
## 11   29   1738
## 12   45   1194
## 13   30   1834
## 14   28   1321
## 15   26   1263
## 16   31   1935
## 17   48   1287
## 18   42   1276
## 19   33   1228
## 20   40   1445
## 21   29   1468
## 22   25   1498
## 23   36   1211
## 24   27   1257
## 25   42   1477
## 26   33   1466
## 27   24   1922
## 28   34   1722
## 29   26   1957
## 30   48   1701
## 31   25   1586
## 32   37   1354
## 33   27   1994
## 34   40   1255
## 35   48   1930
## 36   33   1741
## 37   40   1569
## 38   29   1638
## 39   29   1425
## 40   44   1871
## 41   42   1006
## 42   24   1922
## 43   35   1594
## 44   44   1904
## 45   29   1529
## 46   29   1819
## 47   37   1203
## 48   48   1822
## 49   28   1623
## 50   29   1853
## 51   42   1365
## 52   34   1840
## 53   41   1003
## 54   39   1800
## 55   37   1490
## 56   38   1531
## 57   48   1697
## 58   45   1127
## 59   39   1434
## 60   44   1515
## 61   36   1650
## 62   36   1219
## 63   39   1723
## 64   44   1459
## 65   25   1424
## 66   30   1900
## 67   40   1125
## 68   46   1145
## 69   35   1387
## 70   41   1346
## 71   23   1403
## 72   24   1631
## 73   28   1938
## 74   26   1059
## 75   46   1204
## 76   30   1349
## 77   44   1978
## 78   32   1491
## 79   28   1605
## 80   39   1842
## 81   30   1891
## 82   38   1952
## 83   40   1934
## 84   33   1829
## 85   41   1066
## 86   24   1712
## 87   41   1909
## 88   48   1586
## 89   29   1315
## 90   45   2000
## 91   33   1241
## 92   50   1092
## 93   39   1159
## 94   50   1537
## 95   33   1959
## 96   49   1625
## 97   44   1716
## 98   38   1551
## 99   42   1233
## 100  23   1452

AP<-read.csv("Amazon_products.csv",header=TRUE)
print(AP)

##    user    Product       Date       City Age Gender
## 1   101     Iphone 13-04-2016     London  25      1
## 2   102      Nokia 10-04-2016        USA  26      0
## 3   103    Samsung 01-04-2016      CHINA  27      1
## 4   104        HTC 04-04-2016  Singapore  28      0
## 5   105         MI 08-04-2016      China  29      1
## 6   106     Lenovo 07-04-2016      Dubai  22      0
## 7   107 blackberry 13-04-2016     Europe  23      1
## 8   108   Micromax 14-04-2016      INDIA  24      0
## 9   109     Celkon 15-04-2016   Srilanka  21      1
## 10  110      Intex 16-04-2016 Austrailla  20      0
## 11  101     Iphone 14-04-2016     London  25      1
## 12  102      Nokia 10-04-2016        USA  26      0
## 13  103    Samsung 02-04-2016      CHINA  27      1
## 14  104        HTC 05-04-2016  Singapore  28      0
## 15  105         MI 09-04-2016      China  29      1
## 16  106     Lenovo 08-04-2016      Dubai  22      0
## 17  102      Nokia 10-04-2016        USA  26      0
## 18  108   Micromax 15-04-2016      INDIA  24      0
## 19  109     Celkon 16-04-2016   Srilanka  21      1
## 20  102      Nokia 10-04-2016        USA  26      0
##                                                URL Revene
## 1                         https://www.Flipkart.com   1689
## 2                           https://www.amazon.com   1341
## 3                         https://www.snapdeal.com   4395
## 4                        https://www.shopclues.com   3818
## 5                       https://www.Slickdeals.net   1425
## 6                             https://www.ebay.com   4893
## 7  https://www.google.co.in/chromecast/get-offers/   4284
## 8                  http://www.newegg.com/global/in   4783
## 9                           https://www.paytm.com/   4677
## 10                           https://www.yahoo.com   2861
## 11                        https://www.Flipkart.com   3515
## 12                          https://www.amazon.com   4537
## 13                        https://www.snapdeal.com   4804
## 14                       https://www.shopclues.com   2057
## 15                      https://www.Slickdeals.net   1055
## 16                            https://www.ebay.com   3519
## 17                          https://www.amazon.com   4383
## 18                           http://www.newegg.com   1891
## 19                          https://www.paytm.com/   3175
## 20                          https://www.amazon.com   1904

AP1<-AP[,-c(2,4,7)]
AP1<-AP1[,-c(2)]

dim(AP1)

## [1] 20  4

APclust<-kmeans(AP1,3)
APclust1<-cbind(AP1,APclust$cluster)
plot(AP1,col=APclust$cluster)

The data given by x are clustered by the k-means method, which aims to partition the points into k groups such that the sum of squares from points to the assigned cluster centres is minimized. At the minimum, all cluster centres are at the mean of their Voronoi sets (the set of data points which are nearest to the cluster centre).

Value

kmeans returns an object of class “kmeans” which has a print and a fitted method. It

is a list with at least the following components:

cluster A vector of integers (from 1:k) indicating the cluster to which each point is

centers A matrix of cluster centres.

totss The total sum of squares.

withinss Vector of within-cluster sum of squares, one component per cluster.

tot.withinss Total within-cluster sum of squares, i.e. sum(withinss).

betweenss The between-cluster sum of squares, i.e. totss-tot.withinss.

size The number of points in each cluster.

iter The number of (outer) iterations.

ifault integer: indicator of a possible algorithm problem - for experts.

Implementing K_means clustering Model

clust<-kmeans(Age_income,3)
summary(clust)

##              Length Class  Mode   
## cluster      100    -none- numeric
## centers        6    -none- numeric
## totss          1    -none- numeric
## withinss       3    -none- numeric
## tot.withinss   1    -none- numeric
## betweenss      1    -none- numeric
## size           3    -none- numeric
## iter           1    -none- numeric
## ifault         1    -none- numeric

Age_income1<-cbind(Age_income,clust$cluster)
plot(Age_income,col=clust$cluster)
points(clust$centers,col=1:2,pch=8,cex=2)

K Means Clustering

Big Datamatica

16 August 2016

K-Means Clustering

Set data path as per your data file location

Read the file along with header

Value

kmeans returns an object of class “kmeans” which has a print and a fitted method. It

is a list with at least the following components:

cluster A vector of integers (from 1:k) indicating the cluster to which each point is

centers A matrix of cluster centres.

totss The total sum of squares.

withinss Vector of within-cluster sum of squares, one component per cluster.

tot.withinss Total within-cluster sum of squares, i.e. sum(withinss).

betweenss The between-cluster sum of squares, i.e. totss-tot.withinss.

size The number of points in each cluster.

iter The number of (outer) iterations.

ifault integer: indicator of a possible algorithm problem - for experts.

Implementing K_means clustering Model