Introduction

The all.us.crime.1970 dataset from cluster.dataset package records city crime along with population statistics. All rate variables are per 100,000 population.

Number of Attributes : 10

Number of Instances : 24

Attribute Characteristics: Character & Integer

Objective

The major aim is to perform clustering analysis using algorithms like hClust,kMeans,mclust,CLARA,Agnes and provide inferences accordingly to segment crimes based on their rates in the various cities according to the distribution of population which may be used to detect the crimes. Cluster analysis can be used to identify areas where there are greater incidences of particular types of crime. By identifying these distinct areas or “hot spots” where a similar crime has happened over a period of time, it is possible to manage law enforcement resources more effectively.

Attribute Information

Number of Attributes: 10 Number of Instances : 24

ATTRIBUTES	DESCRIPTION
City	Character vector for the city name
Population	Numeric vector for th epopulation in thousands
white.change	Numeric vector for the % change in inner city white population from 1960 to 1970
black.population	Numeric vector for the black population in thousands
Murder	Numeric vector for the murder rate
Rape	Numeric vector for the rape rate
Robbery	Numeric vector for the robbery rate
Assault	Numeric vector for the assault rate
Burglary	Numeric vector for the burglary rate
Car Theft	Numeric vector for the Car theft rate

Loading the data and finding the summary

## [1] "C:/Users/lavanyamahadevan/Desktop/Project2"

##      city             population     white.change     black.population
##  Length:24          Min.   : 1268   Min.   :-39.400   Min.   :  39.0  
##  Class :character   1st Qu.: 1416   1st Qu.:-20.875   1st Qu.: 117.5  
##  Mode  :character   Median : 2024   Median :-13.450   Median : 302.0  
##                     Mean   : 2932   Mean   : -8.304   Mean   : 452.8  
##                     3rd Qu.: 2923   3rd Qu.:  6.750   3rd Qu.: 585.5  
##                     Max.   :11529   Max.   : 50.800   Max.   :2080.0  
##      murder            rape          robbery         assault     
##  Min.   : 2.600   Min.   : 5.70   Min.   : 53.0   Min.   : 63.0  
##  1st Qu.: 4.400   1st Qu.:16.40   1st Qu.:142.8   1st Qu.:106.8  
##  Median : 9.350   Median :20.20   Median :243.0   Median :157.0  
##  Mean   : 9.188   Mean   :23.18   Mean   :277.9   Mean   :187.8  
##  3rd Qu.:13.525   3rd Qu.:28.10   3rd Qu.:351.8   3rd Qu.:232.2  
##  Max.   :18.400   Max.   :50.00   Max.   :665.0   Max.   :421.0  
##     burglary      car.theft     
##  Min.   : 499   Min.   : 348.0  
##  1st Qu.: 854   1st Qu.: 523.2  
##  Median :1333   Median : 684.0  
##  Mean   :1313   Mean   : 679.6  
##  3rd Qu.:1660   3rd Qu.: 795.8  
##  Max.   :2164   Max.   :1208.0

## 'data.frame':    24 obs. of  10 variables:
##  $ city            : chr  "Anaheim" "Baltimore" "Boston" "Buffalo" ...
##  $ population      : num  1420 2071 2754 1349 6979 ...
##  $ white.change    : num  50.8 -21.4 -16.5 -20.7 -18.6 -17.2 -26.5 14.2 -29.1 25.5 ...
##  $ black.population: num  39 501 151 118 1306 ...
##  $ murder          : num  2.7 13.2 4.4 5.7 12.9 6.4 14.5 18.4 14.7 16.9 ...
##  $ rape            : num  21.9 34.9 14.8 13.7 25.4 16.8 18.7 41 31.1 27.1 ...
##  $ robbery         : num  94 564 136 145 363 120 288 206 649 335 ...
##  $ assault         : num  103 396 95 111 233 107 132 338 223 183 ...
##  $ burglary        : num  1607 1351 1054 862 830 ...
##  $ car.theft       : num  377 701 984 448 708 ...

There are 10 attributes for the crimes in 24 cities for which data is available; around 2500 crimes took place per 1,00,000 residents which means that about 2.5% of the population was a victim of the crime that year.The main objective aim is to know if cities differ in relation to the crimes that are committed. So , we will only consider dimensions related to crime which is attributes 5 to 10.

##           murder  rape robbery assault burglary car.theft
## murder     1.000 0.526   0.638   0.709    0.353     0.495
## rape       0.526 1.000   0.414   0.667    0.694     0.410
## robbery    0.638 0.414   1.000   0.699    0.551     0.559
## assault    0.709 0.667   0.699   1.000    0.596     0.428
## burglary   0.353 0.694   0.551   0.596    1.000     0.382
## car.theft  0.495 0.410   0.559   0.428    0.382     1.000

We find a strong positive association between rate of crimes such as burglary and rape and weaker for murder and burglary and so more of one crime is committed more the others are likely to be committed.

Algorithm 1:hClust

Considering all the continuous variables and removing the discrete ones we plot the dendogram for the all.us.city.crime.1970 data and provide inferences as below.

##            
## groups.3_cc Anaheim Baltimore Boston Buffalo Chicago Cincinnatti Cleveland
##           1       1         0      0       1       0           1         0
##           2       0         1      0       0       0           0         0
##           3       0         0      1       0       1           0         1
##            
## groups.3_cc Dallas Detroit Houston Los Angeles Miami Milwaukee Minneapolis
##           1      0       0       0           0     0         1           1
##           2      1       1       0           1     1         0           0
##           3      0       0       1           0     0         0           0
##            
## groups.3_cc New York Newark Paterson Philadelphia Pittsburgh San Diego
##           1        0      0        1            1          1         1
##           2        1      0        0            0          0         0
##           3        0      1        0            0          0         0
##            
## groups.3_cc San Francisco Seattle St Louis Washington
##           1             0       1        0          0
##           2             1       0        0          0
##           3             0       0        1          1

##  [1] "Anaheim"      "Buffalo"      "Cincinnatti"  "Milwaukee"   
##  [5] "Minneapolis"  "Paterson"     "Philadelphia" "Pittsburgh"  
##  [9] "San Diego"    "Seattle"

## [[1]]
##  [1] "Anaheim"      "Buffalo"      "Cincinnatti"  "Milwaukee"   
##  [5] "Minneapolis"  "Paterson"     "Philadelphia" "Pittsburgh"  
##  [9] "San Diego"    "Seattle"     
## 
## [[2]]
## [1] "Baltimore"     "Dallas"        "Detroit"       "Los Angeles"  
## [5] "Miami"         "New York"      "San Francisco"
## 
## [[3]]
## [1] "Boston"     "Chicago"    "Cleveland"  "Houston"    "Newark"    
## [6] "St Louis"   "Washington"

##            
## groups.3_cc 1268 1349 1358 1359 1385 1404 1420 1422 1556 1814 1857 1985
##           1    0    1    1    1    1    1    1    1    0    1    0    0
##           2    1    0    0    0    0    0    0    0    1    0    0    0
##           3    0    0    0    0    0    0    0    0    0    0    1    1
##            
## groups.3_cc 2064 2071 2363 2401 2754 2861 3110 4200 4818 6979 7032 11529
##           1    0    0    0    1    0    0    0    0    1    0    0     0
##           2    0    1    0    0    0    0    1    1    0    0    1     1
##           3    1    0    1    0    1    1    0    0    0    1    0     0

Algorithm 2:Agglomerative Nesting (Hierarchial Clustering)

## Call:     agnes(x = data_usc, metric = "euclidean", method = "complete") 
## Agglomerative coefficient:  0.8619738 
## Order of objects:
##  [1]  1 23  2 24  8 10 20 14 16  9 15 12 11 21  3  7  4 17 18 19  6 22  5
## [24] 13
## Height (summary):
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   66.39  212.60  337.40  448.40  497.40 1787.00 
## 
## Available components:
## [1] "order"  "height" "ac"     "merge"  "diss"   "call"   "method" "data"

## groups_3
##  1  2  3 
##  9 10  5

Algorithm 3:kMeans

Considering two and three clusters as k in the kmeans algorithm we find the centers and sizes as shown below.

## [1] 10

##  [1] "city"             "population"       "white.change"    
##  [4] "black.population" "murder"           "rape"            
##  [7] "robbery"          "assault"          "burglary"        
## [10] "car.theft"

##      city             population     white.change     black.population
##  Length:24          Min.   : 1268   Min.   :-39.400   Min.   :  39.0  
##  Class :character   1st Qu.: 1416   1st Qu.:-20.875   1st Qu.: 117.5  
##  Mode  :character   Median : 2024   Median :-13.450   Median : 302.0  
##                     Mean   : 2932   Mean   : -8.304   Mean   : 452.8  
##                     3rd Qu.: 2923   3rd Qu.:  6.750   3rd Qu.: 585.5  
##                     Max.   :11529   Max.   : 50.800   Max.   :2080.0  
##      murder            rape          robbery         assault     
##  Min.   : 2.600   Min.   : 5.70   Min.   : 53.0   Min.   : 63.0  
##  1st Qu.: 4.400   1st Qu.:16.40   1st Qu.:142.8   1st Qu.:106.8  
##  Median : 9.350   Median :20.20   Median :243.0   Median :157.0  
##  Mean   : 9.188   Mean   :23.18   Mean   :277.9   Mean   :187.8  
##  3rd Qu.:13.525   3rd Qu.:28.10   3rd Qu.:351.8   3rd Qu.:232.2  
##  Max.   :18.400   Max.   :50.00   Max.   :665.0   Max.   :421.0  
##     burglary      car.theft     
##  Min.   : 499   Min.   : 348.0  
##  1st Qu.: 854   1st Qu.: 523.2  
##  Median :1333   Median : 684.0  
##  Mean   :1313   Mean   : 679.6  
##  3rd Qu.:1660   3rd Qu.: 795.8  
##  Max.   :2164   Max.   :1208.0

## [1] 11 13

##       murder       rape    robbery    assault   burglary  car.theft
## 1 -0.9128346 -0.6991864 -0.8438639 -0.8328348 -0.5708682 -0.7166146
## 2  0.7723985  0.5916192  0.7140387  0.7047064  0.4830424  0.6063662

## [1]  6  7 11

##       murder       rape    robbery     assault   burglary  car.theft
## 1  0.8216938  0.1519491  0.4068364  0.05895427 -0.1660787  0.6362507
## 2  0.7301454  0.9684793  0.9773549  1.25820822  1.0394318  0.5807509
## 3 -0.9128346 -0.6991864 -0.8438639 -0.83283483 -0.5708682 -0.7166146

## K-means clustering with 3 clusters of sizes 17, 3, 4
## 
## Cluster means:
##   population white.change black.population    murder     rape  robbery
## 1   1754.706    -6.076471         207.7059  8.470588 20.51765 215.3529
## 2   8513.333    -7.733333        1470.6667 10.933333 31.76667 445.0000
## 3   3747.250   -18.200000         731.2500 10.925000 28.05000 418.5000
##    assault burglary car.theft
## 1 165.5294 1208.882  629.1176
## 2 296.6667 1544.000  866.6667
## 3 201.0000 1584.250  754.0000
## 
## Clustering vector:
##  [1] 1 1 1 1 2 1 1 1 3 1 2 1 1 1 2 1 1 3 1 1 3 1 1 3
## 
## Within cluster sum of squares by cluster:
## [1]  7816837 15138818  4039346
##  (between_SS / total_SS =  82.3 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"

##                
##                 1 2 3
##   Anaheim       1 0 0
##   Baltimore     1 0 0
##   Boston        1 0 0
##   Buffalo       1 0 0
##   Chicago       0 1 0
##   Cincinnatti   1 0 0
##   Cleveland     1 0 0
##   Dallas        1 0 0
##   Detroit       0 0 1
##   Houston       1 0 0
##   Los Angeles   0 1 0
##   Miami         1 0 0
##   Milwaukee     1 0 0
##   Minneapolis   1 0 0
##   New York      0 1 0
##   Newark        1 0 0
##   Paterson      1 0 0
##   Philadelphia  0 0 1
##   Pittsburgh    1 0 0
##   San Diego     1 0 0
##   San Francisco 0 0 1
##   Seattle       1 0 0
##   St Louis      1 0 0
##   Washington    0 0 1

Algorithm 4:mclust

## 'Mclust' model object:
##  best model: ellipsoidal, equal shape (VEV) with 4 components

## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm 
## ----------------------------------------------------
## 
## Mclust VEV (ellipsoidal, equal shape) model with 4 components:
## 
##  log.likelihood  n df       BIC       ICL
##       -607.1122 24 96 -1519.318 -1519.318
## 
## Clustering table:
## 1 2 3 4 
## 7 6 6 5

The above plots represent BIC,Classification,uncertainity and density .The ellipses superimposed on the claasification and uncertaininty plots correspond to the covariances of the components.

Algorithm 5:PAM

Partitioning around the medoids considering the dataset with 5 to 10 attributes and applying the distance to find the representative objects.

## Medoids:
##      ID murder rape robbery assault burglary car.theft
## [1,] 16    9.5 20.5     333     182     1315       667
## [2,] 18    9.3 15.2     173     123      754       534
## [3,] 12   15.6 17.0     427     421     1858       781
## Clustering vector:
##  [1] 1 1 1 2 2 2 2 1 3 1 3 3 2 1 3 1 2 2 2 1 3 2 3 1
## Objective function:
##   build    swap 
## 269.973 269.973 
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"

## Medoids:
##      ID murder rape robbery assault burglary car.theft
## [1,] 16    9.5 20.5     333     182     1315       667
## [2,] 18    9.3 15.2     173     123      754       534
## [3,] 12   15.6 17.0     427     421     1858       781
## Clustering vector:
##  [1] 1 1 1 2 2 2 2 1 3 1 3 3 2 1 3 1 2 2 2 1 3 2 3 1
## Objective function:
##   build    swap 
## 269.973 269.973 
## 
## Numerical information per cluster:
##      size max_diss  av_diss diameter separation
## [1,]    9 482.4668 281.1015 822.2769   278.6252
## [2,]    9 687.6084 240.3805 901.0808   278.6252
## [3,]    6 481.9436 297.6690 681.6615   412.4179
## 
## Isolated clusters:
##  L-clusters: character(0)
##  L*-clusters: character(0)
## 
## Silhouette plot information:
##    cluster neighbor   sil_width
## 16       1        2  0.48755135
## 20       1        3  0.43396962
## 24       1        3  0.38830077
## 10       1        3  0.38043351
## 2        1        3  0.35380456
## 8        1        3  0.28000182
## 1        1        3  0.23814081
## 14       1        2  0.13758391
## 3        1        2 -0.04308671
## 19       2        1  0.62921603
## 18       2        1  0.62602037
## 17       2        1  0.61044615
## 4        2        1  0.58526430
## 13       2        1  0.54401883
## 6        2        1  0.49983969
## 22       2        1  0.37379915
## 5        2        1  0.32675739
## 7        2        1  0.05267744
## 21       3        1  0.52770625
## 11       3        1  0.49349429
## 9        3        1  0.47305853
## 12       3        1  0.39665116
## 15       3        1  0.36446554
## 23       3        1  0.18961179
## Average silhouette width per cluster:
## [1] 0.2951888 0.4720044 0.4074979
## Average silhouette width of total data set:
## [1] 0.3895719
## 
## 276 dissimilarities, summarized :
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   66.389  460.020  696.110  731.100  937.650 1786.500 
## Metric :  euclidean 
## Number of objects : 24
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"

##      murder rape robbery assault burglary car.theft
## [1,]    9.5 20.5     333     182     1315       667
## [2,]    9.3 15.2     173     123      754       534
## [3,]   15.6 17.0     427     421     1858       781

##         
## groups.3 1 2 3
##        1 2 7 1
##        2 2 0 5
##        3 5 2 0

##         
## groups.3 Anaheim Baltimore Boston Buffalo Chicago Cincinnatti Cleveland
##        1       1         0      0       1       0           1         0
##        2       0         1      0       0       0           0         0
##        3       0         0      1       0       1           0         1
##         
## groups.3 Dallas Detroit Houston Los Angeles Miami Milwaukee Minneapolis
##        1      0       0       0           0     0         1           1
##        2      1       1       0           1     1         0           0
##        3      0       0       1           0     0         0           0
##         
## groups.3 New York Newark Paterson Philadelphia Pittsburgh San Diego
##        1        0      0        1            1          1         1
##        2        1      0        0            0          0         0
##        3        0      1        0            0          0         0
##         
## groups.3 San Francisco Seattle St Louis Washington
##        1             0       1        0          0
##        2             1       0        0          0
##        3             0       0        1          1

## [1] "Anaheim"     "Minneapolis"

##  [1] "Baltimore"     "Boston"        "Buffalo"       "Chicago"      
##  [5] "Cincinnatti"   "Cleveland"     "Dallas"        "Detroit"      
##  [9] "Houston"       "Los Angeles"   "Miami"         "Milwaukee"    
## [13] "New York"      "Newark"        "Paterson"      "Philadelphia" 
## [17] "Pittsburgh"    "St Louis"      "San Francisco" "San Diego"    
## [21] "Seattle"       "Washington"

The structure is weak and could be artificial so it needs to be maximized to have a reasonable or strong structure.

Algortithm 6:CLARA

Compared to other partitioning methods such as pam, it can deal with much larger datasets.It also tries to find k representative objects that are centrally located in the cluster.

## Call:     clara(x = data[5:10], k = 3) 
## Medoids:
##      murder rape robbery assault burglary car.theft
## [1,]    9.5 20.5     333     182     1315       667
## [2,]    9.3 15.2     173     123      754       534
## [3,]   15.6 17.0     427     421     1858       781
## Objective function:   269.973
## Clustering vector:    int [1:24] 1 1 1 2 2 2 2 1 3 1 3 3 2 1 3 1 2 2 ...
## Cluster sizes:            9 9 6 
## Best sample:
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24
## 
## Available components:
##  [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
##  [6] "clusinfo"   "diss"       "call"       "silinfo"    "data"

## Object of class 'clara' from call:
##  clara(x = data[5:10], k = 3) 
## Medoids:
##      murder rape robbery assault burglary car.theft
## [1,]    9.5 20.5     333     182     1315       667
## [2,]    9.3 15.2     173     123      754       534
## [3,]   15.6 17.0     427     421     1858       781
## Objective function:    269.973 
## Numerical information per cluster:
##      size max_diss  av_diss isolation
## [1,]    9 482.4668 281.1015 0.8024201
## [2,]    9 687.6084 240.3805 1.1436036
## [3,]    6 481.9436 297.6690 0.7882187
## Average silhouette width per cluster:
## [1] 0.2951888 0.4720044 0.4074979
## Average silhouette width of best sample: 0.3895719 
## 
## Best sample:
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24
## Clustering vector:
##  [1] 1 1 1 2 2 2 2 1 3 1 3 3 2 1 3 1 2 2 2 1 3 2 3 1
## 
## Silhouette plot information for best sample:
##    cluster neighbor   sil_width
## 16       1        2  0.48755135
## 20       1        3  0.43396962
## 24       1        3  0.38830077
## 10       1        3  0.38043351
## 2        1        3  0.35380456
## 8        1        3  0.28000182
## 1        1        3  0.23814081
## 14       1        2  0.13758391
## 3        1        2 -0.04308671
## 19       2        1  0.62921603
## 18       2        1  0.62602037
## 17       2        1  0.61044615
## 4        2        1  0.58526430
## 13       2        1  0.54401883
## 6        2        1  0.49983969
## 22       2        1  0.37379915
## 5        2        1  0.32675739
## 7        2        1  0.05267744
## 21       3        1  0.52770625
## 11       3        1  0.49349429
## 9        3        1  0.47305853
## 12       3        1  0.39665116
## 15       3        1  0.36446554
## 23       3        1  0.18961179
## 
## 276 dissimilarities, summarized :
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   66.389  460.020  696.110  731.100  937.650 1786.500 
## Metric :  euclidean 
## Number of objects : 24
## 
## Available components:
##  [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
##  [6] "clusinfo"   "diss"       "call"       "silinfo"    "data"

##      murder rape robbery assault burglary car.theft
## [1,]    9.5 20.5     333     182     1315       667
## [2,]    9.3 15.2     173     123      754       534
## [3,]   15.6 17.0     427     421     1858       781

The structure is weak and could be artificial so it needs to be maximized to have a reasonable or strong structure.

## Call:     clara(x = data[5:10], k = 2) 
## Medoids:
##      murder rape robbery assault burglary car.theft
## [1,]   16.9 27.1     335     183     1532       741
## [2,]    9.3 15.2     173     123      754       534
## Objective function:   330.7949
## Clustering vector:    int [1:24] 1 1 2 2 2 2 2 1 1 1 1 1 2 1 1 1 2 2 ...
## Cluster sizes:            14 10 
## Best sample:
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24
## 
## Available components:
##  [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
##  [6] "clusinfo"   "diss"       "call"       "silinfo"    "data"

## Object of class 'clara' from call:
##  clara(x = data[5:10], k = 2) 
## Medoids:
##      murder rape robbery assault burglary car.theft
## [1,]   16.9 27.1     335     183     1532       741
## [2,]    9.3 15.2     173     123      754       534
## Objective function:    330.7949 
## Numerical information per cluster:
##      size max_diss  av_diss isolation
## [1,]   14 669.6429 373.7723 0.8131524
## [2,]   10 687.6084 270.6267 0.8349680
## Average silhouette width per cluster:
## [1] 0.3977912 0.5555761
## Average silhouette width of best sample: 0.4635349 
## 
## Best sample:
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24
## Clustering vector:
##  [1] 1 1 2 2 2 2 2 1 1 1 1 1 2 1 1 1 2 2 2 1 1 2 1 1
## 
## Silhouette plot information for best sample:
##    cluster neighbor  sil_width
## 12       1        2  0.5897181
## 11       1        2  0.5636814
## 9        1        2  0.5579632
## 15       1        2  0.5399762
## 21       1        2  0.5365700
## 23       1        2  0.5024918
## 10       1        2  0.4944387
## 8        1        2  0.4511489
## 24       1        2  0.4156059
## 20       1        2  0.4012990
## 2        1        2  0.3011058
## 1        1        2  0.2959944
## 16       1        2  0.1764405
## 14       1        2 -0.2573574
## 18       2        1  0.6948371
## 19       2        1  0.6933319
## 17       2        1  0.6830896
## 4        2        1  0.6631602
## 13       2        1  0.6093847
## 6        2        1  0.5918047
## 22       2        1  0.5274185
## 5        2        1  0.5147949
## 7        2        1  0.3043874
## 3        2        1  0.2735515
## 
## 276 dissimilarities, summarized :
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   66.389  460.020  696.110  731.100  937.650 1786.500 
## Metric :  euclidean 
## Number of objects : 24
## 
## Available components:
##  [1] "sample"     "medoids"    "i.med"      "clustering" "objective" 
##  [6] "clusinfo"   "diss"       "call"       "silinfo"    "data"

##      murder rape robbery assault burglary car.theft
## [1,]   16.9 27.1     335     183     1532       741
## [2,]    9.3 15.2     173     123      754       534

The Silhoutte width is increased considerably to provide an optimal solution of clusters.

Plotting Cluster Solutions:

It is always a good idea to look at the cluster results. First, considering all the variables except city and population details in k-means algorithm we find that the components show a point variability of 75.74%

And now if we consider only the cluster plot for the set of attributes like Murder,Robbery and car.theft in PAM as shown below represents a point variability of 88.25%.

Comparison of Algorithms - Clustering Validation:

## 
## Clustering Methods:
##  hierarchical kmeans pam clara agnes 
## 
## Cluster sizes:
##  2 3 4 5 6 
## 
## Validation Measures:
##                                  2       3       4       5       6
##                                                                   
## hierarchical Connectivity   5.8615  9.3044 16.1234 20.0956 21.5956
##              Dunn           0.2659  0.2659  0.3445  0.4874  0.4874
##              Silhouette     0.4635  0.4325  0.4254  0.4394  0.3854
## kmeans       Connectivity   6.2198 10.5294 16.9778 21.4079 23.5329
##              Dunn           0.2452  0.2459  0.4032  0.4634  0.4634
##              Silhouette     0.4815  0.4382  0.4411  0.4284  0.3845
## pam          Connectivity   5.8615 15.8738 18.2028 21.4079 25.6913
##              Dunn           0.2659  0.3092  0.2981  0.4634  0.4634
##              Silhouette     0.4635  0.3896  0.4203  0.4284  0.3878
## clara        Connectivity   5.8615 15.8738 18.2028 21.4079 28.9750
##              Dunn           0.2659  0.3092  0.2981  0.4634  0.2127
##              Silhouette     0.4635  0.3896  0.4203  0.4284  0.3240
## agnes        Connectivity   5.8615  9.3044 16.1234 20.0956 21.5956
##              Dunn           0.2659  0.2659  0.3445  0.4874  0.4874
##              Silhouette     0.4635  0.4325  0.4254  0.4394  0.3854
## 
## Optimal Scores:
## 
##              Score  Method       Clusters
## Connectivity 5.8615 hierarchical 2       
## Dunn         0.4874 hierarchical 5       
## Silhouette   0.4815 kmeans       2

Conclusion:

The various clustering analysis algorithm have been executed and we find the rates of crimes occuring in various cities enables crime analysis in US.

CLUSTERING ANALYSIS - US CITY CRIME 1970 DATASET

Lavanya M, Jenifer PK & Reyne Beatrice

19th March 2016