K-MEANS CLUSTERING

Importar la base de datos

file.choose()

bd12 <- read.csv("C:\\Users\\sofia\\OneDrive\\Documentos\\Usaarrests.csv")

Agregar el lugar como nombre de los renglones

bd13 <- bd12
rownames(bd13)<- bd13$Lugar

Eliminar la columna de Lugar

bd14 <- bd13
bd14 <- subset (bd14, select = -c (Lugar))

Revisar presencia de datos anormales

summary(bd14)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00  
##     cluster    
##  Min.   :1.00  
##  1st Qu.:1.25  
##  Median :2.00  
##  Mean   :2.32  
##  3rd Qu.:3.00  
##  Max.   :4.00
boxplot(bd14)

Se determinó que hay datos anormales en Rape (Fuera del límitee superior), pero no se eliminarán al ser muy cercano a los demás datos.

k-means Clustering

Paso 1. Normalizar variables

bd15 <- bd14
bd15 <- as.data.frame(scale(bd14))

Paso 2. k-means Clustering # fue numero 4, porque eran 4 segmentos

segmentos <- kmeans(bd15, 4)
segmentos
## K-means clustering with 4 clusters of sizes 5, 16, 21, 8
## 
## Cluster means:
##       Murder    Assault   UrbanPop       Rape    cluster
## 1 -0.5299035 -1.0098676 -1.4742901 -0.8575347 -1.2706509
## 2 -0.4894375 -0.3826001  0.5758298 -0.2616538 -0.3080366
## 3  0.9681443  0.9765436  0.1370530  0.7978278  1.0212880
## 4 -1.2313140 -1.1670594 -0.5899924 -1.0350312 -1.2706509
## 
## Clustering vector:
##        Alabama         Alaska        Arizona       Arkansas     California 
##              3              3              3              3              3 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##              3              2              2              3              3 
##         Hawaii          Idaho       Illinois        Indiana           Iowa 
##              2              4              3              2              4 
##         Kansas       Kentucky      Louisiana          Maine       Maryland 
##              2              1              3              4              3 
##  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
##              2              3              4              3              3 
##        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##              1              4              3              4              2 
##     New Mexico       New York North Carolina   North Dakota           Ohio 
##              3              3              3              4              2 
##       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
##              2              2              2              2              3 
##   South Dakota      Tennessee          Texas           Utah        Vermont 
##              1              3              3              2              1 
##       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
##              2              2              1              4              2 
## 
## Within cluster sum of squares by cluster:
## [1]  3.984580 16.212213 55.042281  3.875042
##  (between_SS / total_SS =  67.7 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
asignacion <- cbind(bd14, cluster = segmentos$cluster)
head(asignacion,10)
##             Murder Assault UrbanPop Rape cluster cluster
## Alabama       13.2     236       58 21.2       4       3
## Alaska        10.0     263       48 44.5       3       3
## Arizona        8.1     294       80 31.0       3       3
## Arkansas       8.8     190       50 19.5       4       3
## California     9.0     276       91 40.6       3       3
## Colorado       7.9     204       78 38.7       3       3
## Connecticut    3.3     110       77 11.1       2       2
## Delaware       5.9     238       72 15.8       2       2
## Florida       15.4     335       80 31.9       3       3
## Georgia       17.4     211       60 25.8       4       3

Exportar csv

write.csv(asignacion,"arrestos_segmentados.csv")

Visualizar Segmentos

install.packages(“factoextra”)

library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_cluster(segmentos, data = bd15,
             palette=c("red", "blue", "black", "darkgreen"),
             ellipse.type = "euclid",
             star.plot = T,
             repel = T,
             ggtheme = theme())

Optimizar k

install.packages(“data.table”) install.packages(“cluster”)

library(data.table)
library(cluster)
set.seed(123)
optimizacion <- clusGap(bd15, FUN = kmeans, nstart = 25, K.max = 10, B = 50)
plot(optimizacion, xlab = "Numero de clusters k")

Conclusion

En este analisis de datos realizamos 4 clusters, y encontramos que los estados más cercanos al eje son en los que que hay una mayor cantidad de crimenes, siendo estos principalmente California, Nevada, New York, Arizona y Colorado. En cambio, los que están mas lejos del eje, son los más seguros o los que menos cantidad de crimenes tienen, tales como West virginia, Vermont y North Dakota.

LS0tDQp0aXRsZTogPHNwYW4gc3R5bGU9IkNvbG9yOlJlZCI+ICJVU0FBUlJFU1QiDQphdXRob3I6ICJBbmEgQXJ2aXp1LSBBMDE0MTIyMjAiDQpkYXRlOiAiMjAyMi0wOS0wOCINCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDoNCiAgICB0b2M6IHRydWUNCiAgICB0b2NfZmxvYXQ6IHRydWUNCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlDQotLS0NCg0KIVtdKGh0dHBzOi8vZGVmaW5pY2lvbmFiYy5jb20vd3AtY29udGVudC91cGxvYWRzL2RlcmVjaG8vQXJyZXN0by1Eb21pY2lsaWFyaW8uanBnKQ0KDQojIyBLLU1FQU5TIENMVVNURVJJTkcNCg0KIyMjIyBJbXBvcnRhciBsYSBiYXNlIGRlIGRhdG9zDQpmaWxlLmNob29zZSgpDQpgYGB7cn0NCmJkMTIgPC0gcmVhZC5jc3YoIkM6XFxVc2Vyc1xcc29maWFcXE9uZURyaXZlXFxEb2N1bWVudG9zXFxVc2FhcnJlc3RzLmNzdiIpDQpgYGANCg0KIyMjIyBBZ3JlZ2FyIGVsIGx1Z2FyIGNvbW8gbm9tYnJlIGRlIGxvcyByZW5nbG9uZXMNCmBgYHtyfQ0KYmQxMyA8LSBiZDEyDQpyb3duYW1lcyhiZDEzKTwtIGJkMTMkTHVnYXINCmBgYA0KDQojIyMjIEVsaW1pbmFyIGxhIGNvbHVtbmEgZGUgTHVnYXINCmBgYHtyfQ0KYmQxNCA8LSBiZDEzDQpiZDE0IDwtIHN1YnNldCAoYmQxNCwgc2VsZWN0ID0gLWMgKEx1Z2FyKSkNCmBgYA0KDQojIyMjIFJldmlzYXIgcHJlc2VuY2lhIGRlIGRhdG9zIGFub3JtYWxlcw0KYGBge3J9DQpzdW1tYXJ5KGJkMTQpDQpib3hwbG90KGJkMTQpDQpgYGANCg0KU2UgZGV0ZXJtaW7DsyBxdWUgaGF5IGRhdG9zIGFub3JtYWxlcyBlbiBSYXBlIChGdWVyYSBkZWwgbMOtbWl0ZWUgc3VwZXJpb3IpLCBwZXJvIG5vIHNlIGVsaW1pbmFyw6FuIGFsIHNlciBtdXkgY2VyY2FubyBhIGxvcyBkZW3DoXMgZGF0b3MuDQoNCiMjIyBrLW1lYW5zIENsdXN0ZXJpbmcNCg0KIyMjIyBQYXNvIDEuIE5vcm1hbGl6YXIgdmFyaWFibGVzDQpgYGB7cn0NCmJkMTUgPC0gYmQxNA0KYmQxNSA8LSBhcy5kYXRhLmZyYW1lKHNjYWxlKGJkMTQpKQ0KYGBgDQoNCiMjIyMgUGFzbyAyLiBrLW1lYW5zIENsdXN0ZXJpbmcgIyBmdWUgbnVtZXJvIDQsIHBvcnF1ZSBlcmFuIDQgc2VnbWVudG9zDQpgYGB7cn0NCnNlZ21lbnRvcyA8LSBrbWVhbnMoYmQxNSwgNCkNCnNlZ21lbnRvcw0KDQphc2lnbmFjaW9uIDwtIGNiaW5kKGJkMTQsIGNsdXN0ZXIgPSBzZWdtZW50b3MkY2x1c3RlcikNCmhlYWQoYXNpZ25hY2lvbiwxMCkNCmBgYA0KDQojIyMjIEV4cG9ydGFyIGNzdg0KYGBge3J9DQp3cml0ZS5jc3YoYXNpZ25hY2lvbiwiYXJyZXN0b3Nfc2VnbWVudGFkb3MuY3N2IikNCmBgYA0KDQojIyMjIFZpc3VhbGl6YXIgU2VnbWVudG9zDQppbnN0YWxsLnBhY2thZ2VzKCJmYWN0b2V4dHJhIikNCg0KYGBge3J9DQpsaWJyYXJ5KGZhY3RvZXh0cmEpDQpmdml6X2NsdXN0ZXIoc2VnbWVudG9zLCBkYXRhID0gYmQxNSwNCiAgICAgICAgICAgICBwYWxldHRlPWMoInJlZCIsICJibHVlIiwgImJsYWNrIiwgImRhcmtncmVlbiIpLA0KICAgICAgICAgICAgIGVsbGlwc2UudHlwZSA9ICJldWNsaWQiLA0KICAgICAgICAgICAgIHN0YXIucGxvdCA9IFQsDQogICAgICAgICAgICAgcmVwZWwgPSBULA0KICAgICAgICAgICAgIGdndGhlbWUgPSB0aGVtZSgpKQ0KYGBgDQoNCiMjIyMgT3B0aW1pemFyIGsNCmluc3RhbGwucGFja2FnZXMoImRhdGEudGFibGUiKQ0KaW5zdGFsbC5wYWNrYWdlcygiY2x1c3RlciIpDQpgYGB7cn0NCmxpYnJhcnkoZGF0YS50YWJsZSkNCmxpYnJhcnkoY2x1c3RlcikNCnNldC5zZWVkKDEyMykNCm9wdGltaXphY2lvbiA8LSBjbHVzR2FwKGJkMTUsIEZVTiA9IGttZWFucywgbnN0YXJ0ID0gMjUsIEsubWF4ID0gMTAsIEIgPSA1MCkNCnBsb3Qob3B0aW1pemFjaW9uLCB4bGFiID0gIk51bWVybyBkZSBjbHVzdGVycyBrIikNCmBgYA0KDQojIyMgQ29uY2x1c2lvbg0KRW4gZXN0ZSBhbmFsaXNpcyBkZSBkYXRvcyByZWFsaXphbW9zIDQgY2x1c3RlcnMsIHkgZW5jb250cmFtb3MgcXVlIGxvcyBlc3RhZG9zIG3DoXMgY2VyY2Fub3MgYWwgZWplIHNvbiBlbiBsb3MgcXVlIHF1ZSBoYXkgdW5hIG1heW9yIGNhbnRpZGFkIGRlIGNyaW1lbmVzLCBzaWVuZG8gZXN0b3MgcHJpbmNpcGFsbWVudGUgQ2FsaWZvcm5pYSwgTmV2YWRhLCBOZXcgWW9yaywgQXJpem9uYSB5IENvbG9yYWRvLiBFbiBjYW1iaW8sIGxvcyBxdWUgZXN0w6FuIG1hcyBsZWpvcyBkZWwgZWplLCBzb24gbG9zIG3DoXMgc2VndXJvcyBvIGxvcyBxdWUgbWVub3MgY2FudGlkYWQgZGUgY3JpbWVuZXMgdGllbmVuLCB0YWxlcyBjb21vIFdlc3QgdmlyZ2luaWEsIFZlcm1vbnQgeSBOb3J0aCBEYWtvdGEuDQo=