JuveYell

a

Inicio

data("USArrests")
summary(USArrests)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00

#Revisar presencia de datos anormales

boxplot(USArrests)

#Se determino que hay datos anormales en Rape (Fuera del Limite Superior), pero
# No se eliminarán al ser muy cercanos a los demás datos

k-means Clustering

Paso 1. Normalizar variables

bd1 <- USArrests
bd1 <- as.data.frame(scale(USArrests))

Paso 2. k-means Clustering

segmentos <- kmeans(bd1, 4)
segmentos
## K-means clustering with 4 clusters of sizes 12, 8, 1, 29
## 
## Cluster means:
##       Murder    Assault    UrbanPop        Rape
## 1  0.7106707  1.0338263  0.88383713  1.17633437
## 2  1.4118898  0.8743346 -0.81452109  0.01927104
## 3  0.5078625  1.1068225 -1.21176419  2.48420294
## 4 -0.7010700 -0.7071522 -0.09924526 -0.57773737
## 
## Clustering vector:
##        Alabama         Alaska        Arizona       Arkansas     California 
##              2              3              1              2              1 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##              1              4              4              1              2 
##         Hawaii          Idaho       Illinois        Indiana           Iowa 
##              4              4              1              4              4 
##         Kansas       Kentucky      Louisiana          Maine       Maryland 
##              4              4              2              4              1 
##  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
##              4              1              4              2              1 
##        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##              4              4              1              4              4 
##     New Mexico       New York North Carolina   North Dakota           Ohio 
##              1              1              2              4              4 
##       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
##              4              4              4              4              2 
##   South Dakota      Tennessee          Texas           Utah        Vermont 
##              4              2              1              4              4 
##       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
##              4              4              4              4              4 
## 
## Within cluster sum of squares by cluster:
## [1] 14.246875  8.316061  0.000000 53.354791
##  (between_SS / total_SS =  61.3 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
asignacion <- cbind(USArrests, cluster = segmentos$cluster)
head(asignacion,10)
##             Murder Assault UrbanPop Rape cluster
## Alabama       13.2     236       58 21.2       2
## Alaska        10.0     263       48 44.5       3
## Arizona        8.1     294       80 31.0       1
## Arkansas       8.8     190       50 19.5       2
## California     9.0     276       91 40.6       1
## Colorado       7.9     204       78 38.7       1
## Connecticut    3.3     110       77 11.1       4
## Delaware       5.9     238       72 15.8       4
## Florida       15.4     335       80 31.9       1
## Georgia       17.4     211       60 25.8       2

Exportar csv

write.csv(asignacion,"datos_con_cluster.csv")

Visualizar Segmentos

#install.packages("factoextra")
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_cluster(segmentos, data = bd1,
             palette=c("red", "blue", "black", "darkgreen"),
             ellipse.type = "euclid",
             star.plot = T,
             repel = T,
             ggtheme = theme())
## Too few points to calculate an ellipse

Optimizar k

library(cluster)
#install.packages("data.table")
library(data.table)

set.seed(123)
optimizacion <- clusGap(bd1, FUN = kmeans, nstart = 25, K.max = 10, B = 50)
plot(optimizacion, xlab = "Numero de clusters k")

Conclusión

Esta herramienta nos sirve para agrupar los arrestos ocurridos en distintos estados de Estados Unidos. Se decicidió realizar 4 cluster. Aquells estados que estan mas proximos al eje, son los estados con los que mas crimenes cuentan (California, Nevada, New York, Arizona y Colorado).