Análisis de Conglomerado
A32-Análisis de Conglomerados - Comunidades Autónomas
UNIVERSIDAD DE EL SALVADOR
FACULTAD DE CIENCIAS ECONÓMICAS
ESCUELA DE ECONOMÍA
APLICACIONES DE CONGLOMERADOS
CÁTEDRA:
MÉTODOS PARA EL ANÁLISIS ECONÓMICO
DOCENTE:
MSF. CARLOS ADEMIR PEREZ ALAS
GRUPO TEÓRICO:
01
| PRESENTADO POR: | CARNET |
| BENJAMIN ISAAC CASTANEDA SOSA | CS223038 |
Ciudad Universitaria “Dr. Fabio Castillo Figueroa”
30 de noviembre
de 2025, San Salvador, El Salvador
Ejemplo de aplicacion del analisis de conglomerados
El objetivo de este epígrafe es ofrecer una visión integrada de los pasos que requiere la aplicación de un análisis de conglomerados, desde el establecimiento de los objetivos hasta la validación de los resultados.
Un plan de incentivos para vendedores
El director de ventas de una cadena de tiendas de electrodomésticos con implantación nacional está estudiando el plan de incentivos de sus vendedores. Considera que los incentivos deben estar ajustados a las dificultades de las distintas zonas de ventas, siendo necesario fijar incentivos más altos en aquellas zonas geográficas en que las condiciones de vida de sus habitantes hacen más difícil las ventas. Por este motivo quiere determinar si las comunidades autónomas se pueden segmentar en grupos homogéneos respecto al equipamiento de los hogares.
Para ello dispone de los datos que aparecen en el cuadro 3.22 y el objetivo es establecer cuántos grupos de comunidades autónomas con niveles de equipamiento similar pueden establecerse y en qué radican las diferencias entre esos grupos. El procedimiento que aplicaremos es el descrito en el tema, a saber:
| CC.AA. | Automovil | TV_color | Video | Microondas | Lavavajillas | Telefono |
|---|---|---|---|---|---|---|
| Espana | 69.0 | 97.6 | 62.4 | 32.3 | 17.0 | 85.2 |
| Andalucia | 66.7 | 98.0 | 82.7 | 24.1 | 12.7 | 74.7 |
| Aragon | 67.2 | 97.5 | 56.8 | 43.4 | 20.6 | 88.4 |
| Asturias | 63.7 | 95.2 | 52.1 | 24.4 | 13.3 | 88.1 |
| Baleares | 71.9 | 98.8 | 62.4 | 29.8 | 10.1 | 87.9 |
| Canarias | 72.7 | 96.8 | 68.4 | 27.9 | 5.8 | 75.4 |
| Cantabria | 63.4 | 94.9 | 48.9 | 36.5 | 11.2 | 80.5 |
| Castilla y Leon | 65.8 | 97.1 | 47.7 | 28.1 | 14.0 | 85.0 |
| Cast.-La Mancha | 61.5 | 97.3 | 53.6 | 21.7 | 7.1 | 72.9 |
| Cataluna | 70.4 | 98.1 | 71.1 | 36.8 | 19.8 | 92.2 |
| Com. Valenciana | 72.7 | 98.4 | 68.2 | 26.6 | 12.1 | 84.4 |
| Extremadura | 60.5 | 97.7 | 43.7 | 20.7 | 11.7 | 67.1 |
| Galicia | 65.5 | 91.3 | 42.7 | 13.5 | 14.6 | 85.9 |
| Madrid | 74.0 | 99.4 | 76.3 | 53.9 | 32.3 | 95.7 |
| Murcia | 69.0 | 98.7 | 59.3 | 19.5 | 12.1 | 81.4 |
| Navarra | 76.4 | 99.3 | 60.6 | 44.0 | 20.6 | 87.4 |
| Pais Vasco | 71.3 | 98.3 | 61.6 | 45.7 | 23.7 | 94.3 |
| La Rioja | 64.9 | 98.6 | 54.4 | 44.4 | 17.0 | 83.4 |
nombres_CC.AA. <- datos$CC.AA.
datos_numeros <- datos[, -1 ]
rownames(datos_numeros) <- nombres_CC.AA.
head(datos_numeros)## Automovil TV_color Video Microondas Lavavajillas Telefono
## Espana 69.0 97.6 62.4 32.3 17.0 85.2
## Andalucia 66.7 98.0 82.7 24.1 12.7 74.7
## Aragon 67.2 97.5 56.8 43.4 20.6 88.4
## Asturias 63.7 95.2 52.1 24.4 13.3 88.1
## Baleares 71.9 98.8 62.4 29.8 10.1 87.9
## Canarias 72.7 96.8 68.4 27.9 5.8 75.4
Calculo de Outliers
Análisis de la existencia de outliers en la medida en que pueden generar importantes distorsiones en la detección del número de grupos.
mean <- colMeans(datos_numeros)
Sx <- cov(datos_numeros)
D2 <- mahalanobis(datos_numeros, mean, Sx)
# p-values usando Chi-cuadrado con df = número de variables
p_values <- 1 - pchisq(D2, df = ncol(datos))
outliers <- data.frame(D2 = round(D2, 2),
p_value = round(p_values, 2))
outliers## D2 p_value
## Espana 0.20 1.00
## Andalucia 10.51 0.16
## Aragon 1.91 0.96
## Asturias 4.45 0.73
## Baleares 5.68 0.58
## Canarias 9.59 0.21
## Cantabria 7.18 0.41
## Castilla y Leon 2.20 0.95
## Cast.-La Mancha 3.51 0.83
## Cataluna 2.94 0.89
## Com. Valenciana 2.66 0.91
## Extremadura 10.51 0.16
## Galicia 13.13 0.07
## Madrid 8.35 0.30
## Murcia 4.86 0.68
## Navarra 7.66 0.36
## Pais Vasco 2.32 0.94
## La Rioja 4.33 0.74
Metodos Jerarquicos
Realización de un análisis de conglomerados jerárquicos, evaluando la solución de distintos métodos de conglomeración, aplicando los criterios presentados para identificar el número adecuado de grupos y obtención de los centroides que han de servir de partida para el paso siguiente.
Metodo del Centroide (Centroid)
# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])
# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)
# Calculo de la distancia euclidea al cuadrado (usando la matriz normalizada)
matriz.dis.euclid.norm2 <- (matriz.dis.euclid.norm)^2
# Efectuamos el cluster con metodo del centroi
hclust.centroid <- hclust(matriz.dis.euclid.norm2,method = "centroid")
# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.centroid, labels=datos$CC.AA.)# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.centroid[2:1])## height merge.1 merge.2
## 1 0.7213206 -5 -11
## 2 0.9981325 -3 -16
## 3 0.8440838 -18 2
## 4 1.3477855 -1 1
## 5 1.4105691 -4 -8
## 6 1.4669617 -15 4
## 7 1.7419417 -17 3
## 8 1.9805706 -9 -12
## 9 2.0811910 -7 5
## 10 2.1261648 -10 7
## 11 3.3880982 -2 -6
## 12 3.4251429 6 9
## 13 3.9690631 10 12
## 14 6.3547406 11 13
## 15 7.1515229 8 14
## 16 15.8956353 -13 15
## 17 19.0340756 -14 16
Metodo del Vecino mas Cercano (Single)
# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])
# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)
# Efectuamos el cluster con metodo single
hclust.single <- hclust(matriz.dis.euclid.norm,method = "single")
# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.single, labels=datos$CC.AA.)# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.single[2:1])## height merge.1 merge.2
## 1 0.8493059 -5 -11
## 2 0.9990658 -3 -16
## 3 1.0257279 -18 2
## 4 1.1196469 -17 3
## 5 1.1221288 -15 1
## 6 1.1496370 -1 5
## 7 1.1876738 -4 -8
## 8 1.3349580 4 6
## 9 1.3676542 -10 8
## 10 1.4073275 -9 -12
## 11 1.4955213 7 9
## 12 1.5368736 -6 10
## 13 1.5571732 -7 11
## 14 1.6477410 12 13
## 15 1.8406787 -2 14
## 16 2.1297820 -14 15
## 17 2.4227349 -13 16
Metedo de la Vinculacion Promedio (Average)
# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])
# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)
# Efectuamos el cluster con metodo average
hclust.average <- hclust(matriz.dis.euclid.norm,method = "average")
# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.average, labels=datos$CC.AA.)# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.average[2:1])## height merge.1 merge.2
## 1 0.8493059 -5 -11
## 2 0.9990658 -3 -16
## 3 1.0455731 -18 2
## 4 1.1876738 -4 -8
## 5 1.2319944 -15 1
## 6 1.3597377 -1 5
## 7 1.3676542 -10 -17
## 8 1.4073275 -9 -12
## 9 1.5553866 3 7
## 10 1.5600721 -7 4
## 11 1.8406787 -2 -6
## 12 2.1362102 6 10
## 13 2.4354069 9 12
## 14 2.7226739 8 11
## 15 3.0771236 13 14
## 16 4.3011415 -13 15
## 17 4.5882339 -14 16
Metodo del Vecino mas Lejano (Complete)
# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])
# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)
# Efectuamos el cluster con metodo complete
hclust.complete <- hclust(matriz.dis.euclid.norm,method = "complete")
# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.complete, labels=datos$CC.AA.)# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.complete[2:1])## height merge.1 merge.2
## 1 0.8493059 -5 -11
## 2 0.9990658 -3 -16
## 3 1.0654183 -18 2
## 4 1.1876738 -4 -8
## 5 1.3170292 -1 1
## 6 1.3676542 -10 -17
## 7 1.4073275 -9 -12
## 8 1.5629709 -7 4
## 9 1.6125469 -15 5
## 10 1.8406787 -2 -6
## 11 2.1030853 3 6
## 12 2.6649224 9 10
## 13 3.1882462 7 8
## 14 3.6529816 -14 11
## 15 3.7399077 12 13
## 16 5.3329276 -13 15
## 17 7.0481449 14 16
Metodo de Ward
# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])
# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)
# Calculo de la distancia euclidea al cuadrado (usando la matriz normalizada)
matriz.dis.euclid.norm2 <- (matriz.dis.euclid.norm)^2
# Efectuamos el cluster con metodo de Ward
hclust.ward.D2 <- hclust(matriz.dis.euclid.norm2,method = "ward.D2")
# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.ward.D2, labels=datos$CC.AA.)# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.ward.D2[2:1])## height merge.1 merge.2
## 1 0.7213206 -5 -11
## 2 0.9981325 -3 -16
## 3 1.1246648 -18 2
## 4 1.4105691 -4 -8
## 5 1.7311589 -1 1
## 6 1.8704781 -10 -17
## 7 1.9805706 -9 -12
## 8 2.2170506 -15 5
## 9 2.6897831 -7 4
## 10 3.3880982 -2 -6
## 11 3.9724878 3 6
## 12 7.7336488 8 10
## 13 10.3749303 7 9
## 14 11.2851019 -14 11
## 15 14.9208345 12 13
## 16 23.0495867 -13 15
## 17 43.2022438 14 16
Seleccion del Numero de Conglomerados de la Solucion
Metodo del Centroide (Centroid)
datos.NbClust <- datos_numeros
library(NbClust)
res <- NbClust(datos.NbClust,
distance = "euclidean",
min.nc = 2,
max.nc = 15,
method = "centroid",
index = "alllong")## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 6 proposed 2 as the best number of clusters
## * 3 proposed 3 as the best number of clusters
## * 3 proposed 4 as the best number of clusters
## * 5 proposed 5 as the best number of clusters
## * 1 proposed 7 as the best number of clusters
## * 2 proposed 13 as the best number of clusters
## * 8 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 15
##
##
## *******************************************************************
## KL CH Hartigan CCC Scott Marriot TrCovW TraceW
## 2 0.2235 4.2937 2.8811 12.5234 168.6204 3.248655e+15 825042.0502 4858.2106
## 3 0.5278 3.7256 2.5358 6.3228 192.1067 1.982538e+15 494773.1542 4116.8769
## 4 0.2463 3.4990 6.1711 5.0571 223.1054 6.297651e+14 382435.2595 3521.5400
## 5 11.6504 4.9435 0.4256 5.3892 248.4528 2.406744e+14 254825.3744 2444.1665
## 6 0.5876 3.8487 1.2685 4.0611 258.5897 1.973396e+14 243641.1592 2366.6815
## 7 0.1175 3.4446 9.1227 3.1329 271.8817 1.283528e+14 230558.4155 2140.4225
## 8 1.7975 6.0948 7.8420 4.8317 310.6980 1.940222e+13 81769.7539 1170.0562
## 9 5.7537 9.4458 1.6455 6.3465 364.3729 1.244849e+12 16130.2128 655.7873
## 10 0.6339 8.9904 2.6423 5.6311 379.4430 6.653166e+11 10243.4962 554.4190
## 11 1.1425 9.6496 2.6426 5.4025 418.5970 9.143792e+10 4649.9013 416.7650
## 12 0.8279 10.5637 4.4166 5.1720 499.1188 1.241413e+09 3423.1189 302.5483
## 13 2.9486 14.3162 1.6919 5.8770 975.1638 4.800000e-03 1843.1547 174.2683
## 14 0.8534 14.2535 2.3781 4.8274 NaN 0.000000e+00 1120.1709 130.2083
## 15 1.5532 15.9554 1.7852 4.0365 NaN 0.000000e+00 521.6202 81.6600
## Friedman Rubin Cindex DB Silhouette Duda Pseudot2 Beale
## 2 7.184240e+03 97.4761 0.5119 0.4550 0.3914 2.0302 -7.6115 -1.8302
## 3 8.341118e+03 115.0288 0.4860 0.5239 0.2786 2.2512 -7.7810 -1.9957
## 4 1.778822e+04 134.4751 0.4646 0.5493 0.1908 0.6941 5.7303 1.5747
## 5 2.091984e+04 193.7508 0.5680 0.5938 0.2755 136.2556 0.0000 0.0000
## 6 2.143490e+04 200.0942 0.5706 0.5162 0.3098 3.3323 -7.6990 -2.4684
## 7 2.214448e+04 221.2457 0.5648 0.5988 0.2388 0.5466 8.2933 2.9006
## 8 2.615942e+04 404.7321 0.5801 0.5989 0.3940 0.4710 6.7395 3.7042
## 9 8.972745e+04 722.1233 0.6721 0.5256 0.4966 36.4815 -0.9726 -1.8709
## 10 9.726547e+04 854.1542 0.6658 0.4729 0.5246 18.0014 -2.8333 -2.7252
## 11 1.823836e+05 1136.2742 0.6486 0.4498 0.5703 30.2342 -1.9338 -2.4800
## 12 5.460712e+05 1565.2352 0.5977 0.3905 0.6348 0.3520 3.6825 4.7226
## 13 4.297786e+15 2717.4145 0.7959 0.3900 0.6535 189.1116 0.0000 0.0000
## 14 -1.524036e+15 3636.9354 0.7737 0.3279 0.7357 60.4381 -0.9835 -1.8918
## 15 7.239803e+15 5799.1587 0.9363 0.2925 0.7630 380.6423 0.0000 0.0000
## Ratkowsky Ball Ptbiserial Gap Frey McClain Gamma Gplus Tau
## 2 0.2969 2429.1053 0.4857 -0.9273 3.3318 0.0737 0.7119 2.1765 10.7582
## 3 0.2833 1372.2923 0.5325 -1.6679 2.3601 0.1706 0.6778 4.1699 17.5425
## 4 0.3103 880.3850 0.5446 -2.0221 0.8445 0.2894 0.6516 5.7386 21.4641
## 5 0.3428 488.8333 0.6033 -0.7081 -3.0954 0.5619 0.7017 5.6993 26.8105
## 6 0.3155 394.4469 0.5875 -0.8993 8.6015 0.5840 0.6793 6.1307 25.9739
## 7 0.2985 305.7746 0.5168 -0.4308 0.5130 0.8345 0.6047 7.4183 22.6928
## 8 0.3059 146.2570 0.5050 0.7060 0.2928 1.9039 0.7721 3.0131 20.4183
## 9 0.3087 72.8653 0.4735 1.4007 2.0346 3.1152 0.9285 0.5948 15.4510
## 10 0.2952 55.4419 0.4481 1.6991 0.3671 3.5305 0.9178 0.6209 13.8693
## 11 0.2863 37.8877 0.4132 2.0534 0.3510 4.4354 0.9462 0.3203 11.2549
## 12 0.2760 25.2124 0.3781 2.4310 0.2220 5.5272 0.9497 0.2353 8.8758
## 13 0.2712 13.4053 0.3209 2.9959 0.5327 8.0266 0.9955 0.0131 5.7386
## 14 0.2631 9.3006 0.2955 3.4159 0.2725 9.5006 0.9919 0.0196 4.7974
## 15 0.2558 5.4440 0.2381 3.8199 0.5313 14.6238 1.0000 0.0000 2.9412
## Dunn Hubert SDindex Dindex SDbw
## 2 0.4347 4e-04 0.1792 15.0217 0.4231
## 3 0.3955 4e-04 0.1368 13.5193 0.2472
## 4 0.3454 4e-04 0.1293 12.1371 0.1643
## 5 0.4295 4e-04 0.1365 10.0901 0.1361
## 6 0.3851 4e-04 0.1983 9.3985 0.0882
## 7 0.3851 4e-04 0.1956 8.5891 0.0753
## 8 0.4868 4e-04 0.1968 6.2610 0.0668
## 9 0.7193 4e-04 0.1996 4.7092 0.0547
## 10 0.6166 4e-04 0.2013 4.0766 0.0371
## 11 0.6546 4e-04 0.2023 3.3988 0.0311
## 12 0.6546 4e-04 0.2060 2.7435 0.0219
## 13 0.9264 4e-04 0.2216 2.1651 0.0179
## 14 0.9156 5e-04 0.2630 1.6436 0.0109
## 15 1.0795 5e-04 0.2989 1.2287 0.0089
## KL CH Hartigan CCC Scott Marriot TrCovW
## Number_clusters 5.0000 15.0000 7.0000 2.0000 13.000 4.000000e+00 3.0
## Value_Index 11.6504 15.9554 7.8542 12.5234 476.045 9.636823e+14 330268.9
## TraceW Friedman Rubin Cindex DB Silhouette
## Number_clusters 5.0000 1.500000e+01 13.0000 4.0000 15.0000 15.000
## Value_Index 999.8885 8.763839e+15 -232.6584 0.4646 0.2925 0.763
## Duda PseudoT2 Beale Ratkowsky Ball PtBiserial Gap
## Number_clusters 2.0000 2.0000 2.0000 5.0000 3.000 5.0000 2.0000
## Value_Index 2.0302 -7.6115 -1.8302 0.3428 1056.813 0.6033 -0.9273
## Frey McClain Gamma Gplus Tau Dunn Hubert SDindex
## Number_clusters 3.0000 2.0000 15 15 5.0000 15.0000 0 4.0000
## Value_Index 2.3601 0.0737 1 0 26.8105 1.0795 0 0.1293
## Dindex SDbw
## Number_clusters 0 15.0000
## Value_Index 0 0.0089
## Espana Andalucia Aragon Asturias Baleares
## 1 2 3 4 5
## Canarias Cantabria Castilla y Leon Cast.-La Mancha Cataluna
## 6 7 4 8 9
## Com. Valenciana Extremadura Galicia Madrid Murcia
## 5 10 11 12 13
## Navarra Pais Vasco La Rioja
## 14 15 3
Metodo del Vecino mas Cercano (Single)
datos.NbClust <- datos_numeros
library(NbClust)
res <- NbClust(datos.NbClust,
distance = "euclidean",
min.nc = 2,
max.nc = 15,
method = "single",
index = "alllong")## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 5 proposed 2 as the best number of clusters
## * 4 proposed 3 as the best number of clusters
## * 3 proposed 4 as the best number of clusters
## * 3 proposed 5 as the best number of clusters
## * 1 proposed 7 as the best number of clusters
## * 1 proposed 9 as the best number of clusters
## * 1 proposed 10 as the best number of clusters
## * 2 proposed 13 as the best number of clusters
## * 8 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 15
##
##
## *******************************************************************
## KL CH Hartigan CCC Scott Marriot TrCovW TraceW
## 2 0.2235 4.2937 2.8811 12.5234 168.6204 3.248655e+15 825042.0502 4858.2106
## 3 0.5278 3.7256 2.5358 6.3228 192.1067 1.982538e+15 494773.1542 4116.8769
## 4 0.2463 3.4990 6.1711 5.0571 223.1054 6.297651e+14 382435.2595 3521.5400
## 5 1.2467 4.9435 5.9429 5.3892 248.4528 2.406744e+14 254825.3744 2444.1665
## 6 3.5975 6.4166 2.0863 5.8712 285.6188 4.396140e+13 72706.0555 1677.3640
## 7 10.2938 6.0725 0.6307 5.2509 296.4509 3.278027e+13 52441.8541 1428.9347
## 8 0.2416 5.0850 0.9110 4.0780 311.6154 1.843807e+13 49105.2847 1351.4497
## 9 0.0899 4.4717 11.5314 3.0276 326.4007 1.026337e+13 37645.3697 1238.6107
## 10 5.1822 9.1992 2.7537 5.7401 388.1173 4.109033e+11 13203.1042 542.9483
## 11 1.3200 9.9788 2.3453 5.5655 414.0897 1.174560e+11 7142.5814 403.9167
## 12 0.7269 10.5637 4.4166 5.1720 499.1188 1.241413e+09 3423.1189 302.5483
## 13 2.9486 14.3162 1.6919 5.8770 975.1638 4.800000e-03 1843.1547 174.2683
## 14 0.8534 14.2535 2.3781 4.8274 NaN 0.000000e+00 1120.1709 130.2083
## 15 1.5532 15.9554 1.7852 4.0365 NaN 0.000000e+00 521.6202 81.6600
## Friedman Rubin Cindex DB Silhouette Duda Pseudot2 Beale
## 2 7.184240e+03 97.4761 0.5119 0.4550 0.3914 2.0302 -7.6115 -1.8302
## 3 8.341118e+03 115.0288 0.4860 0.5239 0.2786 2.2512 -7.7810 -1.9957
## 4 1.778822e+04 134.4751 0.4646 0.5493 0.1908 0.6941 5.7303 1.5747
## 5 2.091984e+04 193.7508 0.5680 0.5938 0.2755 0.6760 5.2722 1.6903
## 6 2.646221e+04 282.3235 0.5644 0.8000 0.3187 8.7527 -3.5430 -2.7262
## 7 2.745848e+04 331.4072 0.5500 0.7152 0.3153 132.3449 0.0000 0.0000
## 8 2.969617e+04 350.4084 0.5538 0.6520 0.3626 17.3267 -2.8269 -2.7190
## 9 3.576658e+04 382.3310 0.5637 0.6405 0.3959 0.3184 10.7057 6.8647
## 10 1.012341e+05 872.1996 0.6368 0.5102 0.5305 26.7933 -1.9254 -2.4692
## 11 1.314616e+05 1172.4183 0.6271 0.4477 0.6012 45.5972 -0.9781 -1.8815
## 12 5.460712e+05 1565.2352 0.5977 0.3905 0.6348 0.3520 3.6825 4.7226
## 13 4.297786e+15 2717.4145 0.7959 0.3900 0.6535 189.1116 0.0000 0.0000
## 14 -1.524036e+15 3636.9354 0.7737 0.3279 0.7357 60.4381 -0.9835 -1.8918
## 15 7.239803e+15 5799.1587 0.9363 0.2925 0.7630 380.6423 0.0000 0.0000
## Ratkowsky Ball Ptbiserial Gap Frey McClain Gamma Gplus Tau
## 2 0.2969 2429.1053 0.4857 -0.2176 3.3318 0.0737 0.7119 2.1765 10.7582
## 3 0.2833 1372.2923 0.5325 -0.0308 2.3601 0.1706 0.6778 4.1699 17.5425
## 4 0.3103 880.3850 0.5446 -0.8947 0.8445 0.2894 0.6516 5.7386 21.4641
## 5 0.3428 488.8333 0.6033 -0.7081 1.2917 0.5619 0.7017 5.6993 26.8105
## 6 0.3372 279.5607 0.4830 -0.0344 1.0007 1.8087 0.6994 4.2157 19.6209
## 7 0.3203 204.1335 0.4588 0.4405 -1.2013 2.1591 0.7004 3.7908 17.7255
## 8 0.3016 168.9312 0.4448 0.6409 -1.6056 2.2719 0.6832 3.9150 16.8889
## 9 0.2869 137.6234 0.3968 0.8259 0.0902 2.7821 0.6267 4.1503 13.9346
## 10 0.2950 54.2948 0.4326 1.7200 0.3273 3.9228 0.9237 0.5163 12.4967
## 11 0.2861 36.7197 0.4047 2.1072 0.4248 4.7080 0.9456 0.3007 10.4575
## 12 0.2760 25.2124 0.3781 2.4310 0.2220 5.5272 0.9497 0.2353 8.8758
## 13 0.2712 13.4053 0.3209 2.9959 0.5327 8.0266 0.9955 0.0131 5.7386
## 14 0.2631 9.3006 0.2955 3.4159 0.2725 9.5006 0.9919 0.0196 4.7974
## 15 0.2558 5.4440 0.2381 3.8199 0.5313 14.6238 1.0000 0.0000 2.9412
## Dunn Hubert SDindex Dindex SDbw
## 2 0.4347 4e-04 0.1792 15.0217 0.4231
## 3 0.3955 4e-04 0.1368 13.5193 0.2472
## 4 0.3454 4e-04 0.1293 12.1371 0.1643
## 5 0.4295 4e-04 0.1365 10.0901 0.1361
## 6 0.4631 4e-04 0.1806 8.2565 0.1429
## 7 0.4522 4e-04 0.1661 7.4210 0.1096
## 8 0.4522 4e-04 0.2033 6.7294 0.0770
## 9 0.4484 4e-04 0.2188 6.1110 0.0667
## 10 0.7193 4e-04 0.2146 4.0908 0.0476
## 11 0.6956 4e-04 0.2070 3.3761 0.0349
## 12 0.6546 4e-04 0.2060 2.7435 0.0219
## 13 0.9264 4e-04 0.2216 2.1651 0.0179
## 14 0.9156 5e-04 0.2630 1.6436 0.0109
## 15 1.0795 5e-04 0.2989 1.2287 0.0089
## KL CH Hartigan CCC Scott Marriot TrCovW
## Number_clusters 7.0000 15.0000 9.0000 2.0000 13.000 4.000000e+00 3.0
## Value_Index 10.2938 15.9554 10.6204 12.5234 476.045 9.636823e+14 330268.9
## TraceW Friedman Rubin Cindex DB Silhouette
## Number_clusters 10.0000 1.500000e+01 13.0000 4.0000 15.0000 15.000
## Value_Index 556.6307 8.763839e+15 -232.6584 0.4646 0.2925 0.763
## Duda PseudoT2 Beale Ratkowsky Ball PtBiserial Gap
## Number_clusters 2.0000 2.0000 2.0000 5.0000 3.000 5.0000 3.0000
## Value_Index 2.0302 -7.6115 -1.8302 0.3428 1056.813 0.6033 -0.0308
## Frey McClain Gamma Gplus Tau Dunn Hubert SDindex
## Number_clusters 3.0000 2.0000 15 15 5.0000 15.0000 0 4.0000
## Value_Index 2.3601 0.0737 1 0 26.8105 1.0795 0 0.1293
## Dindex SDbw
## Number_clusters 0 15.0000
## Value_Index 0 0.0089
## Espana Andalucia Aragon Asturias Baleares
## 1 2 3 4 5
## Canarias Cantabria Castilla y Leon Cast.-La Mancha Cataluna
## 6 7 4 8 9
## Com. Valenciana Extremadura Galicia Madrid Murcia
## 5 10 11 12 13
## Navarra Pais Vasco La Rioja
## 14 15 3
Metodo de la Vinculacion Promedio (Average)
datos.NbClust <- datos_numeros
library(NbClust)
res <- NbClust(datos.NbClust,
distance = "euclidean",
min.nc = 2,
max.nc = 15,
method = "average",
index = "alllong")## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 6 proposed 2 as the best number of clusters
## * 6 proposed 3 as the best number of clusters
## * 5 proposed 4 as the best number of clusters
## * 2 proposed 5 as the best number of clusters
## * 1 proposed 13 as the best number of clusters
## * 8 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 15
##
##
## *******************************************************************
## KL CH Hartigan CCC Scott Marriot TrCovW TraceW
## 2 0.2235 4.2937 2.8811 12.5234 168.6204 3.248655e+15 825042.0502 4858.2106
## 3 0.0917 3.7256 9.8914 6.3228 192.1067 1.982538e+15 494773.1542 4116.8769
## 4 1.5379 6.9242 8.5128 6.9179 223.7130 6.088617e+14 316581.7207 2480.8983
## 5 2.5612 9.7306 4.1057 7.8188 250.1128 2.194711e+14 94432.8631 1542.7923
## 6 1.3804 10.2131 3.2677 7.7541 278.2104 6.634613e+13 59705.3285 1172.4890
## 7 1.2410 10.4254 2.8243 7.5500 325.6524 6.472241e+12 37665.4289 921.5423
## 8 1.3018 10.5762 2.3111 7.2751 347.6137 2.495555e+12 19413.0120 733.2723
## 9 1.1280 10.5136 2.1353 6.8487 372.4602 7.943109e+11 12325.3583 595.6183
## 10 1.0512 10.4889 2.1339 6.3671 388.4650 4.030405e+11 9665.7481 481.4017
## 11 0.7141 10.6500 3.5668 5.8828 438.7682 2.981569e+10 5880.1217 380.0333
## 12 1.6135 12.8052 2.6678 6.1278 479.4351 3.705384e+09 3890.4809 251.7533
## 13 1.7365 14.3162 1.6919 5.8770 975.1638 4.800000e-03 1843.1547 174.2683
## 14 0.8534 14.2535 2.3781 4.8274 NaN 0.000000e+00 1120.1709 130.2083
## 15 1.5532 15.9554 1.7852 4.0365 NaN 0.000000e+00 521.6202 81.6600
## Friedman Rubin Cindex DB Silhouette Duda Pseudot2 Beale
## 2 7.184240e+03 97.4761 0.5119 0.4550 0.3914 2.0302 -7.6115 -1.8302
## 3 8.341118e+03 115.0288 0.4860 0.5239 0.2786 0.6026 9.2320 2.3679
## 4 9.047014e+03 190.8822 0.5390 0.8181 0.2924 0.4332 10.4655 4.4738
## 5 9.227215e+03 306.9495 0.5783 0.7573 0.3542 0.5515 3.2527 2.5028
## 6 1.295054e+04 403.8923 0.5135 0.7595 0.3774 15.7979 -1.8734 -2.4025
## 7 6.212024e+04 513.8769 0.5738 0.6398 0.4196 12.7845 -2.7653 -2.6598
## 8 7.733520e+04 645.8164 0.6741 0.5886 0.4365 18.0014 -2.8333 -2.7252
## 9 1.022427e+05 795.0717 0.6712 0.5678 0.4798 24.9376 -1.9198 -2.4620
## 10 1.073772e+05 983.7093 0.6373 0.4953 0.5226 45.5972 -0.9781 -1.8815
## 11 2.475246e+05 1246.0994 0.6125 0.4426 0.5532 0.3520 3.6825 4.7226
## 12 2.914964e+05 1881.0448 0.6942 0.4428 0.5719 132.3449 0.0000 0.0000
## 13 4.297786e+15 2717.4145 0.7959 0.3900 0.6535 189.1116 0.0000 0.0000
## 14 -1.524036e+15 3636.9354 0.7737 0.3279 0.7357 60.4381 -0.9835 -1.8918
## 15 7.239803e+15 5799.1587 0.9363 0.2925 0.7630 380.6423 0.0000 0.0000
## Ratkowsky Ball Ptbiserial Gap Frey McClain Gamma Gplus Tau
## 2 0.2969 2429.1053 0.4857 -0.9273 3.3318 0.0737 0.7119 2.1765 10.7582
## 3 0.2833 1372.2923 0.5325 -1.6679 1.4851 0.1706 0.6778 4.1699 17.5425
## 4 0.3713 620.2246 0.5393 -1.6718 0.5140 0.9443 0.6667 6.0784 24.3137
## 5 0.3661 308.5585 0.5229 -0.2480 0.3129 1.8045 0.7995 2.7059 21.5817
## 6 0.3588 195.4148 0.5138 -0.1877 0.2794 2.2764 0.8777 1.3595 19.5163
## 7 0.3446 131.6489 0.5078 0.4653 0.3942 2.5023 0.9134 0.8758 18.4837
## 8 0.3255 91.6590 0.4864 0.9314 0.4962 2.9337 0.9368 0.5490 16.2876
## 9 0.3123 66.1798 0.4522 1.2920 0.5500 3.5970 0.9544 0.3268 13.6732
## 10 0.2984 48.1402 0.4182 1.6655 0.5094 4.3622 0.9495 0.3007 11.2941
## 11 0.2867 34.5485 0.3920 2.1457 0.2885 5.0816 0.9513 0.2484 9.7124
## 12 0.2808 20.9794 0.3354 2.6181 0.1535 7.2843 0.9843 0.0523 6.5752
## 13 0.2712 13.4053 0.3209 2.9959 0.5327 8.0266 0.9955 0.0131 5.7386
## 14 0.2631 9.3006 0.2955 3.3297 0.2725 9.5006 0.9919 0.0196 4.7974
## 15 0.2558 5.4440 0.2381 3.8199 0.5313 14.6238 1.0000 0.0000 2.9412
## Dunn Hubert SDindex Dindex SDbw
## 2 0.4347 4e-04 0.1792 15.0217 0.4231
## 3 0.3955 4e-04 0.1368 13.5193 0.2472
## 4 0.3797 3e-04 0.1662 10.6044 0.2385
## 5 0.4908 3e-04 0.1633 8.3185 0.1718
## 6 0.4908 4e-04 0.1815 7.1479 0.1606
## 7 0.5770 4e-04 0.1633 6.2288 0.1033
## 8 0.7193 4e-04 0.1691 5.4008 0.0804
## 9 0.6956 4e-04 0.1940 4.7231 0.0682
## 10 0.6956 4e-04 0.2079 4.0677 0.0535
## 11 0.6546 4e-04 0.2054 3.4351 0.0376
## 12 0.7629 4e-04 0.2208 2.8567 0.0319
## 13 0.9264 4e-04 0.2216 2.1651 0.0179
## 14 0.9156 5e-04 0.2630 1.6436 0.0109
## 15 1.0795 5e-04 0.2989 1.2287 0.0089
## KL CH Hartigan CCC Scott Marriot TrCovW
## Number_clusters 5.0000 15.0000 3.0000 2.0000 13.0000 4.000000e+00 3.0
## Value_Index 2.5612 15.9554 7.0103 12.5234 495.7287 9.842859e+14 330268.9
## TraceW Friedman Rubin Cindex DB Silhouette Duda
## Number_clusters 4.0000 1.500000e+01 5.0000 3.000 15.0000 15.000 2.0000
## Value_Index 697.8725 8.763839e+15 -19.1244 0.486 0.2925 0.763 2.0302
## PseudoT2 Beale Ratkowsky Ball PtBiserial Gap Frey
## Number_clusters 2.0000 2.0000 4.0000 3.000 4.0000 2.0000 3.0000
## Value_Index -7.6115 -1.8302 0.3713 1056.813 0.5393 -0.9273 1.4851
## McClain Gamma Gplus Tau Dunn Hubert SDindex Dindex
## Number_clusters 2.0000 15 15 4.0000 15.0000 0 3.0000 0
## Value_Index 0.0737 1 0 24.3137 1.0795 0 0.1368 0
## SDbw
## Number_clusters 15.0000
## Value_Index 0.0089
## Espana Andalucia Aragon Asturias Baleares
## 1 2 3 4 5
## Canarias Cantabria Castilla y Leon Cast.-La Mancha Cataluna
## 6 7 4 8 9
## Com. Valenciana Extremadura Galicia Madrid Murcia
## 5 10 11 12 13
## Navarra Pais Vasco La Rioja
## 14 15 3
Metodo del Vecino mas Lejano (Complete)
datos.NbClust <- datos_numeros
library(NbClust)
res <- NbClust(datos.NbClust,
distance = "euclidean",
min.nc = 2,
max.nc = 15,
method = "complete",
index = "alllong")## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 6 proposed 2 as the best number of clusters
## * 9 proposed 3 as the best number of clusters
## * 1 proposed 4 as the best number of clusters
## * 1 proposed 5 as the best number of clusters
## * 1 proposed 7 as the best number of clusters
## * 1 proposed 13 as the best number of clusters
## * 8 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 3
##
##
## *******************************************************************
## KL CH Hartigan CCC Scott Marriot TrCovW TraceW
## 2 1.3012 10.9000 8.6508 13.6679 182.5944 1.494675e+15 611775.8159 3665.1000
## 3 2.9820 11.9269 3.6759 8.7521 207.2198 8.562089e+14 195289.0313 2378.8967
## 4 1.0079 10.3835 3.3382 8.3053 226.6385 5.175283e+14 162881.5552 1910.6633
## 5 0.8999 9.7306 3.5705 7.8188 250.1128 2.194711e+14 94432.8631 1542.7923
## 6 1.0051 9.8184 3.7609 7.5869 269.1563 1.097157e+14 73413.6271 1210.3640
## 7 1.4228 10.4254 2.9240 7.5500 325.6524 6.472241e+12 37665.4289 921.5423
## 8 1.3455 10.6628 2.3317 7.3127 350.9599 2.072206e+12 26403.8461 728.0190
## 9 1.0311 10.6172 2.3692 6.8950 375.7021 6.633938e+11 18843.2293 590.3650
## 10 1.1463 10.8313 2.2159 6.5216 397.8561 2.392023e+11 10044.6904 467.3383
## 11 0.8122 11.0861 3.1758 6.0791 439.9275 2.795598e+10 5744.7480 365.9700
## 12 1.4257 12.8052 2.6678 6.1278 479.4351 3.705384e+09 3890.4809 251.7533
## 13 1.5606 14.3162 1.9308 5.8770 975.1638 4.800000e-03 1843.1547 174.2683
## 14 1.0507 14.7733 2.1582 5.0097 NaN 0.000000e+00 1102.8885 125.7200
## 15 1.4037 15.9554 1.7852 4.0365 NaN 0.000000e+00 521.6202 81.6600
## Friedman Rubin Cindex DB Silhouette Duda Pseudot2 Beale
## 2 7.128190e+03 129.2077 0.3657 1.0902 0.3166 0.5526 8.0957 2.8315
## 3 8.070602e+03 199.0668 0.3854 1.0544 0.3148 4.2423 -3.0571 -2.3523
## 4 8.208753e+03 247.8507 0.6030 0.8711 0.3405 8.6970 -3.5401 -2.7240
## 5 9.227215e+03 306.9495 0.5783 0.7573 0.3542 0.5974 2.6958 2.0743
## 6 9.902408e+03 391.2536 0.5893 0.8123 0.3394 16.1629 -0.9381 -1.8046
## 7 6.212024e+04 513.8769 0.5738 0.6398 0.4196 0.4989 3.0129 2.8979
## 8 6.412791e+04 650.4766 0.6827 0.6672 0.3955 18.0014 -2.8333 -2.7252
## 9 9.316051e+04 802.1466 0.7310 0.6410 0.4244 28.6303 -0.9651 -1.8565
## 10 1.043022e+05 1013.3115 0.7002 0.5778 0.4763 45.5972 -0.9781 -1.8815
## 11 2.475876e+05 1293.9839 0.6812 0.5080 0.5173 24.9376 -1.9198 -2.4620
## 12 2.914964e+05 1881.0448 0.6942 0.4428 0.5719 132.3449 0.0000 0.0000
## 13 4.297786e+15 2717.4145 0.7959 0.3900 0.6535 60.4381 -0.9835 -1.8918
## 14 -2.935833e+15 3766.7778 0.8198 0.3630 0.6808 189.1116 0.0000 0.0000
## 15 7.239803e+15 5799.1587 0.9363 0.2925 0.7630 380.6423 0.0000 0.0000
## Ratkowsky Ball Ptbiserial Gap Frey McClain Gamma Gplus Tau
## 2 0.3962 1832.5500 0.4523 -0.2396 0.4522 0.6085 0.5288 8.9804 20.1569
## 3 0.4334 792.9656 0.4997 -0.7762 0.1456 1.4310 0.6790 5.0980 21.5686
## 4 0.3964 477.6658 0.5184 -1.5993 0.2665 1.5916 0.7442 3.7778 21.9869
## 5 0.3661 308.5585 0.5229 -0.3333 0.3838 1.8045 0.7995 2.7059 21.5817
## 6 0.3448 201.7273 0.5022 -0.1940 0.1217 2.3924 0.8698 1.4052 18.7712
## 7 0.3446 131.6489 0.5078 0.4653 0.8384 2.5023 0.9134 0.8758 18.4837
## 8 0.3306 91.0024 0.4532 0.6569 0.4407 3.3774 0.9193 0.6405 14.6013
## 9 0.3171 65.5961 0.4176 1.0810 0.3187 4.2355 0.9424 0.3660 11.9869
## 10 0.3032 46.7338 0.3987 1.5282 0.4692 4.8067 0.9527 0.2614 10.5359
## 11 0.2913 33.2700 0.3716 2.0658 0.1959 5.6715 0.9580 0.1961 8.9542
## 12 0.2808 20.9794 0.3354 2.6181 0.1535 7.2843 0.9843 0.0523 6.5752
## 13 0.2712 13.4053 0.3209 2.9959 0.4455 8.0266 0.9955 0.0131 5.7386
## 14 0.2630 8.9800 0.2679 3.3648 0.2107 11.5905 0.9966 0.0065 3.8824
## 15 0.2558 5.4440 0.2381 3.8199 0.5313 14.6238 1.0000 0.0000 2.9412
## Dunn Hubert SDindex Dindex SDbw
## 2 0.2825 2e-04 0.2613 13.2067 1.1117
## 3 0.3808 2e-04 0.2131 10.7759 0.7485
## 4 0.4802 3e-04 0.1882 9.4769 0.2645
## 5 0.4908 3e-04 0.1633 8.3185 0.1718
## 6 0.5598 4e-04 0.2008 7.3436 0.1611
## 7 0.5770 4e-04 0.1633 6.2288 0.1033
## 8 0.5566 4e-04 0.2023 5.5272 0.0876
## 9 0.6357 4e-04 0.2016 4.8495 0.0746
## 10 0.6418 4e-04 0.2205 4.1448 0.0583
## 11 0.6506 4e-04 0.2186 3.5121 0.0420
## 12 0.7629 4e-04 0.2208 2.8567 0.0319
## 13 0.9264 4e-04 0.2216 2.1651 0.0179
## 14 0.8976 4e-04 0.3025 1.7502 0.0151
## 15 1.0795 5e-04 0.2989 1.2287 0.0089
## KL CH Hartigan CCC Scott Marriot TrCovW
## Number_clusters 3.000 15.0000 3.0000 2.0000 13.0000 3.000000e+00 3.0
## Value_Index 2.982 15.9554 4.9748 13.6679 495.7287 2.997856e+14 416486.8
## TraceW Friedman Rubin Cindex DB Silhouette Duda
## Number_clusters 3.00 1.500000e+01 3.0000 2.0000 15.0000 15.000 2.0000
## Value_Index 817.97 1.017564e+16 -21.0751 0.3657 0.2925 0.763 0.5526
## PseudoT2 Beale Ratkowsky Ball PtBiserial Gap Frey
## Number_clusters 2.0000 3.0000 3.0000 3.000 5.0000 2.0000 1
## Value_Index 8.0957 -2.3523 0.4334 1039.584 0.5229 -0.2396 NA
## McClain Gamma Gplus Tau Dunn Hubert SDindex Dindex
## Number_clusters 2.0000 15 15 4.0000 15.0000 0 7.0000 0
## Value_Index 0.6085 1 0 21.9869 1.0795 0 0.1633 0
## SDbw
## Number_clusters 15.0000
## Value_Index 0.0089
## Espana Andalucia Aragon Asturias Baleares
## 1 1 2 3 1
## Canarias Cantabria Castilla y Leon Cast.-La Mancha Cataluna
## 1 3 3 3 2
## Com. Valenciana Extremadura Galicia Madrid Murcia
## 1 3 3 2 1
## Navarra Pais Vasco La Rioja
## 2 2 2
Metodo Ward
datos.NbClust <- datos_numeros
library(NbClust)
res <- NbClust(datos.NbClust,
distance = "euclidean",
min.nc = 2,
max.nc = 15,
method = "ward.D2",
index = "alllong")## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 6 proposed 2 as the best number of clusters
## * 9 proposed 3 as the best number of clusters
## * 2 proposed 4 as the best number of clusters
## * 1 proposed 7 as the best number of clusters
## * 1 proposed 13 as the best number of clusters
## * 8 proposed 15 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 3
##
##
## *******************************************************************
## KL CH Hartigan CCC Scott Marriot TrCovW TraceW
## 2 1.3012 10.9000 8.6508 13.6679 182.5944 1.494675e+15 611775.8159 3665.1000
## 3 2.9820 11.9269 3.6759 8.7521 207.2198 8.562089e+14 195289.0313 2378.8967
## 4 0.9974 10.3835 3.3656 8.3053 226.6385 5.175283e+14 162881.5552 1910.6633
## 5 0.7926 9.7511 4.0788 7.8271 248.5619 2.392200e+14 115553.4337 1540.3600
## 6 1.3691 10.2131 3.2677 7.7541 278.2104 6.634613e+13 59705.3285 1172.4890
## 7 1.1985 10.4254 2.9240 7.5500 325.6524 6.472241e+12 37665.4289 921.5423
## 8 1.3455 10.6628 2.3317 7.3127 350.9599 2.072206e+12 26403.8461 728.0190
## 9 1.0311 10.6172 2.3692 6.8950 375.7021 6.633938e+11 18843.2293 590.3650
## 10 0.9984 10.8313 2.5876 6.5216 397.8561 2.392023e+11 10044.6904 467.3383
## 11 1.0601 11.5149 2.8185 6.2651 427.0366 5.721375e+10 7213.7754 353.1217
## 12 1.2542 12.8052 2.6678 6.1278 479.4351 3.705384e+09 3890.4809 251.7533
## 13 1.5606 14.3162 1.9308 5.8770 975.1638 4.800000e-03 1843.1547 174.2683
## 14 1.0507 14.7733 2.1582 5.0097 NaN 0.000000e+00 1102.8885 125.7200
## 15 1.4037 15.9554 1.7852 4.0365 NaN 0.000000e+00 521.6202 81.6600
## Friedman Rubin Cindex DB Silhouette Duda Pseudot2 Beale
## 2 7.128190e+03 129.2077 0.3657 1.0902 0.3166 0.5526 8.0957 2.8315
## 3 8.070602e+03 199.0668 0.3854 1.0544 0.3148 4.2423 -3.0571 -2.3523
## 4 8.208753e+03 247.8507 0.6030 0.8711 0.3405 0.5515 3.2527 2.5028
## 5 1.184051e+04 307.4342 0.5589 0.8463 0.3420 8.6970 -3.5401 -2.7240
## 6 1.295054e+04 403.8923 0.5135 0.7595 0.3774 15.7979 -1.8734 -2.4025
## 7 6.212024e+04 513.8769 0.5738 0.6398 0.4196 0.4989 3.0129 2.8979
## 8 6.412791e+04 650.4766 0.6827 0.6672 0.3955 18.0014 -2.8333 -2.7252
## 9 9.316051e+04 802.1466 0.7310 0.6410 0.4244 28.6303 -0.9651 -1.8565
## 10 1.043022e+05 1013.3115 0.7002 0.5778 0.4763 24.9376 -1.9198 -2.4620
## 11 1.158825e+05 1341.0655 0.6357 0.5084 0.5191 45.5972 -0.9781 -1.8815
## 12 2.914964e+05 1881.0448 0.6942 0.4428 0.5719 132.3449 0.0000 0.0000
## 13 4.297786e+15 2717.4145 0.7959 0.3900 0.6535 60.4381 -0.9835 -1.8918
## 14 -2.935833e+15 3766.7778 0.8198 0.3630 0.6808 189.1116 0.0000 0.0000
## 15 7.239803e+15 5799.1587 0.9363 0.2925 0.7630 380.6423 0.0000 0.0000
## Ratkowsky Ball Ptbiserial Gap Frey McClain Gamma Gplus Tau
## 2 0.3962 1832.5500 0.4523 -0.6455 0.4522 0.6085 0.5288 8.9804 20.1569
## 3 0.4334 792.9656 0.4997 -1.1194 0.1456 1.4310 0.6790 5.0980 21.5686
## 4 0.3964 477.6658 0.5184 -1.4106 0.4834 1.5916 0.7442 3.7778 21.9869
## 5 0.3823 308.0720 0.5017 -1.6984 0.1500 2.0221 0.7872 2.6928 19.9216
## 6 0.3588 195.4148 0.5138 -0.1877 0.2794 2.2764 0.8777 1.3595 19.5163
## 7 0.3446 131.6489 0.5078 0.4653 0.8384 2.5023 0.9134 0.8758 18.4837
## 8 0.3306 91.0024 0.4532 0.9386 0.4407 3.3774 0.9193 0.6405 14.6013
## 9 0.3171 65.5961 0.4176 1.1003 0.3187 4.2355 0.9424 0.3660 11.9869
## 10 0.3032 46.7338 0.3987 1.5413 0.2980 4.8067 0.9527 0.2614 10.5359
## 11 0.2911 32.1020 0.3631 2.1089 0.2182 6.0607 0.9630 0.1569 8.1569
## 12 0.2808 20.9794 0.3354 2.5204 0.1535 7.2843 0.9843 0.0523 6.5752
## 13 0.2712 13.4053 0.3209 2.9959 0.4455 8.0266 0.9955 0.0131 5.7386
## 14 0.2630 8.9800 0.2679 3.3648 0.2107 11.5905 0.9966 0.0065 3.8824
## 15 0.2558 5.4440 0.2381 3.8199 0.5313 14.6238 1.0000 0.0000 2.9412
## Dunn Hubert SDindex Dindex SDbw
## 2 0.2825 2e-04 0.2613 13.2067 1.1117
## 3 0.3808 2e-04 0.2131 10.7759 0.7485
## 4 0.4802 3e-04 0.1882 9.4769 0.2645
## 5 0.4802 3e-04 0.2116 8.3063 0.2324
## 6 0.4908 4e-04 0.1815 7.1479 0.1606
## 7 0.5770 4e-04 0.1633 6.2288 0.1033
## 8 0.5566 4e-04 0.2023 5.5272 0.0876
## 9 0.6357 4e-04 0.2016 4.8495 0.0746
## 10 0.6418 4e-04 0.2205 4.1448 0.0583
## 11 0.6418 4e-04 0.2227 3.4894 0.0458
## 12 0.7629 4e-04 0.2208 2.8567 0.0319
## 13 0.9264 4e-04 0.2216 2.1651 0.0179
## 14 0.8976 4e-04 0.3025 1.7502 0.0151
## 15 1.0795 5e-04 0.2989 1.2287 0.0089
## KL CH Hartigan CCC Scott Marriot TrCovW
## Number_clusters 3.000 15.0000 3.0000 2.0000 13.0000 3.000000e+00 3.0
## Value_Index 2.982 15.9554 4.9748 13.6679 495.7287 2.997856e+14 416486.8
## TraceW Friedman Rubin Cindex DB Silhouette Duda
## Number_clusters 3.00 1.500000e+01 3.0000 2.0000 15.0000 15.000 2.0000
## Value_Index 817.97 1.017564e+16 -21.0751 0.3657 0.2925 0.763 0.5526
## PseudoT2 Beale Ratkowsky Ball PtBiserial Gap Frey
## Number_clusters 2.0000 3.0000 3.0000 3.000 4.0000 2.0000 1
## Value_Index 8.0957 -2.3523 0.4334 1039.584 0.5184 -0.6455 NA
## McClain Gamma Gplus Tau Dunn Hubert SDindex Dindex
## Number_clusters 2.0000 15 15 4.0000 15.0000 0 7.0000 0
## Value_Index 0.6085 1 0 21.9869 1.0795 0 0.1633 0
## SDbw
## Number_clusters 15.0000
## Value_Index 0.0089
## Espana Andalucia Aragon Asturias Baleares
## 1 1 2 3 1
## Canarias Cantabria Castilla y Leon Cast.-La Mancha Cataluna
## 1 3 3 3 2
## Com. Valenciana Extremadura Galicia Madrid Murcia
## 1 3 3 2 1
## Navarra Pais Vasco La Rioja
## 2 2 2
###$ Calculo de Centroides
grupo.ward <- cutree(hclust.ward.D2, k = 2, h = NULL)
datos.grupo <- cbind(datos,grupo.ward)
datos.grupo$CC.AA. <- NULL
round(aggregate(datos.grupo,list(grupo.ward),mean),2) ## Group.1 Automovil TV_color Video Microondas Lavavajillas Telefono grupo.ward
## 1 1 66.87 96.82 57.68 25.42 11.81 80.71 1
## 2 2 70.70 98.53 63.47 44.70 22.33 90.23 2
Metodos No Jerarquicos
Realización de un análisis de conglomerados no jerárquico mediante el método de k-medias para la obtención de una solución óptima en términos de homogeneidad intrasegmentos y heterogeneidad intersegmentos.
c1 <- c(66.87,96.82,56.01,25.43,11.81,80.71)
c2 <- c(70.70,98.53,63.47,44.70,22.43,90.23)
solucion <- kmeans(datos_numeros,rbind(c1,c2))
solucion## K-means clustering with 2 clusters of sizes 12, 6
##
## Cluster means:
## Automovil TV_color Video Microondas Lavavajillas Telefono
## 1 66.86667 96.81667 57.67500 25.425 11.80833 80.70833
## 2 70.70000 98.53333 63.46667 44.700 22.33333 90.23333
##
## Clustering vector:
## Espana Andalucia Aragon Asturias Baleares
## 1 1 2 1 1
## Canarias Cantabria Castilla y Leon Cast.-La Mancha Cataluna
## 1 1 1 1 2
## Com. Valenciana Extremadura Galicia Madrid Murcia
## 1 1 1 2 1
## Navarra Pais Vasco La Rioja
## 2 2 2
##
## Within cluster sum of squares by cluster:
## [1] 2810.6467 854.4533
## (between_SS / total_SS = 40.5 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
plot(hclust.ward.D2,labels=datos$CC.AA.)
grupos1 <- cutree(hclust.ward.D2, k = 2)
rect.hclust(hclust.ward.D2, k = 2, border = "red")plot(hclust.average,labels=datos$CC.AA.)
grupos1 <- cutree(hclust.average, k = 2)
rect.hclust(hclust.average, k = 2, border = "red")plot(hclust.complete,labels=datos$CC.AA.)
grupos1 <- cutree(hclust.complete, k = 2)
rect.hclust(hclust.complete, k = 2, border = "red")Pruebas T para cada variable
solucion.cluster <- solucion$cluster
t1<-t.test(Automovil~solucion.cluster, data=datos_numeros)
t2<-t.test(TV_color~solucion.cluster, data=datos_numeros)
t3<-t.test(Video~solucion.cluster, data=datos_numeros)
t4<-t.test(Microondas~solucion.cluster, data=datos_numeros)
t5<-t.test(Lavavajillas~solucion.cluster, data=datos_numeros)
t6<-t.test(Telefono~solucion.cluster, data=datos_numeros)variables <- c("Automovil", "TV_color", "Video",
"Microondas", "Lavavajillas", "Telefono")
tabla <- data.frame(
Variable = variables,
Grupo1 = NA,
Grupo2 = NA,
t = NA
)
for (i in 1:length(variables)) {
var <- variables[i]
# medias de cada grupo
m1 <- mean(datos_numeros[solucion.cluster == 1, var], na.rm = TRUE)
m2 <- mean(datos_numeros[solucion.cluster == 2, var], na.rm = TRUE)
# prueba t
tt <- t.test(datos_numeros[, var] ~ solucion.cluster)
# guardar valores
tabla$Grupo1[i] <- round(m1, 2)
tabla$Grupo2[i] <- round(m2, 2)
tabla$t[i] <- round(tt$statistic, 2)
}
tabla## Variable Grupo1 Grupo2 t
## 1 Automovil 66.87 70.70 -1.81
## 2 TV_color 96.82 98.53 -2.52
## 3 Video 57.68 63.47 -1.19
## 4 Microondas 25.42 44.70 -6.73
## 5 Lavavajillas 11.81 22.33 -4.48
## 6 Telefono 80.71 90.23 -3.51
library(cluster)
library(factoextra)
library(ggplot2) # Necesitas esta librería para el diseño moderno
# El objeto 'solucion' debe ser el resultado de la función kmeans()
# o el algoritmo de clustering que hayas usado.
fviz_cluster(
# 1. Objeto de clustering y datos originales
solucion,
data = datos_numeros,
# 2. Diseño y Estética
geom = c("point", "text"), # Muestra puntos y etiquetas
ellipse.type = "convex", # Tipo de sombreado del clúster (más moderno)
repel = TRUE, # Evita que las etiquetas se superpongan
# 3. Colores y Títulos
palette = "Set2", # Puedes cambiar la paleta de colores (ej: "Dark2", "cosmic", "jco")
ggtheme = theme_minimal(), # Aplica un tema limpio de ggplot2
main = "Clusters K-means",
xlab = "Componente Principal 1", # Mejores nombres para las componentes
ylab = "Componente Principal 2"
)