Análisis de Conglomerado

A32-Análisis de Conglomerados - Comunidades Autónomas

UNIVERSIDAD DE EL SALVADOR

FACULTAD DE CIENCIAS ECONÓMICAS

ESCUELA DE ECONOMÍA

APLICACIONES DE CONGLOMERADOS

CÁTEDRA:
MÉTODOS PARA EL ANÁLISIS ECONÓMICO

DOCENTE:
MSF. CARLOS ADEMIR PEREZ ALAS

GRUPO TEÓRICO:
01

PRESENTADO POR:	CARNET
BENJAMIN ISAAC CASTANEDA SOSA	CS223038

Ciudad Universitaria “Dr. Fabio Castillo Figueroa”
30 de noviembre de 2025, San Salvador, El Salvador

Ejemplo de aplicacion del analisis de conglomerados

El objetivo de este epígrafe es ofrecer una visión integrada de los pasos que requiere la aplicación de un análisis de conglomerados, desde el establecimiento de los objetivos hasta la validación de los resultados.

Un plan de incentivos para vendedores

El director de ventas de una cadena de tiendas de electrodomésticos con implantación nacional está estudiando el plan de incentivos de sus vendedores. Considera que los incentivos deben estar ajustados a las dificultades de las distintas zonas de ventas, siendo necesario fijar incentivos más altos en aquellas zonas geográficas en que las condiciones de vida de sus habitantes hacen más difícil las ventas. Por este motivo quiere determinar si las comunidades autónomas se pueden segmentar en grupos homogéneos respecto al equipamiento de los hogares.

Para ello dispone de los datos que aparecen en el cuadro 3.22 y el objetivo es establecer cuántos grupos de comunidades autónomas con niveles de equipamiento similar pueden establecerse y en qué radican las diferencias entre esos grupos. El procedimiento que aplicaremos es el descrito en el tema, a saber:

Equipamento de los hogares en distintas comunidades autonomas
	Porcentaje de hogares que poseen
CC.AA.	Automovil	TV_color	Video	Microondas	Lavavajillas	Telefono
Espana	69.0	97.6	62.4	32.3	17.0	85.2
Andalucia	66.7	98.0	82.7	24.1	12.7	74.7
Aragon	67.2	97.5	56.8	43.4	20.6	88.4
Asturias	63.7	95.2	52.1	24.4	13.3	88.1
Baleares	71.9	98.8	62.4	29.8	10.1	87.9
Canarias	72.7	96.8	68.4	27.9	5.8	75.4
Cantabria	63.4	94.9	48.9	36.5	11.2	80.5
Castilla y Leon	65.8	97.1	47.7	28.1	14.0	85.0
Cast.-La Mancha	61.5	97.3	53.6	21.7	7.1	72.9
Cataluna	70.4	98.1	71.1	36.8	19.8	92.2
Com. Valenciana	72.7	98.4	68.2	26.6	12.1	84.4
Extremadura	60.5	97.7	43.7	20.7	11.7	67.1
Galicia	65.5	91.3	42.7	13.5	14.6	85.9
Madrid	74.0	99.4	76.3	53.9	32.3	95.7
Murcia	69.0	98.7	59.3	19.5	12.1	81.4
Navarra	76.4	99.3	60.6	44.0	20.6	87.4
Pais Vasco	71.3	98.3	61.6	45.7	23.7	94.3
La Rioja	64.9	98.6	54.4	44.4	17.0	83.4

nombres_CC.AA. <-  datos$CC.AA.
datos_numeros <- datos[, -1 ]
rownames(datos_numeros)  <- nombres_CC.AA.
head(datos_numeros)

##           Automovil TV_color Video Microondas Lavavajillas Telefono
## Espana         69.0     97.6  62.4       32.3         17.0     85.2
## Andalucia      66.7     98.0  82.7       24.1         12.7     74.7
## Aragon         67.2     97.5  56.8       43.4         20.6     88.4
## Asturias       63.7     95.2  52.1       24.4         13.3     88.1
## Baleares       71.9     98.8  62.4       29.8         10.1     87.9
## Canarias       72.7     96.8  68.4       27.9          5.8     75.4

Calculo de Outliers

Análisis de la existencia de outliers en la medida en que pueden generar importantes distorsiones en la detección del número de grupos.

mean <- colMeans(datos_numeros)
Sx <- cov(datos_numeros)
D2 <- mahalanobis(datos_numeros, mean, Sx)


# p-values usando Chi-cuadrado con df = número de variables
p_values <- 1 - pchisq(D2, df = ncol(datos))

outliers <- data.frame(D2 = round(D2, 2), 
                       p_value = round(p_values, 2))
outliers

##                    D2 p_value
## Espana           0.20    1.00
## Andalucia       10.51    0.16
## Aragon           1.91    0.96
## Asturias         4.45    0.73
## Baleares         5.68    0.58
## Canarias         9.59    0.21
## Cantabria        7.18    0.41
## Castilla y Leon  2.20    0.95
## Cast.-La Mancha  3.51    0.83
## Cataluna         2.94    0.89
## Com. Valenciana  2.66    0.91
## Extremadura     10.51    0.16
## Galicia         13.13    0.07
## Madrid           8.35    0.30
## Murcia           4.86    0.68
## Navarra          7.66    0.36
## Pais Vasco       2.32    0.94
## La Rioja         4.33    0.74

Metodos Jerarquicos

Realización de un análisis de conglomerados jerárquicos, evaluando la solución de distintos métodos de conglomeración, aplicando los criterios presentados para identificar el número adecuado de grupos y obtención de los centroides que han de servir de partida para el paso siguiente.

Metodo del Centroide (Centroid)

# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])

# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)

# Calculo de la distancia euclidea al cuadrado (usando la matriz normalizada)
matriz.dis.euclid.norm2 <- (matriz.dis.euclid.norm)^2

# Efectuamos el cluster con metodo del centroi
hclust.centroid <- hclust(matriz.dis.euclid.norm2,method = "centroid")

# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.centroid, labels=datos$CC.AA.)

# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.centroid[2:1])

##        height merge.1 merge.2
## 1   0.7213206      -5     -11
## 2   0.9981325      -3     -16
## 3   0.8440838     -18       2
## 4   1.3477855      -1       1
## 5   1.4105691      -4      -8
## 6   1.4669617     -15       4
## 7   1.7419417     -17       3
## 8   1.9805706      -9     -12
## 9   2.0811910      -7       5
## 10  2.1261648     -10       7
## 11  3.3880982      -2      -6
## 12  3.4251429       6       9
## 13  3.9690631      10      12
## 14  6.3547406      11      13
## 15  7.1515229       8      14
## 16 15.8956353     -13      15
## 17 19.0340756     -14      16

Metodo del Vecino mas Cercano (Single)

# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])

# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)

# Efectuamos el cluster con metodo single
hclust.single <- hclust(matriz.dis.euclid.norm,method = "single")

# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.single, labels=datos$CC.AA.)

# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.single[2:1])

##       height merge.1 merge.2
## 1  0.8493059      -5     -11
## 2  0.9990658      -3     -16
## 3  1.0257279     -18       2
## 4  1.1196469     -17       3
## 5  1.1221288     -15       1
## 6  1.1496370      -1       5
## 7  1.1876738      -4      -8
## 8  1.3349580       4       6
## 9  1.3676542     -10       8
## 10 1.4073275      -9     -12
## 11 1.4955213       7       9
## 12 1.5368736      -6      10
## 13 1.5571732      -7      11
## 14 1.6477410      12      13
## 15 1.8406787      -2      14
## 16 2.1297820     -14      15
## 17 2.4227349     -13      16

Metedo de la Vinculacion Promedio (Average)

# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])

# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)

# Efectuamos el cluster con metodo average
hclust.average <- hclust(matriz.dis.euclid.norm,method = "average")

# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.average, labels=datos$CC.AA.)

# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.average[2:1])

##       height merge.1 merge.2
## 1  0.8493059      -5     -11
## 2  0.9990658      -3     -16
## 3  1.0455731     -18       2
## 4  1.1876738      -4      -8
## 5  1.2319944     -15       1
## 6  1.3597377      -1       5
## 7  1.3676542     -10     -17
## 8  1.4073275      -9     -12
## 9  1.5553866       3       7
## 10 1.5600721      -7       4
## 11 1.8406787      -2      -6
## 12 2.1362102       6      10
## 13 2.4354069       9      12
## 14 2.7226739       8      11
## 15 3.0771236      13      14
## 16 4.3011415     -13      15
## 17 4.5882339     -14      16

Metodo del Vecino mas Lejano (Complete)

# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])

# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)

# Efectuamos el cluster con metodo complete
hclust.complete <- hclust(matriz.dis.euclid.norm,method = "complete")

# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.complete, labels=datos$CC.AA.)

# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.complete[2:1])

##       height merge.1 merge.2
## 1  0.8493059      -5     -11
## 2  0.9990658      -3     -16
## 3  1.0654183     -18       2
## 4  1.1876738      -4      -8
## 5  1.3170292      -1       1
## 6  1.3676542     -10     -17
## 7  1.4073275      -9     -12
## 8  1.5629709      -7       4
## 9  1.6125469     -15       5
## 10 1.8406787      -2      -6
## 11 2.1030853       3       6
## 12 2.6649224       9      10
## 13 3.1882462       7       8
## 14 3.6529816     -14      11
## 15 3.7399077      12      13
## 16 5.3329276     -13      15
## 17 7.0481449      14      16

Metodo de Ward

# Normalizar todas las columnas numéricas (excluyendo la columna CCAA)
datos.norm <- scale(datos[,-1])

# Calculo de la distancia euclidea
matriz.dis.euclid.norm <- dist(datos.norm[,-1], method='euclidean', diag=TRUE)

# Calculo de la distancia euclidea al cuadrado (usando la matriz normalizada)
matriz.dis.euclid.norm2 <- (matriz.dis.euclid.norm)^2

# Efectuamos el cluster con metodo de Ward
hclust.ward.D2 <- hclust(matriz.dis.euclid.norm2,method = "ward.D2")

# Trazamos el dendrograma usando los nombres de las Comunidades Autónomas como etiquetas
plot(hclust.ward.D2, labels=datos$CC.AA.)

# Saca el historial de aglomeración del objeto de clustering y lo convierte a data frame
data.frame(hclust.ward.D2[2:1])

##        height merge.1 merge.2
## 1   0.7213206      -5     -11
## 2   0.9981325      -3     -16
## 3   1.1246648     -18       2
## 4   1.4105691      -4      -8
## 5   1.7311589      -1       1
## 6   1.8704781     -10     -17
## 7   1.9805706      -9     -12
## 8   2.2170506     -15       5
## 9   2.6897831      -7       4
## 10  3.3880982      -2      -6
## 11  3.9724878       3       6
## 12  7.7336488       8      10
## 13 10.3749303       7       9
## 14 11.2851019     -14      11
## 15 14.9208345      12      13
## 16 23.0495867     -13      15
## 17 43.2022438      14      16

Seleccion del Numero de Conglomerados de la Solucion

Metodo del Centroide (Centroid)

datos.NbClust <- datos_numeros


library(NbClust)

res <- NbClust(datos.NbClust, 
               distance = "euclidean", 
               min.nc = 2, 
               max.nc = 15,  
               method = "centroid", 
               index = "alllong")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 2 as the best number of clusters 
## * 3 proposed 3 as the best number of clusters 
## * 3 proposed 4 as the best number of clusters 
## * 5 proposed 5 as the best number of clusters 
## * 1 proposed 7 as the best number of clusters 
## * 2 proposed 13 as the best number of clusters 
## * 8 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  15 
##  
##  
## *******************************************************************

res$All.index

##         KL      CH Hartigan     CCC    Scott      Marriot      TrCovW    TraceW
## 2   0.2235  4.2937   2.8811 12.5234 168.6204 3.248655e+15 825042.0502 4858.2106
## 3   0.5278  3.7256   2.5358  6.3228 192.1067 1.982538e+15 494773.1542 4116.8769
## 4   0.2463  3.4990   6.1711  5.0571 223.1054 6.297651e+14 382435.2595 3521.5400
## 5  11.6504  4.9435   0.4256  5.3892 248.4528 2.406744e+14 254825.3744 2444.1665
## 6   0.5876  3.8487   1.2685  4.0611 258.5897 1.973396e+14 243641.1592 2366.6815
## 7   0.1175  3.4446   9.1227  3.1329 271.8817 1.283528e+14 230558.4155 2140.4225
## 8   1.7975  6.0948   7.8420  4.8317 310.6980 1.940222e+13  81769.7539 1170.0562
## 9   5.7537  9.4458   1.6455  6.3465 364.3729 1.244849e+12  16130.2128  655.7873
## 10  0.6339  8.9904   2.6423  5.6311 379.4430 6.653166e+11  10243.4962  554.4190
## 11  1.1425  9.6496   2.6426  5.4025 418.5970 9.143792e+10   4649.9013  416.7650
## 12  0.8279 10.5637   4.4166  5.1720 499.1188 1.241413e+09   3423.1189  302.5483
## 13  2.9486 14.3162   1.6919  5.8770 975.1638 4.800000e-03   1843.1547  174.2683
## 14  0.8534 14.2535   2.3781  4.8274      NaN 0.000000e+00   1120.1709  130.2083
## 15  1.5532 15.9554   1.7852  4.0365      NaN 0.000000e+00    521.6202   81.6600
##         Friedman     Rubin Cindex     DB Silhouette     Duda Pseudot2   Beale
## 2   7.184240e+03   97.4761 0.5119 0.4550     0.3914   2.0302  -7.6115 -1.8302
## 3   8.341118e+03  115.0288 0.4860 0.5239     0.2786   2.2512  -7.7810 -1.9957
## 4   1.778822e+04  134.4751 0.4646 0.5493     0.1908   0.6941   5.7303  1.5747
## 5   2.091984e+04  193.7508 0.5680 0.5938     0.2755 136.2556   0.0000  0.0000
## 6   2.143490e+04  200.0942 0.5706 0.5162     0.3098   3.3323  -7.6990 -2.4684
## 7   2.214448e+04  221.2457 0.5648 0.5988     0.2388   0.5466   8.2933  2.9006
## 8   2.615942e+04  404.7321 0.5801 0.5989     0.3940   0.4710   6.7395  3.7042
## 9   8.972745e+04  722.1233 0.6721 0.5256     0.4966  36.4815  -0.9726 -1.8709
## 10  9.726547e+04  854.1542 0.6658 0.4729     0.5246  18.0014  -2.8333 -2.7252
## 11  1.823836e+05 1136.2742 0.6486 0.4498     0.5703  30.2342  -1.9338 -2.4800
## 12  5.460712e+05 1565.2352 0.5977 0.3905     0.6348   0.3520   3.6825  4.7226
## 13  4.297786e+15 2717.4145 0.7959 0.3900     0.6535 189.1116   0.0000  0.0000
## 14 -1.524036e+15 3636.9354 0.7737 0.3279     0.7357  60.4381  -0.9835 -1.8918
## 15  7.239803e+15 5799.1587 0.9363 0.2925     0.7630 380.6423   0.0000  0.0000
##    Ratkowsky      Ball Ptbiserial     Gap    Frey McClain  Gamma  Gplus     Tau
## 2     0.2969 2429.1053     0.4857 -0.9273  3.3318  0.0737 0.7119 2.1765 10.7582
## 3     0.2833 1372.2923     0.5325 -1.6679  2.3601  0.1706 0.6778 4.1699 17.5425
## 4     0.3103  880.3850     0.5446 -2.0221  0.8445  0.2894 0.6516 5.7386 21.4641
## 5     0.3428  488.8333     0.6033 -0.7081 -3.0954  0.5619 0.7017 5.6993 26.8105
## 6     0.3155  394.4469     0.5875 -0.8993  8.6015  0.5840 0.6793 6.1307 25.9739
## 7     0.2985  305.7746     0.5168 -0.4308  0.5130  0.8345 0.6047 7.4183 22.6928
## 8     0.3059  146.2570     0.5050  0.7060  0.2928  1.9039 0.7721 3.0131 20.4183
## 9     0.3087   72.8653     0.4735  1.4007  2.0346  3.1152 0.9285 0.5948 15.4510
## 10    0.2952   55.4419     0.4481  1.6991  0.3671  3.5305 0.9178 0.6209 13.8693
## 11    0.2863   37.8877     0.4132  2.0534  0.3510  4.4354 0.9462 0.3203 11.2549
## 12    0.2760   25.2124     0.3781  2.4310  0.2220  5.5272 0.9497 0.2353  8.8758
## 13    0.2712   13.4053     0.3209  2.9959  0.5327  8.0266 0.9955 0.0131  5.7386
## 14    0.2631    9.3006     0.2955  3.4159  0.2725  9.5006 0.9919 0.0196  4.7974
## 15    0.2558    5.4440     0.2381  3.8199  0.5313 14.6238 1.0000 0.0000  2.9412
##      Dunn Hubert SDindex  Dindex   SDbw
## 2  0.4347  4e-04  0.1792 15.0217 0.4231
## 3  0.3955  4e-04  0.1368 13.5193 0.2472
## 4  0.3454  4e-04  0.1293 12.1371 0.1643
## 5  0.4295  4e-04  0.1365 10.0901 0.1361
## 6  0.3851  4e-04  0.1983  9.3985 0.0882
## 7  0.3851  4e-04  0.1956  8.5891 0.0753
## 8  0.4868  4e-04  0.1968  6.2610 0.0668
## 9  0.7193  4e-04  0.1996  4.7092 0.0547
## 10 0.6166  4e-04  0.2013  4.0766 0.0371
## 11 0.6546  4e-04  0.2023  3.3988 0.0311
## 12 0.6546  4e-04  0.2060  2.7435 0.0219
## 13 0.9264  4e-04  0.2216  2.1651 0.0179
## 14 0.9156  5e-04  0.2630  1.6436 0.0109
## 15 1.0795  5e-04  0.2989  1.2287 0.0089

res$Best.nc

##                      KL      CH Hartigan     CCC   Scott      Marriot   TrCovW
## Number_clusters  5.0000 15.0000   7.0000  2.0000  13.000 4.000000e+00      3.0
## Value_Index     11.6504 15.9554   7.8542 12.5234 476.045 9.636823e+14 330268.9
##                   TraceW     Friedman     Rubin Cindex      DB Silhouette
## Number_clusters   5.0000 1.500000e+01   13.0000 4.0000 15.0000     15.000
## Value_Index     999.8885 8.763839e+15 -232.6584 0.4646  0.2925      0.763
##                   Duda PseudoT2   Beale Ratkowsky     Ball PtBiserial     Gap
## Number_clusters 2.0000   2.0000  2.0000    5.0000    3.000     5.0000  2.0000
## Value_Index     2.0302  -7.6115 -1.8302    0.3428 1056.813     0.6033 -0.9273
##                   Frey McClain Gamma Gplus     Tau    Dunn Hubert SDindex
## Number_clusters 3.0000  2.0000    15    15  5.0000 15.0000      0  4.0000
## Value_Index     2.3601  0.0737     1     0 26.8105  1.0795      0  0.1293
##                 Dindex    SDbw
## Number_clusters      0 15.0000
## Value_Index          0  0.0089

res$Best.pa

##          Espana       Andalucia          Aragon        Asturias        Baleares 
##               1               2               3               4               5 
##        Canarias       Cantabria Castilla y Leon Cast.-La Mancha        Cataluna 
##               6               7               4               8               9 
## Com. Valenciana     Extremadura         Galicia          Madrid          Murcia 
##               5              10              11              12              13 
##         Navarra      Pais Vasco        La Rioja 
##              14              15               3

Metodo del Vecino mas Cercano (Single)

datos.NbClust <- datos_numeros


library(NbClust)

res <- NbClust(datos.NbClust, 
               distance = "euclidean", 
               min.nc = 2, 
               max.nc = 15,  
               method = "single", 
               index = "alllong")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 5 proposed 2 as the best number of clusters 
## * 4 proposed 3 as the best number of clusters 
## * 3 proposed 4 as the best number of clusters 
## * 3 proposed 5 as the best number of clusters 
## * 1 proposed 7 as the best number of clusters 
## * 1 proposed 9 as the best number of clusters 
## * 1 proposed 10 as the best number of clusters 
## * 2 proposed 13 as the best number of clusters 
## * 8 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  15 
##  
##  
## *******************************************************************

res$All.index

##         KL      CH Hartigan     CCC    Scott      Marriot      TrCovW    TraceW
## 2   0.2235  4.2937   2.8811 12.5234 168.6204 3.248655e+15 825042.0502 4858.2106
## 3   0.5278  3.7256   2.5358  6.3228 192.1067 1.982538e+15 494773.1542 4116.8769
## 4   0.2463  3.4990   6.1711  5.0571 223.1054 6.297651e+14 382435.2595 3521.5400
## 5   1.2467  4.9435   5.9429  5.3892 248.4528 2.406744e+14 254825.3744 2444.1665
## 6   3.5975  6.4166   2.0863  5.8712 285.6188 4.396140e+13  72706.0555 1677.3640
## 7  10.2938  6.0725   0.6307  5.2509 296.4509 3.278027e+13  52441.8541 1428.9347
## 8   0.2416  5.0850   0.9110  4.0780 311.6154 1.843807e+13  49105.2847 1351.4497
## 9   0.0899  4.4717  11.5314  3.0276 326.4007 1.026337e+13  37645.3697 1238.6107
## 10  5.1822  9.1992   2.7537  5.7401 388.1173 4.109033e+11  13203.1042  542.9483
## 11  1.3200  9.9788   2.3453  5.5655 414.0897 1.174560e+11   7142.5814  403.9167
## 12  0.7269 10.5637   4.4166  5.1720 499.1188 1.241413e+09   3423.1189  302.5483
## 13  2.9486 14.3162   1.6919  5.8770 975.1638 4.800000e-03   1843.1547  174.2683
## 14  0.8534 14.2535   2.3781  4.8274      NaN 0.000000e+00   1120.1709  130.2083
## 15  1.5532 15.9554   1.7852  4.0365      NaN 0.000000e+00    521.6202   81.6600
##         Friedman     Rubin Cindex     DB Silhouette     Duda Pseudot2   Beale
## 2   7.184240e+03   97.4761 0.5119 0.4550     0.3914   2.0302  -7.6115 -1.8302
## 3   8.341118e+03  115.0288 0.4860 0.5239     0.2786   2.2512  -7.7810 -1.9957
## 4   1.778822e+04  134.4751 0.4646 0.5493     0.1908   0.6941   5.7303  1.5747
## 5   2.091984e+04  193.7508 0.5680 0.5938     0.2755   0.6760   5.2722  1.6903
## 6   2.646221e+04  282.3235 0.5644 0.8000     0.3187   8.7527  -3.5430 -2.7262
## 7   2.745848e+04  331.4072 0.5500 0.7152     0.3153 132.3449   0.0000  0.0000
## 8   2.969617e+04  350.4084 0.5538 0.6520     0.3626  17.3267  -2.8269 -2.7190
## 9   3.576658e+04  382.3310 0.5637 0.6405     0.3959   0.3184  10.7057  6.8647
## 10  1.012341e+05  872.1996 0.6368 0.5102     0.5305  26.7933  -1.9254 -2.4692
## 11  1.314616e+05 1172.4183 0.6271 0.4477     0.6012  45.5972  -0.9781 -1.8815
## 12  5.460712e+05 1565.2352 0.5977 0.3905     0.6348   0.3520   3.6825  4.7226
## 13  4.297786e+15 2717.4145 0.7959 0.3900     0.6535 189.1116   0.0000  0.0000
## 14 -1.524036e+15 3636.9354 0.7737 0.3279     0.7357  60.4381  -0.9835 -1.8918
## 15  7.239803e+15 5799.1587 0.9363 0.2925     0.7630 380.6423   0.0000  0.0000
##    Ratkowsky      Ball Ptbiserial     Gap    Frey McClain  Gamma  Gplus     Tau
## 2     0.2969 2429.1053     0.4857 -0.2176  3.3318  0.0737 0.7119 2.1765 10.7582
## 3     0.2833 1372.2923     0.5325 -0.0308  2.3601  0.1706 0.6778 4.1699 17.5425
## 4     0.3103  880.3850     0.5446 -0.8947  0.8445  0.2894 0.6516 5.7386 21.4641
## 5     0.3428  488.8333     0.6033 -0.7081  1.2917  0.5619 0.7017 5.6993 26.8105
## 6     0.3372  279.5607     0.4830 -0.0344  1.0007  1.8087 0.6994 4.2157 19.6209
## 7     0.3203  204.1335     0.4588  0.4405 -1.2013  2.1591 0.7004 3.7908 17.7255
## 8     0.3016  168.9312     0.4448  0.6409 -1.6056  2.2719 0.6832 3.9150 16.8889
## 9     0.2869  137.6234     0.3968  0.8259  0.0902  2.7821 0.6267 4.1503 13.9346
## 10    0.2950   54.2948     0.4326  1.7200  0.3273  3.9228 0.9237 0.5163 12.4967
## 11    0.2861   36.7197     0.4047  2.1072  0.4248  4.7080 0.9456 0.3007 10.4575
## 12    0.2760   25.2124     0.3781  2.4310  0.2220  5.5272 0.9497 0.2353  8.8758
## 13    0.2712   13.4053     0.3209  2.9959  0.5327  8.0266 0.9955 0.0131  5.7386
## 14    0.2631    9.3006     0.2955  3.4159  0.2725  9.5006 0.9919 0.0196  4.7974
## 15    0.2558    5.4440     0.2381  3.8199  0.5313 14.6238 1.0000 0.0000  2.9412
##      Dunn Hubert SDindex  Dindex   SDbw
## 2  0.4347  4e-04  0.1792 15.0217 0.4231
## 3  0.3955  4e-04  0.1368 13.5193 0.2472
## 4  0.3454  4e-04  0.1293 12.1371 0.1643
## 5  0.4295  4e-04  0.1365 10.0901 0.1361
## 6  0.4631  4e-04  0.1806  8.2565 0.1429
## 7  0.4522  4e-04  0.1661  7.4210 0.1096
## 8  0.4522  4e-04  0.2033  6.7294 0.0770
## 9  0.4484  4e-04  0.2188  6.1110 0.0667
## 10 0.7193  4e-04  0.2146  4.0908 0.0476
## 11 0.6956  4e-04  0.2070  3.3761 0.0349
## 12 0.6546  4e-04  0.2060  2.7435 0.0219
## 13 0.9264  4e-04  0.2216  2.1651 0.0179
## 14 0.9156  5e-04  0.2630  1.6436 0.0109
## 15 1.0795  5e-04  0.2989  1.2287 0.0089

res$Best.nc

##                      KL      CH Hartigan     CCC   Scott      Marriot   TrCovW
## Number_clusters  7.0000 15.0000   9.0000  2.0000  13.000 4.000000e+00      3.0
## Value_Index     10.2938 15.9554  10.6204 12.5234 476.045 9.636823e+14 330268.9
##                   TraceW     Friedman     Rubin Cindex      DB Silhouette
## Number_clusters  10.0000 1.500000e+01   13.0000 4.0000 15.0000     15.000
## Value_Index     556.6307 8.763839e+15 -232.6584 0.4646  0.2925      0.763
##                   Duda PseudoT2   Beale Ratkowsky     Ball PtBiserial     Gap
## Number_clusters 2.0000   2.0000  2.0000    5.0000    3.000     5.0000  3.0000
## Value_Index     2.0302  -7.6115 -1.8302    0.3428 1056.813     0.6033 -0.0308
##                   Frey McClain Gamma Gplus     Tau    Dunn Hubert SDindex
## Number_clusters 3.0000  2.0000    15    15  5.0000 15.0000      0  4.0000
## Value_Index     2.3601  0.0737     1     0 26.8105  1.0795      0  0.1293
##                 Dindex    SDbw
## Number_clusters      0 15.0000
## Value_Index          0  0.0089

res$Best.pa

##          Espana       Andalucia          Aragon        Asturias        Baleares 
##               1               2               3               4               5 
##        Canarias       Cantabria Castilla y Leon Cast.-La Mancha        Cataluna 
##               6               7               4               8               9 
## Com. Valenciana     Extremadura         Galicia          Madrid          Murcia 
##               5              10              11              12              13 
##         Navarra      Pais Vasco        La Rioja 
##              14              15               3

Metodo de la Vinculacion Promedio (Average)

datos.NbClust <- datos_numeros


library(NbClust)

res <- NbClust(datos.NbClust, 
               distance = "euclidean", 
               min.nc = 2, 
               max.nc = 15,  
               method = "average", 
               index = "alllong")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 2 as the best number of clusters 
## * 6 proposed 3 as the best number of clusters 
## * 5 proposed 4 as the best number of clusters 
## * 2 proposed 5 as the best number of clusters 
## * 1 proposed 13 as the best number of clusters 
## * 8 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  15 
##  
##  
## *******************************************************************

res$All.index

##        KL      CH Hartigan     CCC    Scott      Marriot      TrCovW    TraceW
## 2  0.2235  4.2937   2.8811 12.5234 168.6204 3.248655e+15 825042.0502 4858.2106
## 3  0.0917  3.7256   9.8914  6.3228 192.1067 1.982538e+15 494773.1542 4116.8769
## 4  1.5379  6.9242   8.5128  6.9179 223.7130 6.088617e+14 316581.7207 2480.8983
## 5  2.5612  9.7306   4.1057  7.8188 250.1128 2.194711e+14  94432.8631 1542.7923
## 6  1.3804 10.2131   3.2677  7.7541 278.2104 6.634613e+13  59705.3285 1172.4890
## 7  1.2410 10.4254   2.8243  7.5500 325.6524 6.472241e+12  37665.4289  921.5423
## 8  1.3018 10.5762   2.3111  7.2751 347.6137 2.495555e+12  19413.0120  733.2723
## 9  1.1280 10.5136   2.1353  6.8487 372.4602 7.943109e+11  12325.3583  595.6183
## 10 1.0512 10.4889   2.1339  6.3671 388.4650 4.030405e+11   9665.7481  481.4017
## 11 0.7141 10.6500   3.5668  5.8828 438.7682 2.981569e+10   5880.1217  380.0333
## 12 1.6135 12.8052   2.6678  6.1278 479.4351 3.705384e+09   3890.4809  251.7533
## 13 1.7365 14.3162   1.6919  5.8770 975.1638 4.800000e-03   1843.1547  174.2683
## 14 0.8534 14.2535   2.3781  4.8274      NaN 0.000000e+00   1120.1709  130.2083
## 15 1.5532 15.9554   1.7852  4.0365      NaN 0.000000e+00    521.6202   81.6600
##         Friedman     Rubin Cindex     DB Silhouette     Duda Pseudot2   Beale
## 2   7.184240e+03   97.4761 0.5119 0.4550     0.3914   2.0302  -7.6115 -1.8302
## 3   8.341118e+03  115.0288 0.4860 0.5239     0.2786   0.6026   9.2320  2.3679
## 4   9.047014e+03  190.8822 0.5390 0.8181     0.2924   0.4332  10.4655  4.4738
## 5   9.227215e+03  306.9495 0.5783 0.7573     0.3542   0.5515   3.2527  2.5028
## 6   1.295054e+04  403.8923 0.5135 0.7595     0.3774  15.7979  -1.8734 -2.4025
## 7   6.212024e+04  513.8769 0.5738 0.6398     0.4196  12.7845  -2.7653 -2.6598
## 8   7.733520e+04  645.8164 0.6741 0.5886     0.4365  18.0014  -2.8333 -2.7252
## 9   1.022427e+05  795.0717 0.6712 0.5678     0.4798  24.9376  -1.9198 -2.4620
## 10  1.073772e+05  983.7093 0.6373 0.4953     0.5226  45.5972  -0.9781 -1.8815
## 11  2.475246e+05 1246.0994 0.6125 0.4426     0.5532   0.3520   3.6825  4.7226
## 12  2.914964e+05 1881.0448 0.6942 0.4428     0.5719 132.3449   0.0000  0.0000
## 13  4.297786e+15 2717.4145 0.7959 0.3900     0.6535 189.1116   0.0000  0.0000
## 14 -1.524036e+15 3636.9354 0.7737 0.3279     0.7357  60.4381  -0.9835 -1.8918
## 15  7.239803e+15 5799.1587 0.9363 0.2925     0.7630 380.6423   0.0000  0.0000
##    Ratkowsky      Ball Ptbiserial     Gap   Frey McClain  Gamma  Gplus     Tau
## 2     0.2969 2429.1053     0.4857 -0.9273 3.3318  0.0737 0.7119 2.1765 10.7582
## 3     0.2833 1372.2923     0.5325 -1.6679 1.4851  0.1706 0.6778 4.1699 17.5425
## 4     0.3713  620.2246     0.5393 -1.6718 0.5140  0.9443 0.6667 6.0784 24.3137
## 5     0.3661  308.5585     0.5229 -0.2480 0.3129  1.8045 0.7995 2.7059 21.5817
## 6     0.3588  195.4148     0.5138 -0.1877 0.2794  2.2764 0.8777 1.3595 19.5163
## 7     0.3446  131.6489     0.5078  0.4653 0.3942  2.5023 0.9134 0.8758 18.4837
## 8     0.3255   91.6590     0.4864  0.9314 0.4962  2.9337 0.9368 0.5490 16.2876
## 9     0.3123   66.1798     0.4522  1.2920 0.5500  3.5970 0.9544 0.3268 13.6732
## 10    0.2984   48.1402     0.4182  1.6655 0.5094  4.3622 0.9495 0.3007 11.2941
## 11    0.2867   34.5485     0.3920  2.1457 0.2885  5.0816 0.9513 0.2484  9.7124
## 12    0.2808   20.9794     0.3354  2.6181 0.1535  7.2843 0.9843 0.0523  6.5752
## 13    0.2712   13.4053     0.3209  2.9959 0.5327  8.0266 0.9955 0.0131  5.7386
## 14    0.2631    9.3006     0.2955  3.3297 0.2725  9.5006 0.9919 0.0196  4.7974
## 15    0.2558    5.4440     0.2381  3.8199 0.5313 14.6238 1.0000 0.0000  2.9412
##      Dunn Hubert SDindex  Dindex   SDbw
## 2  0.4347  4e-04  0.1792 15.0217 0.4231
## 3  0.3955  4e-04  0.1368 13.5193 0.2472
## 4  0.3797  3e-04  0.1662 10.6044 0.2385
## 5  0.4908  3e-04  0.1633  8.3185 0.1718
## 6  0.4908  4e-04  0.1815  7.1479 0.1606
## 7  0.5770  4e-04  0.1633  6.2288 0.1033
## 8  0.7193  4e-04  0.1691  5.4008 0.0804
## 9  0.6956  4e-04  0.1940  4.7231 0.0682
## 10 0.6956  4e-04  0.2079  4.0677 0.0535
## 11 0.6546  4e-04  0.2054  3.4351 0.0376
## 12 0.7629  4e-04  0.2208  2.8567 0.0319
## 13 0.9264  4e-04  0.2216  2.1651 0.0179
## 14 0.9156  5e-04  0.2630  1.6436 0.0109
## 15 1.0795  5e-04  0.2989  1.2287 0.0089

res$Best.nc

##                     KL      CH Hartigan     CCC    Scott      Marriot   TrCovW
## Number_clusters 5.0000 15.0000   3.0000  2.0000  13.0000 4.000000e+00      3.0
## Value_Index     2.5612 15.9554   7.0103 12.5234 495.7287 9.842859e+14 330268.9
##                   TraceW     Friedman    Rubin Cindex      DB Silhouette   Duda
## Number_clusters   4.0000 1.500000e+01   5.0000  3.000 15.0000     15.000 2.0000
## Value_Index     697.8725 8.763839e+15 -19.1244  0.486  0.2925      0.763 2.0302
##                 PseudoT2   Beale Ratkowsky     Ball PtBiserial     Gap   Frey
## Number_clusters   2.0000  2.0000    4.0000    3.000     4.0000  2.0000 3.0000
## Value_Index      -7.6115 -1.8302    0.3713 1056.813     0.5393 -0.9273 1.4851
##                 McClain Gamma Gplus     Tau    Dunn Hubert SDindex Dindex
## Number_clusters  2.0000    15    15  4.0000 15.0000      0  3.0000      0
## Value_Index      0.0737     1     0 24.3137  1.0795      0  0.1368      0
##                    SDbw
## Number_clusters 15.0000
## Value_Index      0.0089

res$Best.pa

##          Espana       Andalucia          Aragon        Asturias        Baleares 
##               1               2               3               4               5 
##        Canarias       Cantabria Castilla y Leon Cast.-La Mancha        Cataluna 
##               6               7               4               8               9 
## Com. Valenciana     Extremadura         Galicia          Madrid          Murcia 
##               5              10              11              12              13 
##         Navarra      Pais Vasco        La Rioja 
##              14              15               3

Metodo del Vecino mas Lejano (Complete)

datos.NbClust <- datos_numeros


library(NbClust)

res <- NbClust(datos.NbClust, 
               distance = "euclidean", 
               min.nc = 2, 
               max.nc = 15,  
               method = "complete", 
               index = "alllong")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 2 as the best number of clusters 
## * 9 proposed 3 as the best number of clusters 
## * 1 proposed 4 as the best number of clusters 
## * 1 proposed 5 as the best number of clusters 
## * 1 proposed 7 as the best number of clusters 
## * 1 proposed 13 as the best number of clusters 
## * 8 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************

res$All.index

##        KL      CH Hartigan     CCC    Scott      Marriot      TrCovW    TraceW
## 2  1.3012 10.9000   8.6508 13.6679 182.5944 1.494675e+15 611775.8159 3665.1000
## 3  2.9820 11.9269   3.6759  8.7521 207.2198 8.562089e+14 195289.0313 2378.8967
## 4  1.0079 10.3835   3.3382  8.3053 226.6385 5.175283e+14 162881.5552 1910.6633
## 5  0.8999  9.7306   3.5705  7.8188 250.1128 2.194711e+14  94432.8631 1542.7923
## 6  1.0051  9.8184   3.7609  7.5869 269.1563 1.097157e+14  73413.6271 1210.3640
## 7  1.4228 10.4254   2.9240  7.5500 325.6524 6.472241e+12  37665.4289  921.5423
## 8  1.3455 10.6628   2.3317  7.3127 350.9599 2.072206e+12  26403.8461  728.0190
## 9  1.0311 10.6172   2.3692  6.8950 375.7021 6.633938e+11  18843.2293  590.3650
## 10 1.1463 10.8313   2.2159  6.5216 397.8561 2.392023e+11  10044.6904  467.3383
## 11 0.8122 11.0861   3.1758  6.0791 439.9275 2.795598e+10   5744.7480  365.9700
## 12 1.4257 12.8052   2.6678  6.1278 479.4351 3.705384e+09   3890.4809  251.7533
## 13 1.5606 14.3162   1.9308  5.8770 975.1638 4.800000e-03   1843.1547  174.2683
## 14 1.0507 14.7733   2.1582  5.0097      NaN 0.000000e+00   1102.8885  125.7200
## 15 1.4037 15.9554   1.7852  4.0365      NaN 0.000000e+00    521.6202   81.6600
##         Friedman     Rubin Cindex     DB Silhouette     Duda Pseudot2   Beale
## 2   7.128190e+03  129.2077 0.3657 1.0902     0.3166   0.5526   8.0957  2.8315
## 3   8.070602e+03  199.0668 0.3854 1.0544     0.3148   4.2423  -3.0571 -2.3523
## 4   8.208753e+03  247.8507 0.6030 0.8711     0.3405   8.6970  -3.5401 -2.7240
## 5   9.227215e+03  306.9495 0.5783 0.7573     0.3542   0.5974   2.6958  2.0743
## 6   9.902408e+03  391.2536 0.5893 0.8123     0.3394  16.1629  -0.9381 -1.8046
## 7   6.212024e+04  513.8769 0.5738 0.6398     0.4196   0.4989   3.0129  2.8979
## 8   6.412791e+04  650.4766 0.6827 0.6672     0.3955  18.0014  -2.8333 -2.7252
## 9   9.316051e+04  802.1466 0.7310 0.6410     0.4244  28.6303  -0.9651 -1.8565
## 10  1.043022e+05 1013.3115 0.7002 0.5778     0.4763  45.5972  -0.9781 -1.8815
## 11  2.475876e+05 1293.9839 0.6812 0.5080     0.5173  24.9376  -1.9198 -2.4620
## 12  2.914964e+05 1881.0448 0.6942 0.4428     0.5719 132.3449   0.0000  0.0000
## 13  4.297786e+15 2717.4145 0.7959 0.3900     0.6535  60.4381  -0.9835 -1.8918
## 14 -2.935833e+15 3766.7778 0.8198 0.3630     0.6808 189.1116   0.0000  0.0000
## 15  7.239803e+15 5799.1587 0.9363 0.2925     0.7630 380.6423   0.0000  0.0000
##    Ratkowsky      Ball Ptbiserial     Gap   Frey McClain  Gamma  Gplus     Tau
## 2     0.3962 1832.5500     0.4523 -0.2396 0.4522  0.6085 0.5288 8.9804 20.1569
## 3     0.4334  792.9656     0.4997 -0.7762 0.1456  1.4310 0.6790 5.0980 21.5686
## 4     0.3964  477.6658     0.5184 -1.5993 0.2665  1.5916 0.7442 3.7778 21.9869
## 5     0.3661  308.5585     0.5229 -0.3333 0.3838  1.8045 0.7995 2.7059 21.5817
## 6     0.3448  201.7273     0.5022 -0.1940 0.1217  2.3924 0.8698 1.4052 18.7712
## 7     0.3446  131.6489     0.5078  0.4653 0.8384  2.5023 0.9134 0.8758 18.4837
## 8     0.3306   91.0024     0.4532  0.6569 0.4407  3.3774 0.9193 0.6405 14.6013
## 9     0.3171   65.5961     0.4176  1.0810 0.3187  4.2355 0.9424 0.3660 11.9869
## 10    0.3032   46.7338     0.3987  1.5282 0.4692  4.8067 0.9527 0.2614 10.5359
## 11    0.2913   33.2700     0.3716  2.0658 0.1959  5.6715 0.9580 0.1961  8.9542
## 12    0.2808   20.9794     0.3354  2.6181 0.1535  7.2843 0.9843 0.0523  6.5752
## 13    0.2712   13.4053     0.3209  2.9959 0.4455  8.0266 0.9955 0.0131  5.7386
## 14    0.2630    8.9800     0.2679  3.3648 0.2107 11.5905 0.9966 0.0065  3.8824
## 15    0.2558    5.4440     0.2381  3.8199 0.5313 14.6238 1.0000 0.0000  2.9412
##      Dunn Hubert SDindex  Dindex   SDbw
## 2  0.2825  2e-04  0.2613 13.2067 1.1117
## 3  0.3808  2e-04  0.2131 10.7759 0.7485
## 4  0.4802  3e-04  0.1882  9.4769 0.2645
## 5  0.4908  3e-04  0.1633  8.3185 0.1718
## 6  0.5598  4e-04  0.2008  7.3436 0.1611
## 7  0.5770  4e-04  0.1633  6.2288 0.1033
## 8  0.5566  4e-04  0.2023  5.5272 0.0876
## 9  0.6357  4e-04  0.2016  4.8495 0.0746
## 10 0.6418  4e-04  0.2205  4.1448 0.0583
## 11 0.6506  4e-04  0.2186  3.5121 0.0420
## 12 0.7629  4e-04  0.2208  2.8567 0.0319
## 13 0.9264  4e-04  0.2216  2.1651 0.0179
## 14 0.8976  4e-04  0.3025  1.7502 0.0151
## 15 1.0795  5e-04  0.2989  1.2287 0.0089

res$Best.nc

##                    KL      CH Hartigan     CCC    Scott      Marriot   TrCovW
## Number_clusters 3.000 15.0000   3.0000  2.0000  13.0000 3.000000e+00      3.0
## Value_Index     2.982 15.9554   4.9748 13.6679 495.7287 2.997856e+14 416486.8
##                 TraceW     Friedman    Rubin Cindex      DB Silhouette   Duda
## Number_clusters   3.00 1.500000e+01   3.0000 2.0000 15.0000     15.000 2.0000
## Value_Index     817.97 1.017564e+16 -21.0751 0.3657  0.2925      0.763 0.5526
##                 PseudoT2   Beale Ratkowsky     Ball PtBiserial     Gap Frey
## Number_clusters   2.0000  3.0000    3.0000    3.000     5.0000  2.0000    1
## Value_Index       8.0957 -2.3523    0.4334 1039.584     0.5229 -0.2396   NA
##                 McClain Gamma Gplus     Tau    Dunn Hubert SDindex Dindex
## Number_clusters  2.0000    15    15  4.0000 15.0000      0  7.0000      0
## Value_Index      0.6085     1     0 21.9869  1.0795      0  0.1633      0
##                    SDbw
## Number_clusters 15.0000
## Value_Index      0.0089

res$Best.pa

##          Espana       Andalucia          Aragon        Asturias        Baleares 
##               1               1               2               3               1 
##        Canarias       Cantabria Castilla y Leon Cast.-La Mancha        Cataluna 
##               1               3               3               3               2 
## Com. Valenciana     Extremadura         Galicia          Madrid          Murcia 
##               1               3               3               2               1 
##         Navarra      Pais Vasco        La Rioja 
##               2               2               2

Metodo Ward

datos.NbClust <- datos_numeros


library(NbClust)

res <- NbClust(datos.NbClust, 
               distance = "euclidean", 
               min.nc = 2, 
               max.nc = 15,  
               method = "ward.D2", 
               index = "alllong")

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
##

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 6 proposed 2 as the best number of clusters 
## * 9 proposed 3 as the best number of clusters 
## * 2 proposed 4 as the best number of clusters 
## * 1 proposed 7 as the best number of clusters 
## * 1 proposed 13 as the best number of clusters 
## * 8 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  3 
##  
##  
## *******************************************************************

res$All.index

##        KL      CH Hartigan     CCC    Scott      Marriot      TrCovW    TraceW
## 2  1.3012 10.9000   8.6508 13.6679 182.5944 1.494675e+15 611775.8159 3665.1000
## 3  2.9820 11.9269   3.6759  8.7521 207.2198 8.562089e+14 195289.0313 2378.8967
## 4  0.9974 10.3835   3.3656  8.3053 226.6385 5.175283e+14 162881.5552 1910.6633
## 5  0.7926  9.7511   4.0788  7.8271 248.5619 2.392200e+14 115553.4337 1540.3600
## 6  1.3691 10.2131   3.2677  7.7541 278.2104 6.634613e+13  59705.3285 1172.4890
## 7  1.1985 10.4254   2.9240  7.5500 325.6524 6.472241e+12  37665.4289  921.5423
## 8  1.3455 10.6628   2.3317  7.3127 350.9599 2.072206e+12  26403.8461  728.0190
## 9  1.0311 10.6172   2.3692  6.8950 375.7021 6.633938e+11  18843.2293  590.3650
## 10 0.9984 10.8313   2.5876  6.5216 397.8561 2.392023e+11  10044.6904  467.3383
## 11 1.0601 11.5149   2.8185  6.2651 427.0366 5.721375e+10   7213.7754  353.1217
## 12 1.2542 12.8052   2.6678  6.1278 479.4351 3.705384e+09   3890.4809  251.7533
## 13 1.5606 14.3162   1.9308  5.8770 975.1638 4.800000e-03   1843.1547  174.2683
## 14 1.0507 14.7733   2.1582  5.0097      NaN 0.000000e+00   1102.8885  125.7200
## 15 1.4037 15.9554   1.7852  4.0365      NaN 0.000000e+00    521.6202   81.6600
##         Friedman     Rubin Cindex     DB Silhouette     Duda Pseudot2   Beale
## 2   7.128190e+03  129.2077 0.3657 1.0902     0.3166   0.5526   8.0957  2.8315
## 3   8.070602e+03  199.0668 0.3854 1.0544     0.3148   4.2423  -3.0571 -2.3523
## 4   8.208753e+03  247.8507 0.6030 0.8711     0.3405   0.5515   3.2527  2.5028
## 5   1.184051e+04  307.4342 0.5589 0.8463     0.3420   8.6970  -3.5401 -2.7240
## 6   1.295054e+04  403.8923 0.5135 0.7595     0.3774  15.7979  -1.8734 -2.4025
## 7   6.212024e+04  513.8769 0.5738 0.6398     0.4196   0.4989   3.0129  2.8979
## 8   6.412791e+04  650.4766 0.6827 0.6672     0.3955  18.0014  -2.8333 -2.7252
## 9   9.316051e+04  802.1466 0.7310 0.6410     0.4244  28.6303  -0.9651 -1.8565
## 10  1.043022e+05 1013.3115 0.7002 0.5778     0.4763  24.9376  -1.9198 -2.4620
## 11  1.158825e+05 1341.0655 0.6357 0.5084     0.5191  45.5972  -0.9781 -1.8815
## 12  2.914964e+05 1881.0448 0.6942 0.4428     0.5719 132.3449   0.0000  0.0000
## 13  4.297786e+15 2717.4145 0.7959 0.3900     0.6535  60.4381  -0.9835 -1.8918
## 14 -2.935833e+15 3766.7778 0.8198 0.3630     0.6808 189.1116   0.0000  0.0000
## 15  7.239803e+15 5799.1587 0.9363 0.2925     0.7630 380.6423   0.0000  0.0000
##    Ratkowsky      Ball Ptbiserial     Gap   Frey McClain  Gamma  Gplus     Tau
## 2     0.3962 1832.5500     0.4523 -0.6455 0.4522  0.6085 0.5288 8.9804 20.1569
## 3     0.4334  792.9656     0.4997 -1.1194 0.1456  1.4310 0.6790 5.0980 21.5686
## 4     0.3964  477.6658     0.5184 -1.4106 0.4834  1.5916 0.7442 3.7778 21.9869
## 5     0.3823  308.0720     0.5017 -1.6984 0.1500  2.0221 0.7872 2.6928 19.9216
## 6     0.3588  195.4148     0.5138 -0.1877 0.2794  2.2764 0.8777 1.3595 19.5163
## 7     0.3446  131.6489     0.5078  0.4653 0.8384  2.5023 0.9134 0.8758 18.4837
## 8     0.3306   91.0024     0.4532  0.9386 0.4407  3.3774 0.9193 0.6405 14.6013
## 9     0.3171   65.5961     0.4176  1.1003 0.3187  4.2355 0.9424 0.3660 11.9869
## 10    0.3032   46.7338     0.3987  1.5413 0.2980  4.8067 0.9527 0.2614 10.5359
## 11    0.2911   32.1020     0.3631  2.1089 0.2182  6.0607 0.9630 0.1569  8.1569
## 12    0.2808   20.9794     0.3354  2.5204 0.1535  7.2843 0.9843 0.0523  6.5752
## 13    0.2712   13.4053     0.3209  2.9959 0.4455  8.0266 0.9955 0.0131  5.7386
## 14    0.2630    8.9800     0.2679  3.3648 0.2107 11.5905 0.9966 0.0065  3.8824
## 15    0.2558    5.4440     0.2381  3.8199 0.5313 14.6238 1.0000 0.0000  2.9412
##      Dunn Hubert SDindex  Dindex   SDbw
## 2  0.2825  2e-04  0.2613 13.2067 1.1117
## 3  0.3808  2e-04  0.2131 10.7759 0.7485
## 4  0.4802  3e-04  0.1882  9.4769 0.2645
## 5  0.4802  3e-04  0.2116  8.3063 0.2324
## 6  0.4908  4e-04  0.1815  7.1479 0.1606
## 7  0.5770  4e-04  0.1633  6.2288 0.1033
## 8  0.5566  4e-04  0.2023  5.5272 0.0876
## 9  0.6357  4e-04  0.2016  4.8495 0.0746
## 10 0.6418  4e-04  0.2205  4.1448 0.0583
## 11 0.6418  4e-04  0.2227  3.4894 0.0458
## 12 0.7629  4e-04  0.2208  2.8567 0.0319
## 13 0.9264  4e-04  0.2216  2.1651 0.0179
## 14 0.8976  4e-04  0.3025  1.7502 0.0151
## 15 1.0795  5e-04  0.2989  1.2287 0.0089

res$Best.nc

##                    KL      CH Hartigan     CCC    Scott      Marriot   TrCovW
## Number_clusters 3.000 15.0000   3.0000  2.0000  13.0000 3.000000e+00      3.0
## Value_Index     2.982 15.9554   4.9748 13.6679 495.7287 2.997856e+14 416486.8
##                 TraceW     Friedman    Rubin Cindex      DB Silhouette   Duda
## Number_clusters   3.00 1.500000e+01   3.0000 2.0000 15.0000     15.000 2.0000
## Value_Index     817.97 1.017564e+16 -21.0751 0.3657  0.2925      0.763 0.5526
##                 PseudoT2   Beale Ratkowsky     Ball PtBiserial     Gap Frey
## Number_clusters   2.0000  3.0000    3.0000    3.000     4.0000  2.0000    1
## Value_Index       8.0957 -2.3523    0.4334 1039.584     0.5184 -0.6455   NA
##                 McClain Gamma Gplus     Tau    Dunn Hubert SDindex Dindex
## Number_clusters  2.0000    15    15  4.0000 15.0000      0  7.0000      0
## Value_Index      0.6085     1     0 21.9869  1.0795      0  0.1633      0
##                    SDbw
## Number_clusters 15.0000
## Value_Index      0.0089

res$Best.partition

##          Espana       Andalucia          Aragon        Asturias        Baleares 
##               1               1               2               3               1 
##        Canarias       Cantabria Castilla y Leon Cast.-La Mancha        Cataluna 
##               1               3               3               3               2 
## Com. Valenciana     Extremadura         Galicia          Madrid          Murcia 
##               1               3               3               2               1 
##         Navarra      Pais Vasco        La Rioja 
##               2               2               2

###$ Calculo de Centroides

grupo.ward <- cutree(hclust.ward.D2, k = 2, h = NULL)
datos.grupo <- cbind(datos,grupo.ward)
datos.grupo$CC.AA. <- NULL
round(aggregate(datos.grupo,list(grupo.ward),mean),2)

##   Group.1 Automovil TV_color Video Microondas Lavavajillas Telefono grupo.ward
## 1       1     66.87    96.82 57.68      25.42        11.81    80.71          1
## 2       2     70.70    98.53 63.47      44.70        22.33    90.23          2

Metodos No Jerarquicos

Realización de un análisis de conglomerados no jerárquico mediante el método de k-medias para la obtención de una solución óptima en términos de homogeneidad intrasegmentos y heterogeneidad intersegmentos.

c1 <- c(66.87,96.82,56.01,25.43,11.81,80.71)
c2 <- c(70.70,98.53,63.47,44.70,22.43,90.23)
solucion <- kmeans(datos_numeros,rbind(c1,c2))
 
solucion

## K-means clustering with 2 clusters of sizes 12, 6
## 
## Cluster means:
##   Automovil TV_color    Video Microondas Lavavajillas Telefono
## 1  66.86667 96.81667 57.67500     25.425     11.80833 80.70833
## 2  70.70000 98.53333 63.46667     44.700     22.33333 90.23333
## 
## Clustering vector:
##          Espana       Andalucia          Aragon        Asturias        Baleares 
##               1               1               2               1               1 
##        Canarias       Cantabria Castilla y Leon Cast.-La Mancha        Cataluna 
##               1               1               1               1               2 
## Com. Valenciana     Extremadura         Galicia          Madrid          Murcia 
##               1               1               1               2               1 
##         Navarra      Pais Vasco        La Rioja 
##               2               2               2 
## 
## Within cluster sum of squares by cluster:
## [1] 2810.6467  854.4533
##  (between_SS / total_SS =  40.5 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

plot(hclust.ward.D2,labels=datos$CC.AA.)
grupos1 <- cutree(hclust.ward.D2, k = 2)
rect.hclust(hclust.ward.D2, k = 2, border = "red")

plot(hclust.average,labels=datos$CC.AA.)
grupos1 <- cutree(hclust.average, k = 2)
rect.hclust(hclust.average, k = 2, border = "red")

plot(hclust.complete,labels=datos$CC.AA.)
grupos1 <- cutree(hclust.complete, k = 2)
rect.hclust(hclust.complete, k = 2, border = "red")

Pruebas T para cada variable

solucion.cluster <- solucion$cluster
t1<-t.test(Automovil~solucion.cluster, data=datos_numeros)
t2<-t.test(TV_color~solucion.cluster, data=datos_numeros) 
t3<-t.test(Video~solucion.cluster, data=datos_numeros)       
t4<-t.test(Microondas~solucion.cluster, data=datos_numeros)      
t5<-t.test(Lavavajillas~solucion.cluster, data=datos_numeros)     
t6<-t.test(Telefono~solucion.cluster, data=datos_numeros)

variables <- c("Automovil", "TV_color", "Video", 
               "Microondas", "Lavavajillas", "Telefono")

tabla <- data.frame(
  Variable = variables,
  Grupo1 = NA,
  Grupo2 = NA,
  t = NA
)

for (i in 1:length(variables)) {
  var <- variables[i]
  
  # medias de cada grupo
  m1 <- mean(datos_numeros[solucion.cluster == 1, var], na.rm = TRUE)
  m2 <- mean(datos_numeros[solucion.cluster == 2, var], na.rm = TRUE)
  
  # prueba t
  tt <- t.test(datos_numeros[, var] ~ solucion.cluster)
  
  # guardar valores
  tabla$Grupo1[i] <- round(m1, 2)
  tabla$Grupo2[i] <- round(m2, 2)
  tabla$t[i] <- round(tt$statistic, 2)
}

tabla

##       Variable Grupo1 Grupo2     t
## 1    Automovil  66.87  70.70 -1.81
## 2     TV_color  96.82  98.53 -2.52
## 3        Video  57.68  63.47 -1.19
## 4   Microondas  25.42  44.70 -6.73
## 5 Lavavajillas  11.81  22.33 -4.48
## 6     Telefono  80.71  90.23 -3.51

library(cluster)
library(factoextra)
library(ggplot2) # Necesitas esta librería para el diseño moderno

# El objeto 'solucion' debe ser el resultado de la función kmeans()
# o el algoritmo de clustering que hayas usado.

fviz_cluster(
    # 1. Objeto de clustering y datos originales
    solucion, 
    data = datos_numeros,
    
    # 2. Diseño y Estética
    geom = c("point", "text"), # Muestra puntos y etiquetas
    ellipse.type = "convex", # Tipo de sombreado del clúster (más moderno)
    repel = TRUE, # Evita que las etiquetas se superpongan
    
    # 3. Colores y Títulos
    palette = "Set2", # Puedes cambiar la paleta de colores (ej: "Dark2", "cosmic", "jco")
    ggtheme = theme_minimal(), # Aplica un tema limpio de ggplot2
    main = "Clusters K-means",
    xlab = "Componente Principal 1", # Mejores nombres para las componentes
    ylab = "Componente Principal 2"
)