Realizar un analisis de clusterización y de reducción de dimensiones y comprender los resultados del mismo.
Lectura Previa :
Recursos principales multimedia del módulo 4 Dataset wine.csv puede se bajado de jaredlander o provisto por el instructor
Enunciado
Considere el archivo Wine .
El archivo contiene información las características de 178 tipos de vinos .
Las columnas contienen cada característica Tomando como base esta información el estudiante deberá presentar 2 estudios:
#---- SE REQUIERE LAS LIBRERÍAS ====
install.packages("tidyverse")
## Error in install.packages : Updating loaded packages
install.packages("dplyr")
## Error in install.packages : Updating loaded packages
install.packages("plyr")
## Error in install.packages : Updating loaded packages
library(plyr)
library(dplyr)
library(readxl)
library(stringr)
install.packages("writexl")
## Error in install.packages : Updating loaded packages
library(writexl)
library(lubridate)
library(data.table)
library(tidyverse)
library(readxl)
1.- Leer el archivo CSV a un data.frame
wine <- read_csv("wine.csv")
## Rows: 178 Columns: 14
## ── Column specification ──────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): Cultivar, Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phen...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(wine)
## [1] "Cultivar" "Alcohol"
## [3] "Malic acid" "Ash"
## [5] "Alcalinity of ash" "Magnesium"
## [7] "Total phenols" "Flavanoids"
## [9] "Nonflavanoid phenols" "Proanthocyanins"
## [11] "Color intensity" "Hue"
## [13] "OD280/OD315 of diluted wines" "Proline"
se realiza laselección de la siguiente manera:
numeric_vars <- wine[, sapply(wine, is.numeric)]
correlation_matrix <- cor(numeric_vars)
install.packages("corrplot")
## Error in install.packages : Updating loaded packages
library(corrplot)
# CALCULAMOS LA MATRIZ DE CORRELACION
cor_mat <- round(cor(wine),2)
head(cor_mat)
## Cultivar Alcohol Malic acid Ash Alcalinity of ash Magnesium
## Cultivar 1.00 -0.33 0.44 -0.05 0.52 -0.21
## Alcohol -0.33 1.00 0.09 0.21 -0.31 0.27
## Malic acid 0.44 0.09 1.00 0.16 0.29 -0.05
## Ash -0.05 0.21 0.16 1.00 0.44 0.29
## Alcalinity of ash 0.52 -0.31 0.29 0.44 1.00 -0.08
## Magnesium -0.21 0.27 -0.05 0.29 -0.08 1.00
## Total phenols Flavanoids Nonflavanoid phenols Proanthocyanins
## Cultivar -0.72 -0.85 0.49 -0.50
## Alcohol 0.29 0.24 -0.16 0.14
## Malic acid -0.34 -0.41 0.29 -0.22
## Ash 0.13 0.12 0.19 0.01
## Alcalinity of ash -0.32 -0.35 0.36 -0.20
## Magnesium 0.21 0.20 -0.26 0.24
## Color intensity Hue OD280/OD315 of diluted wines Proline
## Cultivar 0.27 -0.62 -0.79 -0.63
## Alcohol 0.55 -0.07 0.07 0.64
## Malic acid 0.25 -0.56 -0.37 -0.19
## Ash 0.26 -0.07 0.00 0.22
## Alcalinity of ash 0.02 -0.27 -0.28 -0.44
## Magnesium 0.20 0.06 0.07 0.39
#El formato de la matriz no aceptable para ggplot, vamos a transponer con la funci?n melt
cor_mat_melted <- melt(cor_mat)
## Warning: The melt generic in data.table has been passed a matrix and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is
## no longer actively developed, and this redirection is now deprecated. To continue using
## melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can
## prepend the namespace, i.e. reshape2::melt(cor_mat). In the next version, this warning
## will become an error.
get_lower_tri<-function(x){
x[upper.tri(x)] <- NA
return(x)
}
get_upper_tri <- function(x){
x[lower.tri(x)]<- NA
return(x)
}
#Nos da igual trabajar con cualquiera de los dos tri?ngulos, pero hay
#que seleccionar uno. na.rm eliminar? los NAs insertados en la funci?n
cor_mat_melted <- melt(get_upper_tri(cor_mat), na.rm = TRUE)
## Warning: The melt generic in data.table has been passed a matrix and will attempt to
## redirect to the relevant reshape2 method; please note that reshape2 is superseded and is
## no longer actively developed, and this redirection is now deprecated. To continue using
## melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can
## prepend the namespace, i.e. reshape2::melt(get_upper_tri(cor_mat)). In the next version,
## this warning will become an error.
ggplot(data = cor_mat_melted, aes(Var2, Var1, fill = value)) +
geom_tile(color = "white") +
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Valor Correlación") +
geom_text(aes(Var2, Var1, label = sprintf("%.2f", value)),
color = "black", size = 2) + # Ajusta el tamaño aquí
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal") +
guides(fill = guide_colorbar(barwidth = 2, barheight = 3,
title.position = "top", title.hjust = 0.5)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1)) +
coord_fixed()
## Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2 3.5.0.
## ℹ Please use the `legend.position.inside` argument of `theme()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
En función a la matriz de correlación y los valores de la misma se opta por tomar las siguientes.
variables:
Los campos de interés para el análisis de clusterización de los vinos basados en MDS son los siguientes:
• Cultivar • Alcohol • Total phenols • Flavanoids • Proanthocyanins • Color intensity • Hue • OD280/OD315 of diluted wines • Proline
Mientras que para la reducción de dimensiones mediante PCA de vinos serían las siguientes:
• Cultivar • Alcohol • Total phenols • Flavanoids • Proanthocyanins • Color intensity • Hue • OD280/OD315 of diluted wines • Proline
numeric_vars <- numeric_vars[, c("Cultivar", "Alcohol", "Total phenols",
"Flavanoids", "Proanthocyanins",
"Color intensity", "Hue",
"OD280/OD315 of diluted wines", "Proline")]
Crear la matriz de distancias euclidianas
dist_matrix <- dist(numeric_vars)
dist_matrix
## 1 2 3 4 5 6 7
## 2 15.134451
## 3 120.008383 135.016068
## 4 415.007516 430.018473 295.012928
## 5 330.006152 315.000865 450.003551 745.010444
## 6 385.003650 400.010221 265.006185 30.031950 715.005291
## 7 225.001714 240.005784 105.015062 190.024601 555.002482 160.013030
## 8 230.004168 245.002626 110.019918 185.030006 560.001808 155.016723 5.068876
## 9 20.045014 5.403110 140.013859 435.010095 310.005516 405.003938 245.002111
## 10 20.074989 5.840479 140.014172 435.001851 310.015201 405.001215 245.009810
## 11 445.000824 460.004775 325.001770 30.089613 775.002335 60.011897 220.003540
## 12 215.006800 230.004298 95.021470 200.031532 545.001611 170.015619 10.049960
## 13 255.003225 270.004323 135.006341 160.024508 585.001672 130.009265 30.018459
## 14 85.014972 100.036694 35.044843 330.011476 415.006748 300.005006 140.011947
## 15 482.005591 497.015421 362.007308 67.010011 812.008556 97.008677 257.016121
## 16 245.010140 260.017556 125.019702 170.009275 575.008064 140.004815 20.146531
## 17 215.004747 230.012452 95.013460 200.011295 545.004553 170.001871 10.111998
## 18 65.025958 80.041950 55.026371 350.005155 395.007874 320.000639 160.013164
## 19 615.009596 630.017570 495.012093 200.004937 945.011600 230.009001 390.019495
## 20 220.003108 205.002296 340.002918 635.007595 110.005049 605.003186 445.001168
## 21 285.000287 270.006321 405.002043 700.004050 45.037773 670.001574 510.000958
## 22 295.006475 280.001379 415.003732 710.011463 35.010467 680.006489 520.002613
## 23 30.068186 15.038092 150.019840 445.021292 300.002884 415.013304 255.005834
## 24 50.061787 35.008469 170.017810 465.022604 280.002020 435.014149 275.008027
## 25 220.012991 205.002908 340.010300 635.017375 110.007322 605.010844 445.004454
## 26 235.013771 220.002557 355.007842 650.016845 95.003704 620.010013 460.005203
## 27 130.010146 145.001252 10.138244 285.020793 460.000601 255.010317 95.010115
## 28 220.015555 235.002096 100.032828 195.053677 550.000725 165.034053 5.389805
## 29 150.006851 135.003214 270.005918 565.011107 180.002219 535.005455 375.001829
## 30 30.028122 15.049448 150.011386 445.014168 300.002277 415.007579 255.000977
## 31 220.004146 235.008705 100.003937 195.016143 550.002621 165.004919 5.231329
## 32 450.003589 465.008005 330.003785 35.041902 780.004518 65.004753 225.009895
## 33 75.033758 60.011409 195.012662 490.019755 255.001236 460.010852 300.005126
## 34 170.006404 185.004458 50.029189 245.017444 500.001732 215.006948 55.013239
## 35 30.078963 45.006915 90.026259 385.022954 360.000557 355.012327 195.006676
## 36 145.004338 130.003804 265.002847 560.008750 185.002847 530.003897 370.001526
## 37 185.011717 170.001341 305.006217 600.012351 145.001230 570.005965 410.003316
## 38 40.081927 55.009189 80.032220 375.025236 370.000754 345.013866 185.011493
## 39 45.086855 30.018083 165.020529 460.024631 285.001501 430.014304 270.009924
## 40 305.000903 290.004211 425.002614 720.005732 25.043590 690.002477 530.000567
## 41 270.002160 255.009523 390.000973 685.002930 60.036172 655.000951 495.002680
## 42 30.071719 15.010666 150.014235 445.018677 300.000484 415.009860 255.004957
## 43 30.018701 45.031054 90.012735 385.008396 360.004219 355.003493 195.005218
## 44 385.005352 370.000509 505.003507 800.009868 55.000997 770.005052 610.002152
## 45 180.006150 165.003734 300.001971 595.008702 150.003122 565.004207 405.002957
## 46 15.059535 30.030818 105.019672 400.011455 345.003341 370.005115 210.001880
## 47 1.012472 15.100354 120.011506 415.010729 330.003736 385.005065 225.002604
## 48 80.005668 65.037004 200.003313 495.003739 250.008926 465.000882 305.003573
## 49 5.170155 10.285470 125.006487 420.005645 325.007194 390.001259 230.004479
## 50 195.029924 210.053265 75.077421 220.005769 525.021236 190.013011 30.248309
## 51 85.031097 100.055693 35.034490 330.006545 415.011889 300.004557 140.026900
## 52 200.001579 215.007059 80.006217 215.017452 530.002503 185.007299 25.016956
## 53 125.017590 140.036980 5.442518 290.001867 455.011176 260.001747 100.035355
## 54 310.003420 325.006957 190.006167 105.021026 640.003343 75.006114 85.013731
## 55 5.131423 10.131491 125.007800 420.007726 325.004311 390.002529 230.002654
## 56 55.015873 70.037374 65.006721 360.006415 385.005547 330.002114 170.007332
## 57 95.006110 80.037386 215.005738 510.002696 235.012039 480.000543 320.003294
## 8 9 10 11 12 13 14
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9 250.003885
## 10 250.011332 2.365248
## 11 215.006398 465.001360 465.002937
## 12 15.028792 235.002944 235.014382 230.005875
## 13 25.024892 275.002638 275.006139 190.002602 40.010376
## 14 145.018734 105.006536 105.028863 360.001504 130.017332 170.009393
## 15 252.022126 502.007147 502.002253 37.050420 267.020518 227.014551 397.005903
## 16 15.200408 265.011639 265.001524 200.009332 30.103335 10.155255 160.023065
## 17 15.106303 235.002865 235.004456 230.001670 1.592294 40.011766 130.007613
## 18 165.013987 85.019453 85.008473 380.002121 150.014113 190.004358 20.090324
## 19 385.021849 635.010905 635.002828 170.028229 400.021553 360.016219 530.011521
## 20 450.000762 200.004539 200.011775 665.001067 435.001373 475.000648 305.006100
## 21 515.001647 265.003067 265.004906 730.000296 500.002693 540.001084 370.003120
## 22 525.002060 275.009142 275.016643 740.002961 510.002341 550.002244 380.009998
## 23 260.004220 10.230738 10.590883 475.005735 245.006916 285.007863 115.032277
## 24 280.005000 30.114560 30.213697 495.007153 265.006631 305.007253 135.039090
## 25 450.003221 200.014614 200.036084 665.005241 435.004284 475.005564 305.014943
## 26 465.004096 215.014047 215.033465 680.004816 450.004086 490.004776 320.013235
## 27 100.004740 150.008937 150.021383 315.004027 85.008778 125.004598 45.055890
## 28 10.129052 240.010613 240.027338 225.014962 5.190944 35.050468 135.033815
## 29 380.001317 130.006977 130.028920 595.001828 365.002117 405.002041 235.007823
## 30 260.001332 10.094568 10.352618 475.002694 245.002225 285.002785 115.019770
## 31 10.163154 240.003676 240.006993 225.000804 5.255768 35.011043 135.005640
## 32 220.011736 470.004789 470.000704 5.187726 235.010785 195.005115 365.006458
## 33 305.004467 55.031115 55.113585 520.004578 290.003140 330.004807 160.018019
## 34 60.006420 190.004509 190.010793 275.003026 45.011503 85.002321 85.024171
## 35 200.004194 50.033407 50.105796 415.005485 185.002846 225.004910 55.059429
## 36 375.001273 125.008971 125.019013 590.001178 360.002077 400.000901 230.008394
## 37 415.001799 165.009945 165.025469 630.002972 400.001425 440.001645 270.011484
## 38 190.007562 60.040859 60.096074 405.007075 175.005387 215.006358 45.090629
## 39 275.006692 25.120328 25.290510 490.007432 260.005641 300.007276 130.036091
## 40 535.001111 285.001639 285.008253 750.000578 520.001803 560.001113 390.002700
## 41 500.003567 250.006248 250.003198 715.000428 485.004317 525.001383 355.004007
## 42 260.002843 10.167433 10.478421 475.004475 245.002562 285.003620 115.028040
## 43 200.004843 50.021164 50.034838 415.001265 185.008839 225.003620 55.025949
## 44 615.001404 365.004896 365.012773 830.002395 600.001317 640.001529 470.006678
## 45 410.002578 160.011021 160.017251 625.001603 395.003403 435.001546 265.008746
## 46 215.000557 35.018635 35.067618 430.002533 200.002473 240.002068 70.031417
## 47 230.004101 20.023109 20.145836 445.001198 215.005845 255.003776 85.009484
## 48 310.004886 60.018480 60.012253 525.000370 295.006174 335.001897 165.006748
## 49 235.007405 15.056862 15.070776 450.000679 220.006032 260.001701 90.011086
## 50 35.240589 215.034593 215.007578 250.020226 20.427381 60.097643 110.062090
## 51 145.032557 105.038702 105.010875 360.005036 130.033715 170.013339 2.551490
## 52 30.029426 220.003806 220.007297 245.001026 15.054767 55.004920 115.010778
## 53 105.036652 145.021970 145.004390 320.004899 90.050765 130.018687 40.067698
## 54 80.015607 330.003801 330.002140 135.004655 95.013725 55.006086 225.008715
## 55 235.002543 15.064352 15.075865 450.001251 220.003312 260.000502 90.021345
## 56 175.010404 75.020293 75.011996 390.001193 160.010721 200.002610 30.054597
## 57 325.004754 75.014389 75.006673 540.000754 310.006038 350.002154 180.007690
## 15 16 17 18 19 20 21
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16 237.007613
## 17 267.006209 30.034503
## 18 417.003621 180.002684 150.001812
## 19 133.010539 370.004976 400.009003 550.004577
## 20 702.006351 465.005645 435.002621 285.005512 835.008884
## 21 767.003319 530.003898 500.001542 350.003509 900.006031 65.007088
## 22 777.009438 540.008811 510.006089 360.010377 910.012568 75.009769 10.177387
## 23 512.017396 275.024819 245.016559 95.054125 645.021127 190.005680 255.007909
## 24 532.018825 295.022020 265.017304 115.045921 665.021421 170.007667 235.012231
## 25 702.014699 465.016609 435.011085 285.020985 835.018381 1.720029 65.042998
## 26 717.013690 480.015221 450.009973 300.017977 850.017461 15.096622 50.059725
## 27 352.016149 115.028491 85.020282 65.032259 485.017996 350.000355 415.002335
## 28 262.036942 25.241713 5.703183 155.029865 395.034807 440.003047 505.006177
## 29 632.009046 395.010471 365.005158 215.012328 765.012505 70.004105 135.006068
## 30 512.011174 275.014764 245.007942 95.031558 645.015236 190.002523 255.003458
## 31 262.008254 25.071727 5.080659 155.004206 395.012719 440.001705 505.001194
## 32 32.038144 205.001261 235.002273 385.000461 165.013286 670.002706 735.001717
## 33 557.014866 320.019484 290.010991 140.030667 690.018990 145.007559 210.011260
## 34 312.013335 75.024617 45.018367 105.010555 445.014501 390.000719 455.001574
## 35 452.017505 215.023389 185.014683 35.101319 585.020253 250.002947 315.006756
## 36 627.007035 390.006994 360.003616 210.008212 760.009990 75.000577 140.003152
## 37 667.010234 430.008819 400.005302 250.010237 800.012476 35.013646 100.017616
## 38 442.019315 205.024858 175.017984 25.147642 575.021182 260.004403 325.008779
## 39 527.019535 290.023453 260.016435 110.045477 660.021960 175.008881 240.014593
## 40 787.004693 550.005624 520.002005 370.004923 920.007886 85.003745 20.010442
## 41 752.002185 515.002662 485.001424 335.002118 885.004552 50.018000 15.023428
## 42 512.014871 275.017333 245.010614 95.034826 645.017604 190.002828 255.007378
## 43 452.007243 215.011213 185.005636 35.036124 585.009935 250.001596 315.000829
## 44 867.008333 630.007289 600.004230 450.007044 1000.011052 165.003092 100.017640
## 45 662.007107 425.007271 395.004427 245.008105 795.010115 40.007324 105.007617
## 46 467.010129 230.011005 200.005483 50.034314 600.012145 235.001465 300.002331
## 47 482.008087 245.015212 215.005984 65.033232 615.012671 220.002735 285.001423
## 48 562.002810 325.004012 295.001423 145.003724 695.005480 140.005682 205.001134
## 49 487.003085 250.004697 220.000611 70.006963 620.006356 215.005439 280.002553
## 50 287.005454 50.035111 20.194135 130.022064 420.000678 415.018183 480.011564
## 51 397.002959 160.008193 130.013613 20.064147 530.005136 305.010830 370.006497
## 52 282.010009 45.044722 15.045783 135.008870 415.014257 420.000983 485.000582
## 53 357.003017 120.009307 90.015915 60.014133 490.003545 345.008388 410.004120
## 54 172.012420 65.008691 95.003260 245.001223 305.012099 530.001716 595.001208
## 55 487.006246 250.004870 220.002230 70.009957 620.008205 215.001461 280.001598
## 56 427.003981 190.005866 160.003401 10.067120 560.007404 275.003986 340.002001
## 57 577.002364 340.002868 310.001088 160.003446 710.004656 125.009462 190.002144
## 22 23 24 25 26 27 28
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23 265.003170
## 24 245.001273 20.030569
## 25 75.010267 190.000619 170.002140
## 26 60.009150 205.002970 185.001836 15.022420
## 27 425.001364 160.005830 180.004743 350.003294 365.002720
## 28 515.001405 250.004695 270.001849 440.001849 455.001115 90.009544
## 29 145.005517 120.004266 100.010573 70.011475 85.010501 280.001016 370.002914
## 30 265.002447 1.211322 20.056590 190.005045 205.006054 160.003910 250.004465
## 31 515.003860 250.012092 270.012324 440.008172 455.006455 90.012455 2.413379
## 32 745.004971 480.011571 500.010973 670.009586 685.008556 320.007661 230.022541
## 33 220.003498 45.016046 25.033184 145.004014 160.001969 205.004249 295.001433
## 34 465.002812 200.009582 220.008594 390.005873 405.005311 40.008927 50.030240
## 35 325.001639 60.014078 80.007094 250.002837 265.001795 100.004777 190.000729
## 36 150.003628 115.009024 95.012496 75.018874 90.014859 275.000668 365.003862
## 37 110.005350 155.007944 135.005470 35.034228 50.015855 315.000655 405.000874
## 38 335.002088 70.022489 90.007792 260.004882 275.002273 90.007680 180.000829
## 39 250.003592 15.077954 5.106897 175.004571 190.001669 175.005599 265.000661
## 40 10.138974 275.004635 255.008626 85.021962 70.029550 435.001500 525.004170
## 41 25.090464 240.014049 220.017035 50.083404 35.109063 400.003536 490.008112
## 42 265.001676 1.212477 20.024056 190.003542 205.002278 160.001712 250.000891
## 43 325.005997 60.031695 80.033908 250.010680 265.010397 100.006805 190.015506
## 44 90.003522 355.002370 335.001403 165.004691 150.002836 515.000445 605.000506
## 45 115.004592 150.009102 130.009180 40.041245 55.022432 310.000894 400.003856
## 46 310.004890 45.034811 65.030481 235.008590 250.009368 115.004368 205.007933
## 47 295.006095 30.045317 50.050836 220.008960 235.009094 130.006974 220.011151
## 48 215.011659 50.065135 30.130068 140.029512 155.025519 210.006437 300.013104
## 49 290.009291 25.158965 45.096696 215.021749 230.018383 135.013269 225.016492
## 50 490.022522 225.061177 245.057308 415.037131 430.034804 65.137736 25.546366
## 51 380.012349 115.064488 135.052665 305.027224 320.022694 45.091785 135.053164
## 52 495.002546 230.008839 250.009827 420.006190 435.005742 70.013046 20.123069
## 53 420.013458 155.043597 175.042348 345.023235 360.021305 5.668721 95.078830
## 54 605.004038 340.011116 360.010347 530.008346 545.007519 180.007279 90.036715
## 55 290.005170 25.097508 45.055668 215.013957 230.012635 135.004883 225.010256
## 56 350.006232 85.045240 105.036361 275.016251 290.013517 75.021389 165.022115
## 57 200.015563 65.060738 45.102600 125.038908 140.034626 225.008125 315.013980
## 29 30 31 32 33 34 35
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30 120.002878
## 31 370.003249 250.006020
## 32 600.005194 480.006588 230.003716
## 33 75.007709 45.017689 295.007301 525.009369
## 34 320.001880 200.004012 50.015019 280.005166 245.006378
## 35 180.003125 60.011296 190.010470 420.009713 105.001861 140.007079
## 36 5.062973 115.003936 365.001994 595.003091 70.015700 315.001291 175.004577
## 37 35.016206 155.005566 405.003645 635.004776 110.005318 355.001508 215.000787
## 38 190.005758 70.018982 180.012412 410.010301 115.004592 130.009376 10.018688
## 39 105.011202 15.108369 265.011676 495.011510 30.013155 215.008791 75.003506
## 40 155.002383 275.001569 525.001425 755.002762 230.006307 475.001408 335.004131
## 41 120.013842 240.007521 490.000761 720.000770 195.016991 440.002420 300.009779
## 42 120.003950 1.151521 250.007437 480.007905 45.006965 200.004407 60.000879
## 43 180.004081 60.021082 190.004039 420.003701 105.022309 140.004824 2.017622
## 44 235.002092 355.001849 605.002718 835.004252 310.001286 555.001569 415.000364
## 45 30.019655 150.005712 400.001986 630.003228 105.012504 350.002037 210.004951
## 46 165.003591 45.012215 205.006313 435.004810 90.018259 155.001644 15.072644
## 47 150.002602 30.021689 220.003916 450.005740 75.019613 170.005449 30.053577
## 48 70.021602 50.034094 300.001231 530.001048 5.608146 250.003606 110.025778
## 49 145.013188 25.069523 225.001276 455.001151 70.043671 175.005668 35.076789
## 50 345.028845 225.043151 25.211696 255.008468 270.049407 25.268302 165.072471
## 51 235.020605 115.040919 135.011544 365.001809 160.040652 85.038441 55.106941
## 52 350.002491 230.003274 20.020897 250.004678 275.006479 30.022958 170.009165
## 53 275.014640 155.029627 95.018104 325.002431 200.034418 45.055373 95.064676
## 54 460.003833 340.005133 90.006195 140.002331 385.008446 140.003369 280.008914
## 55 145.007363 25.040455 225.002657 455.001670 70.031762 175.001637 35.044579
## 56 205.009465 85.020296 165.002137 395.001121 130.024770 115.008904 25.109912
## 57 55.035420 65.028491 315.002143 545.000959 20.192301 265.003692 125.026634
## 36 37 38 39 40 41 42
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37 40.013980
## 38 185.006424 225.000757
## 39 100.016008 140.003257 85.002296
## 40 160.002237 120.010600 345.006187 260.009800
## 41 125.006450 85.026078 310.010677 225.019110 35.023923
## 42 115.005269 155.000765 70.003420 15.021225 275.004367 240.010818
## 43 175.002763 215.007009 10.257968 75.041147 335.001049 300.001870 60.027666
## 44 240.002168 200.000686 425.000592 340.001224 80.014173 115.019801 355.000161
## 45 35.004968 5.130965 220.005370 135.012025 125.005636 90.009203 150.005127
## 46 160.003452 200.004252 25.066015 60.036605 320.001443 285.005107 45.020950
## 47 145.004477 185.008350 40.060115 45.060115 305.000308 270.004155 30.048627
## 48 65.012456 105.020626 120.027895 35.122312 225.002842 190.000456 50.050211
## 49 140.008536 180.012095 45.068601 40.106703 300.003387 265.001899 25.103948
## 50 340.022336 380.026796 155.079300 240.061478 500.015020 465.008739 225.051025
## 51 230.013265 270.017769 45.130942 130.058346 390.009021 355.003049 115.048573
## 52 345.000971 385.003450 160.013190 245.011479 505.000890 470.001086 230.006341
## 53 270.010987 310.015408 85.076162 170.046432 430.006231 395.002390 155.036251
## 54 455.002178 495.003463 270.009976 355.010970 615.001856 580.000876 340.006791
## 55 140.002826 180.005820 45.042513 40.069555 300.002156 265.002108 25.054030
## 56 200.004954 240.008757 15.197131 100.041980 360.003087 325.000686 85.030113
## 57 50.025003 90.029255 135.028937 50.101418 210.004127 175.001959 65.047141
## 43 44 45 46 47 48 49
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44 415.003747
## 45 210.003262 205.002507
## 46 15.046215 400.002549 195.005486
## 47 30.016047 385.003608 180.005387 15.049900
## 48 110.003400 305.007541 100.010107 95.011786 80.011079
## 49 35.035061 380.006401 175.008673 20.066636 5.252999 75.005420
## 50 165.038163 580.019193 375.021493 180.041845 195.042563 275.014569 200.019829
## 51 55.055708 470.010828 265.011023 70.061280 85.047960 165.008802 90.014082
## 52 170.004441 585.002360 380.001871 185.004783 200.003799 280.001733 205.002315
## 53 95.016804 510.010284 305.009845 110.027661 125.023172 205.004214 130.012592
## 54 280.003365 695.003094 490.002563 295.003003 310.004947 390.001086 315.001150
## 55 35.017020 380.003416 175.004270 20.021846 5.215784 75.005770 1.038075
## 56 25.046371 440.005057 235.004389 40.037654 55.026997 135.002670 60.003743
## 57 125.006037 290.009894 85.018878 110.009810 95.012104 15.011872 90.003937
## 50 51 52 53 54 55 56
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51 110.020478
## 52 6.043252 115.017086
## 53 70.032555 40.042454 75.033002
## 54 115.032842 225.007001 110.006427 185.006490
## 55 200.025229 90.023542 205.001662 130.014863 315.000787
## 56 140.028255 30.028418 145.003477 70.020467 255.001321 60.008814
## 57 290.011899 180.009198 295.002516 220.003754 405.000736 90.005899 150.002716
## 57 58 59 60 61 62 63
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 64 65 66 67 68 69 70
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 71 72 73 74 75 76 77
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 78 79 80 81 82 83 84
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 85 86 87 88 89 90 91
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 92 93 94 95 96 97 98
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 99 100 101 102 103 104 105
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 106 107 108 109 110 111 112
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 113 114 115 116 117 118 119
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 120 121 122 123 124 125 126
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 127 128 129 130 131 132 133
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 134 135 136 137 138 139 140
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 141 142 143 144 145 146 147
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 148 149 150 151 152 153 154
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 155 156 157 158 159 160 161
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 162 163 164 165 166 167 168
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 169 170 171 172 173 174 175
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 176 177
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## 11
## 12
## 13
## 14
## 15
## 16
## 17
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## [ reached getOption("max.print") -- omitted 121 rows ]
hc <- hclust(dist_matrix)
k <- 3
clusters <- cutree(hc, k)
numeric_vars$Cluster <- as.factor(clusters)
# Crear gráfico de dispersión de los clusters
ggplot(numeric_vars, aes(x = Alcohol, y = Proline, color = Cluster)) +
geom_point(size = 3) +
scale_color_brewer(palette = "Set1") +
labs(title = "Clusters de Vinos", x = "Alcohol", y = "Proline") +
theme_minimal()
hc <- hclust(dist_matrix)
hc
##
## Call:
## hclust(d = dist_matrix)
##
## Cluster method : complete
## Distance : euclidean
## Number of objects: 178
Dada la observación realizada se puede intuir que existen tres clústeres formados de las características de vino. La segmentación es una herramienta útil para generar la agrupación de datos y en este caso nos sirvio para identificar el cultivar correspondiente a cada vino. Sin embargo, es importante escalar los datos antes de realizar un ánalisis para evitar que la magnitud de alguna influya más en el proceso. Dada la temporalidad de la base de datos es importante considerar manejar la información en cortes de tiempo ya sean trimestrales o semestrales con el objetivo de sintetizar la data y mejorar los resultados.
plot(hc, main = "Diagrama de Clusters", xlab = "", sub = "", cex = 0.9)
numeric_vars <- wine[, sapply(wine, is.numeric)]
pca_result <- prcomp(numeric_vars, scale. = TRUE)
summary(pca_result)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## Standard deviation 2.3529 1.5802 1.2025 0.96328 0.93675 0.82023 0.74418 0.5916
## Proportion of Variance 0.3954 0.1784 0.1033 0.06628 0.06268 0.04806 0.03956 0.0250
## Cumulative Proportion 0.3954 0.5738 0.6771 0.74336 0.80604 0.85409 0.89365 0.9186
## PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.54272 0.51216 0.47524 0.41085 0.35995 0.24044
## Proportion of Variance 0.02104 0.01874 0.01613 0.01206 0.00925 0.00413
## Cumulative Proportion 0.93969 0.95843 0.97456 0.98662 0.99587 1.00000
loadings_matrix <- pca_result$rotation
scores <- pca_result$x
plot(pca_result, main = "Varianza Explicada por los Componentes Principales")
pca1 <- prcomp(wine, scale = TRUE, center = TRUE, retx = T)
pca1.var <- pca1$sdev ^ 2
pca1.var
## [1] 5.53594804 2.49707625 1.44607422 0.92791783 0.87750252 0.67277834 0.55379896
## [8] 0.35003417 0.29454194 0.26230610 0.22584842 0.16879672 0.12956418 0.05781232
var_explicada <- pca1$sdev^2/sum(pca1$sdev^2)
var_explicada
## [1] 0.395424860 0.178362589 0.103291016 0.066279845 0.062678751 0.048055596 0.039557068
## [8] 0.025002441 0.021038710 0.018736150 0.016132030 0.012056908 0.009254584 0.004129451
#Visualizar la varianza explicada por cada componente ayuda a comprender mejor
#los datos. Nos ayuda a identificar visualmente el impacto de cada uno de
#componentes principales y si son suficientes para explicar la variaci?n de los datos
#Visualizamos gr?ficamente la importancia (varianza) de cada componente
df3 <- data.frame("varianza_explicada"=var_explicada, PC = seq(1:14))
ggplot(data=df3 , aes(x=PC , y=varianza_explicada) ) +
geom_col(fill="blue")
propve <- pca1.var / sum(pca1.var)
propve
## [1] 0.395424860 0.178362589 0.103291016 0.066279845 0.062678751 0.048055596 0.039557068
## [8] 0.025002441 0.021038710 0.018736150 0.016132030 0.012056908 0.009254584 0.004129451
#Ploteamos la varianza que es explicada por cada uno de los componentes
plot(propve, xlab = "Componentes Principales",
ylab = "Proporcion de la varianza explicada",
ylim = c(0, 1), type = "b",
main = "Scree Plot")
Acorde a este grafico, se determinó que el número de clúster óptimo para
este caso es 3, ya que después este número representa un punto de
inflexión en la curva o el codo es decir, los clusters o números
siguientes no minimizan la varianza dentro de los clusters de forma
suficientemente significativa como para justificar agrupaciones
adicionales en el conjunto de datos. Por último, se establece una
comparación de promedios entre segmentos con la función aggregate que es
útil para resumir datos en función de uno o más factores.
biplot(pca_result, main = "Biplot de PCA")
Para los scores (observaciones), nos fijamos en los posibles agrupamientos. Puntuaciones próximas representan observaciones de similares características. Puntuaciones con valores de las variables próximas a la media se sitúan más cerca del centro del biplot (0, 0). El resto representan variabilidades normales o extremas (outliers). Por otro lado, la relación de las observaciones con las variables se puede estudiar proyectando las observaciones sobre la dirección de los vectores. Entonces se puede concluir que las variables que aportan a la componente principal PC1 son: Alcohol, Color Intensity Proline, mientras que al PC2 son Hue, Total phenols, Flavanoids y Proanthocyanins etc.