##Introducción #Este documento analiza el dataset de vinos, que contiene variables físico-químicas y una variable de calidad. El objetivo es explorar los datos y visualizar relaciones. Cargar datos

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ lubridate 1.9.4     ✔ tibble    3.3.0
## ✔ purrr     1.2.0     ✔ tidyr     1.3.2
## ✔ readr     2.1.6     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Adjuntando el paquete: 'DT'
## 
## 
## The following objects are masked from 'package:shiny':
## 
##     dataTableOutput, renderDataTable
## 
## 
## 
## Adjuntando el paquete: 'plotly'
## 
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## 
## The following object is masked from 'package:graphics':
## 
##     layout
## 
## 
## 
## Adjuntando el paquete: 'scales'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
## [[1]]
##  [1] "lubridate"  "forcats"    "stringr"    "purrr"      "readr"     
##  [6] "tidyr"      "tibble"     "tidyverse"  "factoextra" "FactoMineR"
## [11] "dplyr"      "ggplot2"    "shiny"      "stats"      "graphics"  
## [16] "grDevices"  "utils"      "datasets"   "methods"    "base"      
## 
## [[2]]
##  [1] "DT"         "lubridate"  "forcats"    "stringr"    "purrr"     
##  [6] "readr"      "tidyr"      "tibble"     "tidyverse"  "factoextra"
## [11] "FactoMineR" "dplyr"      "ggplot2"    "shiny"      "stats"     
## [16] "graphics"   "grDevices"  "utils"      "datasets"   "methods"   
## [21] "base"      
## 
## [[3]]
##  [1] "plotly"     "DT"         "lubridate"  "forcats"    "stringr"   
##  [6] "purrr"      "readr"      "tidyr"      "tibble"     "tidyverse" 
## [11] "factoextra" "FactoMineR" "dplyr"      "ggplot2"    "shiny"     
## [16] "stats"      "graphics"   "grDevices"  "utils"      "datasets"  
## [21] "methods"    "base"      
## 
## [[4]]
##  [1] "cluster"    "plotly"     "DT"         "lubridate"  "forcats"   
##  [6] "stringr"    "purrr"      "readr"      "tidyr"      "tibble"    
## [11] "tidyverse"  "factoextra" "FactoMineR" "dplyr"      "ggplot2"   
## [16] "shiny"      "stats"      "graphics"   "grDevices"  "utils"     
## [21] "datasets"   "methods"    "base"      
## 
## [[5]]
##  [1] "cluster"    "plotly"     "DT"         "lubridate"  "forcats"   
##  [6] "stringr"    "purrr"      "readr"      "tidyr"      "tibble"    
## [11] "tidyverse"  "factoextra" "FactoMineR" "dplyr"      "ggplot2"   
## [16] "shiny"      "stats"      "graphics"   "grDevices"  "utils"     
## [21] "datasets"   "methods"    "base"      
## 
## [[6]]
##  [1] "scales"     "cluster"    "plotly"     "DT"         "lubridate" 
##  [6] "forcats"    "stringr"    "purrr"      "readr"      "tidyr"     
## [11] "tibble"     "tidyverse"  "factoextra" "FactoMineR" "dplyr"     
## [16] "ggplot2"    "shiny"      "stats"      "graphics"   "grDevices" 
## [21] "utils"      "datasets"   "methods"    "base"      
## 
## [[7]]
##  [1] "scales"     "cluster"    "plotly"     "DT"         "lubridate" 
##  [6] "forcats"    "stringr"    "purrr"      "readr"      "tidyr"     
## [11] "tibble"     "tidyverse"  "factoextra" "FactoMineR" "dplyr"     
## [16] "ggplot2"    "shiny"      "stats"      "graphics"   "grDevices" 
## [21] "utils"      "datasets"   "methods"    "base"

Introducción En este informe se analiza el dataset de vinos, que contiene variables físico-químicas y una variable de calidad. El objetivo es explorar las características principales y evaluar cómo se relacionan con la calidad del vino.

Metodología 1. Carga de datos: se utiliza el archivo . 2. Análisis exploratorio: se generan resúmenes estadísticos y visualizaciones. 3. Relaciones entre variables: se estudian correlaciones y gráficos comparativos. 4. Conclusiones: se sintetizan los hallazgos principales.

vinos <- read.csv("C:/Users/ASUS/OneDrive/Desktop/UNIVERSIDAD CATOLICA/VISUALIZACION/CLASE 6/wine.csv", 
                  header = TRUE, sep = ",")
head(vinos)
##   Cultivar Alcohol Malic.acid  Ash Alcalinity.of.ash Magnesium Total.phenols
## 1        1   14.23       1.71 2.43              15.6       127          2.80
## 2        1   13.20       1.78 2.14              11.2       100          2.65
## 3        1   13.16       2.36 2.67              18.6       101          2.80
## 4        1   14.37       1.95 2.50              16.8       113          3.85
## 5        1   13.24       2.59 2.87              21.0       118          2.80
## 6        1   14.20       1.76 2.45              15.2       112          3.27
##   Flavanoids Nonflavanoid.phenols Proanthocyanins Color.intensity  Hue
## 1       3.06                 0.28            2.29            5.64 1.04
## 2       2.76                 0.26            1.28            4.38 1.05
## 3       3.24                 0.30            2.81            5.68 1.03
## 4       3.49                 0.24            2.18            7.80 0.86
## 5       2.69                 0.39            1.82            4.32 1.04
## 6       3.39                 0.34            1.97            6.75 1.05
##   OD280.OD315.of.diluted.wines Proline
## 1                         3.92    1065
## 2                         3.40    1050
## 3                         3.17    1185
## 4                         3.45    1480
## 5                         2.93     735
## 6                         2.85    1450
# Selección de variables numéricas
vinos_num <- vinos %>% select(where(is.numeric))
# Escalamiento
vinos_scaled <- scale(vinos_num)
pca <- prcomp(vinos_scaled)
summary(pca)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.3529 1.5802 1.2025 0.96328 0.93675 0.82023 0.74418
## Proportion of Variance 0.3954 0.1784 0.1033 0.06628 0.06268 0.04806 0.03956
## Cumulative Proportion  0.3954 0.5738 0.6771 0.74336 0.80604 0.85409 0.89365
##                           PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.5916 0.54272 0.51216 0.47524 0.41085 0.35995 0.24044
## Proportion of Variance 0.0250 0.02104 0.01874 0.01613 0.01206 0.00925 0.00413
## Cumulative Proportion  0.9186 0.93969 0.95843 0.97456 0.98662 0.99587 1.00000
# Scree plot
fviz_eig(pca, addlabels = TRUE, ylim = c(0,50))
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.

fviz_pca_biplot(pca, repel = TRUE, col.var = "red", col.ind = "blue")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
##   Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

distancias <- dist(vinos_scaled)
mds <- cmdscale(distancias, k = 2)
mds_df <- as.data.frame(mds)
colnames(mds_df) <- c("Dim1","Dim2")
ggplot(mds_df, aes(Dim1, Dim2)) +
  geom_point(color = "purple") +
  labs(title = "Mapa MDS", x = "Dimensión 1", y = "Dimensión 2")

ggplot(mds_df, aes(Dim1, Dim2)) +
  geom_point(color = "purple") +
  labs(title = "Mapa MDS", x = "Dimensión 1", y = "Dimensión 2")

# Jerárquico
hc <- hclust(distancias, method = "ward.D2")
plot(hc, main = "Dendrograma jerárquico")

# K-means
set.seed(123)
km <- kmeans(vinos_scaled, centers = 3)
fviz_cluster(km, data = vinos_scaled)

ui <- fluidPage(
  titlePanel("Exploración interactiva de Vinos"),
  sidebarLayout(
    sidebarPanel(
      sliderInput("clusters", "Número de clusters:", min = 2, max = 8, value = 3)
    ),
    mainPanel(
      plotOutput("pcaPlot"),
      plotOutput("mdsPlot")
    )
  )
)

server <- function(input, output) {
  output$pcaPlot <- renderPlot({
    km <- kmeans(vinos_scaled, centers = input$clusters)
    fviz_cluster(km, data = vinos_scaled)
  })
  
  output$mdsPlot <- renderPlot({
    mds <- cmdscale(distancias, k = 2)
    df <- as.data.frame(mds)
    colnames(df) <- c("Dim1","Dim2")
    ggplot(df, aes(Dim1, Dim2)) +
      geom_point(color = "darkgreen") +
      labs(title = paste("MDS con", input$clusters, "clusters"))
  })
}
shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents

Conclusiones La exploración del dataset de vinos permite observar lo siguiente: • La mayoría de los vinos se concentran en calificaciones medias (5–6). • El alcohol muestra una relación positiva con la calidad: vinos con más alcohol tienden a ser mejor valorados. • La acidez volátil y la densidad se relacionan negativamente con la calidad. • El dataset está desbalanceado hacia vinos de calidad intermedia, lo que es importante considerar en análisis predictivos.