##Introducción #Este documento analiza el dataset de vinos, que contiene variables físico-químicas y una variable de calidad. El objetivo es explorar los datos y visualizar relaciones. Cargar datos
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ lubridate 1.9.4 ✔ tibble 3.3.0
## ✔ purrr 1.2.0 ✔ tidyr 1.3.2
## ✔ readr 2.1.6
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Adjuntando el paquete: 'DT'
##
##
## The following objects are masked from 'package:shiny':
##
## dataTableOutput, renderDataTable
##
##
##
## Adjuntando el paquete: 'plotly'
##
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
##
## The following object is masked from 'package:stats':
##
## filter
##
##
## The following object is masked from 'package:graphics':
##
## layout
##
##
##
## Adjuntando el paquete: 'scales'
##
##
## The following object is masked from 'package:purrr':
##
## discard
##
##
## The following object is masked from 'package:readr':
##
## col_factor
## [[1]]
## [1] "lubridate" "forcats" "stringr" "purrr" "readr"
## [6] "tidyr" "tibble" "tidyverse" "factoextra" "FactoMineR"
## [11] "dplyr" "ggplot2" "shiny" "stats" "graphics"
## [16] "grDevices" "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "DT" "lubridate" "forcats" "stringr" "purrr"
## [6] "readr" "tidyr" "tibble" "tidyverse" "factoextra"
## [11] "FactoMineR" "dplyr" "ggplot2" "shiny" "stats"
## [16] "graphics" "grDevices" "utils" "datasets" "methods"
## [21] "base"
##
## [[3]]
## [1] "plotly" "DT" "lubridate" "forcats" "stringr"
## [6] "purrr" "readr" "tidyr" "tibble" "tidyverse"
## [11] "factoextra" "FactoMineR" "dplyr" "ggplot2" "shiny"
## [16] "stats" "graphics" "grDevices" "utils" "datasets"
## [21] "methods" "base"
##
## [[4]]
## [1] "cluster" "plotly" "DT" "lubridate" "forcats"
## [6] "stringr" "purrr" "readr" "tidyr" "tibble"
## [11] "tidyverse" "factoextra" "FactoMineR" "dplyr" "ggplot2"
## [16] "shiny" "stats" "graphics" "grDevices" "utils"
## [21] "datasets" "methods" "base"
##
## [[5]]
## [1] "cluster" "plotly" "DT" "lubridate" "forcats"
## [6] "stringr" "purrr" "readr" "tidyr" "tibble"
## [11] "tidyverse" "factoextra" "FactoMineR" "dplyr" "ggplot2"
## [16] "shiny" "stats" "graphics" "grDevices" "utils"
## [21] "datasets" "methods" "base"
##
## [[6]]
## [1] "scales" "cluster" "plotly" "DT" "lubridate"
## [6] "forcats" "stringr" "purrr" "readr" "tidyr"
## [11] "tibble" "tidyverse" "factoextra" "FactoMineR" "dplyr"
## [16] "ggplot2" "shiny" "stats" "graphics" "grDevices"
## [21] "utils" "datasets" "methods" "base"
##
## [[7]]
## [1] "scales" "cluster" "plotly" "DT" "lubridate"
## [6] "forcats" "stringr" "purrr" "readr" "tidyr"
## [11] "tibble" "tidyverse" "factoextra" "FactoMineR" "dplyr"
## [16] "ggplot2" "shiny" "stats" "graphics" "grDevices"
## [21] "utils" "datasets" "methods" "base"
Introducción En este informe se analiza el dataset de vinos, que contiene variables físico-químicas y una variable de calidad. El objetivo es explorar las características principales y evaluar cómo se relacionan con la calidad del vino.
Metodología 1. Carga de datos: se utiliza el archivo . 2. Análisis exploratorio: se generan resúmenes estadísticos y visualizaciones. 3. Relaciones entre variables: se estudian correlaciones y gráficos comparativos. 4. Conclusiones: se sintetizan los hallazgos principales.
vinos <- read.csv("C:/Users/ASUS/OneDrive/Desktop/UNIVERSIDAD CATOLICA/VISUALIZACION/CLASE 6/wine.csv",
header = TRUE, sep = ",")
head(vinos)
## Cultivar Alcohol Malic.acid Ash Alcalinity.of.ash Magnesium Total.phenols
## 1 1 14.23 1.71 2.43 15.6 127 2.80
## 2 1 13.20 1.78 2.14 11.2 100 2.65
## 3 1 13.16 2.36 2.67 18.6 101 2.80
## 4 1 14.37 1.95 2.50 16.8 113 3.85
## 5 1 13.24 2.59 2.87 21.0 118 2.80
## 6 1 14.20 1.76 2.45 15.2 112 3.27
## Flavanoids Nonflavanoid.phenols Proanthocyanins Color.intensity Hue
## 1 3.06 0.28 2.29 5.64 1.04
## 2 2.76 0.26 1.28 4.38 1.05
## 3 3.24 0.30 2.81 5.68 1.03
## 4 3.49 0.24 2.18 7.80 0.86
## 5 2.69 0.39 1.82 4.32 1.04
## 6 3.39 0.34 1.97 6.75 1.05
## OD280.OD315.of.diluted.wines Proline
## 1 3.92 1065
## 2 3.40 1050
## 3 3.17 1185
## 4 3.45 1480
## 5 2.93 735
## 6 2.85 1450
# Selección de variables numéricas
vinos_num <- vinos %>% select(where(is.numeric))
# Escalamiento
vinos_scaled <- scale(vinos_num)
pca <- prcomp(vinos_scaled)
summary(pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.3529 1.5802 1.2025 0.96328 0.93675 0.82023 0.74418
## Proportion of Variance 0.3954 0.1784 0.1033 0.06628 0.06268 0.04806 0.03956
## Cumulative Proportion 0.3954 0.5738 0.6771 0.74336 0.80604 0.85409 0.89365
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.5916 0.54272 0.51216 0.47524 0.41085 0.35995 0.24044
## Proportion of Variance 0.0250 0.02104 0.01874 0.01613 0.01206 0.00925 0.00413
## Cumulative Proportion 0.9186 0.93969 0.95843 0.97456 0.98662 0.99587 1.00000
# Scree plot
fviz_eig(pca, addlabels = TRUE, ylim = c(0,50))
## Warning in geom_bar(stat = "identity", fill = barfill, color = barcolor, :
## Ignoring empty aesthetic: `width`.
fviz_pca_biplot(pca, repel = TRUE, col.var = "red", col.ind = "blue")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the ggpubr package.
## Please report the issue at <https://github.com/kassambara/ggpubr/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
distancias <- dist(vinos_scaled)
mds <- cmdscale(distancias, k = 2)
mds_df <- as.data.frame(mds)
colnames(mds_df) <- c("Dim1","Dim2")
ggplot(mds_df, aes(Dim1, Dim2)) +
geom_point(color = "purple") +
labs(title = "Mapa MDS", x = "Dimensión 1", y = "Dimensión 2")
ggplot(mds_df, aes(Dim1, Dim2)) +
geom_point(color = "purple") +
labs(title = "Mapa MDS", x = "Dimensión 1", y = "Dimensión 2")
# Jerárquico
hc <- hclust(distancias, method = "ward.D2")
plot(hc, main = "Dendrograma jerárquico")
# K-means
set.seed(123)
km <- kmeans(vinos_scaled, centers = 3)
fviz_cluster(km, data = vinos_scaled)
ui <- fluidPage(
titlePanel("Exploración interactiva de Vinos"),
sidebarLayout(
sidebarPanel(
sliderInput("clusters", "Número de clusters:", min = 2, max = 8, value = 3)
),
mainPanel(
plotOutput("pcaPlot"),
plotOutput("mdsPlot")
)
)
)
server <- function(input, output) {
output$pcaPlot <- renderPlot({
km <- kmeans(vinos_scaled, centers = input$clusters)
fviz_cluster(km, data = vinos_scaled)
})
output$mdsPlot <- renderPlot({
mds <- cmdscale(distancias, k = 2)
df <- as.data.frame(mds)
colnames(df) <- c("Dim1","Dim2")
ggplot(df, aes(Dim1, Dim2)) +
geom_point(color = "darkgreen") +
labs(title = paste("MDS con", input$clusters, "clusters"))
})
}
shinyApp(ui, server)
Conclusiones La exploración del dataset de vinos permite observar lo siguiente: • La mayoría de los vinos se concentran en calificaciones medias (5–6). • El alcohol muestra una relación positiva con la calidad: vinos con más alcohol tienden a ser mejor valorados. • La acidez volátil y la densidad se relacionan negativamente con la calidad. • El dataset está desbalanceado hacia vinos de calidad intermedia, lo que es importante considerar en análisis predictivos.