Descargando datos del sitio https://archive.ics.uci.edu/ml/datasets.html. Para este ejemplo se descargaron los datos correspondientes “wine”.
Los nommbres de las columnas se encuentran en un documento ubicado en el directorio https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.names que es una descripción de la información contenida en la base. (Abrir el documento y observar en la sección 4 donde se muestra información relevante y ademas una lista de los atributos o variables)
rm(list=ls())
library(data.table)
urlData <- "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"
data <- fread(urlData)
columnas <- c('clase','Alcohol','Acido Malic','Ceniza','Alcalinidad de la Ceniza',
'Magnesio','Fenoloes Totales','Flavanoides','Fenoles Noflavanoides',
'Proscaracyanins','Intensidad Color','Hue','OD280/OD315 of diluted wines','Proline')
data <- data.frame(data)
colnames(data)<- columnas
head(data)
## clase Alcohol Acido Malic Ceniza Alcalinidad de la Ceniza Magnesio
## 1 1 14.23 1.71 2.43 15.6 127
## 2 1 13.20 1.78 2.14 11.2 100
## 3 1 13.16 2.36 2.67 18.6 101
## 4 1 14.37 1.95 2.50 16.8 113
## 5 1 13.24 2.59 2.87 21.0 118
## 6 1 14.20 1.76 2.45 15.2 112
## Fenoloes Totales Flavanoides Fenoles Noflavanoides Proscaracyanins
## 1 2.80 3.06 0.28 2.29
## 2 2.65 2.76 0.26 1.28
## 3 2.80 3.24 0.30 2.81
## 4 3.85 3.49 0.24 2.18
## 5 2.80 2.69 0.39 1.82
## 6 3.27 3.39 0.34 1.97
## Intensidad Color Hue OD280/OD315 of diluted wines Proline
## 1 5.64 1.04 3.92 1065
## 2 4.38 1.05 3.40 1050
## 3 5.68 1.03 3.17 1185
## 4 7.80 0.86 3.45 1480
## 5 4.32 1.04 2.93 735
## 6 6.75 1.05 2.85 1450
La base de datos está clasificada en tres clases correspondientes a tres regiones vitivinicolas del sur de Italia; Identificamos los datos correspondientes a cada región.
clase1 <- data[which(data$clase == "1"),]
clase2 <- data[which(data$clase == "2"), ]
clase3 <- data[which(data$clase == 3 ), ]
Se calculan las estadísticas generales por clase
summary(clase1)
## clase Alcohol Acido Malic Ceniza
## Min. :1 Min. :12.85 Min. :1.350 Min. :2.040
## 1st Qu.:1 1st Qu.:13.40 1st Qu.:1.665 1st Qu.:2.295
## Median :1 Median :13.75 Median :1.770 Median :2.440
## Mean :1 Mean :13.74 Mean :2.011 Mean :2.456
## 3rd Qu.:1 3rd Qu.:14.10 3rd Qu.:1.935 3rd Qu.:2.615
## Max. :1 Max. :14.83 Max. :4.040 Max. :3.220
## Alcalinidad de la Ceniza Magnesio Fenoloes Totales Flavanoides
## Min. :11.20 Min. : 89.0 Min. :2.20 Min. :2.190
## 1st Qu.:16.00 1st Qu.: 98.0 1st Qu.:2.60 1st Qu.:2.680
## Median :16.80 Median :104.0 Median :2.80 Median :2.980
## Mean :17.04 Mean :106.3 Mean :2.84 Mean :2.982
## 3rd Qu.:18.70 3rd Qu.:114.0 3rd Qu.:3.00 3rd Qu.:3.245
## Max. :25.00 Max. :132.0 Max. :3.88 Max. :3.930
## Fenoles Noflavanoides Proscaracyanins Intensidad Color Hue
## Min. :0.170 Min. :1.250 Min. :3.520 Min. :0.820
## 1st Qu.:0.255 1st Qu.:1.640 1st Qu.:4.550 1st Qu.:0.995
## Median :0.290 Median :1.870 Median :5.400 Median :1.070
## Mean :0.290 Mean :1.899 Mean :5.528 Mean :1.062
## 3rd Qu.:0.320 3rd Qu.:2.090 3rd Qu.:6.225 3rd Qu.:1.130
## Max. :0.500 Max. :2.960 Max. :8.900 Max. :1.280
## OD280/OD315 of diluted wines Proline
## Min. :2.510 Min. : 680.0
## 1st Qu.:2.870 1st Qu.: 987.5
## Median :3.170 Median :1095.0
## Mean :3.158 Mean :1115.7
## 3rd Qu.:3.420 3rd Qu.:1280.0
## Max. :4.000 Max. :1680.0
summary(clase2)
## clase Alcohol Acido Malic Ceniza
## Min. :2 Min. :11.03 Min. :0.740 Min. :1.360
## 1st Qu.:2 1st Qu.:11.91 1st Qu.:1.270 1st Qu.:2.000
## Median :2 Median :12.29 Median :1.610 Median :2.240
## Mean :2 Mean :12.28 Mean :1.933 Mean :2.245
## 3rd Qu.:2 3rd Qu.:12.52 3rd Qu.:2.145 3rd Qu.:2.420
## Max. :2 Max. :13.86 Max. :5.800 Max. :3.230
## Alcalinidad de la Ceniza Magnesio Fenoloes Totales
## Min. :10.60 Min. : 70.00 Min. :1.100
## 1st Qu.:18.00 1st Qu.: 85.50 1st Qu.:1.895
## Median :20.00 Median : 88.00 Median :2.200
## Mean :20.24 Mean : 94.55 Mean :2.259
## 3rd Qu.:22.00 3rd Qu.: 99.50 3rd Qu.:2.560
## Max. :30.00 Max. :162.00 Max. :3.520
## Flavanoides Fenoles Noflavanoides Proscaracyanins Intensidad Color
## Min. :0.570 Min. :0.1300 Min. :0.410 Min. :1.280
## 1st Qu.:1.605 1st Qu.:0.2700 1st Qu.:1.350 1st Qu.:2.535
## Median :2.030 Median :0.3700 Median :1.610 Median :2.900
## Mean :2.081 Mean :0.3637 Mean :1.630 Mean :3.087
## 3rd Qu.:2.475 3rd Qu.:0.4300 3rd Qu.:1.885 3rd Qu.:3.400
## Max. :5.080 Max. :0.6600 Max. :3.580 Max. :6.000
## Hue OD280/OD315 of diluted wines Proline
## Min. :0.690 Min. :1.590 Min. :278.0
## 1st Qu.:0.925 1st Qu.:2.440 1st Qu.:406.5
## Median :1.040 Median :2.830 Median :495.0
## Mean :1.056 Mean :2.785 Mean :519.5
## 3rd Qu.:1.205 3rd Qu.:3.160 3rd Qu.:625.0
## Max. :1.710 Max. :3.690 Max. :985.0
summary(clase3)
## clase Alcohol Acido Malic Ceniza
## Min. :3 Min. :12.20 Min. :1.240 Min. :2.100
## 1st Qu.:3 1st Qu.:12.80 1st Qu.:2.587 1st Qu.:2.300
## Median :3 Median :13.16 Median :3.265 Median :2.380
## Mean :3 Mean :13.15 Mean :3.334 Mean :2.437
## 3rd Qu.:3 3rd Qu.:13.51 3rd Qu.:3.958 3rd Qu.:2.603
## Max. :3 Max. :14.34 Max. :5.650 Max. :2.860
## Alcalinidad de la Ceniza Magnesio Fenoloes Totales
## Min. :17.50 Min. : 80.00 Min. :0.980
## 1st Qu.:20.00 1st Qu.: 89.75 1st Qu.:1.407
## Median :21.00 Median : 97.00 Median :1.635
## Mean :21.42 Mean : 99.31 Mean :1.679
## 3rd Qu.:23.00 3rd Qu.:106.00 3rd Qu.:1.808
## Max. :27.00 Max. :123.00 Max. :2.800
## Flavanoides Fenoles Noflavanoides Proscaracyanins Intensidad Color
## Min. :0.3400 Min. :0.1700 Min. :0.550 Min. : 3.850
## 1st Qu.:0.5800 1st Qu.:0.3975 1st Qu.:0.855 1st Qu.: 5.438
## Median :0.6850 Median :0.4700 Median :1.105 Median : 7.550
## Mean :0.7815 Mean :0.4475 Mean :1.154 Mean : 7.396
## 3rd Qu.:0.9200 3rd Qu.:0.5300 3rd Qu.:1.350 3rd Qu.: 9.225
## Max. :1.5700 Max. :0.6300 Max. :2.700 Max. :13.000
## Hue OD280/OD315 of diluted wines Proline
## Min. :0.4800 Min. :1.270 Min. :415.0
## 1st Qu.:0.5875 1st Qu.:1.510 1st Qu.:545.0
## Median :0.6650 Median :1.660 Median :627.5
## Mean :0.6827 Mean :1.684 Mean :629.9
## 3rd Qu.:0.7525 3rd Qu.:1.820 3rd Qu.:695.0
## Max. :0.9600 Max. :2.470 Max. :880.0
Histogramas por variable incluida en cada clase:
hist.default(clase1$Alcohol, main = "Histograma Alcohol", xlab = "Cantidad de Acohol", ylab = "Frecuencia")