Detecting Wine Quality
Sources :
Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
winequality.red<-read.csv('C:/Users/edba.liba/Downloads/winequality-red.csv')
winequality.white<-read.csv('C:/Users/edba.liba/Downloads/winequality-white.csv')
Attribute information:
Input variables (based on physicochemical tests):
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
Output variable (based on sensory data):
12 - quality (score between 0 and 10)
summary(winequality.red)
## fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality
## 6.7;0.46;0.24;1.7;0.077;18;34;0.9948;3.39;0.6;10.6;6 : 4
## 7.2;0.36;0.46;2.1;0.074;24;44;0.99534;3.4;0.85;11;7 : 4
## 7.2;0.695;0.13;2;0.076;12;20;0.99546;3.29;0.54;10.1;5 : 4
## 7.5;0.51;0.02;1.7;0.084;13;31;0.99538;3.36;0.54;10.5;6: 4
## 11.5;0.18;0.51;4;0.104;4;23;0.9996;3.28;0.97;10.1;6 : 3
## 6.4;0.64;0.21;1.8;0.081;14;31;0.99689;3.59;0.66;9.8;5 : 3
## (Other) :1577
summary(winequality.white)
## fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality
## 7.3;0.19;0.27;13.9;0.057;45;155;0.99807;2.94;0.41;8.8;8: 8
## 7;0.15;0.28;14.7;0.051;29;149;0.99792;2.96;0.39;9;7 : 8
## 6.8;0.18;0.3;12.8;0.062;19;171;0.99808;3;0.52;9;7 : 7
## 7.4;0.16;0.3;13.7;0.056;33;168;0.99825;2.9;0.44;8.7;7 : 7
## 7.4;0.16;0.27;15.5;0.05;25;135;0.9984;2.9;0.43;8.7;7 : 6
## 7.4;0.19;0.3;12.8;0.053;48.5;229;0.9986;3.14;0.49;9.1;7: 6
## (Other) :4856
colnames(winequality.red)
## [1] "fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality"
colnames(winequality.white)
## [1] "fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality"
class(winequality.red)
## [1] "data.frame"
class(winequality.white)
## [1] "data.frame"
head(winequality.red)
## fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality
## 1 7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
## 2 7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
## 3 7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
## 4 11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58;9.8;6
## 5 7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
## 6 7.4;0.66;0;1.8;0.075;13;40;0.9978;3.51;0.56;9.4;5
head(winequality.white)
## fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality
## 1 7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
## 2 6.3;0.3;0.34;1.6;0.049;14;132;0.994;3.3;0.49;9.5;6
## 3 8.1;0.28;0.4;6.9;0.05;30;97;0.9951;3.26;0.44;10.1;6
## 4 7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4;9.9;6
## 5 7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4;9.9;6
## 6 8.1;0.28;0.4;6.9;0.05;30;97;0.9951;3.26;0.44;10.1;6
Observation on Datasets
1. There are two datasets were available for analysis(i.e.) red and white wine samples.
2. These two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine.
3. The inputs include objective tests (e.g. PH values)
4. The output is based on sensory data (median of at least 3 evaluations made by wine experts).
5. The wine quality is graded between 0 (very bad) and 10 (very excellent).
6. These datasets can be viewed as classification or regression tasks.
7. The classes are ordered and not balanced
e.g. there are much more normal wines than excellent or poor ones.
8. Outlier detection algorithms could be used to detect the few excellent or poor wines.
9. Several of the attributes may be correlated, thus it makes sense to apply some sort of feature selection.
str(winequality.red)
## 'data.frame': 1599 obs. of 1 variable:
## $ fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality: Factor w/ 1359 levels "10.1;0.27;0.54;2.3;0.065;7;26;0.99531;3.17;0.53;12.5;6",..: 668 843 839 125 668 665 874 622 824 686 ...
str(winequality.white)
## 'data.frame': 4898 obs. of 1 variable:
## $ fixed.acidity.volatile.acidity.citric.acid.residual.sugar.chlorides.free.sulfur.dioxide.total.sulfur.dioxide.density.pH.sulphates.alcohol.quality: Factor w/ 3961 levels "10.2;0.44;0.88;6.2;0.049;20;124;0.9968;2.99;0.51;9.9;4",..: 3497 919 3596 2594 2594 3596 765 3497 919 3580 ...
Number of Instances and observations : red wine - 1599; white wine - 4898.
Number of Attributes: 11 + output attribute
Missing Attribute Values: None
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.