In this study I am going to analyze some wine data created by Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV). They published two datasets related to red and white variants of the Portuguese “Vinho Verde” wine. I combined them into one dataset and added column “type” for wine type-red or white.
The dataset includes 12 variables and 6497 observations, which contains 1599 red wines and 4898 white wines. 11 input variables are the chemical properties of the wine. 1 output variable is the score given by at least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).
This study includes two parts. In the first part, I am going to compare red wines and white wines to see what is difference between red wines and white wines in terms of chemical properties. In the second part, I will explore which chemical properties influence the quality of wines and build models to predict quality score based on chemical properties.
The first chemical property I look into is fixed acidity, which is most acids involved with wine or fixed or nonvolatile (do not evaporate readily). I draw two plots to show distribution of the data and summary statistics.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
From the plots and data, we can tell that the fixed acidity in red wines is fairly higher than that in white wines. The fixed acidity in white wines is more stable, while some red wines include very high fixed acidity.
Volatile acidity is the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
Like fixed acidity, volatile acidity of white wines is more stable while the level of red wines is higher. Both red wines and white wines have some outliers which have high level volatile acidity.
Citric acid can add ‘freshness’ and flavor to wines, which is found in small quantities.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
The citric acid of red wines is evenly distributed white that of white wines is normally distributed. White wines also have some outliers.
Residual sugar is the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
The amount of average sugar in white wines is much higher than that in red wines. This is probably the most common difference that normal people can tell because white wines usually taste sweeter. The sweetness of red wines is almost fixed except several outliers, and the sweetness of white wines ranges from 0 to 65 g/dm3. Most white wines are sweeter than red wines.
Chlorides is the amount of salt in the wine.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
In general red wines contain more salt than white wines. The distribution of chlorides in both wines is very similar, and both wines have some outliers.
Free sulfur dioxide is the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion. It prevents microbial growth and the oxidation of wine.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
The distribution of free sulfur dioxide of red wines is posited skewed, while the distribution of white wines is normal. In general the white wines include more free sulfur dioxide than the red wines.
Total sulfur dioxide is amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
Since free sulfur dioxide is part of total sulfur dioxide, the distribution and statistics of total sulfur dioxide is very similar to free sulfur dioxide.
The density of water is close to that of water depending on the percent alcohol and sugar content.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0040
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
The density of red wines and white wines is very close. The density of red wines is a little bit higher than that of white wines.
pH describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
The pH of red wines and white wines are both normal distribution, white the overall pH of red wines is a little bit higher than the white wines.
Sulphates is a wine additive which can contribute to sulfur dioxide gas (S02) levels, which acts as an antimicrobial and antioxidant.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
The sulphates in red wines and white wines are very similar, although red wines include a little bit more sulphates than white wines. Red wines also have some outliers.
Alcohol is the percent alcohol content of the wine.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
The alcohol level of red wines and white wines are very close except red wines have some outliers.
## $red
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
##
## $white
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
Most wines, red and white, scored 5 or 6. Only several wines have low or high score, such as 3 or 8.
Finally, one more intertesing plot to wrap up part I. In the graph below, it is clear that red wines locate in the top-right corner since red wines have higher volatile acidity and chlorides.
Thanks for reading.