Introduction

This study is going to analyze wine data created by Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009. Two datasets related to red and white variants of the Portuguese “Vinho Verde” were published. These have been merged into one dataset with a new column “type” to indicated whether wine type is red or white.

There are 13 variables and 6497 observations in the dataset, these are:- 1599 red wines and 4898 white wines. 11 input variables which are the chemical properties of the wine. 1 output variable is the score given by at least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).

This study is divided into two parts. Firstly, a comparion between red wines and white wines to determine their differences in terms of chemical properties. Secondly, we want to see which chemical properties influence the quality of wines and build models to predict quality score based on chemical properties.

Exploratory Data Analysis

Fixed Acidity

chemical property 1 - fixed acidity, most acids involved with wine are fixed or nonvolatile (do not evaporate readily). I draw two plots to show distribution of the data and summary statistics.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.800   6.300   6.800   6.855   7.300  14.200

The plots shows that the fixed acidity in red wines is fairly higher than that in white wines, and the fixed acidity in white wines is more stable, but some red wines include very high fixed acidity.

Volatile Acidity

Volatile acidity is the amount of acetic acid in wine which gives an unpleasant vingegar taste when the levels are too high.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0800  0.2100  0.2600  0.2782  0.3200  1.1000

Volatile acidity like fixed acidity in white wines is more stable than that of red wines which has higher levels. Both red wines and white wines have some outliers which have high level volatile acidity.

Citric Acid

Citric acid can add ‘freshness’ and flavor to wines, they are found in small quantities.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2700  0.3200  0.3342  0.3900  1.6600

The citric acid of red wines is evenly distributed while that of white wines is normally distributed with low outliers in White wines.

Residual Sugar

The amount of sugar remaining after fermentation stops is Residual sugar, most wines has less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.600   1.700   5.200   6.391   9.900  65.800

White wine tastes sweeter than red wine because the amount of average sugar in white wine is much higher than that of red wines. The sweetness of red wines is almost fixed except several outliers, and the sweetness of white wines ranges from 0 to 65 g/dm3. Most white wines are sweeter than red wines.

Chlorides

Wines also has some salt in them the amount of salt in wines is referred to as Chlorides.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600

Red wines contain more salt than white wines although the distribution of chlorides in both wines is very similar, eventhough some outliers exist in both wines.

Free Sulfurdioxide

Free Sulfurdioxide is present ot prevent microbial growth and the oxidation of wine, it is in the free form of So2 in equilibrium between molecular So2 (as a dissolved gas) and bisulfite ion.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   23.00   34.00   35.31   46.00  289.00

The distribution of free sulfurdioxide of red wines is skewed, while the distribution of white wines is normal. Showin that white wines include more free sulfurdioxide than the red wines.

Total Sulfurdioxide

Total sulfurdioxide is amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the smell and taste of wine

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     9.0   108.0   134.0   138.4   167.0   440.0

Since free sulfurdioxide is part of total sulfurdioxide, the distribution and statistics of total sulfurdioxide is very similar to free sulfurdioxide.

Density

The density of water in wines can be closer to the density of water based on the percent alcohol and sugar content in the wine.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0040 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9871  0.9917  0.9937  0.9940  0.9961  1.0390

The density of red wines and white wines is very close. The density of red wines is a little bit higher than that of white wines.

pH

pH describes the acidity of wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.720   3.090   3.180   3.188   3.280   3.820

The pH of red wines and white wines are both normal distribution, white the overall pH of red wines is a little bit higher than the white wines.

Sulphates

Sulphates is a wine additive thar contributes to sulfurdioxide gas (S02) levels, it is used as an antimicrobial and antioxidant.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2200  0.4100  0.4700  0.4898  0.5500  1.0800

The sulphates in red wines and white wines are very similar, although red wines include a little bit more sulphates than white wines with some outliers in Red wines.

Alcohol

Alcohol is the percent alcohol content of the wine.

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00    9.50   10.40   10.51   11.40   14.20

The alcohol level of red wines and white wines are very close except red wines have some outliers.

Quality

## $red
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.636   6.000   8.000 
## 
## $white
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.878   6.000   9.000

Most wines, red and white, scored 5 or 6. Only several wines have low or high score, such as 3 or 8.

Finally,the graph below, shows that red wines have higher volatile acidity and chlorides.