1. INTRODUCTION:

Wine (from Latin vinum) is an alcoholic beverage made from grapes, generally Vitis vinifera, fermented without the addition of sugars, acids, enzymes, water, or other nutrients.

Wine has been produced for thousands of years. The earliest known traces of wine are from Georgia (c.???6000 BC), Iran (c.???5000 BC), and Sicily (c.???4000 BC) although there is evidence of a similar alcoholic beverage being consumed earlier in China (c.???7000 BC). The earliest known winery is the 6,100-year-old Areni-1 winery in Armenia. Wine reached the Balkans by 4500 BC and was consumed and celebrated in ancient Greece, Thrace and Rome. Throughout history, wine has been consumed for its intoxicating effects.

Wine has long played an important role in religion. Red wine was associated with blood by the ancient Egyptians and was used by both the Greek cult of Dionysus and the Romans in their Bacchanalia; Judaism also incorporates it in the Kiddush and Christianity in the Eucharist.

Yeast consumes the sugar in the grapes and converts it to ethanol and carbon dioxide. Different varieties of grapes and strains of yeasts produce different styles of wine. These variations result from the complex interactions between the biochemical development of the grape, the reactions involved in fermentation, the terroir, and the production process. Many countries enact legal appellations intended to define styles and qualities of wine. These typically restrict the geographical origin and permitted varieties of grapes, as well as other aspects of wine production. Wines not made from grapes include rice wine and fruit wines such as plum, cherry, pomegranate and elderberry.

This paper addresses the following issues concerning the “quality of wine” with respect to various chemical contents of and acids. The first issue concerns the correlation between different acids and quality of wine. In this paper, we investigate what could be the best associate parameter on which best quality wine depends?

  1. Overview of the Study

Our field study concerns quality of wine , produced all over the world. Wine is from one of the alcohol family and it is considered as a part of rich culture as well. There are various health benifits of wine (http://www.wideopeneats.com/10-health-benefits-get-drinking-daily-glass-wine/). There are a large number of occupations and professions that are part of the wine industry, ranging from the individuals who grow the grapes, prepare the wine, bottle it, sell it, assess it, market it and finally make recommendations to clients and serve the wine. In this study, we figure out important correlated chemical components of wine. Important acids which are associated with quality of wine. We will find out the right components which are needed in right ratio to make quality wine.

3.Content:

3.1 Content 1: The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. For more details,consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

3.2 Content 2: For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

  1. About Data:

4.1 Wine Quality Data Set Download: Data Folder, Data Set Description

4.2 Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.

4.3 Data Set Characteristics:

Multivariate

Number of Instances:

1599

Area:

Business

Attribute Characteristics:

Real

Number of Attributes:

12

Date Donated

2009-10-07

Associated Tasks:

Classification, Regression

Missing Values?

N/A

Number of Web Hits:

580109

4.4 Source:

Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal @2009

4.5 Data Set Information:

The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. For more details, consult: [Web Link] or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones).
Also, we are not sure if all input variables are relevant.

  1. Model: In order to test Hypothesis, we proposed the following model:

5.1 Positively correlated features: Quality=??0+??1sulphates+??2citric.acid+??3alcohol+??4acidity+??

5.2 Negatively correlated features: Quality=??0+??1total_sulphur_dioxide+??2chlorides+??3Volatile_acidity+??4density+??

wine.df <- read.csv(paste("winequality-red.csv", sep=""))
View(wine.df)

Summary stats:

summary(wine.df)
##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00      
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00      
##  Median :0.07900   Median :14.00       Median : 38.00      
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47      
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00      
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00      
##     density             pH          sulphates         alcohol     
##  Min.   :0.9901   Min.   :2.740   Min.   :0.3300   Min.   : 8.40  
##  1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50  
##  Median :0.9968   Median :3.310   Median :0.6200   Median :10.20  
##  Mean   :0.9967   Mean   :3.311   Mean   :0.6581   Mean   :10.42  
##  3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10  
##  Max.   :1.0037   Max.   :4.010   Max.   :2.0000   Max.   :14.90  
##     quality     
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :6.000  
##  Mean   :5.636  
##  3rd Qu.:6.000  
##  Max.   :8.000

Plotting each variable:

hist(wine.df$fixed.acidity,main = "Distribution of Fixed acidity",xlab = "Fixed acidity",ylab = "Frequency",col = "grey")

hist(wine.df$volatile.acidity,main = "Distribution of Volatile acidity",xlab = "Volatile acidity",ylab = "Frequency",col = "grey")

hist(wine.df$citric.acid,main = "Distribution of Citric acid",xlab = "Citric acid",ylab = "Frequency",col = "grey")

hist(wine.df$residual.sugar,main = "Distribution of Residual Sugar",xlab = "Residual Sugar",ylab = "Frequency",col = "grey")

hist(wine.df$chlorides,main = "Distribution of Chlorides",xlab = "Chlorides",ylab = "Frequency",col = "grey")

hist(wine.df$free.sulfur.dioxide,main = "Distribution of Free Sulphur Dioxide",xlab = "Free Sulphur Dioxide",ylab = "Frequency",col = "grey")

hist(wine.df$total.sulfur.dioxide,main  = "Distribution of Total sulphur Dioxide",xlab = "Total Sulphur Dioxide",ylab = "Frequency",col = "grey")

 hist(wine.df$density,main = "Distribution of Density",xlab = "Density",ylab = "Frequency",col = "grey")

hist(wine.df$pH,main  = "Distribution of ph level",xlab = "ph",ylab = "Frequency",col = "grey")

hist(wine.df$sulphates,main  = "Distribution of Sulphates",xlab = "Sulphates",ylab = "Frequency",col = "grey")

hist(wine.df$alcohol,main  = "Distribution of Alcohol",xlab = "Alcohol",ylab = "Frequency",col = "grey")

hist(wine.df$quality,main  = "Distribution of Quality",xlab = "Quality",ylab = "Frequency",col = "grey")

Boxplot of each variable:

boxplot(wine.df,main="Distribution of each
Variable",xlab="Variable",ylab="Distribution")

Generating Correlation matrix:

cor(wine.df)
##                      fixed.acidity volatile.acidity citric.acid
## fixed.acidity           1.00000000     -0.256130895  0.67170343
## volatile.acidity       -0.25613089      1.000000000 -0.55249568
## citric.acid             0.67170343     -0.552495685  1.00000000
## residual.sugar          0.11477672      0.001917882  0.14357716
## chlorides               0.09370519      0.061297772  0.20382291
## free.sulfur.dioxide    -0.15379419     -0.010503827 -0.06097813
## total.sulfur.dioxide   -0.11318144      0.076470005  0.03553302
## density                 0.66804729      0.022026232  0.36494718
## pH                     -0.68297819      0.234937294 -0.54190414
## sulphates               0.18300566     -0.260986685  0.31277004
## alcohol                -0.06166827     -0.202288027  0.10990325
## quality                 0.12405165     -0.390557780  0.22637251
##                      residual.sugar    chlorides free.sulfur.dioxide
## fixed.acidity           0.114776724  0.093705186        -0.153794193
## volatile.acidity        0.001917882  0.061297772        -0.010503827
## citric.acid             0.143577162  0.203822914        -0.060978129
## residual.sugar          1.000000000  0.055609535         0.187048995
## chlorides               0.055609535  1.000000000         0.005562147
## free.sulfur.dioxide     0.187048995  0.005562147         1.000000000
## total.sulfur.dioxide    0.203027882  0.047400468         0.667666450
## density                 0.355283371  0.200632327        -0.021945831
## pH                     -0.085652422 -0.265026131         0.070377499
## sulphates               0.005527121  0.371260481         0.051657572
## alcohol                 0.042075437 -0.221140545        -0.069408354
## quality                 0.013731637 -0.128906560        -0.050656057
##                      total.sulfur.dioxide     density          pH
## fixed.acidity                 -0.11318144  0.66804729 -0.68297819
## volatile.acidity               0.07647000  0.02202623  0.23493729
## citric.acid                    0.03553302  0.36494718 -0.54190414
## residual.sugar                 0.20302788  0.35528337 -0.08565242
## chlorides                      0.04740047  0.20063233 -0.26502613
## free.sulfur.dioxide            0.66766645 -0.02194583  0.07037750
## total.sulfur.dioxide           1.00000000  0.07126948 -0.06649456
## density                        0.07126948  1.00000000 -0.34169933
## pH                            -0.06649456 -0.34169933  1.00000000
## sulphates                      0.04294684  0.14850641 -0.19664760
## alcohol                       -0.20565394 -0.49617977  0.20563251
## quality                       -0.18510029 -0.17491923 -0.05773139
##                         sulphates     alcohol     quality
## fixed.acidity         0.183005664 -0.06166827  0.12405165
## volatile.acidity     -0.260986685 -0.20228803 -0.39055778
## citric.acid           0.312770044  0.10990325  0.22637251
## residual.sugar        0.005527121  0.04207544  0.01373164
## chlorides             0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide   0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide  0.042946836 -0.20565394 -0.18510029
## density               0.148506412 -0.49617977 -0.17491923
## pH                   -0.196647602  0.20563251 -0.05773139
## sulphates             1.000000000  0.09359475  0.25139708
## alcohol               0.093594750  1.00000000  0.47616632
## quality               0.251397079  0.47616632  1.00000000

Plotting Corrgram:

library(corrgram)
corrgram(wine.df, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of wine content")

Applying OLS regression on all the positively correlated features:

fit1<-lm(wine.df$quality ~ wine.df$sulphates + wine.df$citric.acid + wine.df$alcohol + wine.df$fixed.acidity)

Summary stats of model:

summary(fit1)
## 
## Call:
## lm(formula = wine.df$quality ~ wine.df$sulphates + wine.df$citric.acid + 
##     wine.df$alcohol + wine.df$fixed.acidity)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.73513 -0.35206 -0.08876  0.51102  2.16716 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.13822    0.21444   5.308 1.26e-07 ***
## wine.df$sulphates      0.82108    0.10639   7.718 2.08e-14 ***
## wine.df$citric.acid    0.31205    0.12479   2.501   0.0125 *  
## wine.df$alcohol        0.34562    0.01644  21.023  < 2e-16 ***
## wine.df$fixed.acidity  0.03250    0.01348   2.411   0.0160 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6831 on 1594 degrees of freedom
## Multiple R-squared:  0.2862, Adjusted R-squared:  0.2844 
## F-statistic: 159.8 on 4 and 1594 DF,  p-value: < 2.2e-16

Displaying Features which show positive correlation:

which(summary(fit1)$coefficients[,4]<0.05)
##           (Intercept)     wine.df$sulphates   wine.df$citric.acid 
##                     1                     2                     3 
##       wine.df$alcohol wine.df$fixed.acidity 
##                     4                     5

Displaying Confidence Interval:

confint(fit1)
##                             2.5 %     97.5 %
## (Intercept)           0.717615117 1.55882995
## wine.df$sulphates     0.612398492 1.02975720
## wine.df$citric.acid   0.067279900 0.55682396
## wine.df$alcohol       0.313376995 0.37787069
## wine.df$fixed.acidity 0.006055191 0.05895211

Coefficents of OLS regression:

coefficients(fit1)
##           (Intercept)     wine.df$sulphates   wine.df$citric.acid 
##            1.13822254            0.82107785            0.31205193 
##       wine.df$alcohol wine.df$fixed.acidity 
##            0.34562384            0.03250365

Applying OLS regression on all the negatively correlated features:

fit2<-lm(wine.df$quality ~ wine.df$total.sulfur.dioxide + wine.df$chlorides + wine.df$density + wine.df$volatile.acidity)

Summary stats of model:

summary(fit2)
## 
## Call:
## lm(formula = wine.df$quality ~ wine.df$total.sulfur.dioxide + 
##     wine.df$chlorides + wine.df$density + wine.df$volatile.acidity)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.61237 -0.49798 -0.03126  0.45786  2.73440 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   6.749e+01  9.728e+00   6.938 5.78e-12 ***
## wine.df$total.sulfur.dioxide -3.514e-03  5.512e-04  -6.376 2.38e-10 ***
## wine.df$chlorides            -1.214e+00  3.918e-01  -3.099  0.00198 ** 
## wine.df$density              -6.090e+01  9.769e+00  -6.234 5.80e-10 ***
## wine.df$volatile.acidity     -1.678e+00  1.011e-01 -16.596  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7206 on 1594 degrees of freedom
## Multiple R-squared:  0.2059, Adjusted R-squared:  0.2039 
## F-statistic: 103.3 on 4 and 1594 DF,  p-value: < 2.2e-16

Displaying Features which show negative correlation:

which(summary(fit2)$coefficients[,4]<0.05)
##                  (Intercept) wine.df$total.sulfur.dioxide 
##                            1                            2 
##            wine.df$chlorides              wine.df$density 
##                            3                            4 
##     wine.df$volatile.acidity 
##                            5

Displaying Confidence Interval:

confint(fit2)
##                                      2.5 %        97.5 %
## (Intercept)                   48.410675818  86.574212573
## wine.df$total.sulfur.dioxide  -0.004595249  -0.002433104
## wine.df$chlorides             -1.982489690  -0.445612658
## wine.df$density              -80.059801286 -41.738565290
## wine.df$volatile.acidity      -1.876739502  -1.480000444

Coefficents of OLS regression:

coefficients(fit2)
##                  (Intercept) wine.df$total.sulfur.dioxide 
##                 67.492444196                 -0.003514176 
##            wine.df$chlorides              wine.df$density 
##                 -1.214051174                -60.899183288 
##     wine.df$volatile.acidity 
##                 -1.678369973
  1. Result: The report evaluates the range of observation and concludes quality of wine is positively corellated with sulphates,citric acid,fixed acidity. Alcohol hold a high postive correlation with quality of wine. On the contrary quality of wine is negatively correlated with total sulphur dioxide,chlorides,density. Volatile acidity hold high negative correlation with quality of wine.

  2. Conclusion: This paper was motivated by the need for research that could improve our understanding of the factors influencing the qualty of wine. The unique contribution of this paper is that we investigated various chemical components of the red and white wine. We found that although many chemical components are negatively and positively correlated with quality of wine but it is the game ofright praportion. .