Wine (from Latin vinum) is an alcoholic beverage made from grapes, generally Vitis vinifera, fermented without the addition of sugars, acids, enzymes, water, or other nutrients.
Wine has been produced for thousands of years. The earliest known traces of wine are from Georgia (c.???6000 BC), Iran (c.???5000 BC), and Sicily (c.???4000 BC) although there is evidence of a similar alcoholic beverage being consumed earlier in China (c.???7000 BC). The earliest known winery is the 6,100-year-old Areni-1 winery in Armenia. Wine reached the Balkans by 4500 BC and was consumed and celebrated in ancient Greece, Thrace and Rome. Throughout history, wine has been consumed for its intoxicating effects.
Wine has long played an important role in religion. Red wine was associated with blood by the ancient Egyptians and was used by both the Greek cult of Dionysus and the Romans in their Bacchanalia; Judaism also incorporates it in the Kiddush and Christianity in the Eucharist.
Yeast consumes the sugar in the grapes and converts it to ethanol and carbon dioxide. Different varieties of grapes and strains of yeasts produce different styles of wine. These variations result from the complex interactions between the biochemical development of the grape, the reactions involved in fermentation, the terroir, and the production process. Many countries enact legal appellations intended to define styles and qualities of wine. These typically restrict the geographical origin and permitted varieties of grapes, as well as other aspects of wine production. Wines not made from grapes include rice wine and fruit wines such as plum, cherry, pomegranate and elderberry.
This paper addresses the following issues concerning the “quality of wine” with respect to various chemical contents of and acids. The first issue concerns the correlation between different acids and quality of wine. In this paper, we investigate what could be the best associate parameter on which best quality wine depends?
Our field study concerns quality of wine , produced all over the world. Wine is from one of the alcohol family and it is considered as a part of rich culture as well. There are various health benifits of wine (http://www.wideopeneats.com/10-health-benefits-get-drinking-daily-glass-wine/). There are a large number of occupations and professions that are part of the wine industry, ranging from the individuals who grow the grapes, prepare the wine, bottle it, sell it, assess it, market it and finally make recommendations to clients and serve the wine. In this study, we figure out important correlated chemical components of wine. Important acids which are associated with quality of wine. We will find out the right components which are needed in right ratio to make quality wine.
3.Content:
3.1 Content 1: The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. For more details,consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
3.2 Content 2: For more information, read [Cortez et al., 2009]. Input variables (based on physicochemical tests): 1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)
4.1 Wine Quality Data Set Download: Data Folder, Data Set Description
4.2 Abstract: Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests.
4.3 Data Set Characteristics:
Multivariate
Number of Instances:
1599
Area:
Business
Attribute Characteristics:
Real
Number of Attributes:
12
Date Donated
2009-10-07
Associated Tasks:
Classification, Regression
Missing Values?
N/A
Number of Web Hits:
580109
4.4 Source:
Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez A. Cerdeira, F. Almeida, T. Matos and J. Reis, Viticulture Commission of the Vinho Verde Region(CVRVV), Porto, Portugal @2009
4.5 Data Set Information:
The two datasets are related to red and white variants of the Portuguese “Vinho Verde” wine. For more details, consult: [Web Link] or the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).
The classes are ordered and not balanced (e.g. there are munch more normal wines than excellent or poor ones).
Also, we are not sure if all input variables are relevant.
5.1 Positively correlated features: Quality=??0+??1sulphates+??2citric.acid+??3alcohol+??4acidity+??
5.2 Negatively correlated features: Quality=??0+??1total_sulphur_dioxide+??2chlorides+??3Volatile_acidity+??4density+??
wine.df <- read.csv(paste("winequality-red.csv", sep=""))
View(wine.df)
Summary stats:
summary(wine.df)
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
Plotting each variable:
hist(wine.df$fixed.acidity,main = "Distribution of Fixed acidity",xlab = "Fixed acidity",ylab = "Frequency",col = "grey")
hist(wine.df$volatile.acidity,main = "Distribution of Volatile acidity",xlab = "Volatile acidity",ylab = "Frequency",col = "grey")
hist(wine.df$citric.acid,main = "Distribution of Citric acid",xlab = "Citric acid",ylab = "Frequency",col = "grey")
hist(wine.df$residual.sugar,main = "Distribution of Residual Sugar",xlab = "Residual Sugar",ylab = "Frequency",col = "grey")
hist(wine.df$chlorides,main = "Distribution of Chlorides",xlab = "Chlorides",ylab = "Frequency",col = "grey")
hist(wine.df$free.sulfur.dioxide,main = "Distribution of Free Sulphur Dioxide",xlab = "Free Sulphur Dioxide",ylab = "Frequency",col = "grey")
hist(wine.df$total.sulfur.dioxide,main = "Distribution of Total sulphur Dioxide",xlab = "Total Sulphur Dioxide",ylab = "Frequency",col = "grey")
hist(wine.df$density,main = "Distribution of Density",xlab = "Density",ylab = "Frequency",col = "grey")
hist(wine.df$pH,main = "Distribution of ph level",xlab = "ph",ylab = "Frequency",col = "grey")
hist(wine.df$sulphates,main = "Distribution of Sulphates",xlab = "Sulphates",ylab = "Frequency",col = "grey")
hist(wine.df$alcohol,main = "Distribution of Alcohol",xlab = "Alcohol",ylab = "Frequency",col = "grey")
hist(wine.df$quality,main = "Distribution of Quality",xlab = "Quality",ylab = "Frequency",col = "grey")
Boxplot of each variable:
boxplot(wine.df,main="Distribution of each
Variable",xlab="Variable",ylab="Distribution")
Generating Correlation matrix:
cor(wine.df)
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.256130895 0.67170343
## volatile.acidity -0.25613089 1.000000000 -0.55249568
## citric.acid 0.67170343 -0.552495685 1.00000000
## residual.sugar 0.11477672 0.001917882 0.14357716
## chlorides 0.09370519 0.061297772 0.20382291
## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813
## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302
## density 0.66804729 0.022026232 0.36494718
## pH -0.68297819 0.234937294 -0.54190414
## sulphates 0.18300566 -0.260986685 0.31277004
## alcohol -0.06166827 -0.202288027 0.10990325
## quality 0.12405165 -0.390557780 0.22637251
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.114776724 0.093705186 -0.153794193
## volatile.acidity 0.001917882 0.061297772 -0.010503827
## citric.acid 0.143577162 0.203822914 -0.060978129
## residual.sugar 1.000000000 0.055609535 0.187048995
## chlorides 0.055609535 1.000000000 0.005562147
## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000
## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450
## density 0.355283371 0.200632327 -0.021945831
## pH -0.085652422 -0.265026131 0.070377499
## sulphates 0.005527121 0.371260481 0.051657572
## alcohol 0.042075437 -0.221140545 -0.069408354
## quality 0.013731637 -0.128906560 -0.050656057
## total.sulfur.dioxide density pH
## fixed.acidity -0.11318144 0.66804729 -0.68297819
## volatile.acidity 0.07647000 0.02202623 0.23493729
## citric.acid 0.03553302 0.36494718 -0.54190414
## residual.sugar 0.20302788 0.35528337 -0.08565242
## chlorides 0.04740047 0.20063233 -0.26502613
## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750
## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456
## density 0.07126948 1.00000000 -0.34169933
## pH -0.06649456 -0.34169933 1.00000000
## sulphates 0.04294684 0.14850641 -0.19664760
## alcohol -0.20565394 -0.49617977 0.20563251
## quality -0.18510029 -0.17491923 -0.05773139
## sulphates alcohol quality
## fixed.acidity 0.183005664 -0.06166827 0.12405165
## volatile.acidity -0.260986685 -0.20228803 -0.39055778
## citric.acid 0.312770044 0.10990325 0.22637251
## residual.sugar 0.005527121 0.04207544 0.01373164
## chlorides 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029
## density 0.148506412 -0.49617977 -0.17491923
## pH -0.196647602 0.20563251 -0.05773139
## sulphates 1.000000000 0.09359475 0.25139708
## alcohol 0.093594750 1.00000000 0.47616632
## quality 0.251397079 0.47616632 1.00000000
Plotting Corrgram:
library(corrgram)
corrgram(wine.df, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of wine content")
Applying OLS regression on all the positively correlated features:
fit1<-lm(wine.df$quality ~ wine.df$sulphates + wine.df$citric.acid + wine.df$alcohol + wine.df$fixed.acidity)
Summary stats of model:
summary(fit1)
##
## Call:
## lm(formula = wine.df$quality ~ wine.df$sulphates + wine.df$citric.acid +
## wine.df$alcohol + wine.df$fixed.acidity)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.73513 -0.35206 -0.08876 0.51102 2.16716
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.13822 0.21444 5.308 1.26e-07 ***
## wine.df$sulphates 0.82108 0.10639 7.718 2.08e-14 ***
## wine.df$citric.acid 0.31205 0.12479 2.501 0.0125 *
## wine.df$alcohol 0.34562 0.01644 21.023 < 2e-16 ***
## wine.df$fixed.acidity 0.03250 0.01348 2.411 0.0160 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6831 on 1594 degrees of freedom
## Multiple R-squared: 0.2862, Adjusted R-squared: 0.2844
## F-statistic: 159.8 on 4 and 1594 DF, p-value: < 2.2e-16
Displaying Features which show positive correlation:
which(summary(fit1)$coefficients[,4]<0.05)
## (Intercept) wine.df$sulphates wine.df$citric.acid
## 1 2 3
## wine.df$alcohol wine.df$fixed.acidity
## 4 5
Displaying Confidence Interval:
confint(fit1)
## 2.5 % 97.5 %
## (Intercept) 0.717615117 1.55882995
## wine.df$sulphates 0.612398492 1.02975720
## wine.df$citric.acid 0.067279900 0.55682396
## wine.df$alcohol 0.313376995 0.37787069
## wine.df$fixed.acidity 0.006055191 0.05895211
Coefficents of OLS regression:
coefficients(fit1)
## (Intercept) wine.df$sulphates wine.df$citric.acid
## 1.13822254 0.82107785 0.31205193
## wine.df$alcohol wine.df$fixed.acidity
## 0.34562384 0.03250365
Applying OLS regression on all the negatively correlated features:
fit2<-lm(wine.df$quality ~ wine.df$total.sulfur.dioxide + wine.df$chlorides + wine.df$density + wine.df$volatile.acidity)
Summary stats of model:
summary(fit2)
##
## Call:
## lm(formula = wine.df$quality ~ wine.df$total.sulfur.dioxide +
## wine.df$chlorides + wine.df$density + wine.df$volatile.acidity)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.61237 -0.49798 -0.03126 0.45786 2.73440
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.749e+01 9.728e+00 6.938 5.78e-12 ***
## wine.df$total.sulfur.dioxide -3.514e-03 5.512e-04 -6.376 2.38e-10 ***
## wine.df$chlorides -1.214e+00 3.918e-01 -3.099 0.00198 **
## wine.df$density -6.090e+01 9.769e+00 -6.234 5.80e-10 ***
## wine.df$volatile.acidity -1.678e+00 1.011e-01 -16.596 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7206 on 1594 degrees of freedom
## Multiple R-squared: 0.2059, Adjusted R-squared: 0.2039
## F-statistic: 103.3 on 4 and 1594 DF, p-value: < 2.2e-16
Displaying Features which show negative correlation:
which(summary(fit2)$coefficients[,4]<0.05)
## (Intercept) wine.df$total.sulfur.dioxide
## 1 2
## wine.df$chlorides wine.df$density
## 3 4
## wine.df$volatile.acidity
## 5
Displaying Confidence Interval:
confint(fit2)
## 2.5 % 97.5 %
## (Intercept) 48.410675818 86.574212573
## wine.df$total.sulfur.dioxide -0.004595249 -0.002433104
## wine.df$chlorides -1.982489690 -0.445612658
## wine.df$density -80.059801286 -41.738565290
## wine.df$volatile.acidity -1.876739502 -1.480000444
Coefficents of OLS regression:
coefficients(fit2)
## (Intercept) wine.df$total.sulfur.dioxide
## 67.492444196 -0.003514176
## wine.df$chlorides wine.df$density
## -1.214051174 -60.899183288
## wine.df$volatile.acidity
## -1.678369973
Result: The report evaluates the range of observation and concludes quality of wine is positively corellated with sulphates,citric acid,fixed acidity. Alcohol hold a high postive correlation with quality of wine. On the contrary quality of wine is negatively correlated with total sulphur dioxide,chlorides,density. Volatile acidity hold high negative correlation with quality of wine.
Conclusion: This paper was motivated by the need for research that could improve our understanding of the factors influencing the qualty of wine. The unique contribution of this paper is that we investigated various chemical components of the red and white wine. We found that although many chemical components are negatively and positively correlated with quality of wine but it is the game ofright praportion. .