In this project, We will use R and apply exploratory data analysis techniques in the dataset to discover relationships among multiple variables, and create explanatory visualizations illuminating distributions, outliers, and anomalies.
The dataset
Red Wine Quality
This dataset is public available for research. The details are described in [Cortez et al., 2009].
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: Elsevier, Pre-press (pdf), bib.
to download the dataset click here
This tidy data set contains 1,599 red wines with 11 variables on the chemical properties of the wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).
Attribute information:
For more information, read [Cortez et al., 2009].
Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Missing Attribute Values: None
Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily). 2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste. 3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines. 4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet. 5 - chlorides: the amount of salt in the wine. 6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine. 7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine. 8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content. 9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale. 10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant. 11 - alcohol: the percent alcohol content of the wine. Output variable (based on sensory data): 12 - quality (score between 0 and 10).
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : Ord.factor w/ 6 levels "3"<"4"<"5"<"6"<..: 3 3 3 4 3 3 3 5 5 3 ...
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol quality
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40 3: 10
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 4: 53
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20 5:681
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42 6:638
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 7:199
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90 8: 18
There are 1599 observations and 12 variables in this dataset.All variables are numerical except quality ordered factor.
We try to understand which variables have the most influent on the quality of the wine. Let’s first start with quality variable.
Quality
## 3 4 5 6 7 8
## 10 53 681 638 199 18
Quality is normally distributed and concentrated around 5 and 6.And fewer wines at low quality and high quality.The range of quality of the red wine is [3,8] which means there are no wines with a quality worse than 3 and no wines with quality higher than 8.
Then we will investigate attributes individually.
Fixed Acidity
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
the median for fixed acidity is 7.9. and a peak around 7,The distribution of fixed acidity is right skewed.There are some outliers in the higher range.
Volatile Acidity
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
the median for volatile acidity is 0.52. and two peaks around 0.6 and 0.4 ,The distribution of Volatile Acidity is non symmetric and bimodal distribution.There are some outliers in the higher range.
Citric Acid
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
the median for Citric Acid is 0.260. and a peak around 0,The distribution of citric acid is right skewed.There are some outliers around 1.
Residual Sugar
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
the median for residual sugar is 2.2. High peaks at around 2 with,Residual sugar has a very long-tail distribution with many outliers.
Chlorides
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
The amount of chlorides in the wines has a median value of 0.079. Most wines have between 0.07 and 0.10 of chloride. This plot looks like normally distributed with long tail in the right side, There are some outliers in the higher range.
Free Sulfur Dioxide
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
the median for Free Sulfur Dioxide is 14. and there is a high peak around 7,The distribution of Free Sulfur Dioxide is right skewed.There are some outliers in the higher range.
Total Sulfur Dioxide
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
the median for Total Sulfur Dioxide is 38. and a peak around 30,The distribution of Total Sulfur Dioxide is right skewed.There are some outliers in the higher range.
Density
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9901 0.9956 0.9968 0.9967 0.9978 1.0037
the median for density is 0.9968. and a peak around 30,The distribution of density is normal distribution.
PH
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.740 3.210 3.310 3.311 3.400 4.010
the median for pH is 3.310,The distribution of pH This plot looks like normally distributed.
Sulphates
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3300 0.5500 0.6200 0.6581 0.7300 2.0000
the median for sulphates is 0.62.The distribution of sulphates is slightly right skewed many outliers.
Alcohol
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.50 10.20 10.42 11.10 14.90
the median for Alcohol is 10.20. and a peak around 9.5,The distribution of Alcohol is right skewed.There are some outliers. The range of alcohol of the red wine is [8.4, 14.9] which means there are no wines with a alcohol less than 8.4 and no wines higher than 14.9. Most wines have an alcohol less than 11%.
What is the structure of your dataset?
There are 1599 red wines observations and 12 variables in this dataset.And there’s No missing Attribute Values in the dataset. All variables are numerical except quality categorical variable. There are 11 numerical variables wich represent physicochemical measurements :(fixed acidity
, volatile acidity
, citric acid
, residual sugar
, chlorides
,free sulfur dioxide
, total sulfur dioxide
, density
, pH
, sulphates
, alcohol
). And one categorical variable quality
wich represent the reating of the red wine 0 (very bad) to 10 (very excellent).
What is/are the main feature(s) of interest in your dataset?
The main feature of interest is the quality rating.
What other features in the dataset do you think will help support your investigation into your feature(s) of interest?
I think 5 features contribute to the quality of the wine:
And also in the end all of them are related to each other.
Did you create any new variables from existing variables in the dataset?
No i did not create any new variables.
Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?
Some of the distribution had unusual peaks like citric acid
, residual sugar
, chlorides
, free sulfur dioxide
, alcohol
. And others had unusual outliers like total sulfur dioxide
, chlorides
, residual sugar
, sulphates
.
The dataset were clean and tidy data No need to adjust or change anything. I needed only to convert quality
varaible from int to factor type because quality
was categorical varabile. And no missing values.
In this section, We are interesting to show the relations between 2 variables at a time with plots.
From the correlation matrix its seems that:
Quality have a moderate correlations with alcohol and volatile.acidity. Quality correlated positively with alcohol. And correlated negatively with volatile.acidity.
Alcohol have a moderate negative correlations with density.
pH have a strong negative correlations with fixed.acidity. And a moderate negative correlations with citric.acid.
Density have a strong positive correlations with fixed.acidity.
Total.sulfur.dioxide strongly correlated positively with free.sulfur.dioxide.
Citric.acid strongly correlated positively with fixed.acidity. and have a moderate negative correlations with volatile.acidity.
The relationship between the other variables is week relationship because the absolute correlation lees than 0.5.
The relationship between quality and other varaibles.
Fixed Acidity, Residual Sugar, Chlorides, Free Sulfur Doxide and Total Sulfur Doxide they all have correlation coefficient between 0 and 2 with quality which indicates that there is no clear relationship between the two variables. And that’s what we see in boxplot nothing very clear.
Citric Acid and Sulphates have a positive relationship with quality with correlation coefficient 0.2
Approximately, which is a weak correlation. Citric Acid and Sulphates they seems to have a positive impact on quality. the more citric acid or sulphates in wine the higher rated the wine get.
pH and Density have a negative relationship with quality with correlation coefficient -0.1
Approximately, which is a weak correlation. pH and Density they seems to have a negative impact on quality. the lower pH or Density in wine the higher rated the wine get.
The relationship between two varaibles with strong or moderate correlations.
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.256130895 0.67170343
## volatile.acidity -0.25613089 1.000000000 -0.55249568
## citric.acid 0.67170343 -0.552495685 1.00000000
## residual.sugar 0.11477672 0.001917882 0.14357716
## chlorides 0.09370519 0.061297772 0.20382291
## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813
## total.sulfur.dioxide -0.11318144 0.076470005 0.03553302
## density 0.66804729 0.022026232 0.36494718
## pH -0.68297819 0.234937294 -0.54190414
## sulphates 0.18300566 -0.260986685 0.31277004
## alcohol -0.06166827 -0.202288027 0.10990325
## quality 0.12405165 -0.390557780 0.22637251
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.114776724 0.093705186 -0.153794193
## volatile.acidity 0.001917882 0.061297772 -0.010503827
## citric.acid 0.143577162 0.203822914 -0.060978129
## residual.sugar 1.000000000 0.055609535 0.187048995
## chlorides 0.055609535 1.000000000 0.005562147
## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000
## total.sulfur.dioxide 0.203027882 0.047400468 0.667666450
## density 0.355283371 0.200632327 -0.021945831
## pH -0.085652422 -0.265026131 0.070377499
## sulphates 0.005527121 0.371260481 0.051657572
## alcohol 0.042075437 -0.221140545 -0.069408354
## quality 0.013731637 -0.128906560 -0.050656057
## total.sulfur.dioxide density pH
## fixed.acidity -0.11318144 0.66804729 -0.68297819
## volatile.acidity 0.07647000 0.02202623 0.23493729
## citric.acid 0.03553302 0.36494718 -0.54190414
## residual.sugar 0.20302788 0.35528337 -0.08565242
## chlorides 0.04740047 0.20063233 -0.26502613
## free.sulfur.dioxide 0.66766645 -0.02194583 0.07037750
## total.sulfur.dioxide 1.00000000 0.07126948 -0.06649456
## density 0.07126948 1.00000000 -0.34169933
## pH -0.06649456 -0.34169933 1.00000000
## sulphates 0.04294684 0.14850641 -0.19664760
## alcohol -0.20565394 -0.49617977 0.20563251
## quality -0.18510029 -0.17491923 -0.05773139
## sulphates alcohol quality
## fixed.acidity 0.183005664 -0.06166827 0.12405165
## volatile.acidity -0.260986685 -0.20228803 -0.39055778
## citric.acid 0.312770044 0.10990325 0.22637251
## residual.sugar 0.005527121 0.04207544 0.01373164
## chlorides 0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide 0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide 0.042946836 -0.20565394 -0.18510029
## density 0.148506412 -0.49617977 -0.17491923
## pH -0.196647602 0.20563251 -0.05773139
## sulphates 1.000000000 0.09359475 0.25139708
## alcohol 0.093594750 1.00000000 0.47616632
## quality 0.251397079 0.47616632 1.00000000
Alcohol vs. Quality
The plot shows a positive relationship between alcohol and quality. They have correlation coefficient of 0.48
which is a moderate correlation. Alcohol seems to have a positive impact on wine quality.
The more wine contains alcohol the higher rated the wine get.
Volatile Acidity vs. Quality
The plot shows a negative relationship between Volatile Acidity and quality. They have correlation coefficient of -0.4
which is a moderate correlation. Volatile Acidity seems to have a negative impact on wine quality, the more wine contains Volatile Acidity the lower rated the wine get.
The lower volatile acidity seems to mean higher wine quality.
Alcohol vs. Density
The plot shows a negative relationship between Alcohol and Density. They have correlation coefficient of -0.5
which is a moderate correlation. Alcohol seems to have a negative impact on density in the wine, the more wine contains Alcohol the lower density get.
The lower Alcohol seems to mean higher wine Density.
pH vs. Fixed Acidity
The plot shows a negative relationship between pH and Fixed Acidity. They have correlation coefficient of -0.7
which is a strong correlation. pH seems to have a negative impact on Fixed Acidity in the wine, the more wine contains pH the lower Fixed Acidity get.
The lower the pH, the higher the Fixed Acidity in the wine.
pH vs. Citric Acid
The plot shows a negative relationship between pH and Citric Acid. They have correlation coefficient of -0.54
which is a moderate correlation. pH seems to have a negative impact on Citric Acid in the wine, the more wine contains pH the lower Citric Acid get.
The lower the pH, the higher the Citric Acid in the wine.
Density vs. Fixed Acidity
The plot shows a positive relationship between Density and Fixed Acidity. They have correlation coefficient of 0.7
which is a strong correlation. Density seems to have a positive impact on Fixed Acidity in the wine, the more wine contains Density the more Fixed Acidity get.
The more Density, the higher the Fixed Acidity in the wine.
Total Sulfur Dioxide vs. Free Sulfur Dioxide
The plot shows a positive relationship between Total Sulfur Dioxide and Free Sulfur Dioxide. They have correlation coefficient of 0.7
which is a strong correlation. Total Sulfur Dioxide seems to have a positive impact on Free Sulfur Dioxide in the wine, the more wine contains Total Sulfur Dioxide the more Free Sulfur Dioxide get.
The more Total Sulfur Dioxide, the higher the Free Sulfur Dioxide in the wine.
Citric Acid vs. Fixed Acidity
The plot shows a positive relationship between Citric Acid and Fixed Acidity. They have correlation coefficient of 0.7
which is a strong correlation. Citric Acid seems to have a positive impact on Fixed Acidity in the wine, the more wine contains Citric Acid the more Fixed Acidity get.
The more Citric Acid, the higher the Fixed Acidity in the wine.
Citric Acid vs. Volatile Acidity
The plot shows a negative relationship between Citric Acid and Volatile Acidity. They have correlation coefficient of -0.6
which is between moderate and strong correlation. Citric Acid seems to have a negative impact on Volatile Acidity in the wine, the more wine contains Citric Acid the lower Volatile Acidity get.
The lower Citric Acid, the higher the Volatile Acidity in the wine.
Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?
Quality is moderately correlated with Alcohol and Volatile Acidity. Positively with alcohol and negatively with volatile acidity. And quality is weekly correlated with Citric Acid, Sulphates, pH and Density. Positively with citric acid and sulphates and negatively with pH and density. And there’s no significant relationship between quality and (Fixed Acidity, Residual Sugar, Chlorides, Free Sulfur Doxide and Total Sulfur Doxide). The wine get better with: - Higher alcohol, citric acid and sulphates. - Lower volatile acidity, pH and Density.
Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?
Yes i did.
Fixed Acidity have interesting strong relation with pH, citric acid and density. Negatively with pH and positively with citric acid and density. There’s another interesting strong positive relation between Total Sulfur Dioxide and Free Sulfur Dioxide. And there are other negativie relations with moderate correlation between: (alcohol and density),(pH and Citric Acid), (citric acid and volatile acidity).
What was the strongest relationship you found ?
The strongest relationship we found between pH and fixed acidity with negative correlation of -0.683
.
Citric Acid vs. Volatile Acidity over Quality
From the plot we can see high quality wines when the wine have low volatile acidity. and low quality wines when the wine have low Citric Acid and high volatile acidity.
Alcohol vs. Volatile Acidity over Quality
From the plot we can see high quality wines when the wine have low volatile acidity and high alcohol. and low quality wines when the wine have low alcohol and high volatile acidity.
Alcohol vs. Density over Quality
From the plot we can see high quality wines when the wine have density less than 1 and high alcohol. and low quality wines when the wine have low alcohol and density higher than 0.994.
Alcohol vs. Citric Acid over Quality
From the plot we can see high quality wines when the wine have citric acid less than 0.75 and high alcohol. and low quality wines when the wine have alcohol less than 11 and citric acid less than 0.75.
there’s no clear conclusion about citric acid with alcohol over quality.
Alcohol vs. Sulphates over Quality
From the plot we can see high quality wines when the wine have sulphates between 0.6 and 1.3 and high alcohol. and low quality wines when the wine have low alcohol and sulphates less than 0.6.
Sulphates vs. Volatile Acidity over Quality
From the plot we can see high quality wines when the wine have sulphates between 0.6 and 1.3 and volatile acidity less than 0.7. and low quality wines when the wine have sulphates less than 0.6 and high volatile acidity higher than 0.7.
Alcohol vs. pH over Quality
From the plot we can see high quality wines when the wine have pH between 2.7 and 3.7 and high alcohol. and low quality wines when the wine have alcohol less than 11 and pH between 2.7 and 3.7.
there’s no clear conclusion about pH with alcohol over quality.
pH vs. Volatile Acidity over Quality
From the plot we can see high quality wines when the wine have volatile acidity less than 0.9 and pH less than 3.7. and low quality wines when the wine have pH between 3.2 and 3.7 and volatile acidity above 0.7.
Density vs. Volatile Acidity over Quality
From the plot we can see high quality wines when the wine have density less than 0.998 and volatile acidity less 0.9. and low quality wines when the wine have density between 0.994 and 1.002 and volatile acidity above 0.4 .
Citric Acid vs. Fixed Acidity over Volatile Acidity
From the plot we can see the higher Citric Acid and Fixed Acidity in the wine, The higher Volatile Acidity get in wine. and the lower Citric Acid and Fixed Acidity in the wine, The lower Volatile Acidity get in wine.
Density vs. Fixed Acidity over Citric Acid
From the plot we can see the higher Density and Fixed Acidity in the wine, The higher Citric Acid get in wine. and the lower Density and Fixed Acidity in the wine, The lower Citric Acid get in wine.
Alcohol vs. Density over Fixed Acidity
From the plot we can see Higher alcohol and lower density in the wine, Led to lower Fixed Acidity in wine. Lower alcohol and higher density in the wine, Led to higher Fixed Acidity in wine.
pH vs. Citric Acid over Fixed Acidity
From the plot we can see Higher fixed acidity in wine when pH less than 3.4 and citric acid less than 0.78. lower fixed acidity in wine when pH above 3.4 and citric acid less than 0.78.
pH vs. Citric Acid over Volatile Acidity
From the plot we can see volatile acidity get lower in wine when Citric Acid get lower and pH between 3 and 3.7. And volatile acidity get higher in wine when Citric Acid between 0.25 and 0.76 and pH between 3 and 3.7.
pH vs. Fixed Acidity over Density
From the plot we can see high density when the wine have fixed acidity less than 8 and pH between 2.7 and 3.7. and low density when the wine have fixed acidity above 10 and pH between 2.7 and 3.7.
Talk about some of the relationships you observed in this part of the . Were there features that strengthened each other in terms of at your feature(s) of interest?
After investigating many multivariate plots, We found quality will be higher in wine when we combined these properties:
and we found other relations between features like:
Were there any interesting or surprising interactions between features?
Since i did’t had a big background in the wine field every relationship between features was a surprising and interesting relationship.
Plot One : Quality Of Wine
## 1 2 3 4 5 6
## 10 53 681 638 199 18
Description One
what the most common quality in the dataset ?
We choose This plot to see that the dataset contains 1.12%
of the best quality and contains 0.62%
of the worst quality in red wine, And contains82.5%
of the avarege quality between 5 and 6. And we want to know What factors affect the quality of wine ?
Plot Two : Alcohol And Quality Of Wine
Description Two
Alcohol has the strongest correlation with quality which led to alcohol was the most influential factor on wine quality. Based on that we decide to choose This plot. the plot explain the relation between alcohol and quality of wine. When alcohol increses in red wine we can see the quality above 5 also increse.
Plot Three
Description Three
We decide to choose This plot becuase we want to know the effect of other factors associated with alcohol on wine quality. We observed the opposite direction relation between quality and volatile acidity, density. And the positive relation between quality and alcohol. High quality wines when the wine have low volatile acidity and high alcohol and density less than 1. Low quality wines when the wine low alcohol and high volatile acidity and density higher than 0.994. Both of these factors are important because of their significant impact on wine quality.
This is a unique and intersting project. I have learned a lot through working on the project and I had many difficulties and challenges to reach the goal of the project. I spent many hours learning and building.
What are the difficulties and challenges about this project?
The dataset is about the red wine with 1,599 samples and 12 variables. And there’s no missing data. And it was a tidy dataset. The purpose of the project to analyze and explore the data using the plot. First we tried to understand the variables and what each variable means in this data before start to analyze them. Then we explore variables individually in Univariate Plots Section. After that we explore and analyze Binary variables the relation between them and we use correlations matrix in Bivariate Plots Section. Finally we explored the relationships between three variables in Multivariate Plots Section.
How to improve the analysis ?
If the number of samples was greater, The analysis would be better or if the red and white wine data were combined to study wine more and widely.
Is there any insight for future work?
For future work with this data we should foucse on nonlinear regression modelling for great and accurate results.
We have found a few samples with a high quality wine. This analysis can be used to raise the wine quality to achieve the best results.
In the end i really have fun analyzing this dataset with the amazing R language.
https://www.r-bloggers.com/identify-describe-plot-and-remove-the-outliers-from-the-dataset/
https://s3.amazonaws.com/udacity-hosted-downloads/ud651/GeographyOfAmericanMusic.html
https://en.wikipedia.org/wiki/Acids_in_wine#Citric_acid
https://s3.amazonaws.com/udacity-hosted-downloads/ud651/AtlanticHurricaneTracking.html
https://statistics.laerd.com/statistical-guides/types-of-variable.php
https://s3.amazonaws.com/content.udacity-data.com/courses/ud651/diamondsExample_2016-05.html
https://www.dummies.com/education/math/statistics/how-to-interpret-a-correlation-coefficient-r/
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
http://adv-r.had.co.nz/Functions.html
https://4va.github.io/biodatasci/r-viz-gapminder.html
https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html
https://www.r-bloggers.com/producing-grids-of-plots-in-r-with-ggplot2-a-journey-of-discovery/
https://www.rdocumentation.org/packages/cowplot/versions/0.9.4/topics/plot_grid
https://rstudio-pubs-static.s3.amazonaws.com/337414_ef0db534b06a4232945a5b907cfa871a.html
https://www.rdocumentation.org/packages/ggplot2/versions/2.2.1/topics/labs
https://cran.r-project.org/web/packages/cowplot/vignettes/plot_grid.html