Introduction

In this project, We will use R and apply exploratory data analysis techniques in the dataset to discover relationships among multiple variables, and create explanatory visualizations illuminating distributions, outliers, and anomalies.

The dataset

Red Wine Quality

This dataset is public available for research. The details are described in [Cortez et al., 2009].

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: Elsevier, Pre-press (pdf), bib.

to download the dataset click here

This tidy data set contains 1,599 red wines with 11 variables on the chemical properties of the wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).

Attribute information:

For more information, read [Cortez et al., 2009].

Input variables (based on physicochemical tests): 1 - fixed acidity (tartaric acid - g / dm^3) 2 - volatile acidity (acetic acid - g / dm^3) 3 - citric acid (g / dm^3) 4 - residual sugar (g / dm^3) 5 - chlorides (sodium chloride - g / dm^3 6 - free sulfur dioxide (mg / dm^3) 7 - total sulfur dioxide (mg / dm^3) 8 - density (g / cm^3) 9 - pH 10 - sulphates (potassium sulphate - g / dm3) 11 - alcohol (% by volume) Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Missing Attribute Values: None

Description of attributes:

1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily). 2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste. 3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines. 4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet. 5 - chlorides: the amount of salt in the wine. 6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine. 7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine. 8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content. 9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale. 10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant. 11 - alcohol: the percent alcohol content of the wine. Output variable (based on sensory data): 12 - quality (score between 0 and 10).

Exploring Data

## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : Ord.factor w/ 6 levels "3"<"4"<"5"<"6"<..: 3 3 3 4 3 3 3 5 5 3 ...
##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00      
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00      
##  Median :0.07900   Median :14.00       Median : 38.00      
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47      
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00      
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00      
##     density             pH          sulphates         alcohol      quality
##  Min.   :0.9901   Min.   :2.740   Min.   :0.3300   Min.   : 8.40   3: 10  
##  1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50   4: 53  
##  Median :0.9968   Median :3.310   Median :0.6200   Median :10.20   5:681  
##  Mean   :0.9967   Mean   :3.311   Mean   :0.6581   Mean   :10.42   6:638  
##  3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10   7:199  
##  Max.   :1.0037   Max.   :4.010   Max.   :2.0000   Max.   :14.90   8: 18

There are 1599 observations and 12 variables in this dataset.All variables are numerical except quality ordered factor.

Univariate Plots Section

We try to understand which variables have the most influent on the quality of the wine. Let’s first start with quality variable.

Quality

##   3   4   5   6   7   8 
##  10  53 681 638 199  18

Quality is normally distributed and concentrated around 5 and 6.And fewer wines at low quality and high quality.The range of quality of the red wine is [3,8] which means there are no wines with a quality worse than 3 and no wines with quality higher than 8.

Then we will investigate attributes individually.

Fixed Acidity

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90

the median for fixed acidity is 7.9. and a peak around 7,The distribution of fixed acidity is right skewed.There are some outliers in the higher range.

Volatile Acidity

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800

the median for volatile acidity is 0.52. and two peaks around 0.6 and 0.4 ,The distribution of Volatile Acidity is non symmetric and bimodal distribution.There are some outliers in the higher range.

Citric Acid

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000

the median for Citric Acid is 0.260. and a peak around 0,The distribution of citric acid is right skewed.There are some outliers around 1.

Residual Sugar

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500

the median for residual sugar is 2.2. High peaks at around 2 with,Residual sugar has a very long-tail distribution with many outliers.

Chlorides

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100

The amount of chlorides in the wines has a median value of 0.079. Most wines have between 0.07 and 0.10 of chloride. This plot looks like normally distributed with long tail in the right side, There are some outliers in the higher range.

Free Sulfur Dioxide

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00

the median for Free Sulfur Dioxide is 14. and there is a high peak around 7,The distribution of Free Sulfur Dioxide is right skewed.There are some outliers in the higher range.

Total Sulfur Dioxide

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00

the median for Total Sulfur Dioxide is 38. and a peak around 30,The distribution of Total Sulfur Dioxide is right skewed.There are some outliers in the higher range.

Density

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0037

the median for density is 0.9968. and a peak around 30,The distribution of density is normal distribution.

PH

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010

the median for pH is 3.310,The distribution of pH This plot looks like normally distributed.

Sulphates

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000

the median for sulphates is 0.62.The distribution of sulphates is slightly right skewed many outliers.

Alcohol

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

the median for Alcohol is 10.20. and a peak around 9.5,The distribution of Alcohol is right skewed.There are some outliers. The range of alcohol of the red wine is [8.4, 14.9] which means there are no wines with a alcohol less than 8.4 and no wines higher than 14.9. Most wines have an alcohol less than 11%.

Univariate Analysis

What is the structure of your dataset?

There are 1599 red wines observations and 12 variables in this dataset.And there’s No missing Attribute Values in the dataset. All variables are numerical except quality categorical variable. There are 11 numerical variables wich represent physicochemical measurements :(fixed acidity, volatile acidity, citric acid, residual sugar, chlorides,free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol). And one categorical variable quality wich represent the reating of the red wine 0 (very bad) to 10 (very excellent).

What is/are the main feature(s) of interest in your dataset?

The main feature of interest is the quality rating.

What other features in the dataset do you think will help support your  investigation into your feature(s) of interest?

I think 5 features contribute to the quality of the wine:

And also in the end all of them are related to each other.

Did you create any new variables from existing variables in the dataset?

No i did not create any new variables.

Of the features you investigated, were there any unusual distributions?  Did you perform any operations on the data to tidy, adjust, or change the form  of the data? If so, why did you do this?

Some of the distribution had unusual peaks like citric acid, residual sugar, chlorides, free sulfur dioxide, alcohol. And others had unusual outliers like total sulfur dioxide, chlorides, residual sugar, sulphates.

The dataset were clean and tidy data No need to adjust or change anything. I needed only to convert quality varaible from int to factor type because quality was categorical varabile. And no missing values.

Bivariate Plots Section

In this section, We are interesting to show the relations between 2 variables at a time with plots.

From the correlation matrix its seems that:

The relationship between the other variables is week relationship because the absolute correlation lees than 0.5.

The relationship between quality and other varaibles.

The relationship between two varaibles with strong or moderate correlations.

##                      fixed.acidity volatile.acidity citric.acid
## fixed.acidity           1.00000000     -0.256130895  0.67170343
## volatile.acidity       -0.25613089      1.000000000 -0.55249568
## citric.acid             0.67170343     -0.552495685  1.00000000
## residual.sugar          0.11477672      0.001917882  0.14357716
## chlorides               0.09370519      0.061297772  0.20382291
## free.sulfur.dioxide    -0.15379419     -0.010503827 -0.06097813
## total.sulfur.dioxide   -0.11318144      0.076470005  0.03553302
## density                 0.66804729      0.022026232  0.36494718
## pH                     -0.68297819      0.234937294 -0.54190414
## sulphates               0.18300566     -0.260986685  0.31277004
## alcohol                -0.06166827     -0.202288027  0.10990325
## quality                 0.12405165     -0.390557780  0.22637251
##                      residual.sugar    chlorides free.sulfur.dioxide
## fixed.acidity           0.114776724  0.093705186        -0.153794193
## volatile.acidity        0.001917882  0.061297772        -0.010503827
## citric.acid             0.143577162  0.203822914        -0.060978129
## residual.sugar          1.000000000  0.055609535         0.187048995
## chlorides               0.055609535  1.000000000         0.005562147
## free.sulfur.dioxide     0.187048995  0.005562147         1.000000000
## total.sulfur.dioxide    0.203027882  0.047400468         0.667666450
## density                 0.355283371  0.200632327        -0.021945831
## pH                     -0.085652422 -0.265026131         0.070377499
## sulphates               0.005527121  0.371260481         0.051657572
## alcohol                 0.042075437 -0.221140545        -0.069408354
## quality                 0.013731637 -0.128906560        -0.050656057
##                      total.sulfur.dioxide     density          pH
## fixed.acidity                 -0.11318144  0.66804729 -0.68297819
## volatile.acidity               0.07647000  0.02202623  0.23493729
## citric.acid                    0.03553302  0.36494718 -0.54190414
## residual.sugar                 0.20302788  0.35528337 -0.08565242
## chlorides                      0.04740047  0.20063233 -0.26502613
## free.sulfur.dioxide            0.66766645 -0.02194583  0.07037750
## total.sulfur.dioxide           1.00000000  0.07126948 -0.06649456
## density                        0.07126948  1.00000000 -0.34169933
## pH                            -0.06649456 -0.34169933  1.00000000
## sulphates                      0.04294684  0.14850641 -0.19664760
## alcohol                       -0.20565394 -0.49617977  0.20563251
## quality                       -0.18510029 -0.17491923 -0.05773139
##                         sulphates     alcohol     quality
## fixed.acidity         0.183005664 -0.06166827  0.12405165
## volatile.acidity     -0.260986685 -0.20228803 -0.39055778
## citric.acid           0.312770044  0.10990325  0.22637251
## residual.sugar        0.005527121  0.04207544  0.01373164
## chlorides             0.371260481 -0.22114054 -0.12890656
## free.sulfur.dioxide   0.051657572 -0.06940835 -0.05065606
## total.sulfur.dioxide  0.042946836 -0.20565394 -0.18510029
## density               0.148506412 -0.49617977 -0.17491923
## pH                   -0.196647602  0.20563251 -0.05773139
## sulphates             1.000000000  0.09359475  0.25139708
## alcohol               0.093594750  1.00000000  0.47616632
## quality               0.251397079  0.47616632  1.00000000

Alcohol vs. Quality

The plot shows a positive relationship between alcohol and quality. They have correlation coefficient of 0.48 which is a moderate correlation. Alcohol seems to have a positive impact on wine quality.

The more wine contains alcohol the higher rated the wine get.

Volatile Acidity vs. Quality

The plot shows a negative relationship between Volatile Acidity and quality. They have correlation coefficient of -0.4 which is a moderate correlation. Volatile Acidity seems to have a negative impact on wine quality, the more wine contains Volatile Acidity the lower rated the wine get.

The lower volatile acidity seems to mean higher wine quality.

Alcohol vs. Density

The plot shows a negative relationship between Alcohol and Density. They have correlation coefficient of -0.5 which is a moderate correlation. Alcohol seems to have a negative impact on density in the wine, the more wine contains Alcohol the lower density get.

The lower Alcohol seems to mean higher wine Density.

pH vs. Fixed Acidity

The plot shows a negative relationship between pH and Fixed Acidity. They have correlation coefficient of -0.7 which is a strong correlation. pH seems to have a negative impact on Fixed Acidity in the wine, the more wine contains pH the lower Fixed Acidity get.

The lower the pH, the higher the Fixed Acidity in the wine.

pH vs. Citric Acid

The plot shows a negative relationship between pH and Citric Acid. They have correlation coefficient of -0.54 which is a moderate correlation. pH seems to have a negative impact on Citric Acid in the wine, the more wine contains pH the lower Citric Acid get.

The lower the pH, the higher the Citric Acid in the wine.

Density vs. Fixed Acidity

The plot shows a positive relationship between Density and Fixed Acidity. They have correlation coefficient of 0.7 which is a strong correlation. Density seems to have a positive impact on Fixed Acidity in the wine, the more wine contains Density the more Fixed Acidity get.

The more Density, the higher the Fixed Acidity in the wine.

Total Sulfur Dioxide vs. Free Sulfur Dioxide

The plot shows a positive relationship between Total Sulfur Dioxide and Free Sulfur Dioxide. They have correlation coefficient of 0.7 which is a strong correlation. Total Sulfur Dioxide seems to have a positive impact on Free Sulfur Dioxide in the wine, the more wine contains Total Sulfur Dioxide the more Free Sulfur Dioxide get.

The more Total Sulfur Dioxide, the higher the Free Sulfur Dioxide in the wine.

Citric Acid vs. Fixed Acidity

The plot shows a positive relationship between Citric Acid and Fixed Acidity. They have correlation coefficient of 0.7 which is a strong correlation. Citric Acid seems to have a positive impact on Fixed Acidity in the wine, the more wine contains Citric Acid the more Fixed Acidity get.

The more Citric Acid, the higher the Fixed Acidity in the wine.

Citric Acid vs. Volatile Acidity

The plot shows a negative relationship between Citric Acid and Volatile Acidity. They have correlation coefficient of -0.6 which is between moderate and strong correlation. Citric Acid seems to have a negative impact on Volatile Acidity in the wine, the more wine contains Citric Acid the lower Volatile Acidity get.

The lower Citric Acid, the higher the Volatile Acidity in the wine.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the  investigation. How did the feature(s) of interest vary with other features in  the dataset?

Quality is moderately correlated with Alcohol and Volatile Acidity. Positively with alcohol and negatively with volatile acidity. And quality is weekly correlated with Citric Acid, Sulphates, pH and Density. Positively with citric acid and sulphates and negatively with pH and density. And there’s no significant relationship between quality and (Fixed Acidity, Residual Sugar, Chlorides, Free Sulfur Doxide and Total Sulfur Doxide). The wine get better with: - Higher alcohol, citric acid and sulphates. - Lower volatile acidity, pH and Density.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Yes i did.

Fixed Acidity have interesting strong relation with pH, citric acid and density. Negatively with pH and positively with citric acid and density. There’s another interesting strong positive relation between Total Sulfur Dioxide and Free Sulfur Dioxide. And there are other negativie relations with moderate correlation between: (alcohol and density),(pH and Citric Acid), (citric acid and volatile acidity).

What was the strongest relationship you found ?

The strongest relationship we found between pH and fixed acidity with negative correlation of -0.683.

Multivariate Plots Section

Citric Acid vs. Volatile Acidity over Quality

From the plot we can see high quality wines when the wine have low volatile acidity. and low quality wines when the wine have low Citric Acid and high volatile acidity.

Alcohol vs. Volatile Acidity over Quality

From the plot we can see high quality wines when the wine have low volatile acidity and high alcohol. and low quality wines when the wine have low alcohol and high volatile acidity.

Alcohol vs. Density over Quality

From the plot we can see high quality wines when the wine have density less than 1 and high alcohol. and low quality wines when the wine have low alcohol and density higher than 0.994.

Alcohol vs. Citric Acid over Quality

From the plot we can see high quality wines when the wine have citric acid less than 0.75 and high alcohol. and low quality wines when the wine have alcohol less than 11 and citric acid less than 0.75.

there’s no clear conclusion about citric acid with alcohol over quality.

Alcohol vs. Sulphates over Quality

From the plot we can see high quality wines when the wine have sulphates between 0.6 and 1.3 and high alcohol. and low quality wines when the wine have low alcohol and sulphates less than 0.6.

Sulphates vs. Volatile Acidity over Quality

From the plot we can see high quality wines when the wine have sulphates between 0.6 and 1.3 and volatile acidity less than 0.7. and low quality wines when the wine have sulphates less than 0.6 and high volatile acidity higher than 0.7.

Alcohol vs. pH over Quality

From the plot we can see high quality wines when the wine have pH between 2.7 and 3.7 and high alcohol. and low quality wines when the wine have alcohol less than 11 and pH between 2.7 and 3.7.

there’s no clear conclusion about pH with alcohol over quality.

pH vs. Volatile Acidity over Quality

From the plot we can see high quality wines when the wine have volatile acidity less than 0.9 and pH less than 3.7. and low quality wines when the wine have pH between 3.2 and 3.7 and volatile acidity above 0.7.

Density vs. Volatile Acidity over Quality

From the plot we can see high quality wines when the wine have density less than 0.998 and volatile acidity less 0.9. and low quality wines when the wine have density between 0.994 and 1.002 and volatile acidity above 0.4 .

Citric Acid vs. Fixed Acidity over Volatile Acidity

From the plot we can see the higher Citric Acid and Fixed Acidity in the wine, The higher Volatile Acidity get in wine. and the lower Citric Acid and Fixed Acidity in the wine, The lower Volatile Acidity get in wine.

Density vs. Fixed Acidity over Citric Acid

From the plot we can see the higher Density and Fixed Acidity in the wine, The higher Citric Acid get in wine. and the lower Density and Fixed Acidity in the wine, The lower Citric Acid get in wine.

Alcohol vs. Density over Fixed Acidity

From the plot we can see Higher alcohol and lower density in the wine, Led to lower Fixed Acidity in wine. Lower alcohol and higher density in the wine, Led to higher Fixed Acidity in wine.

pH vs. Citric Acid over Fixed Acidity

From the plot we can see Higher fixed acidity in wine when pH less than 3.4 and citric acid less than 0.78. lower fixed acidity in wine when pH above 3.4 and citric acid less than 0.78.

pH vs. Citric Acid over Volatile Acidity

From the plot we can see volatile acidity get lower in wine when Citric Acid get lower and pH between 3 and 3.7. And volatile acidity get higher in wine when Citric Acid between 0.25 and 0.76 and pH between 3 and 3.7.

pH vs. Fixed Acidity over Density

From the plot we can see high density when the wine have fixed acidity less than 8 and pH between 2.7 and 3.7. and low density when the wine have fixed acidity above 10 and pH between 2.7 and 3.7.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the . Were there features that strengthened each other in terms of at your feature(s) of interest?

After investigating many multivariate plots, We found quality will be higher in wine when we combined these properties:

and we found other relations between features like:

Were there any interesting or surprising interactions between features?

Since i did’t had a big background in the wine field every relationship between features was a surprising and interesting relationship.


Final Plots and Summary

Plot One : Quality Of Wine

##   1   2   3   4   5   6 
##  10  53 681 638 199  18

Description One

what the most common quality in the dataset ?

We choose This plot to see that the dataset contains 1.12% of the best quality and contains 0.62% of the worst quality in red wine, And contains82.5% of the avarege quality between 5 and 6. And we want to know What factors affect the quality of wine ?

Plot Two : Alcohol And Quality Of Wine

Description Two

Alcohol has the strongest correlation with quality which led to alcohol was the most influential factor on wine quality. Based on that we decide to choose This plot. the plot explain the relation between alcohol and quality of wine. When alcohol increses in red wine we can see the quality above 5 also increse.

Plot Three

Description Three

We decide to choose This plot becuase we want to know the effect of other factors associated with alcohol on wine quality. We observed the opposite direction relation between quality and volatile acidity, density. And the positive relation between quality and alcohol. High quality wines when the wine have low volatile acidity and high alcohol and density less than 1. Low quality wines when the wine low alcohol and high volatile acidity and density higher than 0.994. Both of these factors are important because of their significant impact on wine quality.


Reflection

This is a unique and intersting project. I have learned a lot through working on the project and I had many difficulties and challenges to reach the goal of the project. I spent many hours learning and building.

What are the difficulties and challenges about this project?

The dataset is about the red wine with 1,599 samples and 12 variables. And there’s no missing data. And it was a tidy dataset. The purpose of the project to analyze and explore the data using the plot. First we tried to understand the variables and what each variable means in this data before start to analyze them. Then we explore variables individually in Univariate Plots Section. After that we explore and analyze Binary variables the relation between them and we use correlations matrix in Bivariate Plots Section. Finally we explored the relationships between three variables in Multivariate Plots Section.

How to improve the analysis ?

If the number of samples was greater, The analysis would be better or if the red and white wine data were combined to study wine more and widely.

Is there any insight for future work?

For future work with this data we should foucse on nonlinear regression modelling for great and accurate results.

We have found a few samples with a high quality wine. This analysis can be used to raise the wine quality to achieve the best results.

In the end i really have fun analyzing this dataset with the amazing R language.

Resource

https://www.r-bloggers.com/identify-describe-plot-and-remove-the-outliers-from-the-dataset/

https://s3.amazonaws.com/udacity-hosted-downloads/ud651/GeographyOfAmericanMusic.html

https://en.wikipedia.org/wiki/Acids_in_wine#Citric_acid

https://s3.amazonaws.com/udacity-hosted-downloads/ud651/AtlanticHurricaneTracking.html

https://statistics.laerd.com/statistical-guides/types-of-variable.php

https://s3.amazonaws.com/content.udacity-data.com/courses/ud651/diamondsExample_2016-05.html

https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information

https://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-statistics/regression/how-to/correlation/interpret-the-results/

https://www.dummies.com/education/math/statistics/how-to-interpret-a-correlation-coefficient-r/

http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram

http://adv-r.had.co.nz/Functions.html

https://4va.github.io/biodatasci/r-viz-gapminder.html

https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html

https://www.r-bloggers.com/producing-grids-of-plots-in-r-with-ggplot2-a-journey-of-discovery/

https://www.rdocumentation.org/packages/cowplot/versions/0.9.4/topics/plot_grid

https://rstudio-pubs-static.s3.amazonaws.com/337414_ef0db534b06a4232945a5b907cfa871a.html

https://www.rdocumentation.org/packages/ggplot2/versions/2.2.1/topics/labs

https://cran.r-project.org/web/packages/cowplot/vignettes/plot_grid.html