In the Assignment 3, professor asked us to retrieve the data required to answer your research questions.

In the Assignment 2, I listed the question about the Wine: Two Wine Data Exploration 1. Are those two data well organized? 2. How much data did we obtain? 3. How many variables row and column 4. How many missing value? 5. Are those two data reliable data? 6. Are those two data similar ? 7. Are those two data duplicated?

The Library we use in this EDA

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readxl)
library(readr)
setwd("~/Downloads/")

Observe the Wine Dataset – World Wine Data Set

Wine_data <- read_excel("~/Downloads/Wine_data.xlsx")
View(Wine_data)
# Structure and summary of the Dataframe
str(Wine_data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    4898 obs. of  12 variables:
##  $ fixed acidity       : num  7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
##  $ volatile acidity    : num  0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
##  $ citric acid         : num  0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
##  $ residual sugar      : num  20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
##  $ chlorides           : num  0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
##  $ free sulfur dioxide : num  45 14 30 47 47 30 30 45 14 28 ...
##  $ total sulfur dioxide: num  170 132 97 186 186 97 136 170 132 129 ...
##  $ density             : num  1.001 0.994 0.995 0.996 0.996 ...
##  $ pH                  : num  3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
##  $ sulphates           : num  0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
##  $ alcohol             : num  8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
##  $ quality             : num  6 6 6 6 6 6 6 6 6 6 ...
summary(Wine_data)
##  fixed acidity    volatile acidity  citric acid     residual sugar  
##  Min.   : 3.800   Min.   :0.0800   Min.   :0.0000   Min.   : 0.600  
##  1st Qu.: 6.300   1st Qu.:0.2100   1st Qu.:0.2700   1st Qu.: 1.700  
##  Median : 6.800   Median :0.2600   Median :0.3200   Median : 5.200  
##  Mean   : 6.855   Mean   :0.2782   Mean   :0.3342   Mean   : 6.391  
##  3rd Qu.: 7.300   3rd Qu.:0.3200   3rd Qu.:0.3900   3rd Qu.: 9.900  
##  Max.   :14.200   Max.   :1.1000   Max.   :1.6600   Max.   :65.800  
##    chlorides       free sulfur dioxide total sulfur dioxide
##  Min.   :0.00900   Min.   :  2.00      Min.   :  9.0       
##  1st Qu.:0.03600   1st Qu.: 23.00      1st Qu.:108.0       
##  Median :0.04300   Median : 34.00      Median :134.0       
##  Mean   :0.04577   Mean   : 35.31      Mean   :138.4       
##  3rd Qu.:0.05000   3rd Qu.: 46.00      3rd Qu.:167.0       
##  Max.   :0.34600   Max.   :289.00      Max.   :440.0       
##     density             pH          sulphates         alcohol     
##  Min.   :0.9871   Min.   :2.720   Min.   :0.2200   Min.   : 8.00  
##  1st Qu.:0.9917   1st Qu.:3.090   1st Qu.:0.4100   1st Qu.: 9.50  
##  Median :0.9937   Median :3.180   Median :0.4700   Median :10.40  
##  Mean   :0.9940   Mean   :3.188   Mean   :0.4898   Mean   :10.51  
##  3rd Qu.:0.9961   3rd Qu.:3.280   3rd Qu.:0.5500   3rd Qu.:11.40  
##  Max.   :1.0390   Max.   :3.820   Max.   :1.0800   Max.   :14.20  
##     quality     
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :6.000  
##  Mean   :5.878  
##  3rd Qu.:6.000  
##  Max.   :9.000

Observe the Wine DataSet from Kaggle – Red Wine Dataset

## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_integer(),
##   fixed.acidity = col_double(),
##   volatile.acidity = col_double(),
##   citric.acid = col_double(),
##   residual.sugar = col_double(),
##   chlorides = col_double(),
##   free.sulfur.dioxide = col_double(),
##   total.sulfur.dioxide = col_integer(),
##   density = col_double(),
##   pH = col_double(),
##   sulphates = col_double(),
##   alcohol = col_double(),
##   quality = col_integer()
## )
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 2 parsing failures.
## row # A tibble: 2 x 5 col     row                  col               expected actual expected   <int>                <chr>                  <chr>  <chr> actual 1  1296 total.sulfur.dioxide no trailing characters     .5 file 2  1297 total.sulfur.dioxide no trailing characters     .5 row # ... with 1 more variables: file <chr>

Remote the first Column of Red Wine Dataset

## # A tibble: 1 x 12
##   `fixed acidity` `volatile acidity` `citric acid` `residual sugar`
##             <dbl>              <dbl>         <dbl>            <dbl>
## 1               7               0.27          0.36             20.7
## # ... with 8 more variables: chlorides <dbl>, `free sulfur dioxide` <dbl>,
## #   `total sulfur dioxide` <dbl>, density <dbl>, pH <dbl>,
## #   sulphates <dbl>, alcohol <dbl>, quality <dbl>
## # A tibble: 6 x 12
##   fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
##           <dbl>            <dbl>       <dbl>          <dbl>     <dbl>
## 1           7.4             0.70        0.00            1.9     0.076
## 2           7.8             0.88        0.00            2.6     0.098
## 3           7.8             0.76        0.04            2.3     0.092
## 4          11.2             0.28        0.56            1.9     0.075
## 5           7.4             0.70        0.00            1.9     0.076
## 6           7.4             0.66        0.00            1.8     0.075
## # ... with 7 more variables: free.sulfur.dioxide <dbl>,
## #   total.sulfur.dioxide <int>, density <dbl>, pH <dbl>, sulphates <dbl>,
## #   alcohol <dbl>, quality <int>

Those two data set are well organized. The Wine_dataset contents 4898 obs of 12 variables and the World Wine Quality dataset included of 1599 obs of 13 variables. All those datasets are well without missing values. The two datasets were obtained in the different years. Although those two data set have the same those two data set(after removed the fisrt column), those two data set were not duplicate and overlaped.Please checl the bar chars of each dataset.

Explore the Univariate Plots

Bar Chart of the quality feature in Red Wine Dataset

Bar Chart of the alcohol feature in World Wine Dataset

## Warning: position_stack requires non-overlapping x intervals

## Warning: position_stack requires non-overlapping x intervals

Bar Chart of the alcohol feature in World Wine Dataset

## Warning: position_stack requires non-overlapping x intervals

Bar Chart of the alcohol feature in Red Wine Dataset

## Warning: position_stack requires non-overlapping x intervals

#Bar Chart of the alcohol feature in World Wine Dataset #Bar Chart of the alcohol feature in Red Wine Dataset

Line Chart of the alcohol feature in World Wine Dataset

Line Chart of the alcohol feature in Red Wine Dataset

I would like to find out the corelation between the different features.

Corelationship

##                      fixed acidity volatile acidity citric acid
## fixed acidity           1.00000000      -0.02269729  0.28918070
## volatile acidity       -0.02269729       1.00000000 -0.14947181
## citric acid             0.28918070      -0.14947181  1.00000000
## residual sugar          0.08902070       0.06428606  0.09421162
## chlorides               0.02308564       0.07051157  0.11436445
## free sulfur dioxide    -0.04939586      -0.09701194  0.09407722
## total sulfur dioxide    0.09106976       0.08926050  0.12113080
## density                 0.26533101       0.02711385  0.14950257
## pH                     -0.42585829      -0.03191537 -0.16374821
## sulphates              -0.01714299      -0.03572815  0.06233094
## alcohol                -0.12088112       0.06771794 -0.07572873
##                      residual sugar   chlorides free sulfur dioxide
## fixed acidity            0.08902070  0.02308564       -0.0493958591
## volatile acidity         0.06428606  0.07051157       -0.0970119393
## citric acid              0.09421162  0.11436445        0.0940772210
## residual sugar           1.00000000  0.08868454        0.2990983537
## chlorides                0.08868454  1.00000000        0.1013923521
## free sulfur dioxide      0.29909835  0.10139235        1.0000000000
## total sulfur dioxide     0.40143931  0.19891030        0.6155009650
## density                  0.83896645  0.25721132        0.2942104109
## pH                      -0.19413345 -0.09043946       -0.0006177961
## sulphates               -0.02666437  0.01676288        0.0592172458
## alcohol                 -0.45063122 -0.36018871       -0.2501039415
##                      total sulfur dioxide     density            pH
## fixed acidity                 0.091069756  0.26533101 -0.4258582910
## volatile acidity              0.089260504  0.02711385 -0.0319153683
## citric acid                   0.121130798  0.14950257 -0.1637482114
## residual sugar                0.401439311  0.83896645 -0.1941334540
## chlorides                     0.198910300  0.25721132 -0.0904394560
## free sulfur dioxide           0.615500965  0.29421041 -0.0006177961
## total sulfur dioxide          1.000000000  0.52988132  0.0023209718
## density                       0.529881324  1.00000000 -0.0935914935
## pH                            0.002320972 -0.09359149  1.0000000000
## sulphates                     0.134562367  0.07449315  0.1559514973
## alcohol                      -0.448892102 -0.78013762  0.1214320987
##                        sulphates     alcohol
## fixed acidity        -0.01714299 -0.12088112
## volatile acidity     -0.03572815  0.06771794
## citric acid           0.06233094 -0.07572873
## residual sugar       -0.02666437 -0.45063122
## chlorides             0.01676288 -0.36018871
## free sulfur dioxide   0.05921725 -0.25010394
## total sulfur dioxide  0.13456237 -0.44889210
## density               0.07449315 -0.78013762
## pH                    0.15595150  0.12143210
## sulphates             1.00000000 -0.01743277
## alcohol              -0.01743277  1.00000000
##                      fixed acidity volatile acidity citric acid
## fixed acidity           1.00000000     -0.042865228  0.29787793
## volatile acidity       -0.04286523      1.000000000 -0.15040998
## citric acid             0.29787793     -0.150409981  1.00000000
## residual sugar          0.10672494      0.108627441  0.02462098
## chlorides               0.09469118     -0.004934156  0.03265950
## free sulfur dioxide    -0.02454223     -0.081212903  0.08831406
## total sulfur dioxide    0.11264866      0.117613959  0.09321867
## density                 0.27003091      0.010124341  0.09142519
## pH                     -0.41834116     -0.045203569 -0.14619267
## sulphates              -0.01323781     -0.016902303  0.07976628
## alcohol                -0.10682740      0.033966554 -0.02916996
##                      residual sugar    chlorides free sulfur dioxide
## fixed acidity            0.10672494  0.094691176        -0.024542230
## volatile acidity         0.10862744 -0.004934156        -0.081212903
## citric acid              0.02462098  0.032659495         0.088314056
## residual sugar           1.00000000  0.227843904         0.346106737
## chlorides                0.22784390  1.000000000         0.167045505
## free sulfur dioxide      0.34610674  0.167045505         1.000000000
## total sulfur dioxide     0.43125248  0.375243666         0.618616339
## density                  0.78036485  0.508301765         0.327821798
## pH                      -0.18002822 -0.054006467        -0.006273578
## sulphates               -0.00384398  0.093930696         0.052251683
## alcohol                 -0.44525743 -0.570806407        -0.272569338
##                      total sulfur dioxide     density           pH
## fixed acidity                  0.11264866  0.27003091 -0.418341158
## volatile acidity               0.11761396  0.01012434 -0.045203569
## citric acid                    0.09321867  0.09142519 -0.146192675
## residual sugar                 0.43125248  0.78036485 -0.180028223
## chlorides                      0.37524367  0.50830177 -0.054006467
## free sulfur dioxide            0.61861634  0.32782180 -0.006273578
## total sulfur dioxide           1.00000000  0.56382409 -0.011828718
## density                        0.56382409  1.00000000 -0.110060852
## pH                            -0.01182872 -0.11006085  1.000000000
## sulphates                      0.15782480  0.09507867  0.140243305
## alcohol                       -0.47661933 -0.82185508  0.148857249
##                        sulphates     alcohol
## fixed acidity        -0.01323781 -0.10682740
## volatile acidity     -0.01690230  0.03396655
## citric acid           0.07976628 -0.02916996
## residual sugar       -0.00384398 -0.44525743
## chlorides             0.09393070 -0.57080641
## free sulfur dioxide   0.05225168 -0.27256934
## total sulfur dioxide  0.15782480 -0.47661933
## density               0.09507867 -0.82185508
## pH                    0.14024331  0.14885725
## sulphates             1.00000000 -0.04486799
## alcohol              -0.04486799  1.00000000

##                      fixed.acidity volatile.acidity citric.acid
## fixed.acidity           1.00000000     -0.256130895  0.67170343
## volatile.acidity       -0.25613089      1.000000000 -0.55249568
## citric.acid             0.67170343     -0.552495685  1.00000000
## residual.sugar          0.11477672      0.001917882  0.14357716
## chlorides               0.09370519      0.061297772  0.20382291
## free.sulfur.dioxide    -0.15379419     -0.010503827 -0.06097813
## total.sulfur.dioxide            NA               NA          NA
## density                 0.66804729      0.022026232  0.36494718
## pH                     -0.68297819      0.234937294 -0.54190414
## sulphates               0.18300566     -0.260986685  0.31277004
## alcohol                -0.06166827     -0.202288027  0.10990325
##                      residual.sugar    chlorides free.sulfur.dioxide
## fixed.acidity           0.114776724  0.093705186        -0.153794193
## volatile.acidity        0.001917882  0.061297772        -0.010503827
## citric.acid             0.143577162  0.203822914        -0.060978129
## residual.sugar          1.000000000  0.055609535         0.187048995
## chlorides               0.055609535  1.000000000         0.005562147
## free.sulfur.dioxide     0.187048995  0.005562147         1.000000000
## total.sulfur.dioxide             NA           NA                  NA
## density                 0.355283371  0.200632327        -0.021945831
## pH                     -0.085652422 -0.265026131         0.070377499
## sulphates               0.005527121  0.371260481         0.051657572
## alcohol                 0.042075437 -0.221140545        -0.069408354
##                      total.sulfur.dioxide     density          pH
## fixed.acidity                          NA  0.66804729 -0.68297819
## volatile.acidity                       NA  0.02202623  0.23493729
## citric.acid                            NA  0.36494718 -0.54190414
## residual.sugar                         NA  0.35528337 -0.08565242
## chlorides                              NA  0.20063233 -0.26502613
## free.sulfur.dioxide                    NA -0.02194583  0.07037750
## total.sulfur.dioxide                    1          NA          NA
## density                                NA  1.00000000 -0.34169933
## pH                                     NA -0.34169933  1.00000000
## sulphates                              NA  0.14850641 -0.19664760
## alcohol                                NA -0.49617977  0.20563251
##                         sulphates     alcohol
## fixed.acidity         0.183005664 -0.06166827
## volatile.acidity     -0.260986685 -0.20228803
## citric.acid           0.312770044  0.10990325
## residual.sugar        0.005527121  0.04207544
## chlorides             0.371260481 -0.22114054
## free.sulfur.dioxide   0.051657572 -0.06940835
## total.sulfur.dioxide           NA          NA
## density               0.148506412 -0.49617977
## pH                   -0.196647602  0.20563251
## sulphates             1.000000000  0.09359475
## alcohol               0.093594750  1.00000000
##                      fixed.acidity volatile.acidity citric.acid
## fixed.acidity           1.00000000      -0.27828222  0.66170842
## volatile.acidity       -0.27828222       1.00000000 -0.61025947
## citric.acid             0.66170842      -0.61025947  1.00000000
## residual.sugar          0.22070086       0.03238560  0.17641731
## chlorides               0.25090411       0.15877025  0.11257651
## free.sulfur.dioxide    -0.17513656       0.02116264 -0.07645158
## total.sulfur.dioxide            NA               NA          NA
## density                 0.62307076       0.02501412  0.35228526
## pH                     -0.70667359       0.23357152 -0.54802628
## sulphates               0.21265375      -0.32558398  0.33107440
## alcohol                -0.06657566      -0.22493168  0.09645554
##                      residual.sugar     chlorides free.sulfur.dioxide
## fixed.acidity            0.22070086  0.2509041064       -0.1751365613
## volatile.acidity         0.03238560  0.1587702548        0.0211626414
## citric.acid              0.17641731  0.1125765077       -0.0764515753
## residual.sugar           1.00000000  0.2129592419        0.0746178640
## chlorides                0.21295924  1.0000000000        0.0008051686
## free.sulfur.dioxide      0.07461786  0.0008051686        1.0000000000
## total.sulfur.dioxide             NA            NA                  NA
## density                  0.42226586  0.4113896972       -0.0411776800
## pH                      -0.08997095 -0.2343612736        0.1156791779
## sulphates                0.03833200  0.0208254792        0.0458623500
## alcohol                  0.11654813 -0.2845039422       -0.0813673063
##                      total.sulfur.dioxide     density          pH
## fixed.acidity                          NA  0.62307076 -0.70667359
## volatile.acidity                       NA  0.02501412  0.23357152
## citric.acid                            NA  0.35228526 -0.54802628
## residual.sugar                         NA  0.42226586 -0.08997095
## chlorides                              NA  0.41138970 -0.23436127
## free.sulfur.dioxide                    NA -0.04117768  0.11567918
## total.sulfur.dioxide                    1          NA          NA
## density                                NA  1.00000000 -0.31205508
## pH                                     NA -0.31205508  1.00000000
## sulphates                              NA  0.16147823 -0.08030604
## alcohol                                NA -0.46244458  0.17993243
##                        sulphates     alcohol
## fixed.acidity         0.21265375 -0.06657566
## volatile.acidity     -0.32558398 -0.22493168
## citric.acid           0.33107440  0.09645554
## residual.sugar        0.03833200  0.11654813
## chlorides             0.02082548 -0.28450394
## free.sulfur.dioxide   0.04586235 -0.08136731
## total.sulfur.dioxide          NA          NA
## density               0.16147823 -0.46244458
## pH                   -0.08030604  0.17993243
## sulphates             1.00000000  0.20732955
## alcohol               0.20732955  1.00000000