In the Assignment 3, professor asked us to retrieve the data required to answer your research questions.
In the Assignment 2, I listed the question about the Wine: Two Wine Data Exploration 1. Are those two data well organized? 2. How much data did we obtain? 3. How many variables row and column 4. How many missing value? 5. Are those two data reliable data? 6. Are those two data similar ? 7. Are those two data duplicated?
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readxl)
library(readr)
setwd("~/Downloads/")
Wine_data <- read_excel("~/Downloads/Wine_data.xlsx")
View(Wine_data)
# Structure and summary of the Dataframe
str(Wine_data)
## Classes 'tbl_df', 'tbl' and 'data.frame': 4898 obs. of 12 variables:
## $ fixed acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free sulfur dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total sulfur dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : num 6 6 6 6 6 6 6 6 6 6 ...
summary(Wine_data)
## fixed acidity volatile acidity citric acid residual sugar
## Min. : 3.800 Min. :0.0800 Min. :0.0000 Min. : 0.600
## 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700 1st Qu.: 1.700
## Median : 6.800 Median :0.2600 Median :0.3200 Median : 5.200
## Mean : 6.855 Mean :0.2782 Mean :0.3342 Mean : 6.391
## 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900 3rd Qu.: 9.900
## Max. :14.200 Max. :1.1000 Max. :1.6600 Max. :65.800
## chlorides free sulfur dioxide total sulfur dioxide
## Min. :0.00900 Min. : 2.00 Min. : 9.0
## 1st Qu.:0.03600 1st Qu.: 23.00 1st Qu.:108.0
## Median :0.04300 Median : 34.00 Median :134.0
## Mean :0.04577 Mean : 35.31 Mean :138.4
## 3rd Qu.:0.05000 3rd Qu.: 46.00 3rd Qu.:167.0
## Max. :0.34600 Max. :289.00 Max. :440.0
## density pH sulphates alcohol
## Min. :0.9871 Min. :2.720 Min. :0.2200 Min. : 8.00
## 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100 1st Qu.: 9.50
## Median :0.9937 Median :3.180 Median :0.4700 Median :10.40
## Mean :0.9940 Mean :3.188 Mean :0.4898 Mean :10.51
## 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500 3rd Qu.:11.40
## Max. :1.0390 Max. :3.820 Max. :1.0800 Max. :14.20
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.878
## 3rd Qu.:6.000
## Max. :9.000
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_integer(),
## fixed.acidity = col_double(),
## volatile.acidity = col_double(),
## citric.acid = col_double(),
## residual.sugar = col_double(),
## chlorides = col_double(),
## free.sulfur.dioxide = col_double(),
## total.sulfur.dioxide = col_integer(),
## density = col_double(),
## pH = col_double(),
## sulphates = col_double(),
## alcohol = col_double(),
## quality = col_integer()
## )
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 2 parsing failures.
## row # A tibble: 2 x 5 col row col expected actual expected <int> <chr> <chr> <chr> actual 1 1296 total.sulfur.dioxide no trailing characters .5 file 2 1297 total.sulfur.dioxide no trailing characters .5 row # ... with 1 more variables: file <chr>
## # A tibble: 1 x 12
## `fixed acidity` `volatile acidity` `citric acid` `residual sugar`
## <dbl> <dbl> <dbl> <dbl>
## 1 7 0.27 0.36 20.7
## # ... with 8 more variables: chlorides <dbl>, `free sulfur dioxide` <dbl>,
## # `total sulfur dioxide` <dbl>, density <dbl>, pH <dbl>,
## # sulphates <dbl>, alcohol <dbl>, quality <dbl>
## # A tibble: 6 x 12
## fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## # ... with 7 more variables: free.sulfur.dioxide <dbl>,
## # total.sulfur.dioxide <int>, density <dbl>, pH <dbl>, sulphates <dbl>,
## # alcohol <dbl>, quality <int>
Those two data set are well organized. The Wine_dataset contents 4898 obs of 12 variables and the World Wine Quality dataset included of 1599 obs of 13 variables. All those datasets are well without missing values. The two datasets were obtained in the different years. Although those two data set have the same those two data set(after removed the fisrt column), those two data set were not duplicate and overlaped.Please checl the bar chars of each dataset.
## Warning: position_stack requires non-overlapping x intervals
## Warning: position_stack requires non-overlapping x intervals
## Warning: position_stack requires non-overlapping x intervals
## Warning: position_stack requires non-overlapping x intervals
#Bar Chart of the alcohol feature in World Wine Dataset
#Bar Chart of the alcohol feature in Red Wine Dataset
I would like to find out the corelation between the different features.
## fixed acidity volatile acidity citric acid
## fixed acidity 1.00000000 -0.02269729 0.28918070
## volatile acidity -0.02269729 1.00000000 -0.14947181
## citric acid 0.28918070 -0.14947181 1.00000000
## residual sugar 0.08902070 0.06428606 0.09421162
## chlorides 0.02308564 0.07051157 0.11436445
## free sulfur dioxide -0.04939586 -0.09701194 0.09407722
## total sulfur dioxide 0.09106976 0.08926050 0.12113080
## density 0.26533101 0.02711385 0.14950257
## pH -0.42585829 -0.03191537 -0.16374821
## sulphates -0.01714299 -0.03572815 0.06233094
## alcohol -0.12088112 0.06771794 -0.07572873
## residual sugar chlorides free sulfur dioxide
## fixed acidity 0.08902070 0.02308564 -0.0493958591
## volatile acidity 0.06428606 0.07051157 -0.0970119393
## citric acid 0.09421162 0.11436445 0.0940772210
## residual sugar 1.00000000 0.08868454 0.2990983537
## chlorides 0.08868454 1.00000000 0.1013923521
## free sulfur dioxide 0.29909835 0.10139235 1.0000000000
## total sulfur dioxide 0.40143931 0.19891030 0.6155009650
## density 0.83896645 0.25721132 0.2942104109
## pH -0.19413345 -0.09043946 -0.0006177961
## sulphates -0.02666437 0.01676288 0.0592172458
## alcohol -0.45063122 -0.36018871 -0.2501039415
## total sulfur dioxide density pH
## fixed acidity 0.091069756 0.26533101 -0.4258582910
## volatile acidity 0.089260504 0.02711385 -0.0319153683
## citric acid 0.121130798 0.14950257 -0.1637482114
## residual sugar 0.401439311 0.83896645 -0.1941334540
## chlorides 0.198910300 0.25721132 -0.0904394560
## free sulfur dioxide 0.615500965 0.29421041 -0.0006177961
## total sulfur dioxide 1.000000000 0.52988132 0.0023209718
## density 0.529881324 1.00000000 -0.0935914935
## pH 0.002320972 -0.09359149 1.0000000000
## sulphates 0.134562367 0.07449315 0.1559514973
## alcohol -0.448892102 -0.78013762 0.1214320987
## sulphates alcohol
## fixed acidity -0.01714299 -0.12088112
## volatile acidity -0.03572815 0.06771794
## citric acid 0.06233094 -0.07572873
## residual sugar -0.02666437 -0.45063122
## chlorides 0.01676288 -0.36018871
## free sulfur dioxide 0.05921725 -0.25010394
## total sulfur dioxide 0.13456237 -0.44889210
## density 0.07449315 -0.78013762
## pH 0.15595150 0.12143210
## sulphates 1.00000000 -0.01743277
## alcohol -0.01743277 1.00000000
## fixed acidity volatile acidity citric acid
## fixed acidity 1.00000000 -0.042865228 0.29787793
## volatile acidity -0.04286523 1.000000000 -0.15040998
## citric acid 0.29787793 -0.150409981 1.00000000
## residual sugar 0.10672494 0.108627441 0.02462098
## chlorides 0.09469118 -0.004934156 0.03265950
## free sulfur dioxide -0.02454223 -0.081212903 0.08831406
## total sulfur dioxide 0.11264866 0.117613959 0.09321867
## density 0.27003091 0.010124341 0.09142519
## pH -0.41834116 -0.045203569 -0.14619267
## sulphates -0.01323781 -0.016902303 0.07976628
## alcohol -0.10682740 0.033966554 -0.02916996
## residual sugar chlorides free sulfur dioxide
## fixed acidity 0.10672494 0.094691176 -0.024542230
## volatile acidity 0.10862744 -0.004934156 -0.081212903
## citric acid 0.02462098 0.032659495 0.088314056
## residual sugar 1.00000000 0.227843904 0.346106737
## chlorides 0.22784390 1.000000000 0.167045505
## free sulfur dioxide 0.34610674 0.167045505 1.000000000
## total sulfur dioxide 0.43125248 0.375243666 0.618616339
## density 0.78036485 0.508301765 0.327821798
## pH -0.18002822 -0.054006467 -0.006273578
## sulphates -0.00384398 0.093930696 0.052251683
## alcohol -0.44525743 -0.570806407 -0.272569338
## total sulfur dioxide density pH
## fixed acidity 0.11264866 0.27003091 -0.418341158
## volatile acidity 0.11761396 0.01012434 -0.045203569
## citric acid 0.09321867 0.09142519 -0.146192675
## residual sugar 0.43125248 0.78036485 -0.180028223
## chlorides 0.37524367 0.50830177 -0.054006467
## free sulfur dioxide 0.61861634 0.32782180 -0.006273578
## total sulfur dioxide 1.00000000 0.56382409 -0.011828718
## density 0.56382409 1.00000000 -0.110060852
## pH -0.01182872 -0.11006085 1.000000000
## sulphates 0.15782480 0.09507867 0.140243305
## alcohol -0.47661933 -0.82185508 0.148857249
## sulphates alcohol
## fixed acidity -0.01323781 -0.10682740
## volatile acidity -0.01690230 0.03396655
## citric acid 0.07976628 -0.02916996
## residual sugar -0.00384398 -0.44525743
## chlorides 0.09393070 -0.57080641
## free sulfur dioxide 0.05225168 -0.27256934
## total sulfur dioxide 0.15782480 -0.47661933
## density 0.09507867 -0.82185508
## pH 0.14024331 0.14885725
## sulphates 1.00000000 -0.04486799
## alcohol -0.04486799 1.00000000
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.256130895 0.67170343
## volatile.acidity -0.25613089 1.000000000 -0.55249568
## citric.acid 0.67170343 -0.552495685 1.00000000
## residual.sugar 0.11477672 0.001917882 0.14357716
## chlorides 0.09370519 0.061297772 0.20382291
## free.sulfur.dioxide -0.15379419 -0.010503827 -0.06097813
## total.sulfur.dioxide NA NA NA
## density 0.66804729 0.022026232 0.36494718
## pH -0.68297819 0.234937294 -0.54190414
## sulphates 0.18300566 -0.260986685 0.31277004
## alcohol -0.06166827 -0.202288027 0.10990325
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.114776724 0.093705186 -0.153794193
## volatile.acidity 0.001917882 0.061297772 -0.010503827
## citric.acid 0.143577162 0.203822914 -0.060978129
## residual.sugar 1.000000000 0.055609535 0.187048995
## chlorides 0.055609535 1.000000000 0.005562147
## free.sulfur.dioxide 0.187048995 0.005562147 1.000000000
## total.sulfur.dioxide NA NA NA
## density 0.355283371 0.200632327 -0.021945831
## pH -0.085652422 -0.265026131 0.070377499
## sulphates 0.005527121 0.371260481 0.051657572
## alcohol 0.042075437 -0.221140545 -0.069408354
## total.sulfur.dioxide density pH
## fixed.acidity NA 0.66804729 -0.68297819
## volatile.acidity NA 0.02202623 0.23493729
## citric.acid NA 0.36494718 -0.54190414
## residual.sugar NA 0.35528337 -0.08565242
## chlorides NA 0.20063233 -0.26502613
## free.sulfur.dioxide NA -0.02194583 0.07037750
## total.sulfur.dioxide 1 NA NA
## density NA 1.00000000 -0.34169933
## pH NA -0.34169933 1.00000000
## sulphates NA 0.14850641 -0.19664760
## alcohol NA -0.49617977 0.20563251
## sulphates alcohol
## fixed.acidity 0.183005664 -0.06166827
## volatile.acidity -0.260986685 -0.20228803
## citric.acid 0.312770044 0.10990325
## residual.sugar 0.005527121 0.04207544
## chlorides 0.371260481 -0.22114054
## free.sulfur.dioxide 0.051657572 -0.06940835
## total.sulfur.dioxide NA NA
## density 0.148506412 -0.49617977
## pH -0.196647602 0.20563251
## sulphates 1.000000000 0.09359475
## alcohol 0.093594750 1.00000000
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.00000000 -0.27828222 0.66170842
## volatile.acidity -0.27828222 1.00000000 -0.61025947
## citric.acid 0.66170842 -0.61025947 1.00000000
## residual.sugar 0.22070086 0.03238560 0.17641731
## chlorides 0.25090411 0.15877025 0.11257651
## free.sulfur.dioxide -0.17513656 0.02116264 -0.07645158
## total.sulfur.dioxide NA NA NA
## density 0.62307076 0.02501412 0.35228526
## pH -0.70667359 0.23357152 -0.54802628
## sulphates 0.21265375 -0.32558398 0.33107440
## alcohol -0.06657566 -0.22493168 0.09645554
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.22070086 0.2509041064 -0.1751365613
## volatile.acidity 0.03238560 0.1587702548 0.0211626414
## citric.acid 0.17641731 0.1125765077 -0.0764515753
## residual.sugar 1.00000000 0.2129592419 0.0746178640
## chlorides 0.21295924 1.0000000000 0.0008051686
## free.sulfur.dioxide 0.07461786 0.0008051686 1.0000000000
## total.sulfur.dioxide NA NA NA
## density 0.42226586 0.4113896972 -0.0411776800
## pH -0.08997095 -0.2343612736 0.1156791779
## sulphates 0.03833200 0.0208254792 0.0458623500
## alcohol 0.11654813 -0.2845039422 -0.0813673063
## total.sulfur.dioxide density pH
## fixed.acidity NA 0.62307076 -0.70667359
## volatile.acidity NA 0.02501412 0.23357152
## citric.acid NA 0.35228526 -0.54802628
## residual.sugar NA 0.42226586 -0.08997095
## chlorides NA 0.41138970 -0.23436127
## free.sulfur.dioxide NA -0.04117768 0.11567918
## total.sulfur.dioxide 1 NA NA
## density NA 1.00000000 -0.31205508
## pH NA -0.31205508 1.00000000
## sulphates NA 0.16147823 -0.08030604
## alcohol NA -0.46244458 0.17993243
## sulphates alcohol
## fixed.acidity 0.21265375 -0.06657566
## volatile.acidity -0.32558398 -0.22493168
## citric.acid 0.33107440 0.09645554
## residual.sugar 0.03833200 0.11654813
## chlorides 0.02082548 -0.28450394
## free.sulfur.dioxide 0.04586235 -0.08136731
## total.sulfur.dioxide NA NA
## density 0.16147823 -0.46244458
## pH -0.08030604 0.17993243
## sulphates 1.00000000 0.20732955
## alcohol 0.20732955 1.00000000