This data is used to determine the physciochemicals of Wine Quality.
Creating a .CSV file and Read data
library(readr)
wine <- read.csv("/Users/blessinga/Desktop/winequality-red.csv", header=TRUE, sep=",")
str(wine)
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
colnames(wine)
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
# Checking for missing values
sum(is.na(wine))
## [1] 0
# Summary statistics
summary(wine)
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide density
## Min. :0.01200 Min. : 1.00 Min. : 6.00 Min. :0.9901
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00 1st Qu.:0.9956
## Median :0.07900 Median :14.00 Median : 38.00 Median :0.9968
## Mean :0.08747 Mean :15.87 Mean : 46.47 Mean :0.9967
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00 3rd Qu.:0.9978
## Max. :0.61100 Max. :72.00 Max. :289.00 Max. :1.0037
## pH sulphates alcohol quality
## Min. :2.740 Min. :0.3300 Min. : 8.40 Min. :3.000
## 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 1st Qu.:5.000
## Median :3.310 Median :0.6200 Median :10.20 Median :6.000
## Mean :3.311 Mean :0.6581 Mean :10.42 Mean :5.636
## 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :4.010 Max. :2.0000 Max. :14.90 Max. :8.000
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
wine2<-rename(wine,FxAc=fixed.acidity, VolAc=volatile.acidity, CtAc=citric.acid, ResiSug=residual.sugar, Chlo=chlorides, FrSulfDio= free.sulfur.dioxide, TtlSulfDio= total.sulfur.dioxide, Dens =density, pH= pH, Sulph= sulphates, Alc=alcohol, Qlty = quality)
head(wine2)
## FxAc VolAc CtAc ResiSug Chlo FrSulfDio TtlSulfDio Dens pH Sulph Alc Qlty
## 1 7.4 0.70 0.00 1.9 0.076 11 34 0.9978 3.51 0.56 9.4 5
## 2 7.8 0.88 0.00 2.6 0.098 25 67 0.9968 3.20 0.68 9.8 5
## 3 7.8 0.76 0.04 2.3 0.092 15 54 0.9970 3.26 0.65 9.8 5
## 4 11.2 0.28 0.56 1.9 0.075 17 60 0.9980 3.16 0.58 9.8 6
## 5 7.4 0.70 0.00 1.9 0.076 11 34 0.9978 3.51 0.56 9.4 5
## 6 7.4 0.66 0.00 1.8 0.075 13 40 0.9978 3.51 0.56 9.4 5
Analysis 1: Summary
#average values by wine quality
wine2_average_summary <- wine2 %>%
group_by(Qlty) %>%
summarise_all(.funs = list(mean))
# Display the summary
wine2_average_summary
## # A tibble: 6 × 12
## Qlty FxAc VolAc CtAc ResiSug Chlo FrSulfDio TtlSulfDio Dens pH Sulph
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3 8.36 0.884 0.171 2.64 0.122 11 24.9 0.997 3.40 0.57
## 2 4 7.78 0.694 0.174 2.69 0.0907 12.3 36.2 0.997 3.38 0.596
## 3 5 8.17 0.577 0.244 2.53 0.0927 17.0 56.5 0.997 3.30 0.621
## 4 6 8.35 0.497 0.274 2.48 0.0850 15.7 40.9 0.997 3.32 0.675
## 5 7 8.87 0.404 0.375 2.72 0.0766 14.0 35.0 0.996 3.29 0.741
## 6 8 8.57 0.423 0.391 2.58 0.0684 13.3 33.4 0.995 3.27 0.768
## # ℹ 1 more variable: Alc <dbl>
Quality 7 has the highest average fixed acidity.
Quality 3 has the highest average volatile acidity.
Quality 3 and 4 have the lowest average citric acid.
Quality 4 and 7 have the average higest Residual sugar.
Quality 3 have the highest average Chloride.
Quality 3 have the higest average free Sulfur dioxide.
Quality 6 have the highest average total sulfur dioxide.
The density for all qualities are quite similar as well as PH.
Quality 3 have higher dense and pH and lower sulphates.
Quality 8 have the highest alcohol level.
ANALYSIS FROM THIS SUMMARY:
Lower quality wines ratings 3 and 4 tend to have higher volatile
acidity, lower citric acid, higher chlorides, higher density, higher pH,
lower sulphates, and lower alcohol content.
Higher quality wines ratings 7 and 8 exhibit the opposite
characteristics, with lower volatile acidity, higher citric acid, lower
chlorides, lower density, lower pH, higher sulphates, and higher alcohol
content.
Analysis 2: Bar Graph
library(ggplot2)
wine_sum3 <- wine %>%
group_by(quality) %>%
summarise(mean_fixed_acidity = mean(fixed.acidity),
mean_volatile_acidity = mean(volatile.acidity),
mean_citric_acid = mean(citric.acid),
mean_residual_sugar = mean(residual.sugar),
mean_chlorides = mean(chlorides),
mean_free_sulfur_dioxide = mean(free.sulfur.dioxide),
mean_total_sulfur_dioxide = mean(total.sulfur.dioxide),
mean_density = mean(density),
mean_pH = mean(pH),
mean_sulphates = mean(sulphates),
mean_alcohol = mean(alcohol))
wine_sum_long <- wine_sum3 %>%
tidyr::pivot_longer(cols = -quality, names_to = "Property", values_to = "Mean_Value")
# Create bar plot
ggplot(wine_sum_long, aes(x = quality, y = Mean_Value, fill = Property)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Average Physicochemical Properties by Wine Quality",
x = "Wine Quality",
y = "Mean Value") +
scale_fill_brewer(palette = "Set2")
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
ANALYSIS:
Focus on analysis Part 1 but this is an extra.
This analysis of the average values of physicochemical properties
grouped by wine quality shows an in depth visual bar graph of the
difference variables of high and low-quality red wines. This can help
winemakers and customers know which wine choices they prefer to buy
based on the variables.