This data is used to determine the physciochemicals of Wine Quality.

Creating a .CSV file and Read data

library(readr)

wine <- read.csv("/Users/blessinga/Desktop/winequality-red.csv", header=TRUE, sep=",")
str(wine)
## 'data.frame':    1599 obs. of  12 variables:
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
colnames(wine)
##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"
# Checking for missing values
sum(is.na(wine))
## [1] 0
# Summary statistics
summary(wine)
##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide    density      
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00       Min.   :0.9901  
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00       1st Qu.:0.9956  
##  Median :0.07900   Median :14.00       Median : 38.00       Median :0.9968  
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47       Mean   :0.9967  
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00       3rd Qu.:0.9978  
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00       Max.   :1.0037  
##        pH          sulphates         alcohol         quality     
##  Min.   :2.740   Min.   :0.3300   Min.   : 8.40   Min.   :3.000  
##  1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50   1st Qu.:5.000  
##  Median :3.310   Median :0.6200   Median :10.20   Median :6.000  
##  Mean   :3.311   Mean   :0.6581   Mean   :10.42   Mean   :5.636  
##  3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :4.010   Max.   :2.0000   Max.   :14.90   Max.   :8.000

Renaming column names

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
wine2<-rename(wine,FxAc=fixed.acidity, VolAc=volatile.acidity, CtAc=citric.acid, ResiSug=residual.sugar, Chlo=chlorides, FrSulfDio= free.sulfur.dioxide, TtlSulfDio= total.sulfur.dioxide, Dens =density, pH= pH, Sulph= sulphates, Alc=alcohol, Qlty = quality)
head(wine2)
##   FxAc VolAc CtAc ResiSug  Chlo FrSulfDio TtlSulfDio   Dens   pH Sulph Alc Qlty
## 1  7.4  0.70 0.00     1.9 0.076        11         34 0.9978 3.51  0.56 9.4    5
## 2  7.8  0.88 0.00     2.6 0.098        25         67 0.9968 3.20  0.68 9.8    5
## 3  7.8  0.76 0.04     2.3 0.092        15         54 0.9970 3.26  0.65 9.8    5
## 4 11.2  0.28 0.56     1.9 0.075        17         60 0.9980 3.16  0.58 9.8    6
## 5  7.4  0.70 0.00     1.9 0.076        11         34 0.9978 3.51  0.56 9.4    5
## 6  7.4  0.66 0.00     1.8 0.075        13         40 0.9978 3.51  0.56 9.4    5

Analysis of Wine Quality

Analysis 1: Summary

#average values by wine quality
wine2_average_summary <- wine2 %>%
  group_by(Qlty) %>%
  summarise_all(.funs = list(mean))

# Display the summary
wine2_average_summary
## # A tibble: 6 × 12
##    Qlty  FxAc VolAc  CtAc ResiSug   Chlo FrSulfDio TtlSulfDio  Dens    pH Sulph
##   <int> <dbl> <dbl> <dbl>   <dbl>  <dbl>     <dbl>      <dbl> <dbl> <dbl> <dbl>
## 1     3  8.36 0.884 0.171    2.64 0.122       11         24.9 0.997  3.40 0.57 
## 2     4  7.78 0.694 0.174    2.69 0.0907      12.3       36.2 0.997  3.38 0.596
## 3     5  8.17 0.577 0.244    2.53 0.0927      17.0       56.5 0.997  3.30 0.621
## 4     6  8.35 0.497 0.274    2.48 0.0850      15.7       40.9 0.997  3.32 0.675
## 5     7  8.87 0.404 0.375    2.72 0.0766      14.0       35.0 0.996  3.29 0.741
## 6     8  8.57 0.423 0.391    2.58 0.0684      13.3       33.4 0.995  3.27 0.768
## # ℹ 1 more variable: Alc <dbl>

Quality 7 has the highest average fixed acidity.
Quality 3 has the highest average volatile acidity.
Quality 3 and 4 have the lowest average citric acid.
Quality 4 and 7 have the average higest Residual sugar.
Quality 3 have the highest average Chloride.
Quality 3 have the higest average free Sulfur dioxide.
Quality 6 have the highest average total sulfur dioxide.
The density for all qualities are quite similar as well as PH.
Quality 3 have higher dense and pH and lower sulphates.
Quality 8 have the highest alcohol level.

ANALYSIS FROM THIS SUMMARY:
Lower quality wines ratings 3 and 4 tend to have higher volatile acidity, lower citric acid, higher chlorides, higher density, higher pH, lower sulphates, and lower alcohol content.
Higher quality wines ratings 7 and 8 exhibit the opposite characteristics, with lower volatile acidity, higher citric acid, lower chlorides, lower density, lower pH, higher sulphates, and higher alcohol content.



Analysis 2: Bar Graph

library(ggplot2)

wine_sum3 <- wine %>%
  group_by(quality) %>%
  summarise(mean_fixed_acidity = mean(fixed.acidity),
            mean_volatile_acidity = mean(volatile.acidity),
            mean_citric_acid = mean(citric.acid),
            mean_residual_sugar = mean(residual.sugar),
            mean_chlorides = mean(chlorides),
            mean_free_sulfur_dioxide = mean(free.sulfur.dioxide),
            mean_total_sulfur_dioxide = mean(total.sulfur.dioxide),
            mean_density = mean(density),
            mean_pH = mean(pH),
            mean_sulphates = mean(sulphates),
            mean_alcohol = mean(alcohol))

wine_sum_long <- wine_sum3 %>%
  tidyr::pivot_longer(cols = -quality, names_to = "Property", values_to = "Mean_Value")

# Create bar plot
ggplot(wine_sum_long, aes(x = quality, y = Mean_Value, fill = Property)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Average Physicochemical Properties by Wine Quality",
       x = "Wine Quality",
       y = "Mean Value") +
  scale_fill_brewer(palette = "Set2") 
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors

ANALYSIS:
Focus on analysis Part 1 but this is an extra.
This analysis of the average values of physicochemical properties grouped by wine quality shows an in depth visual bar graph of the difference variables of high and low-quality red wines. This can help winemakers and customers know which wine choices they prefer to buy based on the variables.