This project investigates how different chemical properties affect the quality of the Portuguese “Vinho Verde” red wine. We use a dataset of wine samples to examine the relationship between properties like fixed acidity, volatile acidity, and alcohol content, and the quality ratings given by wine experts. Our goal is to see if these properties can predict wine quality. We use basic statistics, correlation analysis, and multiple linear regression to explore these relationships. The results show significant links between some chemical properties and wine quality, offering useful insights for wine producers to improve their products.
The quality of wine is a complex attribute influenced by various chemical properties. Understanding these influences can help wine producers enhance their products and meet consumer expectations. This project aims to investigate how different chemical properties affect the quality of Portuguese “Vinho Verde” red wine.
The research question guiding this study is: “How do various chemical properties of wine influence its quality, and can we predict wine quality based on these properties?” This question is addressed using a dataset of wine samples, which includes measurements of properties such as fixed acidity, volatile acidity, and alcohol content, along with quality ratings provided by wine experts.
By analyzing the relationships between these chemical properties and the quality ratings, this study seeks to identify significant predictors of wine quality. The findings will provide valuable insights for wine producers, enabling them to make data-driven decisions to improve the quality of their wines. The use of descriptive statistics, correlation analysis, and multiple linear regression ensures a comprehensive examination of the data, allowing for robust conclusions to be drawn.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# load data
wines <- data.frame(read_csv(file = "https://raw.githubusercontent.com/Yedzinovich/FALL2024TIDYVERSE/refs/heads/main/WineQT.csv"))
## Rows: 1143 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (13): fixed acidity, volatile acidity, citric acid, residual sugar, chlo...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# display column names
colnames(wines)
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## [13] "Id"
# remove unnecessary columns (id column)
wines <- wines[, -13]
# rename columns to be more readable
wines <- wines %>% rename(
fixed_acidity = fixed.acidity,
volatile_acidity = volatile.acidity,
citric_acid = citric.acid,
residual_sugar = residual.sugar,
chlorides = chlorides,
free_sulfur_dioxide = free.sulfur.dioxide,
total_sulfur_dioxide = total.sulfur.dioxide,
density = density,
ph = pH,
sulphates = sulphates,
alcohol = alcohol,
quality = quality
)
head(wines)
## fixed_acidity volatile_acidity citric_acid residual_sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free_sulfur_dioxide total_sulfur_dioxide density ph sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
Let’s perform an exploratory data analysis on the wine dataset.
1 - Summary statistics: calculate summary statistics (mean, median, standard deviation etc) for each variable to understand their central tendency and dispersion.
2 - Histograms: create histograms for each variable to visualize their distributions. This helps in identifying the shape of the data, presence of outliers, and skewness.
3 - Box plots: create box plots for each variable to visualize their spread and identify potential outliers.
Note: we exclude the quality variable in this step because we are focusing on visualizing the distributions of the chemical properties of the wine. The quality variable is the dependent variable (the outcome we are trying to predict), and it doesn’t need to be included in the histograms/plots of the independent variables.
summary(wines)
## fixed_acidity volatile_acidity citric_acid residual_sugar
## Min. : 4.600 Min. :0.1200 Min. :0.0000 Min. : 0.900
## 1st Qu.: 7.100 1st Qu.:0.3925 1st Qu.:0.0900 1st Qu.: 1.900
## Median : 7.900 Median :0.5200 Median :0.2500 Median : 2.200
## Mean : 8.311 Mean :0.5313 Mean :0.2684 Mean : 2.532
## 3rd Qu.: 9.100 3rd Qu.:0.6400 3rd Qu.:0.4200 3rd Qu.: 2.600
## Max. :15.900 Max. :1.5800 Max. :1.0000 Max. :15.500
## chlorides free_sulfur_dioxide total_sulfur_dioxide density
## Min. :0.01200 Min. : 1.00 Min. : 6.00 Min. :0.9901
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 21.00 1st Qu.:0.9956
## Median :0.07900 Median :13.00 Median : 37.00 Median :0.9967
## Mean :0.08693 Mean :15.62 Mean : 45.91 Mean :0.9967
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 61.00 3rd Qu.:0.9978
## Max. :0.61100 Max. :68.00 Max. :289.00 Max. :1.0037
## ph sulphates alcohol quality
## Min. :2.740 Min. :0.3300 Min. : 8.40 Min. :3.000
## 1st Qu.:3.205 1st Qu.:0.5500 1st Qu.: 9.50 1st Qu.:5.000
## Median :3.310 Median :0.6200 Median :10.20 Median :6.000
## Mean :3.311 Mean :0.6577 Mean :10.44 Mean :5.657
## 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :4.010 Max. :2.0000 Max. :14.90 Max. :8.000
# Histograms for each variable of of wine chemical properties
wines %>% gather(key = "variable", value = "value", -quality) %>% # gather function is used to reshape the data from wide format to long format.
ggplot(aes(x = value)) +
geom_histogram(bins = 30, fill = "#800020", color = "black") +
facet_wrap(~variable, scales = "free_x") +
theme_minimal() +
labs(title = "Histograms of Wine Chemical Properties", x = "Value", y = "Frequency")
# Box plots for each variable of wine chemical properties
wines %>% gather(key = "variable", value = "value", -quality) %>%
ggplot(aes(x = variable, y = value)) +
geom_boxplot(fill = "#dbf47c", color = "black") +
theme_minimal() +
labs(title = "Box Plots of Wine Chemical Properties", x = "Variable", y = "Value")
This initial exploratory data analysis provides a comprehensive overview
of the dataset, helping me understand the basic characteristics and
distributions of the chemical properties of the wine.
Let’s have a look at the histogram. The histograms display the frequency distributions of various wine chemical properties. Let’s have a look at each property one by one.
Now, let’s have a look at the box plots of wine chemicals. The box plots highlight the spread and presence of outliers in the same variables.
Some observations that can be made based on plots above: - Skewed Distributions: Many chemical properties are positively skewed (such as chlorides, residual sugar, total sulfur dioxide), indicating that typical wines cluster at low levels of these properties, with a few exceptions. - Outliers: The box plots reveal significant outliers in some variables, particularly for chlorides, residual sugar, and total sulfur dioxide, which might correspond to specialty wines or experimental samples. - Tight Ranges: Variables like density and pH show tight clustering, which is consistent with standard winemaking practices. - Right-Tailed Distributions: These reflect a few instances of atypical wines with unusual chemical properties.
In this section, we are going to compute correlation coefficients between quality and other variables to identify any significant relationships.
correlations <- cor(wines)
correlations["quality", ]
## fixed_acidity volatile_acidity citric_acid
## 0.12197010 -0.40739351 0.24082084
## residual_sugar chlorides free_sulfur_dioxide
## 0.02200193 -0.12408453 -0.06325964
## total_sulfur_dioxide density ph
## -0.18333915 -0.17520792 -0.05245303
## sulphates alcohol quality
## 0.25771026 0.48486621 1.00000000
Based on the correlation values between wine quality and various chemical properties, the following can be observed:
As we can see, alcohol content and sulphates are the most positively correlated with wine quality, while volatile acidity has the most significant negative correlation. Other chemical properties show weaker relationships with wine quality.
From the above correlations, I can see that alcohol content is one of the most positively correlated with wine quality. Let’s test the hypothesis that there is a significant positive correlation between alcohol content and quality of wine.
correlation <- cor(wines$alcohol, wines$quality)
cor_test <- cor.test(wines$alcohol, wines$quality)
cor_test
##
## Pearson's product-moment correlation
##
## data: wines$alcohol and wines$quality
## t = 18.727, df = 1141, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4392310 0.5280056
## sample estimates:
## cor
## 0.4848662
Explanation of Results - Pearson Correlation Coefficient: 0.4848662 - P-value: 1.924653e-68 Based on these results, we can determine whether to reject the null hypothesis. The p-value is significantly less than 0.05 (0.05 > 1.924653e-68), indicating a significant positive correlation between alcohol content and wine quality.
Let’s conduct t-tests to compare the means of chemical property (alcohol) between high-quality and low-quality wines.
# Define high-quality and low-quality wines
high_quality <- subset(wines, quality >= 7) # why 7? highest quality ranking is 8, to let's take 7 & 8 at the highest
low_quality <- subset(wines, quality < 7)
head(high_quality)
## fixed_acidity volatile_acidity citric_acid residual_sugar chlorides
## 8 7.3 0.65 0.00 1.2 0.065
## 9 7.8 0.58 0.02 2.0 0.073
## 13 8.5 0.28 0.56 1.8 0.092
## 28 8.1 0.38 0.28 2.1 0.066
## 90 8.0 0.59 0.16 1.8 0.065
## 144 9.6 0.32 0.47 1.4 0.056
## free_sulfur_dioxide total_sulfur_dioxide density ph sulphates alcohol
## 8 15 21 0.99460 3.39 0.47 10.0
## 9 9 18 0.99680 3.36 0.57 9.5
## 13 35 103 0.99690 3.30 0.75 10.5
## 28 13 30 0.99680 3.23 0.73 9.7
## 90 3 16 0.99620 3.42 0.92 10.5
## 144 9 24 0.99695 3.22 0.82 10.3
## quality
## 8 7
## 9 7
## 13 7
## 28 7
## 90 7
## 144 7
head(low_quality)
## fixed_acidity volatile_acidity citric_acid residual_sugar chlorides
## 1 7.4 0.70 0.00 1.9 0.076
## 2 7.8 0.88 0.00 2.6 0.098
## 3 7.8 0.76 0.04 2.3 0.092
## 4 11.2 0.28 0.56 1.9 0.075
## 5 7.4 0.70 0.00 1.9 0.076
## 6 7.4 0.66 0.00 1.8 0.075
## free_sulfur_dioxide total_sulfur_dioxide density ph sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
# T-tests for alcohol
t_test_alcohol <- t.test(high_quality$alcohol, low_quality$alcohol)
t_test_alcohol
##
## Welch Two Sample t-test
##
## data: high_quality$alcohol and low_quality$alcohol
## t = 14.687, df = 210.02, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.09246 1.43119
## sample estimates:
## mean of x mean of y
## 11.52841 10.26658
Sample Estimates: - Mean of high-quality wines: 11.52841 - Mean of low-quality wines: 10.26658
Both tests show significant differences in fixed acidity and alcohol content between high-quality and low-quality wines. The low p-values and confidence intervals that do not include 0 provide strong evidence that these differences are statistically significant.
Simple Linear Regression The linear regression model is used to predict wine quality based on alcohol content.
# Simple linear regression for alcohol
model_simple <- lm(quality ~ alcohol, data = wines)
summary(model_simple)
##
## Call:
## lm(formula = quality ~ alcohol, data = wines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8224 -0.4000 -0.1725 0.5152 2.5748
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.88701 0.20240 9.323 <2e-16 ***
## alcohol 0.36104 0.01928 18.727 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7051 on 1141 degrees of freedom
## Multiple R-squared: 0.2351, Adjusted R-squared: 0.2344
## F-statistic: 350.7 on 1 and 1141 DF, p-value: < 2.2e-16
A simple linear regression model is fitted to predict wine quality based on alcohol content. The regression results show that for each unit increase in alcohol content, the wine quality is expected to increase by 0.36104 units. The coefficient for alcohol is highly significant, with a t-value of 18.727 and a p-value of less than 2e-16. This indicates that the effect of alcohol content on wine quality is statistically significant. The model’s R-squared value is 0.2351, meaning that approximately 23.51% of the variability in wine quality can be explained by alcohol content alone.
While the analysis shows a significant positive correlation between alcohol content and wine quality, it doesn’t mean that higher alcohol content always results in better wine. The correlation coefficient of 0.4848662 indicates a moderate relationship, suggesting that alcohol content is one of several factors influencing wine quality.
Wine quality is determined by a complex interplay of various chemical properties, including acidity, sugar content, tannins, and more. Higher alcohol content can enhance certain flavors and contribute to the overall balance of the wine, but it can also overpower other characteristics if not well-balanced.
Multiple Linear Regression Build a multiple linear regression model to predict quality using several predictors (not only alcohol).
# Multiple linear regression
model_multiple <- lm(quality ~ fixed_acidity + volatile_acidity + residual_sugar + chlorides + free_sulfur_dioxide + total_sulfur_dioxide + density + ph + sulphates + alcohol, data = wines)
summary(model_multiple)
##
## Call:
## lm(formula = quality ~ fixed_acidity + volatile_acidity + residual_sugar +
## chlorides + free_sulfur_dioxide + total_sulfur_dioxide +
## density + ph + sulphates + alcohol, data = wines)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.45636 -0.36756 -0.04732 0.44086 2.00042
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.212e+01 2.476e+01 0.894 0.371688
## fixed_acidity 1.566e-02 2.869e-02 0.546 0.585157
## volatile_acidity -1.072e+00 1.193e-01 -8.986 < 2e-16 ***
## residual_sugar 1.316e-02 1.845e-02 0.713 0.475870
## chlorides -1.824e+00 4.734e-01 -3.854 0.000123 ***
## free_sulfur_dioxide 2.632e-03 2.529e-03 1.041 0.298244
## total_sulfur_dioxide -2.946e-03 8.114e-04 -3.631 0.000295 ***
## density -1.800e+01 2.527e+01 -0.712 0.476320
## ph -3.979e-01 2.224e-01 -1.789 0.073910 .
## sulphates 8.720e-01 1.334e-01 6.536 9.56e-11 ***
## alcohol 2.759e-01 3.075e-02 8.972 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6404 on 1132 degrees of freedom
## Multiple R-squared: 0.3739, Adjusted R-squared: 0.3684
## F-statistic: 67.61 on 10 and 1132 DF, p-value: < 2.2e-16
The multiple regression analysis reveals several key factors that significantly contribute to wine quality. Among the chemical properties analyzed, volatile acidity, chlorides, total sulfur dioxide, sulphates, and alcohol content emerged as significant predictors. Specifically, volatile acidity has a negative impact on wine quality, with an estimate of -1.072 and a highly significant p-value of less than 2e-16. Similarly, chlorides negatively affect wine quality, with an estimate of -1.824 and a p-value of 0.000123. Total sulfur dioxide also shows a negative impact, with an estimate of -0.002946 and a p-value of 0.000295.
On the positive side, sulphates and alcohol content significantly enhance wine quality. Sulphates have an estimate of 0.8720 and a p-value of 9.56e-11, indicating a strong positive effect. Alcohol content, with an estimate of 0.2759 and a p-value of less than 2e-16, also significantly improves wine quality. These results demonstrate that higher levels of sulphates and alcohol are associated with better quality wines.
Conversely, some factors did not show a significant impact on wine quality in this model. Fixed acidity (estimate: 0.01566, p-value: 0.585157), residual sugar (estimate: 0.01316, p-value: 0.475870), free sulfur dioxide (estimate: 0.002632, p-value: 0.298244), density (estimate: -18.00, p-value: 0.476320), and pH (estimate: -0.3979, p-value: 0.073910) were not statistically significant predictors. This suggests that variations in these properties do not substantially influence the overall quality of wine.
The model explains approximately 37.39% of the variability in wine quality (Multiple R-squared: 0.3739), highlighting that while these chemical properties are important, other factors not included in the model also play a crucial role in determining wine quality. This analysis underscores the complexity of wine quality assessment, where multiple interacting factors contribute to the final evaluation.
Let’s use one-way ANOVA in our analysis to validate and complement the findings from the multiple regression model. It provides a clear understanding of how each chemical property individually affects wine quality, which is essential for accurate prediction and quality assessment.
The ANOVA test we performed is a one-way ANOVA. This is because we are examining the effect of multiple independent variables (fixed_acidity, volatile_acidity, residual_sugar, chlorides, free_sulfur_dioxide, total_sulfur_dioxide, density, pH, sulphates, and alcohol) on a single dependent variable (wine quality) without considering interactions between the independent variables.
# Perform ANOVA
anova_model <- aov(quality ~ fixed_acidity + volatile_acidity + residual_sugar +
chlorides + free_sulfur_dioxide + total_sulfur_dioxide +
density + ph + sulphates + alcohol, data = wines)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## fixed_acidity 1 11.0 11.03 26.898 2.54e-07 ***
## volatile_acidity 1 112.4 112.36 273.946 < 2e-16 ***
## residual_sugar 1 0.2 0.20 0.481 0.48814
## chlorides 1 8.3 8.28 20.190 7.73e-06 ***
## free_sulfur_dioxide 1 3.0 2.96 7.208 0.00736 **
## total_sulfur_dioxide 1 16.0 16.04 39.097 5.71e-10 ***
## density 1 49.0 49.02 119.521 < 2e-16 ***
## ph 1 6.4 6.44 15.705 7.87e-05 ***
## sulphates 1 38.0 37.96 92.543 < 2e-16 ***
## alcohol 1 33.0 33.01 80.490 < 2e-16 ***
## Residuals 1132 464.3 0.41
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA test results indicate that several chemical properties significantly influence wine quality. The F-values and corresponding p-values show the strength and significance of each factor’s effect. Volatile acidity has the highest F-value (273.946) and a p-value of less than 2e-16, indicating a very strong negative impact on wine quality. Similarly, alcohol content (F-value: 80.490, p-value: < 2e-16) and sulphates (F-value: 92.543, p-value: < 2e-16) have strong positive effects on wine quality. Other significant factors include chlorides (F-value: 20.190, p-value: 7.73e-06), total sulfur dioxide (F-value: 39.097, p-value: 5.71e-10), density (F-value: 119.521, p-value: < 2e-16), and pH (F-value: 15.705, p-value: 7.87e-05). Fixed acidity (F-value: 26.898, p-value: 2.54e-07) and free sulfur dioxide (F-value: 7.208, p-value: 0.00736) also show significant effects, though to a lesser extent. Residual sugar, however, does not significantly affect wine quality (F-value: 0.481, p-value: 0.48814).
These results highlight the complex interplay of various chemical properties in determining wine quality, with volatile acidity, alcohol, and sulphates being particularly influential.
Correlation Analysis: Measures the linear relationship between two variables. Correlation only captures the strength and direction of a linear relationship between two variables, without considering the influence of other variables.
Multiple Regression Evaluates the relationship between one dependent variable (wine quality) and multiple independent variables (sulphates, alcohol, acidity etc.).It provides coefficients for each independent variable, indicating their individual impact on the dependent variable while controlling for other variables. It is ideal for predicting the dependent variable and understanding the relative importance of each predictor.
ANOVA (Analysis of Variance): ANOVA in Regression: In the context of multiple linear regression, ANOVA helps determine the significance of each predictor variable (sulphates, alcohol, acidity etc) in explaining the variability in the dependent variable (wine quality).(Compares the means of different groups to determine if there are statistically significant differences between them.)
For a quick overview of the relationship between two variables: Trust correlation analysis For prediction and detailed insights: Trust multiple regression. For comparing group means: Trust ANOVA.
In our case, where the goal is to understand and predict wine quality based on various chemical properties, multiple regression is the more appropriate and reliable method. It allows us to see the individual contributions of each variable and make informed predictions about wine quality.
The analysis of the wine dataset reveals that various chemical properties significantly influence wine quality. Through correlation analysis, multiple regression, and ANOVA, we identified key factors that contribute to wine quality.
Key Factors:
Sulphates: Sulphates positively influence wine quality. The multiple regression analysis indicated a significant positive effect (Estimate: 0.8720, p-value: 9.56e-11), and the ANOVA results supported this with an F-value of 92.543 and a p-value of < 2e-16.
Alcohol Content: There is a significant positive correlation between alcohol content and wine quality. The Pearson correlation coefficient is 0.4848662, and the p-value is 1.924653e-68, indicating that higher alcohol content is associated with better wine quality. This was further supported by the multiple regression analysis, where alcohol content had a significant positive effect (Estimate: 0.2759, p-value: < 2e-16).
Volatile Acidity: This factor has a strong negative impact on wine quality. The multiple regression analysis showed a significant negative effect (Estimate: -1.072, p-value: < 2e-16), and the ANOVA results confirmed its importance with an F-value of 273.946 and a p-value of < 2e-16.
Chlorides and Total Sulfur Dioxide: Both have significant negative effects on wine quality. Chlorides (Estimate: -1.824, p-value: 0.000123) and total sulfur dioxide (Estimate: -0.002946, p-value: 0.000295) were significant in the multiple regression analysis, and the ANOVA results confirmed their impact with F-values of 20.190 and 39.097, respectively.
Other Factors: Fixed acidity, free sulfur dioxide, density, and pH also showed significant effects in the ANOVA results, though their impact was less pronounced compared to the factors mentioned above.
Predictive Power: The multiple regression model explained approximately 37.39% of the variability in wine quality (R-squared: 0.3739), indicating that while these chemical properties are important predictors, other factors not included in the model also play a crucial role. The ANOVA results further validated the significance of these properties.
Positive Influences on Wine Quality: - Higher alcohol content - Higher sulphates
Negative Influences on Wine Quality: - Higher volatile acidity - Higher chlorides - Higher total sulfur dioxide
Conclusion: These key chemical properties significantly influence wine quality. Higher alcohol content and sulphates are linked to better quality, while higher volatile acidity, chlorides, and total sulfur dioxide have a negative impact. This analysis suggests that wine quality can be predicted based on these chemical properties, though other factors also play a role. These insights are valuable for winemakers aiming to optimize wine quality through careful management of its chemical composition.
The data for this study was not self-collected. It is sourced from Kaggle, specifically from the dataset titled “Wine Quality Dataset” provided by Yasser H. You can access the dataset using the following link: https://www.kaggle.com/datasets/yasserh/wine-quality-dataset/data