This project stems from a dataset with 1599 observations of 13 variables in regarding the chemical compositions of various red wines.
Two variables will be compared: citric acid content and volatile acidity (i.e., acetic acid content). These variables correspond to different flavors in red wine and are measured in \(g/dm^3\).
Citric acid content is associated with increased flavor and freshness in wine. Acetic acid content is associated with a sour, vinegar taste if too much is present in the wine.
The purpose of this research is to determine if there is a relationship between citric acid content (a measure of flavor and freshness) and acetic acid content (associated with sourness at high levels).
If the two variables are related, then citric acid levels may be used to estimate acetic acid content, which will indicate whether freshness and flavor in red wine is associated with vinegary or sour flavor (acetic acid) in the wine.This research is useful because if citric acid level predicts acetic acid level, then one could associate the flavor and freshness of the wine with a particular degree of sourness, depending on both the direction and the strength of the relationship.
library(readxl)
wineQualityReds <- read_xlsx ("C:/Users/Admin/Downloads/wineQualityReds.xlsx")
## New names:
## * `` -> ...1
View(wineQualityReds)
attach(wineQualityReds)
summary(citric.acid)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
sd(citric.acid)
## [1] 0.1948011
The mean citric acid level of the wines is is \(0.2710 g/dm^3\) with a standard deviation of \(0.1948 g/dm^3\). The median citric acid content is \(0.2600 g/dm^3\).
summary(volatile.acidity)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
sd(volatile.acidity)
## [1] 0.1790597
The mean volatile acidity level of the wines is \(0.5278 g/dm^3\) with a standard deviation of \(0.1791 g/dm^3\). The median volatile acidity content is \(0.5200 g/dm^3\).
plot(citric.acid, volatile.acidity, main="Citric Acid vs Acetic Acid Content in Red Wine",
xlab="Citric Acid (g/dm^3)", ylab="Volatile Acidity (g/dm^3) ", pch=20)
According to the scatterplot, as citric acid content increases, volatile acidity (acetic acid content) slightly decreases.To further investigate this relationship, formulating and testing a regression model will be useful.
one_model <- lm(citric.acid ~ volatile.acidity, data=wineQualityReds)
one_coef <- coefficients(one_model)
The resulting regression model is \[ \hat{y} = 0.59 + -0.6x \]
For a \(1 g/dm^3\) increase in volatile acidity, we expect citric acid levels to increase by \(-0.6 g/dm^3\).
If volatile acidity (i.e., acetic acid content) were \(0 g/dm^3\), we would expect the citric acid level to be \(0.59g/dm^3\).
one_ci <- confint(one_model, level=0.95)
The confidence interval for the slope of volatile acidity is (-0.65, -0.56).
one_summary <- summary(one_model)
one_t <- one_summary[[4]]
p.value.string = function(p.value){
p.value <- round(p.value, digits=4)
if (p.value == 0) {
return("p < 0.0001")
} else {
return(paste0("p = ", format(p.value, scientific = F)))
}
}
Hypotheses
\(H_0: \ \beta_1 = 0\)
\(H_1: \ \beta_1 \ne 0\)
Test Statistic
\(t_0 = -26.49\).
p-value
\(p < 0.0001\).
Rejection Region
Reject if \(p < \alpha\), where \(\alpha=0.05\).
Conclusion and Interpretation
Reject \(H_0\). There is sufficient evidence to suggest that the regression line is significant.
cor(volatile.acidity ,citric.acid)^2
## [1] 0.3052515
The \(R^2\) value for the model is 0.3052, which means that approximately 30.52% of the variation in the data is explained by the model.
library(ggplot2)
print((ggplot(wineQualityReds, aes_(citric.acid, volatile.acidity)) +
geom_point ()+ ggtitle("Citric Acid and Acetic Acid Contents in Red Wine"))+
stat_smooth(method = lm) +
xlab("Citric Acid Content (g/dm^3)") +
ylab("Acetic Acid Content (g/dm^3") +
theme_bw())
## `geom_smooth()` using formula 'y ~ x'
This visualization depicts the regression line and confidence bands for the model \[ \hat{y} = 0.59 + -0.6x \].
According to the hypothesis test, the slope of the line is nonzero and indicates a negative relationship between citric acid level and volatile acidity (acetic acid level). Therefore, this model may be used to predict that the fresher the flavor of the wine, the less sour or vinegary the wine will taste. This research may assist both sommeliers and wineries in cultivating pleasant flavor combinations in red wine.