The analysis is aimed at looking and finding deep insights of the red
wine dataset. The report also aims in bringing a conclusion so as to
what are the factors that affect the quality of wine. The features of
the dataset are as follows , This dataset has 12 variables which
contains 7 variables of ingredient, 3 variables of physical properity, 1
variable of alcohol and 1 of quality. The dataset are of the size
1599 , 12. The libraries used for the data are , tidyverse
and ggplot. The libraries are used for visualization which can help us
to come to conclusion.
The data is in csv format.In computing, a comma-separated values (CSV) file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The columns are as follows:
[1] “fixed.acidity”
[2]“volatile.acidity”
[3]“citric.acid”
[4]“residual.sugar”
[5] “chlorides”
[6]“free.sulfur.dioxide” [7]“total.sulfur.dioxide” [8]“density”
[9] “pH”
[10]“sulphates” [11]“alcohol” [12]“quality”
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
Min. 1st Qu. Median Mean 3rd Qu. Max.
8.40 9.50 10.20 10.42 11.10 14.90
fixed.acidity volatile.acidity citric.acid residual.sugar
Min. : 4.600 Min. :0.120 Min. :0.000 Min. : 0.900
1st Qu.: 7.100 1st Qu.:0.390 1st Qu.:0.090 1st Qu.: 1.900
Median : 7.900 Median :0.520 Median :0.260 Median : 2.200
Mean : 8.321 Mean :0.528 Mean :0.271 Mean : 2.532
3rd Qu.: 9.200 3rd Qu.:0.640 3rd Qu.:0.420 3rd Qu.: 2.600
Max. :15.900 Max. :1.580 Max. :1.000 Max. :15.500
chlorides free.sulfur.dioxide total.sulfur.dioxide density
Min. :0.01200 Min. : 1.00 Min. : 6.0 Min. :0.9901
1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.0 1st Qu.:0.9956
Median :0.07900 Median :14.00 Median : 38.0 Median :0.9968
Mean :0.08748 Mean :15.84 Mean : 46.4 Mean :0.9967
3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.0 3rd Qu.:0.9978
Max. :0.61100 Max. :68.00 Max. :289.0 Max. :1.0037
pH sulphates alcohol quality
Min. :2.740 Min. :0.3300 Min. : 8.40 Min. :3.000
1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 1st Qu.:5.000
Median :3.310 Median :0.6200 Median :10.20 Median :6.000
Mean :3.311 Mean :0.6582 Mean :10.42 Mean :5.636
3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 3rd Qu.:6.000
Max. :4.010 Max. :2.0000 Max. :14.90 Max. :8.000
0% 25% 50% 75% 100%
8.4 9.6 10.0 11.0 13.1
0% 25% 50% 75% 100%
8.4 9.5 10.0 10.9 14.9
0% 25% 50% 75% 100%
9.2 10.8 11.6 12.2 14.0
[1] 0.3608741
count of density
1.There is a negative relation between density and alcohol which seems to be the case so as to why great wines have lower density and higher alcohol concentration as well as lower SO2 levels(free and total).
2.Our analysis proves that , Better wines are bound to have a control on alcohol level and SO2 level and since they use more concentrated alcohol they’re bound to have a lower density. Since they tend to have a lower density they also tend to have lower residual sugar.
3.It is noted that alcohol has a huge impact on the overall quality of wine.
4.Apart from alcohol sulphur dioxide free and total had infuence on the wine being great. The chemical aspects of wine affect the quality of wine.
5.It is also also found that there was a specific pH range and ratio of the sulphur dioxide affecting the entire quality of wine along with the alcohol concentration.
---
title: "RED WINE DATASET ANALYSIS"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
social : menu
theme : simplex
storyboard : TRUE
source_code : embed
---
```{r setup, include=FALSE}
libraries= c("dplyr","lattice","ggplot2","DT","flexdashboard","FactoMineR","GGally")
lapply(libraries,require,character.only=TRUE)
df=read.csv("winequality_red.csv")
attach(df)
```
# INTRO {.tabset}
==============================================
The analysis is aimed at looking and finding deep insights of the red wine dataset. The report also aims in bringing a conclusion so as to what are the factors that affect the quality of wine.
The features of the dataset are as follows , This dataset has 12 variables which contains 7 variables of ingredient, 3 variables of physical properity, 1 variable of alcohol and 1 of quality.
The dataset are of the size `1599 , 12`. The libraries used for the data are , tidyverse and ggplot. The libraries are used for visualization which can help us to come to conclusion.
The data is in csv format.In computing, a comma-separated values (CSV) file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The columns are as follows:
[1] "fixed.acidity"
[2]"volatile.acidity"
[3]"citric.acid"
[4]"residual.sugar"
[5] "chlorides"
[6]"free.sulfur.dioxide"
[7]"total.sulfur.dioxide"
[8]"density"
[9] "pH"
[10]"sulphates"
[11]"alcohol"
[12]"quality"
--------------------------------------------------
----------------------------------------------
# MENU
### Structure
```{r}
str(df)
```
### SUMMARY
```{r}
summary(df)
```
# UNIVARIATEPLOTS
### PLOTS FOR EACH FACTORS
```{r}
ggplot(df,aes(quality))+geom_bar()+ggtitle("Quality counts graph")
```
Column {data-width=650}
--------------------------
```{r}
ggplot(df,aes(fixed.acidity))+geom_histogram(color="black",fill="white",binwidth = 0.1)+ggtitle("Fixed Acidity Value count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(volatile.acidity))+geom_histogram(color="black",fill="blue",binwidth = 0.1)+ggtitle("Volatile Acidity count graph")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(citric.acid))+geom_histogram(color="red",fill="white",bins = 30)+ggtitle("Citric Acid Count graph")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(residual.sugar))+geom_histogram(color="red",fill="blue",binwidth = 0.5)+ggtitle("Residual Sugar Count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(chlorides))+geom_histogram(color="black",fill="yellow",binwidth = 0.01)+ggtitle("Chloride Count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
summary(df$chlorides)
```
```{r}
ggplot(df , aes(alcohol)) + geom_histogram(color="brown",fill="orange",binwidth = 0.1)+ggtitle("Alcohol count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
summary(df$alcohol)
```
```{r}
ggplot(df,aes(free.sulfur.dioxide))+geom_histogram(color="black",fill="blue",binwidth = 1)+ggtitle("Free Sulpher Dioxide count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(total.sulfur.dioxide))+geom_histogram(color="black",fill="white",binwidth = 2)+ggtitle("Total Sulfur dioxide count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(pH))+ggtitle("pH count ")+geom_histogram(color="black",fill="white",binwidth = 0.05)
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(density))+ggtitle("Density Count")+geom_histogram(color="black",fill="white",binwidth = 0.001)
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df,aes(sulphates))+geom_histogram(color="black",fill="white",binwidth = 0.01 )+ggtitle("Sulphates count")
```
# OUTLIERS
### REMOVING THE OUTLIERS FROM PARAMETERS FOR BETTER ANALYSIS
```{r}
df_rmout = subset(df,residual.sugar < 15)
df_rmout = subset(df,total.sulfur.dioxide < 289)
df_rmout = subset(df,density < 1.0037)
df_rmout = subset(df,free.sulfur.dioxide < 72)
```
```{r}
summary(df_rmout)
```
# NEW VARIATE
```{r}
ggplot(df_rmout,aes(quality))+geom_bar()+ggtitle("Quality counts graph")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(fixed.acidity))+geom_histogram(color="yellow",fill="red",binwidth = 0.1)+ggtitle("Fixed Acidity Value count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(volatile.acidity))+geom_histogram(color="black",fill="white",binwidth = 0.1)+ggtitle("Volatile Acidity count graph")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(citric.acid))+geom_histogram(color="black",fill="white",bins = 30)+ggtitle("Citric Acid Count graph")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(residual.sugar))+geom_histogram(color="black",fill="white",binwidth = 0.5)+ggtitle("Residual Sugar Count")
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout , aes(alcohol)) + geom_histogram(color="black",fill="white",binwidth = 0.1)+ggtitle("Alcohol count")
```
# WINEQUALITY
### The ratio calculations are free sulpher dioxide / total sulpher dioxide
```{r}
df_rmout$class = cut(df_rmout$quality,breaks=c(0,4,6,9),labels=c("average","good","great"))
df_rmout$ratio = (df_rmout$free.sulfur.dioxide/df_rmout$total.sulfur.dioxide)
```
```{r}
ggplot(df_rmout,aes(class))+geom_bar(col=c("yellow","red","green"),fill=c("blue","green","red"))
```
# BIVARIATE
### WINE QUALITY VS CLASS
```{r}
ggcorr(df_rmout)
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(class,alcohol))+geom_boxplot()
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(alcohol,fill=class))+geom_histogram(aes(x = alcohol , y = ..density..),binwidth = 0.1)+geom_density(alpha=0.2)
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(alcohol,fill=class))+geom_density(alpha=0.2)
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
print(quantile(df_rmout[df_rmout$class=="average",]$alcohol))
print(quantile(df_rmout[df_rmout$class=="good",]$alcohol))
print(quantile(df_rmout[df_rmout$class=="great",]$alcohol))
```
### Total Sulpher Dioxide vs class of wine
```{r}
ggplot(df_rmout,aes(class,free.sulfur.dioxide))+geom_boxplot()
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(residual.sugar,fill = class))+geom_density(alpha =0.5)
```
Column {data-width=650}
-----------------------------------------------------------------------
### DENSITY VS RESIDUAL SUGAR
```{r}
ggplot(df_rmout,aes(residual.sugar,fill = class))+geom_density(alpha = 0.5)
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(class , residual.sugar))+geom_boxplot()
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
cor(df_rmout$density,df_rmout$residual.sugar)
```
Column {data-width=650}
-----------------------------------------------------------------------
### DENSITY VS QUALITY
```{r}
ggplot(df_rmout,aes(density,fill = class))+geom_density(alpha = 0.5)
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(class ,density))+geom_boxplot()
```
# SCATTER PLOTS
```{r}
ggplot(df_rmout,aes(alcohol,density))+geom_point()
```
Column {data-width=650}
-----------------------------------------------------------------------
```{r}
ggplot(df_rmout,aes(residual.sugar,density))+geom_point()
```
# DATATABLE
### Valuebox
```{r}
value=sum(density)
valueBox("count of density",value,icon="fa fa-user")
```
### DOWNLOAD
```{r}
datatable(df,extensions='Buttons',options=list(dom="Bftrip",Buttons=c('copy','print','csv','pdf')))
```
# INSIGHTS
1.There is a negative relation between density and alcohol which seems to be the case so as to why great wines have lower density and higher alcohol concentration as well as lower SO2 levels(free and total).
2.Our analysis proves that , Better wines are bound to have a control on alcohol level and SO2 level and since they use more concentrated alcohol they're bound to have a lower density. Since they tend to have a lower density they also tend to have lower residual sugar.
3.It is noted that alcohol has a huge impact on the overall quality of wine.
4.Apart from alcohol sulphur dioxide free and total had infuence on the wine being great. The chemical aspects of wine affect the quality of wine.
5.It is also also found that there was a specific pH range and ratio of the sulphur dioxide affecting the entire quality of wine along with the alcohol concentration.