# INTRO

The analysis is aimed at looking and finding deep insights of the red wine dataset. The report also aims in bringing a conclusion so as to what are the factors that affect the quality of wine. The features of the dataset are as follows , This dataset has 12 variables which contains 7 variables of ingredient, 3 variables of physical properity, 1 variable of alcohol and 1 of quality. The dataset are of the size 1599 , 12. The libraries used for the data are , tidyverse and ggplot. The libraries are used for visualization which can help us to come to conclusion.

The data is in csv format.In computing, a comma-separated values (CSV) file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The columns are as follows:

[1] “fixed.acidity”
[2]“volatile.acidity”
[3]“citric.acid”
[4]“residual.sugar”
[5] “chlorides”
[6]“free.sulfur.dioxide” [7]“total.sulfur.dioxide” [8]“density”
[9] “pH”
[10]“sulphates” [11]“alcohol” [12]“quality”

————————————————–

UNIVARIATEPLOTS

PLOTS FOR EACH FACTORS

Column

Column

Column

Column

Column

Column

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.01200 0.07000 0.07900 0.08747 0.09000 0.61100 

Column

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   8.40    9.50   10.20   10.42   11.10   14.90 

Column

Column

Column

Column

OUTLIERS

REMOVING THE OUTLIERS FROM PARAMETERS FOR BETTER ANALYSIS

 fixed.acidity    volatile.acidity  citric.acid    residual.sugar  
 Min.   : 4.600   Min.   :0.120    Min.   :0.000   Min.   : 0.900  
 1st Qu.: 7.100   1st Qu.:0.390    1st Qu.:0.090   1st Qu.: 1.900  
 Median : 7.900   Median :0.520    Median :0.260   Median : 2.200  
 Mean   : 8.321   Mean   :0.528    Mean   :0.271   Mean   : 2.532  
 3rd Qu.: 9.200   3rd Qu.:0.640    3rd Qu.:0.420   3rd Qu.: 2.600  
 Max.   :15.900   Max.   :1.580    Max.   :1.000   Max.   :15.500  
   chlorides       free.sulfur.dioxide total.sulfur.dioxide    density      
 Min.   :0.01200   Min.   : 1.00       Min.   :  6.0        Min.   :0.9901  
 1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.0        1st Qu.:0.9956  
 Median :0.07900   Median :14.00       Median : 38.0        Median :0.9968  
 Mean   :0.08748   Mean   :15.84       Mean   : 46.4        Mean   :0.9967  
 3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.0        3rd Qu.:0.9978  
 Max.   :0.61100   Max.   :68.00       Max.   :289.0        Max.   :1.0037  
       pH          sulphates         alcohol         quality     
 Min.   :2.740   Min.   :0.3300   Min.   : 8.40   Min.   :3.000  
 1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50   1st Qu.:5.000  
 Median :3.310   Median :0.6200   Median :10.20   Median :6.000  
 Mean   :3.311   Mean   :0.6582   Mean   :10.42   Mean   :5.636  
 3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10   3rd Qu.:6.000  
 Max.   :4.010   Max.   :2.0000   Max.   :14.90   Max.   :8.000  

NEW VARIATE

Column

Column

Column

Column

Column

WINEQUALITY

The ratio calculations are free sulpher dioxide / total sulpher dioxide

BIVARIATE

WINE QUALITY VS CLASS

Column

Column

Column

Column

  0%  25%  50%  75% 100% 
 8.4  9.6 10.0 11.0 13.1 
  0%  25%  50%  75% 100% 
 8.4  9.5 10.0 10.9 14.9 
  0%  25%  50%  75% 100% 
 9.2 10.8 11.6 12.2 14.0 

Total Sulpher Dioxide vs class of wine

Column

Column

DENSITY VS RESIDUAL SUGAR

Column

Column

[1] 0.3608741

Column

DENSITY VS QUALITY

Column

SCATTER PLOTS

Column

DATATABLE

Valuebox

count of density

DOWNLOAD

INSIGHTS

1.There is a negative relation between density and alcohol which seems to be the case so as to why great wines have lower density and higher alcohol concentration as well as lower SO2 levels(free and total).

2.Our analysis proves that , Better wines are bound to have a control on alcohol level and SO2 level and since they use more concentrated alcohol they’re bound to have a lower density. Since they tend to have a lower density they also tend to have lower residual sugar.

3.It is noted that alcohol has a huge impact on the overall quality of wine.

4.Apart from alcohol sulphur dioxide free and total had infuence on the wine being great. The chemical aspects of wine affect the quality of wine.

5.It is also also found that there was a specific pH range and ratio of the sulphur dioxide affecting the entire quality of wine along with the alcohol concentration.

---
title: "RED WINE DATASET ANALYSIS"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social : menu
    theme : simplex
    storyboard : TRUE
    source_code : embed
---

```{r setup, include=FALSE}
libraries= c("dplyr","lattice","ggplot2","DT","flexdashboard","FactoMineR","GGally")
lapply(libraries,require,character.only=TRUE)
df=read.csv("winequality_red.csv")
attach(df)
```



# INTRO {.tabset}
==============================================

The analysis is aimed at looking and finding deep insights of the red wine dataset. The report also aims in bringing a conclusion so as to what are the factors that affect the quality of wine. 
The features of the dataset are as follows , This dataset has 12 variables which contains 7 variables of ingredient, 3 variables of physical properity, 1 variable of alcohol and 1 of quality.
The dataset are of the size `1599 , 12`. The libraries used for the data are , tidyverse and ggplot. The libraries are used for visualization which can help us to come to conclusion. 

The data is in csv format.In computing, a comma-separated values (CSV) file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The columns are as follows:


[1] "fixed.acidity"       
[2]"volatile.acidity"     
[3]"citric.acid"         
[4]"residual.sugar"      
[5] "chlorides"           
[6]"free.sulfur.dioxide" 
[7]"total.sulfur.dioxide"
[8]"density"             
[9] "pH"                   
[10]"sulphates"
[11]"alcohol" 
[12]"quality"  

--------------------------------------------------
----------------------------------------------
     
# MENU 
### Structure
```{r}
str(df)
```

### SUMMARY

```{r}
summary(df)
```



# UNIVARIATEPLOTS 
### PLOTS FOR EACH FACTORS

```{r}
ggplot(df,aes(quality))+geom_bar()+ggtitle("Quality counts graph")
```

Column {data-width=650}
--------------------------




```{r}
ggplot(df,aes(fixed.acidity))+geom_histogram(color="black",fill="white",binwidth = 0.1)+ggtitle("Fixed Acidity Value count")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(volatile.acidity))+geom_histogram(color="black",fill="blue",binwidth = 0.1)+ggtitle("Volatile Acidity count graph")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(citric.acid))+geom_histogram(color="red",fill="white",bins = 30)+ggtitle("Citric Acid Count graph")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(residual.sugar))+geom_histogram(color="red",fill="blue",binwidth = 0.5)+ggtitle("Residual Sugar Count")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(chlorides))+geom_histogram(color="black",fill="yellow",binwidth = 0.01)+ggtitle("Chloride Count")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
summary(df$chlorides)
```



```{r}
ggplot(df , aes(alcohol)) + geom_histogram(color="brown",fill="orange",binwidth = 0.1)+ggtitle("Alcohol count")

```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
summary(df$alcohol)
```



```{r}
ggplot(df,aes(free.sulfur.dioxide))+geom_histogram(color="black",fill="blue",binwidth = 1)+ggtitle("Free Sulpher Dioxide count")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(total.sulfur.dioxide))+geom_histogram(color="black",fill="white",binwidth = 2)+ggtitle("Total Sulfur dioxide count")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(pH))+ggtitle("pH count ")+geom_histogram(color="black",fill="white",binwidth = 0.05)
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(density))+ggtitle("Density Count")+geom_histogram(color="black",fill="white",binwidth = 0.001)
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df,aes(sulphates))+geom_histogram(color="black",fill="white",binwidth = 0.01 )+ggtitle("Sulphates count")
```





# OUTLIERS  
### REMOVING THE OUTLIERS FROM PARAMETERS FOR BETTER ANALYSIS
```{r}
df_rmout = subset(df,residual.sugar < 15)
df_rmout = subset(df,total.sulfur.dioxide < 289) 
df_rmout = subset(df,density < 1.0037)
df_rmout = subset(df,free.sulfur.dioxide < 72)
```


```{r}
summary(df_rmout)
```






# NEW VARIATE 



```{r}
ggplot(df_rmout,aes(quality))+geom_bar()+ggtitle("Quality counts graph")
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(fixed.acidity))+geom_histogram(color="yellow",fill="red",binwidth = 0.1)+ggtitle("Fixed Acidity Value count")
```

Column {data-width=650}
-----------------------------------------------------------------------



```{r}
ggplot(df_rmout,aes(volatile.acidity))+geom_histogram(color="black",fill="white",binwidth = 0.1)+ggtitle("Volatile Acidity count graph")
```

Column {data-width=650}
-----------------------------------------------------------------------


```{r}
ggplot(df_rmout,aes(citric.acid))+geom_histogram(color="black",fill="white",bins = 30)+ggtitle("Citric Acid Count graph")
```

Column {data-width=650}
-----------------------------------------------------------------------


```{r}
ggplot(df_rmout,aes(residual.sugar))+geom_histogram(color="black",fill="white",binwidth = 0.5)+ggtitle("Residual Sugar Count")
```

Column {data-width=650}
-----------------------------------------------------------------------


```{r}
ggplot(df_rmout , aes(alcohol)) + geom_histogram(color="black",fill="white",binwidth = 0.1)+ggtitle("Alcohol count")
```



# WINEQUALITY 


### The ratio calculations are free sulpher dioxide / total sulpher dioxide

```{r}

df_rmout$class = cut(df_rmout$quality,breaks=c(0,4,6,9),labels=c("average","good","great"))
df_rmout$ratio = (df_rmout$free.sulfur.dioxide/df_rmout$total.sulfur.dioxide)
```


```{r}
ggplot(df_rmout,aes(class))+geom_bar(col=c("yellow","red","green"),fill=c("blue","green","red"))

```





# BIVARIATE  
### WINE QUALITY VS CLASS

```{r}
ggcorr(df_rmout)
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(class,alcohol))+geom_boxplot()
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(alcohol,fill=class))+geom_histogram(aes(x = alcohol , y = ..density..),binwidth = 0.1)+geom_density(alpha=0.2)
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(alcohol,fill=class))+geom_density(alpha=0.2)
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
print(quantile(df_rmout[df_rmout$class=="average",]$alcohol))
print(quantile(df_rmout[df_rmout$class=="good",]$alcohol))
print(quantile(df_rmout[df_rmout$class=="great",]$alcohol))
```


### Total Sulpher Dioxide vs class of wine

```{r}
ggplot(df_rmout,aes(class,free.sulfur.dioxide))+geom_boxplot()
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(residual.sugar,fill = class))+geom_density(alpha =0.5)
```

Column {data-width=650}
-----------------------------------------------------------------------

### DENSITY VS RESIDUAL SUGAR

```{r}
ggplot(df_rmout,aes(residual.sugar,fill = class))+geom_density(alpha = 0.5)
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(class , residual.sugar))+geom_boxplot()
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
cor(df_rmout$density,df_rmout$residual.sugar)
```

Column {data-width=650}
-----------------------------------------------------------------------

### DENSITY VS QUALITY

```{r}
ggplot(df_rmout,aes(density,fill = class))+geom_density(alpha = 0.5)
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(class ,density))+geom_boxplot()
```



# SCATTER PLOTS 

```{r}
ggplot(df_rmout,aes(alcohol,density))+geom_point()
```

Column {data-width=650}
-----------------------------------------------------------------------

```{r}
ggplot(df_rmout,aes(residual.sugar,density))+geom_point()
```


# DATATABLE 

### Valuebox
```{r}
value=sum(density)
valueBox("count of density",value,icon="fa fa-user")
```



### DOWNLOAD
```{r}
datatable(df,extensions='Buttons',options=list(dom="Bftrip",Buttons=c('copy','print','csv','pdf')))
```

# INSIGHTS 
1.There is a negative relation between density and alcohol which seems to be the case so as to why great wines have lower density and higher alcohol concentration as well as lower SO2 levels(free and total). 

2.Our analysis proves that , Better wines are bound to have a control on alcohol level and SO2 level and since they use more concentrated alcohol they're bound to have a lower density. Since they tend to have a lower density they also tend to have lower residual sugar. 


3.It is noted that alcohol has a huge impact on the overall quality of wine.

4.Apart from alcohol sulphur dioxide free and total had infuence on the wine being great. The chemical aspects of wine affect the quality of wine. 

5.It is also also found that there was a specific pH range and ratio of the sulphur dioxide affecting the entire quality of wine along with the alcohol concentration.