#Introduction
On my last data set I wanted to find a dataset that has a lot of columns and rows. The one I chose is from Melvin Matanos and his dataset on wine. This dataset has a lot of columns and my focus is trying to find the wine type with quality, quality_label,resiudal.sugar, alcohol, and wine_type. There can be some analysis done with these collected data.
#Step 1 Overview
library(knitr)
library(stringr)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
require(ggplot2)
wine <- read.csv("https://raw.githubusercontent.com/Wilchau/Data607Project2/main/Data_3.csv")
head(wine)
## X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 0 7.0 0.17 0.74 12.8 0.05
## 2 1 7.7 0.64 0.21 2.2 0.08
## 3 2 6.8 0.39 0.34 7.4 0.02
## 4 3 6.3 0.28 0.47 11.2 0.04
## 5 4 7.4 0.35 0.20 13.9 0.05
## free.sulfure.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 24 126 0.99420 3.26 0.38 12.2
## 2 32 133 0.99560 3.27 0.45 9.9
## 3 38 133 0.99212 3.18 0.44 12.0
## 4 61 183 0.99592 3.12 0.51 9.5
## 5 63 229 0.99888 3.11 0.50 8.9
## quality wine_type quality_label
## 1 8 white high
## 2 5 red low
## 3 7 white medium
## 4 6 white medium
## 5 6 white medium
#Step 2 Pull out the necessary variables Drinking wine is like an art-work. There are many variables that can contribute to the taste. First we can use select() to grab the necessary variables: quality, quality_label,resiudal.sugar, alcohol, and wine_type. Then I will focus on showing the statisitcal summary and focus on pH vs all the other variables.
wine_df <-select(wine,X,residual.sugar,pH,alcohol,quality,wine_type,quality_label)
summary(wine_df)
## X residual.sugar pH alcohol quality
## Min. :0 Min. : 2.2 Min. :3.110 Min. : 8.9 Min. :5.0
## 1st Qu.:1 1st Qu.: 7.4 1st Qu.:3.120 1st Qu.: 9.5 1st Qu.:6.0
## Median :2 Median :11.2 Median :3.180 Median : 9.9 Median :6.0
## Mean :2 Mean : 9.5 Mean :3.188 Mean :10.5 Mean :6.4
## 3rd Qu.:3 3rd Qu.:12.8 3rd Qu.:3.260 3rd Qu.:12.0 3rd Qu.:7.0
## Max. :4 Max. :13.9 Max. :3.270 Max. :12.2 Max. :8.0
## wine_type quality_label
## Length:5 Length:5
## Class :character Class :character
## Mode :character Mode :character
##
##
##
ggplot(data=wine_df, aes(x=pH, y=quality, group=1)) +
geom_line()+
geom_point()
ggplot(data=wine_df, aes(x=pH, y=alcohol, group=1)) +
geom_line()+
geom_point()
ggplot(data=wine_df, aes(x=pH, y=residual.sugar, group=1)) +
geom_line()+
geom_point()
Based on this obsersation. From my reading that pH affects the texture
of the wine. I can see around pH 3.15-3.25 there is an optimal pH level
for wine that can have peak sugar, quality, and alcoholical level. Once
it goes past 3.25 pH level the wine can be a little bit more basic and
as a result can decrease the quality of the wine. When we look at the
statistical summary: Median :3.180. This shows that most wine that is
around this point has the optimal level of quality.
#Conclusion Don’t let your wine become basic or else the quality will decrease drastically.