Neural Networks
Wine Quality Analysis
You are a data scientist in the Great North American Wine Company (GNAWC). GNAWC is a family owned business and sells red wines all over North America. The Director of Quality Product, Mrs. Johnson, wants to analyze wine quality based on physicochemical tests. You are given a dataset containing the file wine.csv.
Data Source: https://archive.ics.uci.edu/dataset/186/wine+quality
Data Dictionary
This dataset contains the following columns:
Variable Name | Data Type | Description | Constraints/Rules |
---|
Question 1
Load the dataset wine.csv into memory.
Read the dataset into memory
Display the dimensions of the data frame (number of rows and columns)
## [1] 1599 12
Display the column names of the data frame
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
Question 2
Preprocess the inputs
Section A
Standardize the inputs using the scale() function.
scale all columns except the “quality” column
Display column names of the standardized inputs
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol"
Section B
Convert the standardized inputs to a data frame using the as.data.frame() function.
Section C
Split the data into a training set containing 3/4 of the original data (test set containing the remaining 1/4 of the original data).
Add back the quality column
Split the data into training set and testing set
set.seed(1) # For reproducibility
index <- sample(1:nrow(standard_data), 0.75 * nrow(standard_data))
training_data <- standard_data[index, ]
testing_data <- standard_data[-index, ]
Display the training dataset dimensions
## [1] 1199 12
Display the testing dataset dimensions
## [1] 400 12
Question 3
Build a neural networks model
Section A
The response is quality and the inputs are:
volatile.acidity
, density
, pH
,
and alcohol
. Please use 1 hidden layer with 1
neuron.
Build the neural network model
Section C
Forecast the wine quality in the test dataset.
predicted <- predict(nn_model, newdata = testing_data)
# Display the statistical summary of the the predicted values
summary(predicted)
## V1
## Min. :4.900
## 1st Qu.:5.213
## Median :5.469
## Mean :5.613
## 3rd Qu.:6.011
## Max. :6.559