Neural Networks


Wine Quality Analysis


You are a data scientist in the Great North American Wine Company (GNAWC). GNAWC is a family owned business and sells red wines all over North America. The Director of Quality Product, Mrs. Johnson, wants to analyze wine quality based on physicochemical tests. You are given a dataset containing the file wine.csv.

Data Source: https://archive.ics.uci.edu/dataset/186/wine+quality

Data Dictionary

This dataset contains the following columns:

Variable Name Data Type Description Constraints/Rules


Question 1

Load the dataset wine.csv into memory.

Read the dataset into memory

Wine.df <- read.csv("data/wine.csv")

Display the dimensions of the data frame (number of rows and columns)

dim(Wine.df)
## [1] 1599   12

Display the column names of the data frame

colnames(Wine.df)
##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"


Question 2

Preprocess the inputs

Section A

Standardize the inputs using the scale() function.

scale all columns except the “quality” column

scaled.wine <- scale(Wine.df[, -ncol(Wine.df)])

Display column names of the standardized inputs

colnames(scaled.wine)
##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"

Section B

Convert the standardized inputs to a data frame using the as.data.frame() function.

standard_data <- as.data.frame(scaled.wine)

Section C

Split the data into a training set containing 3/4 of the original data (test set containing the remaining 1/4 of the original data).

Add back the quality column

standard_data$quality <- Wine.df$quality

Split the data into training set and testing set

set.seed(1) # For reproducibility
index <- sample(1:nrow(standard_data), 0.75 * nrow(standard_data))
training_data <- standard_data[index, ]
testing_data <- standard_data[-index, ]

Display the training dataset dimensions

dim(training_data)
## [1] 1199   12

Display the testing dataset dimensions

dim(testing_data)
## [1] 400  12


Question 3

Build a neural networks model

Section A

The response is quality and the inputs are: volatile.acidity, density, pH, and alcohol. Please use 1 hidden layer with 1 neuron.

Build the neural network model

nn_model <- neuralnet(quality ~ volatile.acidity + density + pH + alcohol,
                data = training_data, hidden = c(1))

Section B

Plot the neural networks.

plot(nn_model)

Neural Networks Plot

Section C

Forecast the wine quality in the test dataset.

predicted <- predict(nn_model, newdata = testing_data)

# Display the statistical summary of the the predicted values
summary(predicted)
##        V1       
##  Min.   :4.900  
##  1st Qu.:5.213  
##  Median :5.469  
##  Mean   :5.613  
##  3rd Qu.:6.011  
##  Max.   :6.559

Section D

Get the observed wine quality of the test dataset.

actual <- testing_data$quality

# Display the statistical summary of the observed wine quality
summary(actual)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.652   6.000   8.000

Section E

Compute test error (MSE).

mean_squared_error <- mean((actual - predicted)^2)
mean_squared_error
## [1] 0.4573465