Prediction on the Class of Wine

Thawatchai Phakwithoonchai

3/18/2020

Introduction

This assignment is a part of Coursera: Developing Data Products for week 4. This peer assessed assignment has two parts:
1. Shiny application on Rstudio’s servers. (link)
2. Slidify presentation about the application.

For the application, the following items must be included:
1. Some form of input (widget: textbox, radio button, checkbox, …)
2. Some operation on the ui input in sever.R
3. Some reactive output displayed as a result of server calculations
4. You must also include enough documentation so that a novice user could use your application.
5. The documentation should be at the Shiny website itself. Do not post to an external link.

The dataset is the wine dataset, which come from UCI Machine Learning Repository. It uses the chemical analysis to determine the origin of wines. There are 13 constituents found in each of the three types of wines.

The objective of application is to predict the class (type) of wine based on the given constituents. Also, there is a scatter plot of 3 highly correlated parameters in order to provide the visualization for further analysis.

Data Preparation

path <- "D:\\Coursera\\Data Science - Specialization (by Johns Hopkins University)\\9-Developing Data Products\\DevDatProd_AssignWk4\\MyApp"
setwd(path)
# Data reading
raw.data <- read.csv("wine.csv", header = FALSE)
colnames(raw.data) <- c("Class", "Alcohol", "Malic acid", "Ash", "Alkalinity of ash","Magnesium", "Total phenols", "Flavanoids",
                        "Nonflavanoid phenols", "Proanthocyanins", "Color intensity", "Hue", "OD280.OD315", "Proline")
summary(raw.data)
##      Class          Alcohol        Malic acid         Ash       
##  Min.   :1.000   Min.   :11.03   Min.   :0.740   Min.   :1.360  
##  1st Qu.:1.000   1st Qu.:12.36   1st Qu.:1.603   1st Qu.:2.210  
##  Median :2.000   Median :13.05   Median :1.865   Median :2.360  
##  Mean   :1.938   Mean   :13.00   Mean   :2.336   Mean   :2.367  
##  3rd Qu.:3.000   3rd Qu.:13.68   3rd Qu.:3.083   3rd Qu.:2.558  
##  Max.   :3.000   Max.   :14.83   Max.   :5.800   Max.   :3.230  
##  Alkalinity of ash   Magnesium      Total phenols     Flavanoids   
##  Min.   :10.60     Min.   : 70.00   Min.   :0.980   Min.   :0.340  
##  1st Qu.:17.20     1st Qu.: 88.00   1st Qu.:1.742   1st Qu.:1.205  
##  Median :19.50     Median : 98.00   Median :2.355   Median :2.135  
##  Mean   :19.49     Mean   : 99.74   Mean   :2.295   Mean   :2.029  
##  3rd Qu.:21.50     3rd Qu.:107.00   3rd Qu.:2.800   3rd Qu.:2.875  
##  Max.   :30.00     Max.   :162.00   Max.   :3.880   Max.   :5.080  
##  Nonflavanoid phenols Proanthocyanins Color intensity       Hue        
##  Min.   :0.1300       Min.   :0.410   Min.   : 1.280   Min.   :0.4800  
##  1st Qu.:0.2700       1st Qu.:1.250   1st Qu.: 3.220   1st Qu.:0.7825  
##  Median :0.3400       Median :1.555   Median : 4.690   Median :0.9650  
##  Mean   :0.3619       Mean   :1.591   Mean   : 5.058   Mean   :0.9574  
##  3rd Qu.:0.4375       3rd Qu.:1.950   3rd Qu.: 6.200   3rd Qu.:1.1200  
##  Max.   :0.6600       Max.   :3.580   Max.   :13.000   Max.   :1.7100  
##   OD280.OD315       Proline      
##  Min.   :1.270   Min.   : 278.0  
##  1st Qu.:1.938   1st Qu.: 500.5  
##  Median :2.780   Median : 673.5  
##  Mean   :2.612   Mean   : 746.9  
##  3rd Qu.:3.170   3rd Qu.: 985.0  
##  Max.   :4.000   Max.   :1680.0

Data analysis

# Correlation analysis (Correlation Coeff > 0.67)
cor.data <- cor(raw.data)
corrplot(cor.data, method = "color", type = "lower", addCoef.col = "black")

Application interface

Shiny application on Rstudio’s servers. (link)

Application interface is captured as following:

knitr::include_graphics("./MyApp/AppScreen.png", )