Red_Wine

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(readr)
library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --

## v ggplot2 3.2.0     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   0.8.3     v stringr 1.4.0
## v ggplot2 3.2.0     v forcats 0.4.0

## -- Conflicts --------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2)
library(tidyr)
library(randomForest)

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

library(rpart)
library(rpart.plot)
library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

library(adabag)

## Loading required package: foreach

## 
## Attaching package: 'foreach'

## The following objects are masked from 'package:purrr':
## 
##     accumulate, when

## Loading required package: doParallel

## Loading required package: iterators

## Loading required package: parallel

library(ipred)

## 
## Attaching package: 'ipred'

## The following object is masked from 'package:adabag':
## 
##     bagging

# Read csv
red <- read.csv(file.choose(), header = T)
red

#Modifying the variables
red$quality <- as.factor(red$quality)
red$rating <- ifelse(red$quality == 7 & red$quality == 8, "Excellent", ifelse(red$quality == 5 & red$quality == 6, "Normal", "Poor"))
red$rating <- as.factor(red$rating)

# Update the names of variables
colnames(red)[1] <- c("fixed_acidity") 
colnames(red)[2] <- c("volatile_acidity") 
colnames(red)[3] <- c("citric_acid") 
colnames(red)[4] <- c("residual_sugar") 
colnames(red)[6] <- c("free_sulfur_dioxide") 
colnames(red)[7] <- c("total_sulfur_dioxide")


# Training & Validation
set.seed(123)
train.index <- sample(c(1:dim(red)[1]), dim(red)[1] * 0.8)
train.df <- red[train.index, ]
valid.df <- red[-train.index, ]



# Random Forest
red.rf <- randomForest(quality ~ ., data = train.df, ntree = 500, mtry = 4, nodesize = 8, importance = TRUE)
red.rf

## 
## Call:
##  randomForest(formula = quality ~ ., data = train.df, ntree = 500,      mtry = 4, nodesize = 8, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 4
## 
##         OOB estimate of  error rate: 32.84%
## Confusion matrix:
##   3 4   5   6  7 8 class.error
## 3 0 0   7   1  0 0   1.0000000
## 4 0 0  29  14  1 0   1.0000000
## 5 0 0 423 105  4 0   0.2048872
## 6 0 0 123 352 32 0   0.3057199
## 7 0 0   9  79 84 0   0.5116279
## 8 0 0   0   9  7 0   1.0000000

Red_Wine

Sagar Khurana

March 4, 2018

R Markdown

Including Plots