This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(readr)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------ tidyverse 1.2.1 --
## v ggplot2 3.2.0 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v ggplot2 3.2.0 v forcats 0.4.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(tidyr)
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
library(rpart)
library(rpart.plot)
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
library(adabag)
## Loading required package: foreach
##
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
##
## accumulate, when
## Loading required package: doParallel
## Loading required package: iterators
## Loading required package: parallel
library(ipred)
##
## Attaching package: 'ipred'
## The following object is masked from 'package:adabag':
##
## bagging
# Read csv
red <- read.csv(file.choose(), header = T)
red
#Modifying the variables
red$quality <- as.factor(red$quality)
red$rating <- ifelse(red$quality == 7 & red$quality == 8, "Excellent", ifelse(red$quality == 5 & red$quality == 6, "Normal", "Poor"))
red$rating <- as.factor(red$rating)
# Update the names of variables
colnames(red)[1] <- c("fixed_acidity")
colnames(red)[2] <- c("volatile_acidity")
colnames(red)[3] <- c("citric_acid")
colnames(red)[4] <- c("residual_sugar")
colnames(red)[6] <- c("free_sulfur_dioxide")
colnames(red)[7] <- c("total_sulfur_dioxide")
# Training & Validation
set.seed(123)
train.index <- sample(c(1:dim(red)[1]), dim(red)[1] * 0.8)
train.df <- red[train.index, ]
valid.df <- red[-train.index, ]
# Random Forest
red.rf <- randomForest(quality ~ ., data = train.df, ntree = 500, mtry = 4, nodesize = 8, importance = TRUE)
red.rf
##
## Call:
## randomForest(formula = quality ~ ., data = train.df, ntree = 500, mtry = 4, nodesize = 8, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 4
##
## OOB estimate of error rate: 32.84%
## Confusion matrix:
## 3 4 5 6 7 8 class.error
## 3 0 0 7 1 0 0 1.0000000
## 4 0 0 29 14 1 0 1.0000000
## 5 0 0 423 105 4 0 0.2048872
## 6 0 0 123 352 32 0 0.3057199
## 7 0 0 9 79 84 0 0.5116279
## 8 0 0 0 9 7 0 1.0000000
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.