The document is for Data-607 for Fall 2018 for the course of CUNY SPS - MSDS. This deals with a dataset which has the data related to mushrooms.

library(plyr)
library(ggplot2)
mushrooms.dataset <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data", sep = ",", header = FALSE)

View(mushrooms.dataset)

names(mushrooms.dataset) <- c("class", "cap-shape", "cap-surface", "cap-color", "bruises",
                              "odor", 
                              "gill-attachment", "gill-spacing", "gill-size", "gill-color",
                              "stalk-shape", "stalk-root", "stalk-surface-above-ring",
                              "stalk-surface-below-ring", "stalk-color-above-ring",
                              "stalk-color-below-ring", "veil-type", "veil-color", 
                              "ring-number", "ring-type", "spore-print-color", 
                              "population", "habitat")

Reformatting the data set so that the data set is having the relevant values in the columns, and hence it will then be easier to read the data and understand it well. Here we are replacing the abbreviated data into relevant information so that it makes more sense. All the values have been converted to relevant and easily understood values under all the captured variables for this mushroom data set.

Please note here that this code is to convert the values for 23 variables in total and hence the code has been hidden from the html document to make it look cleaner.

## The following `from` values were not present in `x`: d
## The following `from` values were not present in `x`: u, z
## The following `from` values were not present in `x`: u
## The following `from` values were not present in `x`: c, s, z

A subset of columns has been created below keeping only the columns which are relevant in the context and which give more information regardsing the question - “Is there a parameter which determines if the mushroom of a particular quality are poisonous or not ?”

mushrooms.dataset.subset <- mushrooms.dataset[, c("class", "cap-shape", "cap-surface", "cap-color", "odor", "population", "habitat")]

# Fetching only the poisonous entries from the dataset to check for the various factors that might suggest if mushroom will be poisonous or edible

mushrooms.dataset.subset.poisonous <- subset(mushrooms.dataset.subset, class == "poisonous")

mushrooms.dataset.subset.edible <- subset(mushrooms.dataset.subset, class == "edible" )

Some basic statistics within the poisonous and edible mushrooms:

# How the odor determines the poisonous quality
ggplot(mushrooms.dataset.subset.poisonous, aes(x = factor(1), fill = factor(odor))) + geom_bar(width = 1) + coord_polar(theta = "y")

# How the odor determines the edible quality
ggplot(mushrooms.dataset.subset.edible, aes(x = factor(1), fill = factor(odor))) + geom_bar(width = 1) + coord_polar(theta = "y")

# Split of the habitats within the poisonous mushrooms
ggplot(mushrooms.dataset.subset.poisonous, aes(x = factor(1), fill = factor(habitat))) + geom_bar(width = 1) + coord_polar(theta = "y")

# split of the habitats within the edible mushrooms
ggplot(mushrooms.dataset.subset.edible, aes(x = factor(1), fill = factor(habitat))) + geom_bar(width = 1) + coord_polar(theta = "y")

Final Word:

1. All the foul smelling mushrooms fall under are poisonous in nature.

2. Only the mushrooms with the odor in the 3 categories - almond, anise or no smell - fall under the edible ones