Week 3 MSDA R Bridge Assignment

Mushroom Dataset

Description: Read selective columns of mushroom data from web (github) and transform the dataset.

library(knitr)
mush_rl <- "https://raw.githubusercontent.com/mkds/MSDA/master/Bridge/agaricus-lepiota.data"

Below code reads columns 1,6,9,10,21 from the file. If a class of a column is “NULL” in read.table command, then that column is skipped.

mushrooms <- read.table(file=mush_rl,sep=",",colClasses = c("factor",rep("NULL",4),"factor",rep("NULL",2),"factor","factor",rep("NULL",10),"factor",rep("NULL",2)),col.names=c("edible",rep("NULL",4),"odor",rep("NULL",2),"gill-size","gill-color",rep("NULL",10),"spore-print-color",rep("NULL",2)))

Check the dataset

str(mushrooms)
## 'data.frame':    8124 obs. of  5 variables:
##  $ edible           : Factor w/ 2 levels "e","p": 2 1 1 2 1 1 1 1 2 1 ...
##  $ odor             : Factor w/ 9 levels "a","c","f","l",..: 7 1 4 7 6 1 1 4 7 1 ...
##  $ gill.size        : Factor w/ 2 levels "b","n": 2 1 1 2 1 1 1 1 2 1 ...
##  $ gill.color       : Factor w/ 12 levels "b","e","g","h",..: 5 5 6 6 5 6 3 6 8 3 ...
##  $ spore.print.color: Factor w/ 9 levels "b","h","k","n",..: 3 4 4 3 4 3 3 4 3 3 ...
kable(head(mushrooms))
edible odor gill.size gill.color spore.print.color
p p n k k
e a b k n
e l b n n
p p n n k
e n b k n
e a b n k

Change the level of factors, so that it is more descriptive.

attr(mushrooms$edible,"levels") <- c("edible","poisonous")
attr(mushrooms$odor,"levels") <- c("almond","creosote","foul","anise","musty","none","pungent","spicy","fishy")
attr(mushrooms$gill.size,"levels") <- c("broad","narrow")
attr(mushrooms$gill.color,"levels")<-c("buff","red","gray","chocolate","black","brown","orange","pink","green","purple","white","yellow")
attr(mushrooms$spore.print.color,"levels")=c("buff","chocolate","black","brown","orange","green","purple","white","yellow")

Check the transformed dataset

str(mushrooms)
## 'data.frame':    8124 obs. of  5 variables:
##  $ edible           : Factor w/ 2 levels "edible","poisonous": 2 1 1 2 1 1 1 1 2 1 ...
##  $ odor             : Factor w/ 9 levels "almond","creosote",..: 7 1 4 7 6 1 1 4 7 1 ...
##  $ gill.size        : Factor w/ 2 levels "broad","narrow": 2 1 1 2 1 1 1 1 2 1 ...
##  $ gill.color       : Factor w/ 12 levels "buff","red","gray",..: 5 5 6 6 5 6 3 6 8 3 ...
##  $ spore.print.color: Factor w/ 9 levels "buff","chocolate",..: 3 4 4 3 4 3 3 4 3 3 ...
kable(head(mushrooms))
edible odor gill.size gill.color spore.print.color
poisonous pungent narrow black black
edible almond broad black brown
edible anise broad brown brown
poisonous pungent narrow brown black
edible none broad black brown
edible almond broad brown black