Description: Read selective columns of mushroom data from web (github) and transform the dataset.
library(knitr)
mush_rl <- "https://raw.githubusercontent.com/mkds/MSDA/master/Bridge/agaricus-lepiota.data"
Below code reads columns 1,6,9,10,21 from the file. If a class of a column is “NULL” in read.table command, then that column is skipped.
mushrooms <- read.table(file=mush_rl,sep=",",colClasses = c("factor",rep("NULL",4),"factor",rep("NULL",2),"factor","factor",rep("NULL",10),"factor",rep("NULL",2)),col.names=c("edible",rep("NULL",4),"odor",rep("NULL",2),"gill-size","gill-color",rep("NULL",10),"spore-print-color",rep("NULL",2)))
Check the dataset
str(mushrooms)
## 'data.frame': 8124 obs. of 5 variables:
## $ edible : Factor w/ 2 levels "e","p": 2 1 1 2 1 1 1 1 2 1 ...
## $ odor : Factor w/ 9 levels "a","c","f","l",..: 7 1 4 7 6 1 1 4 7 1 ...
## $ gill.size : Factor w/ 2 levels "b","n": 2 1 1 2 1 1 1 1 2 1 ...
## $ gill.color : Factor w/ 12 levels "b","e","g","h",..: 5 5 6 6 5 6 3 6 8 3 ...
## $ spore.print.color: Factor w/ 9 levels "b","h","k","n",..: 3 4 4 3 4 3 3 4 3 3 ...
kable(head(mushrooms))
edible | odor | gill.size | gill.color | spore.print.color |
---|---|---|---|---|
p | p | n | k | k |
e | a | b | k | n |
e | l | b | n | n |
p | p | n | n | k |
e | n | b | k | n |
e | a | b | n | k |
Change the level of factors, so that it is more descriptive.
attr(mushrooms$edible,"levels") <- c("edible","poisonous")
attr(mushrooms$odor,"levels") <- c("almond","creosote","foul","anise","musty","none","pungent","spicy","fishy")
attr(mushrooms$gill.size,"levels") <- c("broad","narrow")
attr(mushrooms$gill.color,"levels")<-c("buff","red","gray","chocolate","black","brown","orange","pink","green","purple","white","yellow")
attr(mushrooms$spore.print.color,"levels")=c("buff","chocolate","black","brown","orange","green","purple","white","yellow")
Check the transformed dataset
str(mushrooms)
## 'data.frame': 8124 obs. of 5 variables:
## $ edible : Factor w/ 2 levels "edible","poisonous": 2 1 1 2 1 1 1 1 2 1 ...
## $ odor : Factor w/ 9 levels "almond","creosote",..: 7 1 4 7 6 1 1 4 7 1 ...
## $ gill.size : Factor w/ 2 levels "broad","narrow": 2 1 1 2 1 1 1 1 2 1 ...
## $ gill.color : Factor w/ 12 levels "buff","red","gray",..: 5 5 6 6 5 6 3 6 8 3 ...
## $ spore.print.color: Factor w/ 9 levels "buff","chocolate",..: 3 4 4 3 4 3 3 4 3 3 ...
kable(head(mushrooms))
edible | odor | gill.size | gill.color | spore.print.color |
---|---|---|---|---|
poisonous | pungent | narrow | black | black |
edible | almond | broad | black | brown |
edible | anise | broad | brown | brown |
poisonous | pungent | narrow | brown | black |
edible | none | broad | black | brown |
edible | almond | broad | brown | black |