Author: Romerl Elizes

Yes, I went overboard. I am not as nifty than more seasoned R programmers, but this should do the job.

Part I. Load Data

Data will be loaded from the agaricus-lepioata.data from the appropriate website. I will parse the data as a csv file and execute summary function to show some important information of the raw data.

urlfile <- "https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"
tableInput <- read.table(file = urlfile, header = FALSE, sep = ",")
summary(tableInput)
##  V1       V2       V3             V4       V5             V6      
##  e:4208   b: 452   f:2320   n      :2284   f:4748   n      :3528  
##  p:3916   c:   4   g:   4   g      :1840   t:3376   f      :2160  
##           f:3152   s:2556   e      :1500            s      : 576  
##           k: 828   y:3244   y      :1072            y      : 576  
##           s:  32            w      :1040            a      : 400  
##           x:3656            b      : 168            l      : 400  
##                             (Other): 220            (Other): 484  
##  V7       V8       V9            V10       V11      V12      V13     
##  a: 210   c:6812   b:5612   b      :1728   e:3516   ?:2480   f: 552  
##  f:7914   w:1312   n:2512   p      :1492   t:4608   b:3776   k:2372  
##                             w      :1202            c: 556   s:5176  
##                             n      :1048            e:1120   y:  24  
##                             g      : 752            r: 192           
##                             h      : 732                             
##                             (Other):1170                             
##  V14           V15            V16       V17      V18      V19     
##  f: 600   w      :4464   w      :4384   p:8124   n:  96   n:  36  
##  k:2304   p      :1872   p      :1872            o:  96   o:7488  
##  s:4936   g      : 576   g      : 576            w:7924   t: 600  
##  y: 284   n      : 448   n      : 512            y:   8           
##           b      : 432   b      : 432                             
##           o      : 192   o      : 192                             
##           (Other): 140   (Other): 156                             
##  V20           V21       V22      V23     
##  e:2776   w      :2388   a: 384   d:3148  
##  f:  48   n      :1968   c: 340   g:2148  
##  l:1296   k      :1872   n: 400   l: 832  
##  n:  36   h      :1632   s:1248   m: 292  
##  p:3968   r      :  72   v:4040   p:1144  
##           b      :  48   y:1712   u: 368  
##           (Other): 144            w: 192

Part II. Create Data Frame with a subset of 5 Columns

The columns that will be extracted are columns 1, 3, 5, 7, and 9. The resultant table will show the top 10 rows of the data. Reference: http://stackoverflow.com/questions/5234117/how-to-drop-columns-by-name-in-a-data-frame

subsetTableInput <- tableInput[, -which(names(tableInput) %in% c("V2","V4","V6","V8","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23"))]
head(subsetTableInput, n=10)
##    V1 V3 V5 V7 V9
## 1   p  s  t  f  n
## 2   e  s  t  f  b
## 3   e  s  t  f  b
## 4   p  y  t  f  n
## 5   e  s  f  f  b
## 6   e  y  t  f  b
## 7   e  s  t  f  b
## 8   e  y  t  f  b
## 9   p  y  t  f  n
## 10  e  s  t  f  b

Part III. Create Meaningful Column Names

The columns will be renamed to appropriate column names: Toxicity, Cap Surface, Bruises, Gill Attachment, and Gill Size. The resultant table will show the top 10 rows of the data.

names(subsetTableInput) <- c("Toxicity", "CapSurface", "Bruises", "GillAttachment","GillSize")
head(subsetTableInput, n=10)
##    Toxicity CapSurface Bruises GillAttachment GillSize
## 1         p          s       t              f        n
## 2         e          s       t              f        b
## 3         e          s       t              f        b
## 4         p          y       t              f        n
## 5         e          s       f              f        b
## 6         e          y       t              f        b
## 7         e          s       t              f        b
## 8         e          y       t              f        b
## 9         p          y       t              f        n
## 10        e          s       t              f        b

Part IV. Replace abbreviations in Data with Meaningful Names

A. Change all column values of in Toxicity to their appropriate values

subsetTableInput <- within(subsetTableInput, levels(Toxicity)[levels(Toxicity) == "p"] <- "poisonous")
subsetTableInput <- within(subsetTableInput, levels(Toxicity)[levels(Toxicity) == "e"] <- "edible")

B. Change all column values of in CapSurface to their appropriate values

subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "f"] <- "fibrous")
subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "g"] <- "grooves")
subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "y"] <- "scaly")
subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "s"] <- "smooth")

C. Change all column values of in Bruises to their appropriate values

subsetTableInput <- within(subsetTableInput, levels(Bruises)[levels(Bruises) == "t"] <- "bruises")
subsetTableInput <- within(subsetTableInput, levels(Bruises)[levels(Bruises) == "f"] <- "no")

D. Change all column values of in GillAttachment to their appropriate values

subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "a"] <- "attached")
subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "d"] <- "descending")
subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "f"] <- "free")
subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "n"] <- "notched")

E. Change all column values of in GillSize to their appropriate values

subsetTableInput <- within(subsetTableInput, levels(GillSize)[levels(GillSize) == "b"] <- "broad")
subsetTableInput <- within(subsetTableInput, levels(GillSize)[levels(GillSize) == "n"] <- "narrow")

Part V. Final Output

head(subsetTableInput, n=40)
##     Toxicity CapSurface Bruises GillAttachment GillSize
## 1  poisonous     smooth bruises           free   narrow
## 2     edible     smooth bruises           free    broad
## 3     edible     smooth bruises           free    broad
## 4  poisonous      scaly bruises           free   narrow
## 5     edible     smooth      no           free    broad
## 6     edible      scaly bruises           free    broad
## 7     edible     smooth bruises           free    broad
## 8     edible      scaly bruises           free    broad
## 9  poisonous      scaly bruises           free   narrow
## 10    edible     smooth bruises           free    broad
## 11    edible      scaly bruises           free    broad
## 12    edible      scaly bruises           free    broad
## 13    edible     smooth bruises           free    broad
## 14 poisonous      scaly bruises           free   narrow
## 15    edible    fibrous      no           free    broad
## 16    edible    fibrous      no           free   narrow
## 17    edible    fibrous      no           free    broad
## 18 poisonous     smooth bruises           free   narrow
## 19 poisonous      scaly bruises           free   narrow
## 20 poisonous     smooth bruises           free   narrow
## 21    edible     smooth bruises           free    broad
## 22 poisonous      scaly bruises           free   narrow
## 23    edible      scaly bruises           free    broad
## 24    edible      scaly bruises           free    broad
## 25    edible     smooth bruises           free    broad
## 26 poisonous     smooth bruises           free   narrow
## 27    edible      scaly bruises           free    broad
## 28    edible      scaly bruises           free    broad
## 29    edible    fibrous      no           free   narrow
## 30    edible     smooth bruises           free   narrow
## 31    edible     smooth bruises           free    broad
## 32 poisonous      scaly bruises           free   narrow
## 33    edible      scaly bruises           free    broad
## 34    edible      scaly bruises           free    broad
## 35    edible      scaly bruises           free    broad
## 36    edible    fibrous bruises           free   narrow
## 37    edible    fibrous      no           free   narrow
## 38 poisonous      scaly bruises           free   narrow
## 39    edible    fibrous bruises           free   narrow
## 40    edible     smooth bruises           free    broad