Yes, I went overboard. I am not as nifty than more seasoned R programmers, but this should do the job.
Data will be loaded from the agaricus-lepioata.data from the appropriate website. I will parse the data as a csv file and execute summary function to show some important information of the raw data.
urlfile <- "https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data"
tableInput <- read.table(file = urlfile, header = FALSE, sep = ",")
summary(tableInput)
## V1 V2 V3 V4 V5 V6
## e:4208 b: 452 f:2320 n :2284 f:4748 n :3528
## p:3916 c: 4 g: 4 g :1840 t:3376 f :2160
## f:3152 s:2556 e :1500 s : 576
## k: 828 y:3244 y :1072 y : 576
## s: 32 w :1040 a : 400
## x:3656 b : 168 l : 400
## (Other): 220 (Other): 484
## V7 V8 V9 V10 V11 V12 V13
## a: 210 c:6812 b:5612 b :1728 e:3516 ?:2480 f: 552
## f:7914 w:1312 n:2512 p :1492 t:4608 b:3776 k:2372
## w :1202 c: 556 s:5176
## n :1048 e:1120 y: 24
## g : 752 r: 192
## h : 732
## (Other):1170
## V14 V15 V16 V17 V18 V19
## f: 600 w :4464 w :4384 p:8124 n: 96 n: 36
## k:2304 p :1872 p :1872 o: 96 o:7488
## s:4936 g : 576 g : 576 w:7924 t: 600
## y: 284 n : 448 n : 512 y: 8
## b : 432 b : 432
## o : 192 o : 192
## (Other): 140 (Other): 156
## V20 V21 V22 V23
## e:2776 w :2388 a: 384 d:3148
## f: 48 n :1968 c: 340 g:2148
## l:1296 k :1872 n: 400 l: 832
## n: 36 h :1632 s:1248 m: 292
## p:3968 r : 72 v:4040 p:1144
## b : 48 y:1712 u: 368
## (Other): 144 w: 192
The columns that will be extracted are columns 1, 3, 5, 7, and 9. The resultant table will show the top 10 rows of the data. Reference: http://stackoverflow.com/questions/5234117/how-to-drop-columns-by-name-in-a-data-frame
subsetTableInput <- tableInput[, -which(names(tableInput) %in% c("V2","V4","V6","V8","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23"))]
head(subsetTableInput, n=10)
## V1 V3 V5 V7 V9
## 1 p s t f n
## 2 e s t f b
## 3 e s t f b
## 4 p y t f n
## 5 e s f f b
## 6 e y t f b
## 7 e s t f b
## 8 e y t f b
## 9 p y t f n
## 10 e s t f b
The columns will be renamed to appropriate column names: Toxicity, Cap Surface, Bruises, Gill Attachment, and Gill Size. The resultant table will show the top 10 rows of the data.
names(subsetTableInput) <- c("Toxicity", "CapSurface", "Bruises", "GillAttachment","GillSize")
head(subsetTableInput, n=10)
## Toxicity CapSurface Bruises GillAttachment GillSize
## 1 p s t f n
## 2 e s t f b
## 3 e s t f b
## 4 p y t f n
## 5 e s f f b
## 6 e y t f b
## 7 e s t f b
## 8 e y t f b
## 9 p y t f n
## 10 e s t f b
A. Change all column values of in Toxicity to their appropriate values
subsetTableInput <- within(subsetTableInput, levels(Toxicity)[levels(Toxicity) == "p"] <- "poisonous")
subsetTableInput <- within(subsetTableInput, levels(Toxicity)[levels(Toxicity) == "e"] <- "edible")
B. Change all column values of in CapSurface to their appropriate values
subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "f"] <- "fibrous")
subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "g"] <- "grooves")
subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "y"] <- "scaly")
subsetTableInput <- within(subsetTableInput, levels(CapSurface)[levels(CapSurface) == "s"] <- "smooth")
C. Change all column values of in Bruises to their appropriate values
subsetTableInput <- within(subsetTableInput, levels(Bruises)[levels(Bruises) == "t"] <- "bruises")
subsetTableInput <- within(subsetTableInput, levels(Bruises)[levels(Bruises) == "f"] <- "no")
D. Change all column values of in GillAttachment to their appropriate values
subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "a"] <- "attached")
subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "d"] <- "descending")
subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "f"] <- "free")
subsetTableInput <- within(subsetTableInput, levels(GillAttachment)[levels(GillAttachment) == "n"] <- "notched")
E. Change all column values of in GillSize to their appropriate values
subsetTableInput <- within(subsetTableInput, levels(GillSize)[levels(GillSize) == "b"] <- "broad")
subsetTableInput <- within(subsetTableInput, levels(GillSize)[levels(GillSize) == "n"] <- "narrow")
head(subsetTableInput, n=40)
## Toxicity CapSurface Bruises GillAttachment GillSize
## 1 poisonous smooth bruises free narrow
## 2 edible smooth bruises free broad
## 3 edible smooth bruises free broad
## 4 poisonous scaly bruises free narrow
## 5 edible smooth no free broad
## 6 edible scaly bruises free broad
## 7 edible smooth bruises free broad
## 8 edible scaly bruises free broad
## 9 poisonous scaly bruises free narrow
## 10 edible smooth bruises free broad
## 11 edible scaly bruises free broad
## 12 edible scaly bruises free broad
## 13 edible smooth bruises free broad
## 14 poisonous scaly bruises free narrow
## 15 edible fibrous no free broad
## 16 edible fibrous no free narrow
## 17 edible fibrous no free broad
## 18 poisonous smooth bruises free narrow
## 19 poisonous scaly bruises free narrow
## 20 poisonous smooth bruises free narrow
## 21 edible smooth bruises free broad
## 22 poisonous scaly bruises free narrow
## 23 edible scaly bruises free broad
## 24 edible scaly bruises free broad
## 25 edible smooth bruises free broad
## 26 poisonous smooth bruises free narrow
## 27 edible scaly bruises free broad
## 28 edible scaly bruises free broad
## 29 edible fibrous no free narrow
## 30 edible smooth bruises free narrow
## 31 edible smooth bruises free broad
## 32 poisonous scaly bruises free narrow
## 33 edible scaly bruises free broad
## 34 edible scaly bruises free broad
## 35 edible scaly bruises free broad
## 36 edible fibrous bruises free narrow
## 37 edible fibrous no free narrow
## 38 poisonous scaly bruises free narrow
## 39 edible fibrous bruises free narrow
## 40 edible smooth bruises free broad