I first load the necessary libraries and read the csv files. I then assign the output to a data frame.
library("bitops")
library("RCurl")
url = "https://raw.githubusercontent.com/chrisestevez/MSDA-Bridge/master/mushroom.csv"
Rdata = getURL(url)
MyData = read.csv(text = Rdata,header = FALSE,sep=",")
MyFinalData = data.frame(MyData)Here is an example of the unmodified data frame.
head(MyFinalData, n=10)## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
## 1 p x s n t p f c n k e e s s w w p w o p
## 2 e x s y t a f c b k e c s s w w p w o p
## 3 e b s w t l f c b n e c s s w w p w o p
## 4 p x y w t p f c n n e e s s w w p w o p
## 5 e x s g f n f w b k t e s s w w p w o e
## 6 e x y y t a f c b n e c s s w w p w o p
## 7 e b s w t a f c b g e c s s w w p w o p
## 8 e b y w t l f c b n e c s s w w p w o p
## 9 p x y w t p f c n p e e s s w w p w o p
## 10 e b s y t a f c b g e c s s w w p w o p
## V21 V22 V23
## 1 k s u
## 2 n n g
## 3 n n m
## 4 k s u
## 5 n a g
## 6 k n g
## 7 k n m
## 8 n s m
## 9 k v g
## 10 k s m
In this step I subset the data and rename columns V1,V3,V5,V9.
MyFinalData = subset(MyData,select = c(V1,V3,V5,V9))
colnames(MyFinalData) = c("MushroomType","CapSurface","Bruises","GillSize")Example of changed column headings.
head(MyFinalData, n=10)## MushroomType CapSurface Bruises GillSize
## 1 p s t n
## 2 e s t b
## 3 e s t b
## 4 p y t n
## 5 e s f b
## 6 e y t b
## 7 e s t b
## 8 e y t b
## 9 p y t n
## 10 e s t b
To replace the values within columns I used a method from JJ85 from stack overflow. I modified the implementation in one step.
http://stackoverflow.com/questions/23355806/invalid-factor-level-na-generated-r
MyFinalData$MushroomType = c('p'="poisonous",'e'="edible")[ as.character(MyFinalData$MushroomType)]
MyFinalData$CapSurface =c('f'="fibrous",'g'="grooves",y='scaly','s'="smooth")[ as.character(MyFinalData$CapSurface)]
MyFinalData$Bruises = c('t'="bruises",'f'="no")[ as.character(MyFinalData$Bruises)]
MyFinalData$GillSize = c('b'="broad",'n'="narrow")[ as.character(MyFinalData$GillSize)]Here is the example of the final output.
head(MyFinalData, n=10)## MushroomType CapSurface Bruises GillSize
## 1 poisonous smooth bruises narrow
## 2 edible smooth bruises broad
## 3 edible smooth bruises broad
## 4 poisonous scaly bruises narrow
## 5 edible smooth no broad
## 6 edible scaly bruises broad
## 7 edible smooth bruises broad
## 8 edible scaly bruises broad
## 9 poisonous scaly bruises narrow
## 10 edible smooth bruises broad