R Week 3

Obtainig Mushroom data from GitHub repository

I first load the necessary libraries and read the csv files. I then assign the output to a data frame.

library("bitops")
library("RCurl")


url = "https://raw.githubusercontent.com/chrisestevez/MSDA-Bridge/master/mushroom.csv"

Rdata = getURL(url)

MyData = read.csv(text = Rdata,header = FALSE,sep=",")
MyFinalData = data.frame(MyData)

Example 1

Here is an example of the unmodified data frame.

head(MyFinalData, n=10)
##    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
## 1   p  x  s  n  t  p  f  c  n   k   e   e   s   s   w   w   p   w   o   p
## 2   e  x  s  y  t  a  f  c  b   k   e   c   s   s   w   w   p   w   o   p
## 3   e  b  s  w  t  l  f  c  b   n   e   c   s   s   w   w   p   w   o   p
## 4   p  x  y  w  t  p  f  c  n   n   e   e   s   s   w   w   p   w   o   p
## 5   e  x  s  g  f  n  f  w  b   k   t   e   s   s   w   w   p   w   o   e
## 6   e  x  y  y  t  a  f  c  b   n   e   c   s   s   w   w   p   w   o   p
## 7   e  b  s  w  t  a  f  c  b   g   e   c   s   s   w   w   p   w   o   p
## 8   e  b  y  w  t  l  f  c  b   n   e   c   s   s   w   w   p   w   o   p
## 9   p  x  y  w  t  p  f  c  n   p   e   e   s   s   w   w   p   w   o   p
## 10  e  b  s  y  t  a  f  c  b   g   e   c   s   s   w   w   p   w   o   p
##    V21 V22 V23
## 1    k   s   u
## 2    n   n   g
## 3    n   n   m
## 4    k   s   u
## 5    n   a   g
## 6    k   n   g
## 7    k   n   m
## 8    n   s   m
## 9    k   v   g
## 10   k   s   m

Subsetting data

In this step I subset the data and rename columns V1,V3,V5,V9.

MyFinalData = subset(MyData,select = c(V1,V3,V5,V9)) 

colnames(MyFinalData) = c("MushroomType","CapSurface","Bruises","GillSize")

Example 2

Example of changed column headings.

head(MyFinalData, n=10)
##    MushroomType CapSurface Bruises GillSize
## 1             p          s       t        n
## 2             e          s       t        b
## 3             e          s       t        b
## 4             p          y       t        n
## 5             e          s       f        b
## 6             e          y       t        b
## 7             e          s       t        b
## 8             e          y       t        b
## 9             p          y       t        n
## 10            e          s       t        b

Replacing values

To replace the values within columns I used a method from JJ85 from stack overflow. I modified the implementation in one step.

http://stackoverflow.com/questions/23355806/invalid-factor-level-na-generated-r

MyFinalData$MushroomType =  c('p'="poisonous",'e'="edible")[ as.character(MyFinalData$MushroomType)]
MyFinalData$CapSurface  =c('f'="fibrous",'g'="grooves",y='scaly','s'="smooth")[ as.character(MyFinalData$CapSurface)]
MyFinalData$Bruises = c('t'="bruises",'f'="no")[ as.character(MyFinalData$Bruises)]
MyFinalData$GillSize = c('b'="broad",'n'="narrow")[ as.character(MyFinalData$GillSize)]

Example 3

Here is the example of the final output.

head(MyFinalData, n=10)
##    MushroomType CapSurface Bruises GillSize
## 1     poisonous     smooth bruises   narrow
## 2        edible     smooth bruises    broad
## 3        edible     smooth bruises    broad
## 4     poisonous      scaly bruises   narrow
## 5        edible     smooth      no    broad
## 6        edible      scaly bruises    broad
## 7        edible     smooth bruises    broad
## 8        edible      scaly bruises    broad
## 9     poisonous      scaly bruises   narrow
## 10       edible     smooth bruises    broad