Charlie Rosemond - Due 090119

Load car package

I begin by loading the car package, which will facilitate subsequent data manipulation.

library(car)
## Loading required package: carData

Read in comma-delimited file as a data frame

Upon downloading the mushrooms file (https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data) and uploading it to github, I read the raw file into R as a data frame called “mushrooms”. I then check the top six rows of the first six columns.

mushrooms <- read.csv("https://raw.githubusercontent.com/chrosemo/data607_fall19_week1/master/agaricus-lepiota.data", header = FALSE)
head(mushrooms[1:6])
##   V1 V2 V3 V4 V5 V6
## 1  p  x  s  n  t  p
## 2  e  x  s  y  t  a
## 3  e  b  s  w  t  l
## 4  p  x  y  w  t  p
## 5  e  x  s  g  f  n
## 6  e  x  y  y  t  a

Update the data frame with appropriate column names

I update the column names of “mushrooms” to reflect the names noted in the agaricus-lepiota.names file (https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.names) and again check the top six rows of the first six columns.

colnames(mushrooms) <- c("class", "cap_shape", "cap_surface", "cap_color", "bruises", "odor", "gill_attachment", "gill_spacing", "gill_size", "gill_color", "stalk_shape", "stalk_root", "stalk_surface_above_ring", "stalk_surface_below_ring", "stalk_color_above_ring", "stalk_color_below_ring", "veil_type", "veil_color", "ring_number", "ring_type", "spore_print_color", "population", "habitat")
head(mushrooms[1:6])
##   class cap_shape cap_surface cap_color bruises odor
## 1     p         x           s         n       t    p
## 2     e         x           s         y       t    a
## 3     e         b           s         w       t    l
## 4     p         x           y         w       t    p
## 5     e         x           s         g       f    n
## 6     e         x           y         y       t    a

Subset the data frame

I subset the data frame, creating a new one called “mushrooms_working” that keeps five columns: “class”, “cap_shape”, “cap_color”, “population”, and “habitat”. I then check the new data frame’s bottom six rows.

mushrooms_working <- mushrooms[c("class", "cap_shape", "cap_color", "population", "habitat")]
tail(mushrooms_working)
##      class cap_shape cap_color population habitat
## 8119     p         k         n          v       d
## 8120     e         k         n          c       l
## 8121     e         x         n          v       l
## 8122     e         f         n          c       l
## 8123     p         k         n          v       l
## 8124     e         x         n          c       l

Replace the abbreviations used in the data

Finally, I recode each of the columns using car’s recode function, replacing abbreviations with the appropriate values as noted in the aforementioned agaricus-lepiota.names file. I then check the recoded data frame’s bottom six rows.

mushrooms_working$class <- recode(mushrooms_working$class, "'e' = 'edible'; 'p' = 'poisonous'")
mushrooms_working$cap_shape <- recode(mushrooms_working$cap_shape, "'b' = 'bell'; 'c' = 'conical'; 'x' = 'convex'; 'f' = 'flat'; 'k' = 'knobbed'; 's' = 'sunken'")
mushrooms_working$cap_color <- recode(mushrooms_working$cap_color, "'n' = 'brown'; 'b' = 'buff'; 'c' = 'cinnamon'; 'g' = 'gray'; 'r' = 'green'; 'p' = 'pink'; 'u' = 'purple'; 'e' = 'red'; 'w' = 'white'; 'y' = 'yellow'")
mushrooms_working$population <- recode(mushrooms_working$population, "'a' = 'abundant'; 'c' = 'clustered'; 'n' = 'numerous'; 's' = 'scattered'; 'v' = 'several'; 'y' = 'solitary'")
mushrooms_working$habitat <- recode(mushrooms_working$habitat, "'g' = 'grasses'; 'l' = 'leaves'; 'm' = 'meadows'; 'p' = 'paths'; 'u' = 'urban'; 'w' = 'waste'; 'd' = 'woods'")
tail(mushrooms_working)
##          class cap_shape cap_color population habitat
## 8119 poisonous   knobbed     brown    several   woods
## 8120    edible   knobbed     brown  clustered  leaves
## 8121    edible    convex     brown    several  leaves
## 8122    edible      flat     brown  clustered  leaves
## 8123 poisonous   knobbed     brown    several  leaves
## 8124    edible    convex     brown  clustered  leaves