9/1/2019

A dataset was loaded from the internet onto a data frame from the Mushrooms dataset and the following code was implemented:

mushrooms<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data", header= FALSE, sep=",")

We can see the dimensions of the data. This command should output [1] 8124 23, indicating that there are 8124 rows and 23 columns:

dim(mushrooms)

## [1] 8124   23

The column names were changed to reflect their full description:

colnames(mushrooms) <-c("EDIBILITY","CAP-SHAPE","CAP-SURFACE","CAP-COLOR","BRUISES","ODOR","GILL-ATTACHMENT","GILL-SPACING","GILL-SIZE","GILL-COLOR","STALK-SHAPE","STAL-ROOT","STAL-SURFACE-ABOVE-RING", "STALK-SURFACE-BELOW-RING", "STALK-COLOR-ABOVE-RING", "STALK-COLOR-BELOW-RING", "VEIL-TYPE", "VEILCOLOR", "RING-NUMBER", "RING-TYPE", "SPORE-PRINT-COLOR", "POPULATION", "HABITAT")

And now we can see the names:

names(mushrooms)

##  [1] "EDIBILITY"                "CAP-SHAPE"               
##  [3] "CAP-SURFACE"              "CAP-COLOR"               
##  [5] "BRUISES"                  "ODOR"                    
##  [7] "GILL-ATTACHMENT"          "GILL-SPACING"            
##  [9] "GILL-SIZE"                "GILL-COLOR"              
## [11] "STALK-SHAPE"              "STAL-ROOT"               
## [13] "STAL-SURFACE-ABOVE-RING"  "STALK-SURFACE-BELOW-RING"
## [15] "STALK-COLOR-ABOVE-RING"   "STALK-COLOR-BELOW-RING"  
## [17] "VEIL-TYPE"                "VEILCOLOR"               
## [19] "RING-NUMBER"              "RING-TYPE"               
## [21] "SPORE-PRINT-COLOR"        "POPULATION"              
## [23] "HABITAT"

Some of the characters representing data in the rows were changed to their actual name for a few variables, which will be used in the subset below:

mushrooms$EDIBILITY <- sub("p", "Poisonous", mushrooms$EDIBILITY ) 
mushrooms$ODOR <- sub("m", "Musty", mushrooms$ODOR) 
mushrooms$POPULATION <- sub("c", "Clustered", mushrooms$POPULATION) 
mushrooms$HABITAT <- sub("d", "Woods", mushrooms$HABITAT)

A subset of the columns in the dataset was created:

subset(mushrooms, EDIBILITY == "Poisonous" & ODOR == "Musty", select = c(EDIBILITY, ODOR, POPULATION, HABITAT))

##      EDIBILITY  ODOR POPULATION HABITAT
## 6416 Poisonous Musty  Clustered   Woods
## 6669 Poisonous Musty  Clustered   Woods
## 6856 Poisonous Musty  Clustered   Woods
## 6946 Poisonous Musty  Clustered   Woods
## 6992 Poisonous Musty  Clustered   Woods
## 7035 Poisonous Musty  Clustered   Woods
## 7066 Poisonous Musty  Clustered   Woods
## 7092 Poisonous Musty  Clustered   Woods
## 7101 Poisonous Musty  Clustered   Woods
## 7112 Poisonous Musty  Clustered   Woods
## 7147 Poisonous Musty  Clustered   Woods
## 7167 Poisonous Musty  Clustered   Woods
## 7231 Poisonous Musty  Clustered   Woods
## 7266 Poisonous Musty  Clustered   Woods
## 7286 Poisonous Musty  Clustered   Woods
## 7324 Poisonous Musty  Clustered   Woods
## 7337 Poisonous Musty  Clustered   Woods
## 7343 Poisonous Musty  Clustered   Woods
## 7369 Poisonous Musty  Clustered   Woods
## 7387 Poisonous Musty  Clustered   Woods
## 7470 Poisonous Musty  Clustered   Woods
## 7481 Poisonous Musty  Clustered   Woods
## 7486 Poisonous Musty  Clustered   Woods
## 7537 Poisonous Musty  Clustered   Woods
## 7636 Poisonous Musty  Clustered   Woods
## 7710 Poisonous Musty  Clustered   Woods
## 7715 Poisonous Musty  Clustered   Woods
## 7728 Poisonous Musty  Clustered   Woods
## 7802 Poisonous Musty  Clustered   Woods
## 7806 Poisonous Musty  Clustered   Woods
## 7821 Poisonous Musty  Clustered   Woods
## 7911 Poisonous Musty  Clustered   Woods
## 7942 Poisonous Musty  Clustered   Woods
## 7982 Poisonous Musty  Clustered   Woods
## 8096 Poisonous Musty  Clustered   Woods
## 8115 Poisonous Musty  Clustered   Woods

What is the mushrooms’ most common veil color? In this case a barplot is used to find the answer:

barplot(table(mushrooms$VEILCOLOR), ylim=c(0,8000), ylab="Frequency", main="Barplot of Veil Color Distribution")

“White” is the most common veil color by far.

Assignment 1 DATA607

Mario Pena

9/1/2019