A dataset was loaded from the internet onto a data frame from the Mushrooms dataset and the following code was implemented:
mushrooms<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/agaricus-lepiota.data", header= FALSE, sep=",")
We can see the dimensions of the data. This command should output [1] 8124 23, indicating that there are 8124 rows and 23 columns:
dim(mushrooms)
## [1] 8124 23
The column names were changed to reflect their full description:
colnames(mushrooms) <-c("EDIBILITY","CAP-SHAPE","CAP-SURFACE","CAP-COLOR","BRUISES","ODOR","GILL-ATTACHMENT","GILL-SPACING","GILL-SIZE","GILL-COLOR","STALK-SHAPE","STAL-ROOT","STAL-SURFACE-ABOVE-RING", "STALK-SURFACE-BELOW-RING", "STALK-COLOR-ABOVE-RING", "STALK-COLOR-BELOW-RING", "VEIL-TYPE", "VEILCOLOR", "RING-NUMBER", "RING-TYPE", "SPORE-PRINT-COLOR", "POPULATION", "HABITAT")
And now we can see the names:
names(mushrooms)
## [1] "EDIBILITY" "CAP-SHAPE"
## [3] "CAP-SURFACE" "CAP-COLOR"
## [5] "BRUISES" "ODOR"
## [7] "GILL-ATTACHMENT" "GILL-SPACING"
## [9] "GILL-SIZE" "GILL-COLOR"
## [11] "STALK-SHAPE" "STAL-ROOT"
## [13] "STAL-SURFACE-ABOVE-RING" "STALK-SURFACE-BELOW-RING"
## [15] "STALK-COLOR-ABOVE-RING" "STALK-COLOR-BELOW-RING"
## [17] "VEIL-TYPE" "VEILCOLOR"
## [19] "RING-NUMBER" "RING-TYPE"
## [21] "SPORE-PRINT-COLOR" "POPULATION"
## [23] "HABITAT"
Some of the characters representing data in the rows were changed to their actual name for a few variables, which will be used in the subset below:
mushrooms$EDIBILITY <- sub("p", "Poisonous", mushrooms$EDIBILITY )
mushrooms$ODOR <- sub("m", "Musty", mushrooms$ODOR)
mushrooms$POPULATION <- sub("c", "Clustered", mushrooms$POPULATION)
mushrooms$HABITAT <- sub("d", "Woods", mushrooms$HABITAT)
A subset of the columns in the dataset was created:
subset(mushrooms, EDIBILITY == "Poisonous" & ODOR == "Musty", select = c(EDIBILITY, ODOR, POPULATION, HABITAT))
## EDIBILITY ODOR POPULATION HABITAT
## 6416 Poisonous Musty Clustered Woods
## 6669 Poisonous Musty Clustered Woods
## 6856 Poisonous Musty Clustered Woods
## 6946 Poisonous Musty Clustered Woods
## 6992 Poisonous Musty Clustered Woods
## 7035 Poisonous Musty Clustered Woods
## 7066 Poisonous Musty Clustered Woods
## 7092 Poisonous Musty Clustered Woods
## 7101 Poisonous Musty Clustered Woods
## 7112 Poisonous Musty Clustered Woods
## 7147 Poisonous Musty Clustered Woods
## 7167 Poisonous Musty Clustered Woods
## 7231 Poisonous Musty Clustered Woods
## 7266 Poisonous Musty Clustered Woods
## 7286 Poisonous Musty Clustered Woods
## 7324 Poisonous Musty Clustered Woods
## 7337 Poisonous Musty Clustered Woods
## 7343 Poisonous Musty Clustered Woods
## 7369 Poisonous Musty Clustered Woods
## 7387 Poisonous Musty Clustered Woods
## 7470 Poisonous Musty Clustered Woods
## 7481 Poisonous Musty Clustered Woods
## 7486 Poisonous Musty Clustered Woods
## 7537 Poisonous Musty Clustered Woods
## 7636 Poisonous Musty Clustered Woods
## 7710 Poisonous Musty Clustered Woods
## 7715 Poisonous Musty Clustered Woods
## 7728 Poisonous Musty Clustered Woods
## 7802 Poisonous Musty Clustered Woods
## 7806 Poisonous Musty Clustered Woods
## 7821 Poisonous Musty Clustered Woods
## 7911 Poisonous Musty Clustered Woods
## 7942 Poisonous Musty Clustered Woods
## 7982 Poisonous Musty Clustered Woods
## 8096 Poisonous Musty Clustered Woods
## 8115 Poisonous Musty Clustered Woods
What is the mushrooms’ most common veil color? In this case a barplot is used to find the answer:
barplot(table(mushrooms$VEILCOLOR), ylim=c(0,8000), ylab="Frequency", main="Barplot of Veil Color Distribution")
“White” is the most common veil color by far.