Introduction
The main aim of this project is to find associations between mushroom
characteristics and edibility. The dataset includes information about
more than 8000 different mushrooms. Each mushroom is described with 23
characteristics, which are:
1. edibility (edible or poisonous)
2. cap shape (bell, conical, convex, flat, knobbed, sunken)
3. surface of the cap (fibrous, grooves, scaly, smooth)
4. color of the cap (brown, buff, cinnamon,gray,green, pink, purple,
red, white, yellow)
5. bruises? (whether bruises appear or not)
6. odor (almond, anise, creosote, fishy, foul, musty, none, pungent,
spicy)
7. gill attachment (attached, descending, free, notched)
8. gill spacing (close, crowded, distant)
9. gill size (broad, narrow)
10. gill color (black, brown, buff, chocolate, gray, green, orange,
pink, purple, red, white, yellow)
11. stalk shape (enlarging, tapering)
12. stalk root (bulbous, club, cup, equal, rhizomorphs, rooted,
missing)
13. stalk surface above ring (fibrous, scaly, silky, smooth)
14. stalk surface below ring (fibrous, scaly, silky, smooth)
15. stalk color above ring (brown, buff, cinnamon, gray, orange, pink,
red, white, yellow)
16. stalk color below ring (brown, buff, cinnamon, gray, orange, pink,
pred, white, yellow)
17. veil type (partial, universal)
18. veil color (brown, orange, white, yellow)
19. ring number (none, one, two)
20. ring type (cobwebby, evanescent, flaring, large, none, pendant,
sheathing, zone)
21. spore print color (black, brown, buff, chocolate, green, orange,
purple, white, yellow)
22. population (abundant, clustered, numerous, scattered, several,
solitary)
23. habitat (grasses, leaves, meadows, paths, urban, waste, woods)
The main issue in the original dataset was that all the features were
represented by one letter.
head(original_data)
## p x s n t p.1 f c n.1 k e e.1 s.1 s.2 w w.1 p.2 w.2 o p.3 k.1 s.3 u
## 1 e x s y t a f c b k e c s s w w p w o p n n g
## 2 e b s w t l f c b n e c s s w w p w o p n n m
## 3 p x y w t p f c n n e e s s w w p w o p k s u
## 4 e x s g f n f w b k t e s s w w p w o e n a g
## 5 e x y y t a f c b n e c s s w w p w o p k n g
## 6 e b s w t a f c b g e c s s w w p w o p k n m
It was needed to encode all the features according to the description of the dataset, and rename some of them (the colors, for example are repeated, and during transformation of the data to the transaction form they are deleted automatically). The form of the encoded dataset is presented below.
head(mush)
## edibility cap_shape cap_surface cap_col bruises odor gill_attachment gill_spacing gill_size gill_color stalk_shape
## 1 edible convex_cap smooth.surface yellow.cap bruises almond free close broad black.gill enlarging
## 2 edible bell_cap smooth.surface white.cap bruises anise free close broad brown.gill enlarging
## 3 poisonous convex_cap scaly.surface white.cap bruises pungent free close narrow brown.gill enlarging
## 4 edible convex_cap smooth.surface gray.cap no_bruises none free crowded broad black.gill tapering
## 5 edible convex_cap scaly.surface yellow.cap bruises almond free close broad brown.gill enlarging
## 6 edible bell_cap smooth.surface white.cap bruises almond free close broad gray.gill enlarging
## stalk_root stalk_surface_above_ring stalk_surface_below__ring stalk_color_above_ring stalk_below_ring veil_type veil_color
## 1 club smooth.above.ring smooth.below.ring white.above.ring white.below.ring partial white.veil
## 2 club smooth.above.ring smooth.below.ring white.above.ring white.below.ring partial white.veil
## 3 equal smooth.above.ring smooth.below.ring white.above.ring white.below.ring partial white.veil
## 4 equal smooth.above.ring smooth.below.ring white.above.ring white.below.ring partial white.veil
## 5 club smooth.above.ring smooth.below.ring white.above.ring white.below.ring partial white.veil
## 6 club smooth.above.ring smooth.below.ring white.above.ring white.below.ring partial white.veil
## ring_number ring_type spore_print_color population habitat
## 1 one pendant brown.spore numerous grasses
## 2 one pendant brown.spore numerous meadows
## 3 one pendant black.spore scattered urban
## 4 one evanescent brown.spore abundant grasses
## 5 one pendant black.spore numerous grasses
## 6 one pendant black.spore numerous meadows
The transactions:
inspect(mush_transactions[1:2])
## items
## [1] {almond,
## black.gill,
## broad,
## brown.spore,
## bruises,
## close,
## club,
## convex_cap,
## edible,
## enlarging,
## free,
## grasses,
## numerous,
## one,
## partial,
## pendant,
## smooth.above.ring,
## smooth.below.ring,
## smooth.surface,
## white.above.ring,
## white.below.ring,
## white.veil,
## yellow.cap}
## [2] {anise,
## bell_cap,
## broad,
## brown.gill,
## brown.spore,
## bruises,
## close,
## club,
## edible,
## enlarging,
## free,
## meadows,
## numerous,
## one,
## partial,
## pendant,
## smooth.above.ring,
## smooth.below.ring,
## smooth.surface,
## white.above.ring,
## white.below.ring,
## white.cap,
## white.veil}
Let us apply apriory algorithm to form the association rules. We need to take into account that the feature “poisonous” will appear in half of the transactions, and feature “edible” in the other half, so we should control for these features. Additionaly, the dataset might produce too much meaningless rules, so it is needed to set relatively high support and confidence level.
inspect(mushroom_rules[1:10])
## lhs rhs support confidence coverage lift count
## [1] {buff.above.ring} => {poisonous} 0.05318232 1.0000000 0.05318232 2.074840 432
## [2] {buff.below.ring} => {poisonous} 0.05318232 1.0000000 0.05318232 2.074840 432
## [3] {brown.above.ring} => {poisonous} 0.05318232 0.9642857 0.05515204 2.000739 432
## [4] {purple.gill} => {edible} 0.05465961 0.9024390 0.06056876 1.742042 444
## [5] {brown.below.ring} => {poisonous} 0.05515204 0.8750000 0.06303090 1.815485 448
## [6] {club} => {edible} 0.06303090 0.9208633 0.06844762 1.777608 512
## [7] {gray.above.ring} => {edible} 0.07090976 1.0000000 0.07090976 1.930371 576
## [8] {gray.below.ring} => {edible} 0.07090976 1.0000000 0.07090976 1.930371 576
## [9] {spicy} => {poisonous} 0.07090976 1.0000000 0.07090976 2.074840 576
## [10] {fishy} => {poisonous} 0.07090976 1.0000000 0.07090976 2.074840 576
Now, let us sort the rules based on their lift, confidence and support.
inspect(sort(mushroom_rules, by = "lift")[1:10])
## lhs rhs support confidence coverage lift count
## [1] {buff.above.ring} => {poisonous} 0.05318232 1 0.05318232 2.07484 432
## [2] {buff.below.ring} => {poisonous} 0.05318232 1 0.05318232 2.07484 432
## [3] {spicy} => {poisonous} 0.07090976 1 0.07090976 2.07484 576
## [4] {fishy} => {poisonous} 0.07090976 1 0.07090976 2.07484 576
## [5] {large} => {poisonous} 0.15954697 1 0.15954697 2.07484 1296
## [6] {buff.gill} => {poisonous} 0.21272929 1 0.21272929 2.07484 1728
## [7] {foul} => {poisonous} 0.26591161 1 0.26591161 2.07484 2160
## [8] {buff.above.ring, large} => {poisonous} 0.05318232 1 0.05318232 2.07484 432
## [9] {buff.above.ring, chocolate.spore} => {poisonous} 0.05318232 1 0.05318232 2.07484 432
## [10] {buff.above.ring, foul} => {poisonous} 0.05318232 1 0.05318232 2.07484 432
inspect(sort(mushroom_rules, by = "confidence")[1:10])
## lhs rhs support confidence coverage lift count
## [1] {buff.above.ring} => {poisonous} 0.05318232 1 0.05318232 2.074840 432
## [2] {buff.below.ring} => {poisonous} 0.05318232 1 0.05318232 2.074840 432
## [3] {gray.above.ring} => {edible} 0.07090976 1 0.07090976 1.930371 576
## [4] {gray.below.ring} => {edible} 0.07090976 1 0.07090976 1.930371 576
## [5] {spicy} => {poisonous} 0.07090976 1 0.07090976 2.074840 576
## [6] {fishy} => {poisonous} 0.07090976 1 0.07090976 2.074840 576
## [7] {large} => {poisonous} 0.15954697 1 0.15954697 2.074840 1296
## [8] {buff.gill} => {poisonous} 0.21272929 1 0.21272929 2.074840 1728
## [9] {foul} => {poisonous} 0.26591161 1 0.26591161 2.074840 2160
## [10] {buff.above.ring, large} => {poisonous} 0.05318232 1 0.05318232 2.074840 432
inspect(sort(mushroom_rules, by = "confidence")[1:10])
## lhs rhs support confidence coverage lift count
## [1] {buff.above.ring} => {poisonous} 0.05318232 1 0.05318232 2.074840 432
## [2] {buff.below.ring} => {poisonous} 0.05318232 1 0.05318232 2.074840 432
## [3] {gray.above.ring} => {edible} 0.07090976 1 0.07090976 1.930371 576
## [4] {gray.below.ring} => {edible} 0.07090976 1 0.07090976 1.930371 576
## [5] {spicy} => {poisonous} 0.07090976 1 0.07090976 2.074840 576
## [6] {fishy} => {poisonous} 0.07090976 1 0.07090976 2.074840 576
## [7] {large} => {poisonous} 0.15954697 1 0.15954697 2.074840 1296
## [8] {buff.gill} => {poisonous} 0.21272929 1 0.21272929 2.074840 1728
## [9] {foul} => {poisonous} 0.26591161 1 0.26591161 2.074840 2160
## [10] {buff.above.ring, large} => {poisonous} 0.05318232 1 0.05318232 2.074840 432
Based on the top-ten rules chosen by the highest confidence, lift and support, we can suggest that a mushroom is poisonous if it has spicy, foul or fishy smell, buff color above or below the ring, large ring and buff gill color.