Tidyverse Assignment Part 02 Extended from SHaslett-607-Tidyverse-assignment.Rmd
Assignment Overview
Create an Example Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset.
Assignment Response
For this assignment, I have chosen to demonstrate the basic functionailty supplied by the dplry package that comes with tidyverse.
Datasource
https://www.kaggle.com/residentmario/ramen-ratings/data
Import the Ramen Ratings dataset using the “readr” package.
Check that the data imported successfully.
## # A tibble: 6 x 7
## Review.. Brand Variety Style Country Stars Top.Ten
## <int> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 2580 New Touch "T's Restaurant Tantanme~ Cup Japan 3.75 ""
## 2 2579 Just Way Noodles Spicy Hot Sesame~ Pack Taiwan 1 ""
## 3 2578 Nissin Cup Noodles Chicken Vege~ Cup USA 2.25 ""
## 4 2577 Wei Lih GGE Ramen Snack Tomato F~ Pack Taiwan 2.75 ""
## 5 2576 Ching's ~ Singapore Curry Pack India 3.75 ""
## 6 2575 Samyang ~ Kimchi song Song Ramen Pack South K~ 4.75 ""
Using the “dplyr” package’s “slice” function, select a specific range of rows from the dataset.
## # A tibble: 10 x 7
## Review.. Brand Variety Style Country Stars Top.Ten
## <int> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 2571 KOKA The Original Spicy Stir-~ Pack Singapo~ 2.5 ""
## 2 2570 Tao Kae~ Creamy tom Yum Kung Flav~ Pack Thailand 5 ""
## 3 2569 Yamachan Yokohama Tonkotsu Shoyu Pack USA 5 ""
## 4 2568 Nongshim Mr. Bibim Stir-Fried Kim~ Pack South K~ 4.25 ""
## 5 2567 Nissin Deka Buto Kimchi Pork Fl~ Bowl Japan 4.5 ""
## 6 2566 Nissin Demae Ramen Bar Noodle A~ Pack Hong Ko~ 5 ""
## 7 2565 KOKA Mushroom Flavour Instant~ Cup Singapo~ 3.5 ""
## 8 2564 TRDP Mario Masala Noodles Pack India 3.75 ""
## 9 2563 Yamachan Tokyo Shoyu Ramen Pack USA 5 ""
## 10 2562 Binh Tay Mi Hai Cua Pack Vietnam 4 ""
Now use the “slice” function to select 2 row ranges, and 4 specific rows.
# Select rows 2 to 9, 26 to 30, and rows 40, 21, 16, and 35.
multiple_selected_rows <- ramen_ratings %>% slice(c(2:9, 26:30, 40, 21, 16, 35))
multiple_selected_rows## # A tibble: 17 x 7
## Review.. Brand Variety Style Country Stars Top.Ten
## <int> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 2579 Just Way Noodles Spicy Hot Sesame~ Pack Taiwan 1 ""
## 2 2578 Nissin Cup Noodles Chicken Vege~ Cup USA 2.25 ""
## 3 2577 Wei Lih GGE Ramen Snack Tomato F~ Pack Taiwan 2.75 ""
## 4 2576 Ching's~ Singapore Curry Pack India 3.75 ""
## 5 2575 Samyang~ Kimchi song Song Ramen Pack South K~ 4.75 ""
## 6 2574 Acecook Spice Deli Tantan Men Wi~ Cup Japan 4 ""
## 7 2573 Ikeda S~ Nabeyaki Kitsune Udon Tray Japan 3.75 ""
## 8 2572 Ripe'n'~ Hokkaido Soy Sauce Ramen Pack Japan 0.25 ""
## 9 2555 Samyang~ Song Song Kimchi Big Bowl Bowl South K~ 4.25 ""
## 10 2554 Yum-Mie Instant Noodles Beef In ~ Pack Ghana 3.5 ""
## 11 2553 Nissin Hakata Ramen Noodle Whit~ Bowl Japan 4.75 ""
## 12 2552 MyKuali Penang White Curry Rice ~ Bowl Malaysia 5 ""
## 13 2551 KOKA Signature Tom Yum Flavor~ Pack Singapo~ 4 ""
## 14 2541 Nissin Cup Noodles Very Veggie ~ Cup USA 5 ""
## 15 2560 Nissin Cup Noodles Laksa Flavour Cup Hong Ko~ 4.25 ""
## 16 2565 KOKA Mushroom Flavour Instant~ Cup Singapo~ 3.5 ""
## 17 2546 New Tou~ Sugo-Men Kyoto Backfat S~ Bowl Japan 3.75 ""
Use dplyr’s “filter” function to select rows based on specified conditions.
In this example, we will only select rows that have a 5 Star rating.
## # A tibble: 369 x 7
## Review.. Brand Variety Style Country Stars Top.Ten
## <int> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 2570 Tao Kae ~ Creamy tom Yum Kung Fla~ Pack Thailand 5 ""
## 2 2569 Yamachan Yokohama Tonkotsu Shoyu Pack USA 5 ""
## 3 2566 Nissin Demae Ramen Bar Noodle ~ Pack Hong Ko~ 5 ""
## 4 2563 Yamachan Tokyo Shoyu Ramen Pack USA 5 ""
## 5 2559 Jackpot ~ Beef Ramen Pack USA 5 ""
## 6 2558 KOKA Creamy Soup With Crushe~ Cup Singapo~ 5 ""
## 7 2552 MyKuali Penang White Curry Rice~ Bowl Malaysia 5 ""
## 8 2550 Samyang ~ Paegaejang Ramen Pack South K~ 5 ""
## 9 2545 KOKA Instant Noodles Laksa S~ Pack Singapo~ 5 ""
## 10 2543 KOKA Curry Flavour Instant N~ Cup Singapo~ 5 ""
## # ... with 359 more rows
Finally, use dplyr’s “select” function to select specific data columns from the dataset.
In this example, we will select the “Brand”, “Variety”, “Country”, “Stars”, and “Top.Ten” columns for brands that have a Top.Ten listing.
# Select only the rows with Top.Ten column entries whilst removing those
# with NA, empty, or "\n" values.
ramen_ratings_filtered <- filter(ramen_ratings, Top.Ten != "" & !is.na(Top.Ten) & Top.Ten != "\n")
ramen_top_ten <- select(ramen_ratings_filtered, c("Brand", "Variety", "Country", "Stars", "Top.Ten"))
ramen_top_ten## # A tibble: 37 x 5
## Brand Variety Country Stars Top.Ten
## <fct> <fct> <fct> <fct> <fct>
## 1 MAMA Instant Noodles Coconut Milk Flavour Myanmar 5 2016 #~
## 2 Prima Taste Singapore Laksa Wholegrain La Mian Singapo~ 5 2016 #1
## 3 Prima Juzz's Mee Creamy Chicken Flavour Singapo~ 5 2016 #8
## 4 Prima Taste Singapore Curry Wholegrain La Mian Singapo~ 5 2016 #5
## 5 Tseng Noodl~ Scallion With Sichuan Pepper Flavor Taiwan 5 2016 #9
## 6 Wugudaochang Tomato Beef Brisket Flavor Purple P~ China 5 2016 #7
## 7 A-Sha Dry N~ Veggie Noodle Tomato Noodle With Vi~ Taiwan 5 2015 #~
## 8 MyKuali Penang Hokkien Prawn Noodle (New Im~ Malaysia 5 2015 #7
## 9 CarJEN Nyonya Curry Laksa Malaysia 5 2015 #4
## 10 Maruchan Gotsumori Sauce Yakisoba Japan 5 2015 #9
## # ... with 27 more rows
===========================================================================================
Extended SHaslett-607-Tidyverse-assignment.Rmd By Forhad Akbar
Added dplyr function group_by().
group_by capability tutorial with tally
Description: Using group_by and tally we can find count of category member
Usage: group_by(data, …)
Example: Find the count by Ramen Style
## # A tibble: 8 x 2
## Style n
## <fct> <int>
## 1 "" 2
## 2 Bar 1
## 3 Bowl 481
## 4 Box 6
## 5 Can 1
## 6 Cup 450
## 7 Pack 1531
## 8 Tray 108
Added tidyverse function ggplot().
ggplot(style, aes(x = reorder(Style,n), y = n)) +
geom_bar(stat = "identity", position = position_dodge(), fill="steelblue") +
geom_text(aes(label = n), vjust = .5, hjust = 1, position = position_dodge(width = 0.9), color = "black") +
ggtitle("Ramen Style By Count") +
xlab("Style") + ylab("Count") +
coord_flip()