GGPLOT

Is a package in ggplot that allows us to visualize our data and make it clearer when presenting our findings. In this document I plan to show how to use ggplot features that are not just bar plots or line charts. The data I will be using to demo the different features of ggplot is one that rates Ramen. It allows me to show how ggplot graphs discrete and continuous variables.

Read Data

ramen_ratings <- read.csv("https://raw.githubusercontent.com/moiyajosephs/Data607-Project2/main/ramen-ratings.csv")

Plot Ramen

Geom Count

When plotting two discrete variables geom_count is recommended. Geom_Count is a variation of geom_point and maps the frequency for each observation. It has a legend for the dots, the larger the dot the more of that value there is in the data.

ggplot(ramen_ratings, aes(Brand, Style, color = Style)) + geom_count() + theme(axis.text.x = element_blank() )

Geom Col

Geom col is identical to geom_bar since it shows bar charts. The difference with geom_col is that it allows you to plot the bars relative to the data instead of the number of types x occurs like geom_bar.

Below I am able to plot the brand of ramen and the ratings they received.

ramen_ratings$Stars <- as.numeric(ramen_ratings$Stars)
## Warning: NAs introduced by coercion
ggplot(head(ramen_ratings,10), aes(Brand, Stars, fill = Style) ) + geom_col() +  theme(axis.text.x = element_text(angle = -30, vjust = 1, hjust = 0))

Geom Map

This is the most interesting of the data and if data has regions specified, you can map where each point is. The function map_data allows you to get the longitude and latitude of a region specified, like state, or country. The ramen data is international, luckily we can also set map_data to world, so it collects all the coordinates of countries in the world.

map <- map_data("world")

This data is very large, however, and we do not need all the coordinates for the same country. So I used the distinct function in order to get unique row values for each country in map_data and call it map_regions.set.

map.regions.set <- distinct(map, region, .keep_all = TRUE)

Now that I have the unique regions, I can left join it with the map data where the country equals the region. That way we have the ramen information from the original dataset, joined with the coordinates for the region of each plot,

ramen.map <- left_join(ramen_ratings, map.regions.set, by = c("Country"="region"))

ggplot() +
  geom_map(
    data = map, map = map,
    aes(long, lat, map_id = region),
    color = "white", fill = "lightgray", size = 0.1
  ) +
  geom_point(
    data = ramen.map,
    aes(long, lat, color = Style),
    alpha = 0.7
  )
## Warning: Ignoring unknown aesthetics: x, y
## Warning: Removed 148 rows containing missing values (geom_point).

The map above shows where the style of ramens are located around the world. At a glance, a person could see that pack is a popular ramen style.

Conclusion

Ggplot is a very powerful library within tidyverse that allows you to make various visualizations based on your data. With visualizations, data scientists can present any key findings in an easy to understand way.

References

  1. list(title = “Welcome to the {tidyverse}”, author = list(list(given = “Hadley”, family = “Wickham”, role = NULL, email = NULL, comment = NULL), list(given = “Mara”, family = “Averick”, role = NULL, email = NULL, comment = NULL), list(given = “Jennifer”, family = “Bryan”, role = NULL, email = NULL, comment = NULL), list(given = “Winston”, family = “Chang”, role = NULL, email = NULL, comment = NULL), list(given = c(“Lucy”, “D’Agostino”), family = “McGowan”, role = NULL, email = NULL, comment = NULL), list(given = “Romain”, family = “François”, role = NULL, email = NULL, comment = NULL), list(given = “Garrett”, family = “Grolemund”, role = NULL, email = NULL, comment = NULL), list(given = “Alex”, family = “Hayes”, role = NULL, email = NULL, comment = NULL), list(given = “Lionel”, family = “Henry”, role = NULL, email = NULL, comment = NULL), list(given = “Jim”, family = “Hester”, role = NULL, email = NULL, comment = NULL), list(given = “Max”, family = “Kuhn”, role = NULL, email = NULL, comment = NULL), list(given = c(“Thomas”, “Lin”), family = “Pedersen”, role = NULL, email = NULL, comment = NULL), list(given = “Evan”, family = “Miller”, role = NULL, email = NULL, comment = NULL), list(given = c(“Stephan”, “Milton”), family = “Bache”, role = NULL, email = NULL, comment = NULL), list(given = “Kirill”, family = “Müller”, role = NULL, email = NULL, comment = NULL), list(given = “Jeroen”, family = “Ooms”, role = NULL, email = NULL, comment = NULL), list(given = “David”, family = “Robinson”, role = NULL, email = NULL, comment = NULL), list(given = c(“Dana”, “Paige”), family = “Seidel”, role = NULL, email = NULL, comment = NULL), list(given = “Vitalie”, family = “Spinu”, role = NULL, email = NULL, comment = NULL), list(given = “Kohske”, family = “Takahashi”, role = NULL, email = NULL, comment = NULL), list(given = “Davis”, family = “Vaughan”, role = NULL, email = NULL, comment = NULL), list(given = “Claus”, family = “Wilke”, role = NULL, email = NULL, comment = NULL), list(given = “Kara”, family = “Woo”, role = NULL, email = NULL, comment = NULL), list(given = “Hiroaki”, family = “Yutani”, role = NULL, email = NULL, comment = NULL)), year = “2019”, journal = “Journal of Open Source Software”, volume = “4”, number = “43”, pages = “1686”, doi = “10.21105/joss.01686”)

  2. https://www.kaggle.com/datasets/residentmario/ramen-ratings?resource=download

  3. https://rdrr.io/cran/ggplot2/man/map_data.html

  4. https://datavizpyr.com/how-to-make-world-map-with-ggplot2-in-r/

TidyVerse EXTEND Assignment:

head(ramen_ratings)
##   Review..          Brand
## 1     2580      New Touch
## 2     2579       Just Way
## 3     2578         Nissin
## 4     2577        Wei Lih
## 5     2576 Ching's Secret
## 6     2575  Samyang Foods
##                                                       Variety Style     Country
## 1                                   T's Restaurant Tantanmen    Cup       Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles  Pack      Taiwan
## 3                               Cup Noodles Chicken Vegetable   Cup         USA
## 4                               GGE Ramen Snack Tomato Flavor  Pack      Taiwan
## 5                                             Singapore Curry  Pack       India
## 6                                      Kimchi song Song Ramen  Pack South Korea
##   Stars Top.Ten
## 1  3.75        
## 2  1.00        
## 3  2.25        
## 4  2.75        
## 5  3.75        
## 6  4.75

Package Dplyr - for manipulating datasets

Change Name of Columns in R with dplyr rename()

ramen_ratings <- ramen_ratings %>%
  rename(Reviews = Review..,
         Ratings = Stars,
         Top_Ten = Top.Ten
         )

head(ramen_ratings)
##   Reviews          Brand
## 1    2580      New Touch
## 2    2579       Just Way
## 3    2578         Nissin
## 4    2577        Wei Lih
## 5    2576 Ching's Secret
## 6    2575  Samyang Foods
##                                                       Variety Style     Country
## 1                                   T's Restaurant Tantanmen    Cup       Japan
## 2 Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles  Pack      Taiwan
## 3                               Cup Noodles Chicken Vegetable   Cup         USA
## 4                               GGE Ramen Snack Tomato Flavor  Pack      Taiwan
## 5                                             Singapore Curry  Pack       India
## 6                                      Kimchi song Song Ramen  Pack South Korea
##   Ratings Top_Ten
## 1    3.75        
## 2    1.00        
## 3    2.25        
## 4    2.75        
## 5    3.75        
## 6    4.75

Selecting the columns ‘Brand’ and ‘Ratings’ and all rows

summary <- ramen_ratings %>% 
  group_by(Brand) %>% 
  summarise(Ratings = as.integer(mean(Ratings, na.rm = TRUE))) %>% 
  arrange(desc(Ratings)) %>% filter(Ratings == 5)

summary
## # A tibble: 24 x 2
##    Brand                Ratings
##    <chr>                  <int>
##  1 ChoripDong                 5
##  2 Daddy                      5
##  3 Daifuku                    5
##  4 Foodmon                    5
##  5 Higashi                    5
##  6 Jackpot Teriyaki           5
##  7 Kiki Noodle                5
##  8 Kimura                     5
##  9 Komforte Chockolates       5
## 10 MyOri                      5
## # ... with 14 more rows

Package ggplot2 - for Visualization

Using the ‘ggplot’ for visualizing Ratings=5

ggplot(summary, aes(x = Ratings, y = Brand, fill = Ratings)) +
  geom_col(position = "dodge")

# glimpse(ramen_ratings)

Extracting the ‘Top_Ten’ column

top_ten_df <- filter(ramen_ratings, Top_Ten != "")

head(top_ten_df)
##   Reviews         Brand                                         Variety Style
## 1    1964          MAMA            Instant Noodles Coconut Milk Flavour  Pack
## 2    1947   Prima Taste              Singapore Laksa Wholegrain La Mian  Pack
## 3    1925         Prima               Juzz's Mee Creamy Chicken Flavour  Pack
## 4    1907   Prima Taste              Singapore Curry Wholegrain La Mian  Pack
## 5    1828 Tseng Noodles            Scallion With Sichuan Pepper  Flavor  Pack
## 6    1689  Wugudaochang Tomato Beef Brisket Flavor Purple Potato Noodle  Pack
##     Country Ratings  Top_Ten
## 1   Myanmar       5 2016 #10
## 2 Singapore       5  2016 #1
## 3 Singapore       5  2016 #8
## 4 Singapore       5  2016 #5
## 5    Taiwan       5  2016 #9
## 6     China       5  2016 #7

Package tidyr:

Separate a character column into multiple columns with a regular expression or numeric locations

Split Top_Ten column into two: Year and Ranking

top_ten_df <- top_ten_df %>% separate(Top_Ten, c("Year", "Ranking"))  
  

head(top_ten_df)
##   Reviews         Brand                                         Variety Style
## 1    1964          MAMA            Instant Noodles Coconut Milk Flavour  Pack
## 2    1947   Prima Taste              Singapore Laksa Wholegrain La Mian  Pack
## 3    1925         Prima               Juzz's Mee Creamy Chicken Flavour  Pack
## 4    1907   Prima Taste              Singapore Curry Wholegrain La Mian  Pack
## 5    1828 Tseng Noodles            Scallion With Sichuan Pepper  Flavor  Pack
## 6    1689  Wugudaochang Tomato Beef Brisket Flavor Purple Potato Noodle  Pack
##     Country Ratings Year Ranking
## 1   Myanmar       5 2016      10
## 2 Singapore       5 2016       1
## 3 Singapore       5 2016       8
## 4 Singapore       5 2016       5
## 5    Taiwan       5 2016       9
## 6     China       5 2016       7
filter(top_ten_df, Ranking == 10)
##   Reviews            Brand
## 1    1964             MAMA
## 2    1638 A-Sha Dry Noodle
## 3    1471             Mama
## 4    1302             Mama
## 5     608             Koka
##                                                      Variety Style   Country
## 1                       Instant Noodles Coconut Milk Flavour  Pack   Myanmar
## 2 Veggie Noodle Tomato Noodle With Vine Ripened Tomato Sauce  Pack    Taiwan
## 3   Instant Noodles Shrimp Creamy Tom Yum Flavour Jumbo Pack  Pack  Thailand
## 4             Instant Noodles Yentafo Tom Yum Mohfai Flavour  Pack  Thailand
## 5                                         Spicy Black Pepper  Pack Singapore
##   Ratings Year Ranking
## 1       5 2016      10
## 2       5 2015      10
## 3       5 2013      10
## 4       5 2014      10
## 5       5 2012      10

Using the ‘ggplot’ for visualizing Ranking

ggplot(top_ten_df, aes(x = Ranking, y = Brand, fill = Ranking)) +
  geom_col(position = "dodge")