Restaurant Menus Dataset

Looking at numerous restaurants in Texas, there are over 5,000 variables to observe! Below are five visualizations available that look at a few different factors and an analysis that makes assumptions based on what the data is showing us. What is provided across the numerous restaurants vary in cuisine, pricing, and style.

# I paste some code in here, maybe to identify all of the libraries I need to use and then to read in the data and to report some details about the data. 
library(dplyr)
library(ggplot2)
library(lubridate)
library(scales)
library(RColorBrewer)
library(ggthemes)
library(data.table)
library(DescTools)
library(tidyverse)
library(stringr)

restaurantmenuchanges_csv_1_ <- read_csv("C:/Users/BossLadyCharles/Desktop/Daniiii/DS736/restaurantmenuchanges.csv (1).zip")
getwd()
## [1] "C:/Users/BossLadyCharles/Desktop/Daniiii/DS736"
df <- (restaurantmenuchanges_csv_1_)
df
## # A tibble: 5,000 × 19
##    createdOn           changeOperation market      city    menuItemName         
##    <dttm>              <chr>           <chr>       <chr>   <chr>                
##  1 2022-10-27 23:07:19 Create          Houston, TX Houston 1% Low Fat Milk (110…
##  2 2022-10-27 23:07:26 Delete          Houston, TX Houston 10-Piece Boneless Wi…
##  3 2022-10-27 23:07:26 Delete          Houston, TX Houston 8-Piece Boneless Win…
##  4 2022-10-27 23:07:33 Delete          Houston, TX Houston LUNCH DUET           
##  5 2022-10-27 23:07:33 Delete          Houston, TX Houston FILET MIGNON         
##  6 2022-10-27 23:07:33 Delete          Houston, TX Houston HIBACHI STEAK        
##  7 2022-10-27 23:07:33 Delete          Houston, TX Houston SPICY HIBACHI CHICKEN
##  8 2022-10-27 23:07:33 Delete          Houston, TX Houston HIBACHI CHICKEN      
##  9 2022-10-27 23:07:33 Delete          Houston, TX Houston HIBACHI SHRIMP       
## 10 2022-10-27 23:07:33 Delete          Houston, TX Houston BEEF JULIENNE        
## # ℹ 4,990 more rows
## # ℹ 14 more variables: menuItemDescription <chr>, menuItemCurrentPrice <chr>,
## #   menuItemPreviousPrice <chr>, menuItemImageUrl <chr>,
## #   menuItemCategory <chr>, menuItemAverageRating <chr>,
## #   menuItemRatingCount <chr>, restaurantName <chr>,
## #   restaurantDescription <chr>, restaurantAddress <chr>,
## #   restaurantImageUrl <chr>, restaurantPriceRange <chr>, …
colnames(df)
##  [1] "createdOn"             "changeOperation"       "market"               
##  [4] "city"                  "menuItemName"          "menuItemDescription"  
##  [7] "menuItemCurrentPrice"  "menuItemPreviousPrice" "menuItemImageUrl"     
## [10] "menuItemCategory"      "menuItemAverageRating" "menuItemRatingCount"  
## [13] "restaurantName"        "restaurantDescription" "restaurantAddress"    
## [16] "restaurantImageUrl"    "restaurantPriceRange"  "restaurantLatitude"   
## [19] "restaurantLongitude"
head(df)
## # A tibble: 6 × 19
##   createdOn           changeOperation market      city    menuItemName          
##   <dttm>              <chr>           <chr>       <chr>   <chr>                 
## 1 2022-10-27 23:07:19 Create          Houston, TX Houston 1% Low Fat Milk (110 …
## 2 2022-10-27 23:07:26 Delete          Houston, TX Houston 10-Piece Boneless Win…
## 3 2022-10-27 23:07:26 Delete          Houston, TX Houston 8-Piece Boneless Wing…
## 4 2022-10-27 23:07:33 Delete          Houston, TX Houston LUNCH DUET            
## 5 2022-10-27 23:07:33 Delete          Houston, TX Houston FILET MIGNON          
## 6 2022-10-27 23:07:33 Delete          Houston, TX Houston HIBACHI STEAK         
## # ℹ 14 more variables: menuItemDescription <chr>, menuItemCurrentPrice <chr>,
## #   menuItemPreviousPrice <chr>, menuItemImageUrl <chr>,
## #   menuItemCategory <chr>, menuItemAverageRating <chr>,
## #   menuItemRatingCount <chr>, restaurantName <chr>,
## #   restaurantDescription <chr>, restaurantAddress <chr>,
## #   restaurantImageUrl <chr>, restaurantPriceRange <chr>,
## #   restaurantLatitude <chr>, restaurantLongitude <chr>
tail(df)
## # A tibble: 6 × 19
##   createdOn           changeOperation market      city    menuItemName          
##   <dttm>              <chr>           <chr>       <chr>   <chr>                 
## 1 2022-11-07 21:21:02 Delete          Houston, TX Houston Toffee                
## 2 2022-11-07 21:21:02 Create          Houston, TX Houston Texas Frito Brittle   
## 3 2022-11-07 21:21:02 Create          Houston, TX Houston Almonds               
## 4 2022-11-07 21:21:02 Delete          Houston, TX Houston Banana Chia Pudding   
## 5 2022-11-07 21:21:03 Create          Houston, TX Houston Bluebonnet Raspberry …
## 6 2022-11-07 21:21:03 Create          Houston, TX Houston Cowgirl Cotton Candy …
## # ℹ 14 more variables: menuItemDescription <chr>, menuItemCurrentPrice <chr>,
## #   menuItemPreviousPrice <chr>, menuItemImageUrl <chr>,
## #   menuItemCategory <chr>, menuItemAverageRating <chr>,
## #   menuItemRatingCount <chr>, restaurantName <chr>,
## #   restaurantDescription <chr>, restaurantAddress <chr>,
## #   restaurantImageUrl <chr>, restaurantPriceRange <chr>,
## #   restaurantLatitude <chr>, restaurantLongitude <chr>

Restaurant Menus Visualization

# I paste some code in here if needed. This might be manipulation of the data after reading it in, to remove bad data, for example.

Bar Chart

Analysis: This is a Bar Chart using function geom_col, displaying the top 10 prices of various menu items across Texas. Above each bar, the ‘y’ value, our exact number of menu items, can be seen. The number represents how many different menu items across 1,000+ options, are priced exactly the same as the value shown below in the ‘x’ value, price ($).

It can be seen that most items across the data cost $13.00 with 74 of the results beng the same price. This tells us that around $13.00 is a fine price point people are willing to pay since so many restaurants even have an item that price. $13.00 tends to be the mean price many stick with as most menus span $2.00 to $26.00 items in cost.

price_current <- df |>
  filter(!is.na(`menuItemCurrentPrice`)) |> 
  count(`menuItemCurrentPrice`) |>
  arrange(desc(n)) |>
  head(10)
price_current
## # A tibble: 10 × 2
##    menuItemCurrentPrice     n
##    <chr>                <int>
##  1 $13.00                  74
##  2 $15.00                  71
##  3 $8.00                   69
##  4 $8.99                   62
##  5 $16.00                  55
##  6 $3.00                   55
##  7 $2.49                   52
##  8 $7.00                   51
##  9 $7.99                   50
## 10 $14.00                  48
ggplot(price_current, aes(x = reorder(menuItemCurrentPrice, -n), y = n)) +
  geom_col(fill = "violet", colour = "darkviolet") +
  geom_text(aes(label = n), 
            vjust = -0.5,           # Nudge text above the bar (- is up)
            color = "purple4",   
            fontface = "bold") +
  labs(title = "Top Menu Item Prices", 
       x = "Price ($)", 
       y = "Count of Menu Items") +
theme_minimal()

Dumbbell Plot

Analysis: This Dumbbell plot, colored with geom_point marks with the colors red and purple, as seen in the key under the title, with the different price points of Asian style restaurants in Texas. Most prices fluctuate in a growth trend, meaning items tend to cost more than they used to as seen with Izakaya, Hando, and Fukuoka. There are some cases though, when prices decrease as seen in Kim Son Cafe and thhe Flying Fish where items prices have decreased by about $2 or more. This likely indicates those restaurants are more popular and can afford to lower how much they charge customers to enjoy certain food items.

df <- data.frame(restaurantName = c("Hando", "Fukuoka Sushi Bar & Grill", "Flying Fish", "Kim Son Cafe",
                                    "Izakaya"),
  menuItemPreviousPrice = c(15.00, 12.59, 19.95, 15.60, 6.00), 
  menuItemCurrentPrice = c(20.00, 13.75, 15.99, 13.00, 9.00))


ggplot(df, aes(y = restaurantName)) +
  geom_segment(aes(x = menuItemPreviousPrice, xend = menuItemCurrentPrice, 
                   y = restaurantName, yend = restaurantName), 
               color = "grey", size = 1.5) +
  geom_point(aes(x = menuItemPreviousPrice), color = "#F02036", size = 4) +
  geom_point(aes(x = menuItemCurrentPrice), color = "purple4", size = 4) +
  theme_minimal() +
  labs(
    title = "Restaurant Price Fluctuation Comparison",
    subtitle = "Red = Previous Price  |  Purple = Current Price",
    x = "Price ($)",
    y = "Asian Style Restaurant") +
  scale_x_continuous(labels = scales::dollar_format())

Pie Chart

Analysis: This pie chart shows the top five most popular menu items of the dataset which displays numerous restaurants in Texas. It can be seen from the categories the dishes are under, “beverages”, “sides”, “drinks”, “breakfast”, and “breakfast tacos”, people value quick snacks and breakfast items the most. This means, many of these restaurants, though serving breakfast, lunch and sometimes dinner items depending on hours of operation, really attract their audience from those up in the early hours of the day searching for breakfast items to get them going.

This pattern is notable in how drinks and sides are bought the most at 23% and then, under Breakfast, the taquito and cheese is the next most popular at 19% of customers indulging in that favored item because remember, this is five most popular items of over 1,000+ menu items.

restaurantmenuchanges_csv_1_ <- read_csv("C:/Users/BossLadyCharles/Desktop/Daniiii/DS736/restaurantmenuchanges.csv (1).zip")

my_menu_data <- read_csv("C:/Users/BossLadyCharles/Desktop/Daniiii/DS736/restaurantmenuchanges.csv (1).zip")

if (!is.data.frame(my_menu_data)) {
  stop("The data did not load! Check your file name and path.")}

pie_data <- my_menu_data %>%
 dplyr::filter(menuItemCategory != "Beverages") %>% 
  dplyr::filter(!is.na(menuItemCategory)) %>% 
  dplyr::group_by(menuItemCategory) %>%
  dplyr::summarize(
    category_count = n(),
    top_dish = names(which.max(table(menuItemName)))) %>%
  dplyr::arrange(desc(category_count)) %>%
  head(5) %>% 
  mutate(percent_val = category_count / sum(category_count),
    label_text = str_wrap(paste0(menuItemCategory, "\n", top_dish, "\n", percent(percent_val, accuracy = 1)), width = 12))


ggplot(pie_data, aes(x = "", y = category_count, fill = menuItemCategory)) +
  geom_col(width = 1, color = "white") +
  coord_polar("y", start = 0) +
  geom_text(aes(label = label_text), 
            position = position_stack(vjust = 0.5), 
            color = "white", 
            size = 3.2, 
            fontface = "bold",
          lineheight = 1.2) +
  scale_fill_manual(values = c("#E74c7c", "#FF1177", "#FF1155", "#AA0077", "#AA0055")) +
  theme_void() +
  labs(title = "Top 5 Menu Categories & The Star Dishes") +
  theme(legend.position = "none")

Area Plot

Analysis: Here is an area plot, made with function geom_area and marks the three restaurants in all 5,000 variables of data that have the most menu items. It can be seen from the y-axis that Black Walnut Cafe has the most with over 200 varying items at different price points that customers to choose from. Different price points are noted in the x-axis as the area plot spans from cheapest item to most expensive. There are some outliers as items over $30.00 is not common for any of these restaurants. Overlap can also be seen between each area chart as the amount of items each restaurant offers changes, but the cost of items tends to be the same across these three plotted. Which restaurant has what can be noted from the legend on the right hand side, each restaurant having their own corresponding color.

top_3_restaurants <- my_menu_data %>%
  count(restaurantName) %>%
  arrange(desc(n)) %>%
  head(3) %>%
  pull(restaurantName)
top_3_restaurants
## [1] "Black Walnut Cafe"         "Mo' Better Brews"         
## [3] "Fukuoka Sushi Bar & Grill"
area_data <- my_menu_data %>%
  filter(restaurantName %in% top_3_restaurants) %>%
  mutate(
    clean_price = as.numeric(str_remove_all(menuItemCurrentPrice, "[\\$,]"))) %>%
  filter(!is.na(clean_price))


ggplot(area_data, aes(x = clean_price, fill = restaurantName)) +
  geom_area(stat = "bin", binwidth = 5, alpha = 0.6, position = "stack") +
  coord_cartesian(xlim = c(0, 40)) +
  scale_fill_manual(values = c("#E74c6c", "#AA0077", "#550055")) +
  labs(
    title = "Price Distribution of Top 3 Restaurants",
    x = "Price ($)",
    y = "Number of Menu Items",
    fill = "Restaurant") +
  theme_minimal()

Heat Map

Analysis: While looking to understand if expensive prices have any correlation to rating, a heat map was made from the area data of a prior area plot that examined the three restaurants with the most menu items against their price points. The lighter the color of the square, the more items are present at that clean price point in the x-axis. It is seen that most items are between $3.00 and $18.00 which allows us to conclude people prefer buying the lower costing items, expensive prices are not an indicator of better ratings. People rate the restaurant for other factors besides expensive prices, this could be how all of them have affordable prices or a large variety of items to choose from. While some restaurants have more than one rating, depending on location in Teas, Mo’ Better Brews has just one at 97 ratig from the public.

heatmap_data <- area_data %>%
  filter(!is.na(menuItemRatingCount), !is.na(clean_price)) %>%
  filter(clean_price <= 100)


ggplot(heatmap_data, aes(x = clean_price, y = menuItemRatingCount)) +
   geom_bin2d(bins = 15) + 
  scale_fill_viridis_c(option = "inferno") +
  facet_wrap(~restaurantName) +
  labs(title = "Price vs. Rating by Item Heavy Restaurants",
  x = "Price ($)", y = "Rating Count") +
  theme_minimal() +
  theme(legend.position = "bottom",
    panel.border = element_rect(color = "purple4", fill = NA, size = 0.5),
    panel.spacing = unit(1, "lines"),
    strip.background = element_rect(fill = "mistyrose", color = "purple4"),
    strip.text = element_text(face = "bold"))

Wrap up

Thank you for viewing these Data Visualizations!