Looking at numerous restaurants in Texas, there are over 5,000 variables to observe! Below are five visualizations available that look at a few different factors and an analysis that makes assumptions based on what the data is showing us. What is provided across the numerous restaurants vary in cuisine, pricing, and style.
# I paste some code in here, maybe to identify all of the libraries I need to use and then to read in the data and to report some details about the data.
library(dplyr)
library(ggplot2)
library(lubridate)
library(scales)
library(RColorBrewer)
library(ggthemes)
library(data.table)
library(DescTools)
library(tidyverse)
library(stringr)
restaurantmenuchanges_csv_1_ <- read_csv("C:/Users/BossLadyCharles/Desktop/Daniiii/DS736/restaurantmenuchanges.csv (1).zip")
getwd()
## [1] "C:/Users/BossLadyCharles/Desktop/Daniiii/DS736"
df <- (restaurantmenuchanges_csv_1_)
df
## # A tibble: 5,000 × 19
## createdOn changeOperation market city menuItemName
## <dttm> <chr> <chr> <chr> <chr>
## 1 2022-10-27 23:07:19 Create Houston, TX Houston 1% Low Fat Milk (110…
## 2 2022-10-27 23:07:26 Delete Houston, TX Houston 10-Piece Boneless Wi…
## 3 2022-10-27 23:07:26 Delete Houston, TX Houston 8-Piece Boneless Win…
## 4 2022-10-27 23:07:33 Delete Houston, TX Houston LUNCH DUET
## 5 2022-10-27 23:07:33 Delete Houston, TX Houston FILET MIGNON
## 6 2022-10-27 23:07:33 Delete Houston, TX Houston HIBACHI STEAK
## 7 2022-10-27 23:07:33 Delete Houston, TX Houston SPICY HIBACHI CHICKEN
## 8 2022-10-27 23:07:33 Delete Houston, TX Houston HIBACHI CHICKEN
## 9 2022-10-27 23:07:33 Delete Houston, TX Houston HIBACHI SHRIMP
## 10 2022-10-27 23:07:33 Delete Houston, TX Houston BEEF JULIENNE
## # ℹ 4,990 more rows
## # ℹ 14 more variables: menuItemDescription <chr>, menuItemCurrentPrice <chr>,
## # menuItemPreviousPrice <chr>, menuItemImageUrl <chr>,
## # menuItemCategory <chr>, menuItemAverageRating <chr>,
## # menuItemRatingCount <chr>, restaurantName <chr>,
## # restaurantDescription <chr>, restaurantAddress <chr>,
## # restaurantImageUrl <chr>, restaurantPriceRange <chr>, …
colnames(df)
## [1] "createdOn" "changeOperation" "market"
## [4] "city" "menuItemName" "menuItemDescription"
## [7] "menuItemCurrentPrice" "menuItemPreviousPrice" "menuItemImageUrl"
## [10] "menuItemCategory" "menuItemAverageRating" "menuItemRatingCount"
## [13] "restaurantName" "restaurantDescription" "restaurantAddress"
## [16] "restaurantImageUrl" "restaurantPriceRange" "restaurantLatitude"
## [19] "restaurantLongitude"
head(df)
## # A tibble: 6 × 19
## createdOn changeOperation market city menuItemName
## <dttm> <chr> <chr> <chr> <chr>
## 1 2022-10-27 23:07:19 Create Houston, TX Houston 1% Low Fat Milk (110 …
## 2 2022-10-27 23:07:26 Delete Houston, TX Houston 10-Piece Boneless Win…
## 3 2022-10-27 23:07:26 Delete Houston, TX Houston 8-Piece Boneless Wing…
## 4 2022-10-27 23:07:33 Delete Houston, TX Houston LUNCH DUET
## 5 2022-10-27 23:07:33 Delete Houston, TX Houston FILET MIGNON
## 6 2022-10-27 23:07:33 Delete Houston, TX Houston HIBACHI STEAK
## # ℹ 14 more variables: menuItemDescription <chr>, menuItemCurrentPrice <chr>,
## # menuItemPreviousPrice <chr>, menuItemImageUrl <chr>,
## # menuItemCategory <chr>, menuItemAverageRating <chr>,
## # menuItemRatingCount <chr>, restaurantName <chr>,
## # restaurantDescription <chr>, restaurantAddress <chr>,
## # restaurantImageUrl <chr>, restaurantPriceRange <chr>,
## # restaurantLatitude <chr>, restaurantLongitude <chr>
tail(df)
## # A tibble: 6 × 19
## createdOn changeOperation market city menuItemName
## <dttm> <chr> <chr> <chr> <chr>
## 1 2022-11-07 21:21:02 Delete Houston, TX Houston Toffee
## 2 2022-11-07 21:21:02 Create Houston, TX Houston Texas Frito Brittle
## 3 2022-11-07 21:21:02 Create Houston, TX Houston Almonds
## 4 2022-11-07 21:21:02 Delete Houston, TX Houston Banana Chia Pudding
## 5 2022-11-07 21:21:03 Create Houston, TX Houston Bluebonnet Raspberry …
## 6 2022-11-07 21:21:03 Create Houston, TX Houston Cowgirl Cotton Candy …
## # ℹ 14 more variables: menuItemDescription <chr>, menuItemCurrentPrice <chr>,
## # menuItemPreviousPrice <chr>, menuItemImageUrl <chr>,
## # menuItemCategory <chr>, menuItemAverageRating <chr>,
## # menuItemRatingCount <chr>, restaurantName <chr>,
## # restaurantDescription <chr>, restaurantAddress <chr>,
## # restaurantImageUrl <chr>, restaurantPriceRange <chr>,
## # restaurantLatitude <chr>, restaurantLongitude <chr>
# I paste some code in here if needed. This might be manipulation of the data after reading it in, to remove bad data, for example.
Analysis: This is a Bar Chart using function geom_col, displaying the top 10 prices of various menu items across Texas. Above each bar, the ‘y’ value, our exact number of menu items, can be seen. The number represents how many different menu items across 1,000+ options, are priced exactly the same as the value shown below in the ‘x’ value, price ($).
It can be seen that most items across the data cost $13.00 with 74 of the results beng the same price. This tells us that around $13.00 is a fine price point people are willing to pay since so many restaurants even have an item that price. $13.00 tends to be the mean price many stick with as most menus span $2.00 to $26.00 items in cost.
price_current <- df |>
filter(!is.na(`menuItemCurrentPrice`)) |>
count(`menuItemCurrentPrice`) |>
arrange(desc(n)) |>
head(10)
price_current
## # A tibble: 10 × 2
## menuItemCurrentPrice n
## <chr> <int>
## 1 $13.00 74
## 2 $15.00 71
## 3 $8.00 69
## 4 $8.99 62
## 5 $16.00 55
## 6 $3.00 55
## 7 $2.49 52
## 8 $7.00 51
## 9 $7.99 50
## 10 $14.00 48
ggplot(price_current, aes(x = reorder(menuItemCurrentPrice, -n), y = n)) +
geom_col(fill = "violet", colour = "darkviolet") +
geom_text(aes(label = n),
vjust = -0.5, # Nudge text above the bar (- is up)
color = "purple4",
fontface = "bold") +
labs(title = "Top Menu Item Prices",
x = "Price ($)",
y = "Count of Menu Items") +
theme_minimal()
Analysis: This Dumbbell plot, colored with geom_point marks with the colors red and purple, as seen in the key under the title, with the different price points of Asian style restaurants in Texas. Most prices fluctuate in a growth trend, meaning items tend to cost more than they used to as seen with Izakaya, Hando, and Fukuoka. There are some cases though, when prices decrease as seen in Kim Son Cafe and thhe Flying Fish where items prices have decreased by about $2 or more. This likely indicates those restaurants are more popular and can afford to lower how much they charge customers to enjoy certain food items.
df <- data.frame(restaurantName = c("Hando", "Fukuoka Sushi Bar & Grill", "Flying Fish", "Kim Son Cafe",
"Izakaya"),
menuItemPreviousPrice = c(15.00, 12.59, 19.95, 15.60, 6.00),
menuItemCurrentPrice = c(20.00, 13.75, 15.99, 13.00, 9.00))
ggplot(df, aes(y = restaurantName)) +
geom_segment(aes(x = menuItemPreviousPrice, xend = menuItemCurrentPrice,
y = restaurantName, yend = restaurantName),
color = "grey", size = 1.5) +
geom_point(aes(x = menuItemPreviousPrice), color = "#F02036", size = 4) +
geom_point(aes(x = menuItemCurrentPrice), color = "purple4", size = 4) +
theme_minimal() +
labs(
title = "Restaurant Price Fluctuation Comparison",
subtitle = "Red = Previous Price | Purple = Current Price",
x = "Price ($)",
y = "Asian Style Restaurant") +
scale_x_continuous(labels = scales::dollar_format())
Analysis: This pie chart shows the top five most popular menu items of the dataset which displays numerous restaurants in Texas. It can be seen from the categories the dishes are under, “beverages”, “sides”, “drinks”, “breakfast”, and “breakfast tacos”, people value quick snacks and breakfast items the most. This means, many of these restaurants, though serving breakfast, lunch and sometimes dinner items depending on hours of operation, really attract their audience from those up in the early hours of the day searching for breakfast items to get them going.
This pattern is notable in how drinks and sides are bought the most at 23% and then, under Breakfast, the taquito and cheese is the next most popular at 19% of customers indulging in that favored item because remember, this is five most popular items of over 1,000+ menu items.
restaurantmenuchanges_csv_1_ <- read_csv("C:/Users/BossLadyCharles/Desktop/Daniiii/DS736/restaurantmenuchanges.csv (1).zip")
my_menu_data <- read_csv("C:/Users/BossLadyCharles/Desktop/Daniiii/DS736/restaurantmenuchanges.csv (1).zip")
if (!is.data.frame(my_menu_data)) {
stop("The data did not load! Check your file name and path.")}
pie_data <- my_menu_data %>%
dplyr::filter(menuItemCategory != "Beverages") %>%
dplyr::filter(!is.na(menuItemCategory)) %>%
dplyr::group_by(menuItemCategory) %>%
dplyr::summarize(
category_count = n(),
top_dish = names(which.max(table(menuItemName)))) %>%
dplyr::arrange(desc(category_count)) %>%
head(5) %>%
mutate(percent_val = category_count / sum(category_count),
label_text = str_wrap(paste0(menuItemCategory, "\n", top_dish, "\n", percent(percent_val, accuracy = 1)), width = 12))
ggplot(pie_data, aes(x = "", y = category_count, fill = menuItemCategory)) +
geom_col(width = 1, color = "white") +
coord_polar("y", start = 0) +
geom_text(aes(label = label_text),
position = position_stack(vjust = 0.5),
color = "white",
size = 3.2,
fontface = "bold",
lineheight = 1.2) +
scale_fill_manual(values = c("#E74c7c", "#FF1177", "#FF1155", "#AA0077", "#AA0055")) +
theme_void() +
labs(title = "Top 5 Menu Categories & The Star Dishes") +
theme(legend.position = "none")
Analysis: Here is an area plot, made with function geom_area and marks the three restaurants in all 5,000 variables of data that have the most menu items. It can be seen from the y-axis that Black Walnut Cafe has the most with over 200 varying items at different price points that customers to choose from. Different price points are noted in the x-axis as the area plot spans from cheapest item to most expensive. There are some outliers as items over $30.00 is not common for any of these restaurants. Overlap can also be seen between each area chart as the amount of items each restaurant offers changes, but the cost of items tends to be the same across these three plotted. Which restaurant has what can be noted from the legend on the right hand side, each restaurant having their own corresponding color.
top_3_restaurants <- my_menu_data %>%
count(restaurantName) %>%
arrange(desc(n)) %>%
head(3) %>%
pull(restaurantName)
top_3_restaurants
## [1] "Black Walnut Cafe" "Mo' Better Brews"
## [3] "Fukuoka Sushi Bar & Grill"
area_data <- my_menu_data %>%
filter(restaurantName %in% top_3_restaurants) %>%
mutate(
clean_price = as.numeric(str_remove_all(menuItemCurrentPrice, "[\\$,]"))) %>%
filter(!is.na(clean_price))
ggplot(area_data, aes(x = clean_price, fill = restaurantName)) +
geom_area(stat = "bin", binwidth = 5, alpha = 0.6, position = "stack") +
coord_cartesian(xlim = c(0, 40)) +
scale_fill_manual(values = c("#E74c6c", "#AA0077", "#550055")) +
labs(
title = "Price Distribution of Top 3 Restaurants",
x = "Price ($)",
y = "Number of Menu Items",
fill = "Restaurant") +
theme_minimal()
Analysis: While looking to understand if expensive prices have any correlation to rating, a heat map was made from the area data of a prior area plot that examined the three restaurants with the most menu items against their price points. The lighter the color of the square, the more items are present at that clean price point in the x-axis. It is seen that most items are between $3.00 and $18.00 which allows us to conclude people prefer buying the lower costing items, expensive prices are not an indicator of better ratings. People rate the restaurant for other factors besides expensive prices, this could be how all of them have affordable prices or a large variety of items to choose from. While some restaurants have more than one rating, depending on location in Teas, Mo’ Better Brews has just one at 97 ratig from the public.
heatmap_data <- area_data %>%
filter(!is.na(menuItemRatingCount), !is.na(clean_price)) %>%
filter(clean_price <= 100)
ggplot(heatmap_data, aes(x = clean_price, y = menuItemRatingCount)) +
geom_bin2d(bins = 15) +
scale_fill_viridis_c(option = "inferno") +
facet_wrap(~restaurantName) +
labs(title = "Price vs. Rating by Item Heavy Restaurants",
x = "Price ($)", y = "Rating Count") +
theme_minimal() +
theme(legend.position = "bottom",
panel.border = element_rect(color = "purple4", fill = NA, size = 0.5),
panel.spacing = unit(1, "lines"),
strip.background = element_rect(fill = "mistyrose", color = "purple4"),
strip.text = element_text(face = "bold"))
Thank you for viewing these Data Visualizations!