The fast food chain industry is growing more than ever before. Globalization and technology advancements have brought together different cuisines of the world. Also, drive thru and take away are gaining popularity in fast food industries due to busy schedules. Data is a key asset for food industries like any other business where data is used for both macro and granular analyses for different levels i.e. customer level, store level and so on. Food chain industries are now relying on data to find most popular dish, optimizing inventory and food storage, attracting new customers with discounts, providing customized offers for repetitive customers, optimizing menu prices and many more.
This project performs concrete analysis on data available for a famous fast-food chain in USA across 47 locations and provides efficient and effective strategies to be taken by store manager to maximize profits.
Goal:
In this project, we will utilize the transaction data available for the fast food chain across different stores in USA. The primary motive of analyses is to maximize profit. However, this project aims to come up with effective business decisions by looking at trends and patterns in data. The most efficient way to find pattern is visualization. Having said that, this project will help to explore patterns hidden in the available dataset through effective visualization.
Once patterns are discovered, and strategies are made based on fact, wise decision would be to implement strategies in one store and perform consistent process of experimentation before implementing those to other stores. Depending on success of that stores, the strategies could be applied to other stores. That way, it can save operation cost and maximize profit.
In this analysis we will focus more on store level data. We will explore characteristics for each store, reveal patterns and trends. The goal is to maximize profit by increasing sales, and lowering operation cost and investory waste.
Data Characteristics
531,503 records
columns:
Constraints and assumptions
fastfood chain data
library("plotly")
library("tidyverse")
library("data.table")
library("gridExtra")
library("knitr")
library("gganimate")
library("maps")
library("lubridate")
library("treemap")
library("treemapify")
# Load athletes_events data
data <- read_csv("Data/fastfood_dataset_challenege.csv")
glimpse(data)## Observations: 531,503
## Variables: 17
## $ order_id <dbl> 341643, 344179, 463211, 357213, 466331, ...
## $ customer_id <dbl> 125549, 322281, 124745, 285968, 123599, ...
## $ date_created <dttm> 2018-01-07 00:42:57, 2018-01-09 23:02:4...
## $ year <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018...
## $ month <dbl> 1, 1, 5, 1, 5, 5, 1, 1, 3, 3, 3, 2, 2, 2...
## $ item_no <dbl> 360, 380, 2010, 1130, 160, 421, 2011, 20...
## $ price <dbl> 10.99, 7.79, 6.89, 2.49, 7.59, 13.98, 9....
## $ qty <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ order_discounts_total <dbl> 2.98, 7.98, 0.00, 0.00, 0.00, 4.09, 0.00...
## $ line_discounts_total <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ tax <dbl> 0.70, 0.60, 1.53, 1.37, 0.58, 0.63, 2.14...
## $ disctotal <dbl> 0.00, 2.70, 0.00, 0.00, 0.00, 4.09, 0.00...
## $ order_total <dbl> 11.690000, 9.979999, 25.700001, 22.91000...
## $ gender <chr> "Male", "Female", "Female", "Male", "Fem...
## $ location_no <dbl> 139, 139, 139, 139, 139, 139, 139, 139, ...
## $ postalcode <chr> "06111", "06111", "06111", "06111", "061...
## $ store_id <dbl> 1153, 1153, 1153, 1153, 1153, 1153, 1153...
## # A tibble: 6 x 17
## order_id customer_id date_created year month item_no price qty
## <dbl> <dbl> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 341643 125549 2018-01-07 00:42:57 2018 1 360 11.0 1
## 2 344179 322281 2018-01-09 23:02:43 2018 1 380 7.79 1
## 3 463211 124745 2018-05-11 22:30:42 2018 5 2010 6.89 1
## 4 357213 285968 2018-01-23 23:36:58 2018 1 1130 2.49 1
## 5 466331 123599 2018-05-15 16:34:02 2018 5 160 7.59 1
## 6 483080 129856 2018-05-30 16:20:31 2018 5 421 14.0 1
## # ... with 9 more variables: order_discounts_total <dbl>,
## # line_discounts_total <dbl>, tax <dbl>, disctotal <dbl>,
## # order_total <dbl>, gender <chr>, location_no <dbl>, postalcode <chr>,
## # store_id <dbl>
In this section of the report, we will visualize some of the overall important characteristics. The sales pattern could be different for various stores. Therefore, web application is developed with the help of R shiny in order to visualize data for each store by selecting shoreId in dropdown menu.
Let’s see overall characteristics for all stores.
Which day of the week, most sales occur?
data_weekly <- data %>%
mutate(day_of_week = weekdays(as.Date(date_created))) %>%
group_by( day_of_week)%>%
summarize(sales_total = sum(order_total))
p <- ggplot(data = data_weekly, aes(x= day_of_week, y= sales_total))+
geom_col(aes(fill= day_of_week))+
scale_x_discrete(limits=c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'))
pHow the order amount varied over time?
order_total_monthyear <- data %>%
mutate(store_id = as.character(store_id)) %>%
group_by(year, month, store_id) %>%
summarize(total= sum(order_total))%>%
arrange(desc(year, month))
p<- ggplot(order_total_monthyear, aes(x= month, y=total, group= store_id, color = store_id))+
geom_line()
ggplotly(p)sales_time <- data %>%
mutate(store_id = as.character(store_id)) %>%
mutate(date_modified = as.Date(date_created)) %>%
group_by(date_modified, store_id) %>%
summarize(total= sum(order_total))%>%
arrange(date_modified)%>%
filter(store_id== '1151')
sales_time## # A tibble: 182 x 3
## # Groups: date_modified [182]
## date_modified store_id total
## <date> <chr> <dbl>
## 1 2018-01-01 1151 2002.
## 2 2018-01-02 1151 3642.
## 3 2018-01-03 1151 2331.
## 4 2018-01-04 1151 2323.
## 5 2018-01-05 1151 3857.
## 6 2018-01-06 1151 5830.
## 7 2018-01-07 1151 1511.
## 8 2018-01-08 1151 2506.
## 9 2018-01-09 1151 3168.
## 10 2018-01-10 1151 2865.
## # ... with 172 more rows
What are the most popular menu items for each store?
order_total_items <- data %>%
mutate(item_no = as.character(item_no)) %>%
group_by(item_no) %>%
summarize(totalqty = sum(qty)) %>%
arrange(desc(totalqty))%>%
slice(10:1)
p<- ggplot(order_total_items, aes(x= item_no, y=totalqty))+
geom_col(fill= "blue")+
coord_flip()
pWho are the regular customers?
regular_customers <- data %>%
mutate(customer_id = as.character(customer_id))%>%
group_by(customer_id, order_id) %>%
summarize(order_total = sum(order_total)) %>%
group_by(customer_id) %>%
summarize(visit= n())%>%
arrange(desc(visit)) %>%
slice(10:1)
p<- ggplot(regular_customers, aes(x= customer_id, y=visit))+
geom_col(fill= "blue")+
coord_flip()
pWhat are the peak hours for each day?
data_weekly_hours <- data %>%
mutate(day_of_week = weekdays(as.Date(date_created))) %>%
mutate(hour = hour(as.POSIXlt(date_created,format="%Y-%m-%dT%H:%M:%S"))) %>%
group_by( day_of_week,hour)%>%
summarize(sales_total = sum(order_total))
p <- ggplot(data = data_weekly_hours, aes(x= day_of_week, y= hour, fill= sales_total))+
geom_bar(stat="identity")+
scale_x_discrete(limits=c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'))+
scale_fill_gradient(low = "yellow", high = "blue")
ggplotly(p)tree map of menu items
items_frquency <- data %>%
mutate(item_no = as.character(item_no)) %>%
filter(price< 50) %>%
group_by(item_no) %>%
summarize(totalqty = sum(qty), price = mean(price)) %>%
arrange(desc(totalqty))
p<- ggplot(data= items_frquency, mapping =aes(area= totalqty, fill= price, label= item_no))+
geom_treemap()+
geom_treemap_text(fontface = "italic", colour = "white", place = "centre",
grow = TRUE)
pGeographic representation of stores
library(zipcode)
data(zipcode)
us<-map_data('state')
order_total_by_location <- data %>%
mutate(store_id = as.character(store_id)) %>%
mutate(zip = as.character(postalcode)) %>%
group_by(store_id, zip, location_no ) %>%
summarize(total = sum(order_total))
order_total_by_location<- merge(order_total_by_location, zipcode, by='zip')
p <- ggplot(order_total_by_location,aes(longitude,latitude)) +
geom_polygon(data=us,aes(x=long,y=lat,group=group),color='gray',fill=NA,alpha=.35)+
geom_point(aes(size = total, color= store_id),alpha=.5) +
xlim(-85,-75)+ylim(35,45)
ggplotly(p)For store level analysis, visit https://kabita-paul.shinyapps.io/StoreAnalysisApp/
Validation is about whether one has built the right product, and verification is about whether one has built the product right. Application algorithm should carry out the visual encoding and interaction design. The performance of the system is significant component of the accessibility and the usability. Performance of the application was considered while creating the coding and system design. Tidiness and neatness of data coding effects the system performance and reproducibility. The variables which may slow down the application were created at the top of the application as a pre-processing portion of the system. Additionally, reproducibility (please see the Github URL in Appendix) and readiness for the production were designed considering the user.
The evaluation of the system by human direct interaction is extremely complex task. Users may be biased and influenced by the experience, prior knowledge, and perspective. Also, cognitive ability may differ from person to person, which can bring about discord in judgment. Individuals may see different than one another, while one may see the cosmetics, others technical details.
Analytical and empirical techniques utilized by Human Computer Interaction (HCI) interacts with users via computers, which should; assess the functionality of the system that fulfills all of the functions requested by the user that defined in the phase of user requirements specification, analyze the system’s effect on the final users.
https://kabita-paul.shinyapps.io/StoreAnalysisApp/
http://rpubs.com/kabitapaul11/fastfoodchain_analysis/