Fast food chain Analysis - Technical Report

0.1 Introduction
0.2 Domain Problem Charactarization
0.3 Data/operation abstraction design
0.4 Encoding/Interaction design
0.5 Key streategies to implement
0.6 Algorithmic design
0.7 User evaluation
0.8 Future work
0.9 Appendix
0.10 References

0.1 Introduction

The fast food chain industry is growing more than ever before. Globalization and technology advancements have brought together different cuisines of the world. Also, drive thru and take away are gaining popularity in fast food industries due to busy schedules. Data is a key asset for food industries like any other business where data is used for both macro and granular analyses for different levels i.e. customer level, store level and so on. Food chain industries are now relying on data to find most popular dish, optimizing inventory and food storage, attracting new customers with discounts, providing customized offers for repetitive customers, optimizing menu prices and many more.

This project performs concrete analysis on data available for a famous fast-food chain in USA across 47 locations and provides efficient and effective strategies to be taken by store manager to maximize profits.

Goal:

In this project, we will utilize the transaction data available for the fast food chain across different stores in USA. The primary motive of analyses is to maximize profit. However, this project aims to come up with effective business decisions by looking at trends and patterns in data. The most efficient way to find pattern is visualization. Having said that, this project will help to explore patterns hidden in the available dataset through effective visualization.

Once patterns are discovered, and strategies are made based on fact, wise decision would be to implement strategies in one store and perform consistent process of experimentation before implementing those to other stores. Depending on success of that stores, the strategies could be applied to other stores. That way, it can save operation cost and maximize profit.

In this analysis we will focus more on store level data. We will explore characteristics for each store, reveal patterns and trends. The goal is to maximize profit by increasing sales, and lowering operation cost and investory waste.

Primary Question: For each store, how sales can be increased?
Sub Question 1: Which day of the week, most sales occur?
Sub Question 2: How the order amount varied over time?
Sub Question 3: What are the most popular menu items for each store?
Sub Question 4: Who are the regular customers and time of visit?
Sub Question 5: What are the peak hours for each day?

Data Characteristics

531,503 records

columns:

order_id
customer_id
date_created
year
month
item_no
price qty
order_discounts_total
line_discounts_total
tax
disctotal
order_total
gender
location_no
postalcode
store_id

Constraints and assumptions

Menu items are encoded as numbers instead of food item names due to data privacy concerns. Better analysis could be done if it would have contained real name for food.

0.2 Domain Problem Charactarization

Maximize profit
Increase sales
Minimize inventory waste
Retain existing/repititive customers
attract new customers

0.3 Data/operation abstraction design

Dataset: fastfood_dataset_challenge
47 stores
40,365 unique customers
791 menu items

fastfood chain data

library("plotly")
library("tidyverse")
library("data.table")
library("gridExtra")
library("knitr")
library("gganimate")
library("maps")
library("lubridate")
library("treemap")
library("treemapify")

# Load athletes_events data 
data <- read_csv("Data/fastfood_dataset_challenege.csv")

glimpse(data)

## Observations: 531,503
## Variables: 17
## $ order_id              <dbl> 341643, 344179, 463211, 357213, 466331, ...
## $ customer_id           <dbl> 125549, 322281, 124745, 285968, 123599, ...
## $ date_created          <dttm> 2018-01-07 00:42:57, 2018-01-09 23:02:4...
## $ year                  <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018...
## $ month                 <dbl> 1, 1, 5, 1, 5, 5, 1, 1, 3, 3, 3, 2, 2, 2...
## $ item_no               <dbl> 360, 380, 2010, 1130, 160, 421, 2011, 20...
## $ price                 <dbl> 10.99, 7.79, 6.89, 2.49, 7.59, 13.98, 9....
## $ qty                   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ order_discounts_total <dbl> 2.98, 7.98, 0.00, 0.00, 0.00, 4.09, 0.00...
## $ line_discounts_total  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ tax                   <dbl> 0.70, 0.60, 1.53, 1.37, 0.58, 0.63, 2.14...
## $ disctotal             <dbl> 0.00, 2.70, 0.00, 0.00, 0.00, 4.09, 0.00...
## $ order_total           <dbl> 11.690000, 9.979999, 25.700001, 22.91000...
## $ gender                <chr> "Male", "Female", "Female", "Male", "Fem...
## $ location_no           <dbl> 139, 139, 139, 139, 139, 139, 139, 139, ...
## $ postalcode            <chr> "06111", "06111", "06111", "06111", "061...
## $ store_id              <dbl> 1153, 1153, 1153, 1153, 1153, 1153, 1153...

head(data)

## # A tibble: 6 x 17
##   order_id customer_id date_created         year month item_no price   qty
##      <dbl>       <dbl> <dttm>              <dbl> <dbl>   <dbl> <dbl> <dbl>
## 1   341643      125549 2018-01-07 00:42:57  2018     1     360 11.0      1
## 2   344179      322281 2018-01-09 23:02:43  2018     1     380  7.79     1
## 3   463211      124745 2018-05-11 22:30:42  2018     5    2010  6.89     1
## 4   357213      285968 2018-01-23 23:36:58  2018     1    1130  2.49     1
## 5   466331      123599 2018-05-15 16:34:02  2018     5     160  7.59     1
## 6   483080      129856 2018-05-30 16:20:31  2018     5     421 14.0      1
## # ... with 9 more variables: order_discounts_total <dbl>,
## #   line_discounts_total <dbl>, tax <dbl>, disctotal <dbl>,
## #   order_total <dbl>, gender <chr>, location_no <dbl>, postalcode <chr>,
## #   store_id <dbl>

0.4 Encoding/Interaction design

In this section of the report, we will visualize some of the overall important characteristics. The sales pattern could be different for various stores. Therefore, web application is developed with the help of R shiny in order to visualize data for each store by selecting shoreId in dropdown menu.

Let’s see overall characteristics for all stores.

Which day of the week, most sales occur?

data_weekly <- data %>%
  mutate(day_of_week = weekdays(as.Date(date_created))) %>%
  group_by( day_of_week)%>%
  summarize(sales_total = sum(order_total))
p <- ggplot(data = data_weekly, aes(x= day_of_week, y= sales_total))+
  geom_col(aes(fill= day_of_week))+
  scale_x_discrete(limits=c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'))
p

How the order amount varied over time?

order_total_monthyear <- data %>%
  mutate(store_id = as.character(store_id)) %>%
  group_by(year, month, store_id) %>%
  summarize(total= sum(order_total))%>%
  arrange(desc(year, month))
p<- ggplot(order_total_monthyear, aes(x= month, y=total, group= store_id, color = store_id))+
  geom_line()
ggplotly(p)

sales_time <- data %>%
    mutate(store_id = as.character(store_id)) %>%
    mutate(date_modified = as.Date(date_created)) %>%
    group_by(date_modified, store_id) %>%
    summarize(total= sum(order_total))%>%
    arrange(date_modified)%>%
    filter(store_id== '1151')
sales_time

## # A tibble: 182 x 3
## # Groups:   date_modified [182]
##    date_modified store_id total
##    <date>        <chr>    <dbl>
##  1 2018-01-01    1151     2002.
##  2 2018-01-02    1151     3642.
##  3 2018-01-03    1151     2331.
##  4 2018-01-04    1151     2323.
##  5 2018-01-05    1151     3857.
##  6 2018-01-06    1151     5830.
##  7 2018-01-07    1151     1511.
##  8 2018-01-08    1151     2506.
##  9 2018-01-09    1151     3168.
## 10 2018-01-10    1151     2865.
## # ... with 172 more rows

p<- ggplot(sales_time, aes(x= date_modified, y=total))+
    geom_line(color= "darkblue")
p

What are the most popular menu items for each store?

order_total_items <- data %>%
  mutate(item_no = as.character(item_no))  %>%
  group_by(item_no) %>%
  summarize(totalqty = sum(qty)) %>%
  arrange(desc(totalqty))%>%
  slice(10:1)
p<- ggplot(order_total_items, aes(x= item_no, y=totalqty))+
  geom_col(fill= "blue")+
  coord_flip()
p

Who are the regular customers?

regular_customers <- data %>%
    mutate(customer_id = as.character(customer_id))%>%
    group_by(customer_id, order_id) %>%
    summarize(order_total = sum(order_total)) %>%
    group_by(customer_id)  %>%
    summarize(visit= n())%>%
    arrange(desc(visit)) %>%
    slice(10:1)

p<- ggplot(regular_customers, aes(x= customer_id, y=visit))+
  geom_col(fill= "blue")+
  coord_flip()
p

What are the peak hours for each day?

data_weekly_hours <- data %>%
  mutate(day_of_week = weekdays(as.Date(date_created))) %>%
  mutate(hour = hour(as.POSIXlt(date_created,format="%Y-%m-%dT%H:%M:%S"))) %>%
  group_by( day_of_week,hour)%>%
  summarize(sales_total = sum(order_total))
p <- ggplot(data = data_weekly_hours, aes(x= day_of_week, y= hour, fill= sales_total))+
  geom_bar(stat="identity")+
  scale_x_discrete(limits=c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'))+
  scale_fill_gradient(low = "yellow", high = "blue")
ggplotly(p)

tree map of menu items

items_frquency <- data %>%
  mutate(item_no = as.character(item_no))  %>%
  filter(price< 50) %>%
  group_by(item_no) %>%
  summarize(totalqty = sum(qty), price = mean(price)) %>%
  arrange(desc(totalqty))
p<- ggplot(data= items_frquency, mapping =aes(area= totalqty, fill= price, label= item_no))+
 geom_treemap()+
  geom_treemap_text(fontface = "italic", colour = "white", place = "centre",
                    grow = TRUE)
 
p

Geographic representation of stores

library(zipcode)

data(zipcode)
us<-map_data('state')


order_total_by_location <- data %>%
  mutate(store_id = as.character(store_id)) %>%
  mutate(zip = as.character(postalcode)) %>%
  group_by(store_id, zip, location_no ) %>%
  summarize(total = sum(order_total))


order_total_by_location<- merge(order_total_by_location, zipcode, by='zip')


p <- ggplot(order_total_by_location,aes(longitude,latitude)) +
  geom_polygon(data=us,aes(x=long,y=lat,group=group),color='gray',fill=NA,alpha=.35)+
  geom_point(aes(size = total, color= store_id),alpha=.5) +
  xlim(-85,-75)+ylim(35,45)

ggplotly(p)

For store level analysis, visit https://kabita-paul.shinyapps.io/StoreAnalysisApp/

0.5 Key streategies to implement

customized discounts (via email/ message) for regular customers
happy hours( special discounts) on days (and hours) with least orders
optimize inventory storage ( increase stocks of raw materials for most ordered items)
utilization of workforce- more stuffs on duty in peak hours.

0.6 Algorithmic design

Validation is about whether one has built the right product, and verification is about whether one has built the product right. Application algorithm should carry out the visual encoding and interaction design. The performance of the system is significant component of the accessibility and the usability. Performance of the application was considered while creating the coding and system design. Tidiness and neatness of data coding effects the system performance and reproducibility. The variables which may slow down the application were created at the top of the application as a pre-processing portion of the system. Additionally, reproducibility (please see the Github URL in Appendix) and readiness for the production were designed considering the user.

0.7 User evaluation

The evaluation of the system by human direct interaction is extremely complex task. Users may be biased and influenced by the experience, prior knowledge, and perspective. Also, cognitive ability may differ from person to person, which can bring about discord in judgment. Individuals may see different than one another, while one may see the cosmetics, others technical details.
Analytical and empirical techniques utilized by Human Computer Interaction (HCI) interacts with users via computers, which should; assess the functionality of the system that fulfills all of the functions requested by the user that defined in the phase of user requirements specification, analyze the system’s effect on the final users.

0.8 Future work

customers’ income level and socio-economic situation of the location can be analysed.
menu item customization and pairing of items given the detailed menu data are provided.
analysis of customize combo meals for items which are ordered together (better pricing strategy)
Improve Performance: The shiny app takes about 3-5 seconds to get loaded. Profiling has to be done to improve performance.

0.9 Appendix

Please see the shiny app link:

https://kabita-paul.shinyapps.io/StoreAnalysisApp/

Please see the blog link:

http://rpubs.com/kabitapaul11/fastfoodchain_analysis/

Please see the Github link:

https://github.com/kabitapaul11/FastFoodChain/

0.10 References

Article: A nested model for visualization design and validation