Background

The Bread Basket

Like many outstanding American businesses, The Bread Basket literally has its roots in hearth, home, and friends. Here is how it all began, in the words of founder, Marie Hyde.

In 1989, they expanded their business by moving from our kitchen to a small shop nearby. In 1993, they built and relocated to our main facility, a beautiful 6,000 square foot building on Signal Mountain in Tennessee. It was then that her husband, Charlie, left his job, as a corporate executive, to help manage the business. Under his guidance, The Bread Basket is a thriving company, with two locations in the Chattanooga area. We deliver our gift baskets locally and ship nationwide.

Dataset

The dataset belongs to “The Bread Basket” a bakery located in Edinburgh. The dataset has 20507 entries, over 9000 transactions, and 4 columns.

The dataset has transactions of customers who ordered different items from this bakery online and the time period of the data is from 26-01-11 to 27-12-03.

The Dataset published on Kaggle

Importing Data

Make sure our data placed in the same folder our R project data.

bread <- read.csv("data_input/thebreadbasket.csv")
head(bread)
##   Transaction          Item        date_time period_day weekday_weekend
## 1           1         Bread 30-10-2016 09:58    morning         weekend
## 2           2  Scandinavian 30-10-2016 10:05    morning         weekend
## 3           2  Scandinavian 30-10-2016 10:05    morning         weekend
## 4           3 Hot chocolate 30-10-2016 10:07    morning         weekend
## 5           3           Jam 30-10-2016 10:07    morning         weekend
## 6           3       Cookies 30-10-2016 10:07    morning         weekend

Data Cleansing & Coertions

Check Datatypes

make sure the data clean and ready to use.

str(bread)
## 'data.frame':    20507 obs. of  5 variables:
##  $ Transaction    : int  1 2 2 3 3 3 4 5 5 5 ...
##  $ Item           : chr  "Bread" "Scandinavian" "Scandinavian" "Hot chocolate" ...
##  $ date_time      : chr  "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
##  $ period_day     : chr  "morning" "morning" "morning" "morning" ...
##  $ weekday_weekend: chr  "weekend" "weekend" "weekend" "weekend" ...

There is some datatype that not appropriate. The datatype that we should change,

  1. Date_time to date
  2. period_day and weekday_weekend to factor

Change Datatype

Import Packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Change Data Types

bread_clean <- 
  bread %>% 
  mutate(date_time = dmy_hm(date_time),
         Item = as.factor(Item),
         period_day = as.factor(period_day),
         weekday_weekend = as.factor(weekday_weekend),
         Year = year(date_time),
         Month = months(date_time))
glimpse(bread_clean)
## Rows: 20,507
## Columns: 7
## $ Transaction     <int> 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, ~
## $ Item            <fct> Bread, Scandinavian, Scandinavian, Hot chocolate, Jam,~
## $ date_time       <dttm> 2016-10-30 09:58:00, 2016-10-30 10:05:00, 2016-10-30 ~
## $ period_day      <fct> morning, morning, morning, morning, morning, morning, ~
## $ weekday_weekend <fct> weekend, weekend, weekend, weekend, weekend, weekend, ~
## $ Year            <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, ~
## $ Month           <chr> "October", "October", "October", "October", "October",~

Each of column already changed into desired data type

Checking Missing Value

colSums(is.na(bread_clean))
##     Transaction            Item       date_time      period_day weekday_weekend 
##               0               0               0               0               0 
##            Year           Month 
##               0               0
anyNA(bread_clean)
## [1] FALSE

Great!! we haven’t Missing Values

Now, The bread basket dataset is ready to be processed and analyzed

Data Explanation

First, we can use summary() function to know the data

summary(bread_clean)
##   Transaction         Item        date_time                       period_day   
##  Min.   :   1   Coffee  :5471   Min.   :2016-10-30 09:58:00   afternoon:11569  
##  1st Qu.:2552   Bread   :3325   1st Qu.:2016-12-03 14:28:00   evening  :  520  
##  Median :5137   Tea     :1435   Median :2017-01-22 11:18:00   morning  : 8404  
##  Mean   :4976   Cake    :1025   Mean   :2017-01-18 01:31:08   night    :   14  
##  3rd Qu.:7357   Pastry  : 856   3rd Qu.:2017-02-28 16:00:00                    
##  Max.   :9684   Sandwich: 771   Max.   :2017-04-09 15:04:00                    
##                 (Other) :7624                                                  
##  weekday_weekend      Year         Month          
##  weekday:12807   Min.   :2016   Length:20507      
##  weekend: 7700   1st Qu.:2016   Class :character  
##                  Median :2017   Mode  :character  
##                  Mean   :2017                     
##                  3rd Qu.:2017                     
##                  Max.   :2017                     
## 

SUMMARY

  1. From the data, the first order recorded on October 30, 2016.
  2. There are 9684 transactions recorded.
  3. Sales that occurred on weekdays were 12807 transactions while on weekends 7700 transactions.
  4. Most transactions occur in the afternoon and the lowest at night.
  5. Coffee became the product with the most sales with 5471 transactions.

Business Needs

Now, to help The Bread Basket increase its sales, we can answer questions that can be useful information for them

1. What products sold the most in 2017

Answer:

First, we need to subset the data for the Year in 2017

bread_2017 <- bread_clean[bread_clean$Year == 2017,]

And now, we can create a new data table for looking at which product sold the most in 2017.

bread_most <- as.data.frame(sort(table(bread_2017$Item),decreasing = T))
names(bread_most)[1] <- paste("Products")
names(bread_most)[2] <- paste("Quantity")

head(bread_most,10)
##         Products Quantity
## 1         Coffee     3257
## 2          Bread     1935
## 3            Tea      858
## 4           Cake      762
## 5       Sandwich      536
## 6         Pastry      490
## 7        Cookies      374
## 8  Hot chocolate      329
## 9          Juice      259
## 10         Scone      258
library(ggplot2)

ggplot(head(bread_most,10), aes(x = Products, y = Quantity))+
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Most sold products in 2017")

Hence, Coffee and Bread are the most sold products in 2017.

2. What products seld the most on the weekend?

Answer:

First, we need to subset the data for the weekday_weekend on Weekend

bread_weekend <- bread_clean[bread_clean$weekday_weekend == "weekend",]

And now, we can create a new data table for looking at which product sold the most on Weekend.

weekend_most <- as.data.frame(sort(table(bread_weekend$Item),decreasing = T))
names(weekend_most)[1] <- paste("Products")
names(weekend_most)[2] <- paste("Quantity")

head(weekend_most,10)
##         Products Quantity
## 1         Coffee     1928
## 2          Bread     1233
## 3            Tea      459
## 4           Cake      413
## 5         Pastry      290
## 6      Medialuna      277
## 7       Sandwich      259
## 8  Hot chocolate      250
## 9          Scone      203
## 10       Brownie      172
ggplot(head(weekend_most,10), aes(x = Products, y = Quantity))+
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Most sold products on Weekend")

Hence, Coffee and Bread are the most sold products on Weekend.

3. In what month did the most sales occur?

Answer:

month_most <- as.data.frame(sort(table(bread_weekend$Month),decreasing = T))
names(month_most)[1] <- paste("Months")
names(month_most)[2] <- paste("Quantity")

head(month_most)
##     Months Quantity
## 1 November     1562
## 2 February     1529
## 3    March     1442
## 4  January     1343
## 5 December     1090
## 6    April      564
ggplot(month_most, aes(x = Months, y = Quantity))+
  geom_bar(stat = "identity") +
  labs(title = "Months with The Most Sales")

Hence,November is the Month with The Most Sales.

4. Most sales of Bread occur in what period of the day?

Answer:

First, we need to subset the data for the Item to Bread

item_bread <- bread_clean[bread_clean$Item == "Bread",]

And now, we can create a new data table for looking at which period of the day that sold the most Bread product.

bread_most <- as.data.frame(sort(table(item_bread$period_day),decreasing = T))
names(bread_most)[1] <- paste("Period Day")
names(bread_most)[2] <- paste("Quantity")

head(bread_most,10)
##   Period Day Quantity
## 1  afternoon     1661
## 2    morning     1610
## 3    evening       54
## 4      night        0
ggplot(bread_most, aes(x = `Period Day`, y = Quantity))+
  geom_bar(stat = "identity") +
  labs(title = "Period of Days with the Most Sales of Bread products")

Hence, Afternoon is Period of Days with the Most Sales of Bread products.

5. Most sales of Tea occur in what period of the day?

Answer:

First, we need to subset the data for the Item to Tea

item_tea <- bread_clean[bread_clean$Item == "Tea",]

And now, we can create a new data table for looking at which period of the day that sold the most Tea product.

tea_most <- as.data.frame(sort(table(item_tea$period_day),decreasing = T))
names(tea_most)[1] <- paste("Period Day")
names(tea_most)[2] <- paste("Quantity")

head(tea_most,10)
##   Period Day Quantity
## 1  afternoon      930
## 2    morning      456
## 3    evening       49
## 4      night        0
ggplot(tea_most, aes(x = `Period Day`, y = Quantity))+
  geom_bar(stat = "identity") +
  labs(title = "Period of Days with the Most Sales of Tea products")

Hence, Afternoon is Period of Days with the Most Sales of Tea products.

Conclusion

In almost 6 months recorded from October 2016 to April 2017 The Bread Basket has had 9684 sales transactions. The Bread Basket has several products on its menu, such as Coffee, Bread, Tea, Cake, Pastry, and others. The Bread Basket has several favorite menus, namely Coffee, Bread, and Tea, with total sales transactions of 5471, 3325, and 1435, respectively. The Bread Basket also has the most sales in the afternoon and the morning. And most of the sales transactions for The Bread Basket occur on weekdays.

In sales in 2017, Coffee, Bread, and Tea became the products with the most sales. And in sales on the Weekend, Coffee, Bread, and Tea also became the products with the most sales

Reference

  1. The Bread Basket