Like many outstanding American businesses, The Bread Basket literally has its roots in hearth, home, and friends. Here is how it all began, in the words of founder, Marie Hyde.
In 1989, they expanded their business by moving from our kitchen to a small shop nearby. In 1993, they built and relocated to our main facility, a beautiful 6,000 square foot building on Signal Mountain in Tennessee. It was then that her husband, Charlie, left his job, as a corporate executive, to help manage the business. Under his guidance, The Bread Basket is a thriving company, with two locations in the Chattanooga area. We deliver our gift baskets locally and ship nationwide.
The dataset belongs to “The Bread Basket” a bakery located in Edinburgh. The dataset has 20507 entries, over 9000 transactions, and 4 columns.
The dataset has transactions of customers who ordered different items from this bakery online and the time period of the data is from 26-01-11 to 27-12-03.
The Dataset published on Kaggle
Make sure our data placed in the same folder our R project data.
bread <- read.csv("data_input/thebreadbasket.csv")
head(bread)
## Transaction Item date_time period_day weekday_weekend
## 1 1 Bread 30-10-2016 09:58 morning weekend
## 2 2 Scandinavian 30-10-2016 10:05 morning weekend
## 3 2 Scandinavian 30-10-2016 10:05 morning weekend
## 4 3 Hot chocolate 30-10-2016 10:07 morning weekend
## 5 3 Jam 30-10-2016 10:07 morning weekend
## 6 3 Cookies 30-10-2016 10:07 morning weekend
Check Datatypes
make sure the data clean and ready to use.
str(bread)
## 'data.frame': 20507 obs. of 5 variables:
## $ Transaction : int 1 2 2 3 3 3 4 5 5 5 ...
## $ Item : chr "Bread" "Scandinavian" "Scandinavian" "Hot chocolate" ...
## $ date_time : chr "30-10-2016 09:58" "30-10-2016 10:05" "30-10-2016 10:05" "30-10-2016 10:07" ...
## $ period_day : chr "morning" "morning" "morning" "morning" ...
## $ weekday_weekend: chr "weekend" "weekend" "weekend" "weekend" ...
There is some datatype that not appropriate. The datatype that we should change,
Import Packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Change Data Types
bread_clean <-
bread %>%
mutate(date_time = dmy_hm(date_time),
Item = as.factor(Item),
period_day = as.factor(period_day),
weekday_weekend = as.factor(weekday_weekend),
Year = year(date_time),
Month = months(date_time))
glimpse(bread_clean)
## Rows: 20,507
## Columns: 7
## $ Transaction <int> 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 7, 8, ~
## $ Item <fct> Bread, Scandinavian, Scandinavian, Hot chocolate, Jam,~
## $ date_time <dttm> 2016-10-30 09:58:00, 2016-10-30 10:05:00, 2016-10-30 ~
## $ period_day <fct> morning, morning, morning, morning, morning, morning, ~
## $ weekday_weekend <fct> weekend, weekend, weekend, weekend, weekend, weekend, ~
## $ Year <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, ~
## $ Month <chr> "October", "October", "October", "October", "October",~
Each of column already changed into desired data type
Checking Missing Value
colSums(is.na(bread_clean))
## Transaction Item date_time period_day weekday_weekend
## 0 0 0 0 0
## Year Month
## 0 0
anyNA(bread_clean)
## [1] FALSE
Great!! we haven’t Missing Values
Now, The bread basket dataset is ready to be processed and analyzed
First, we can use summary() function to know the data
summary(bread_clean)
## Transaction Item date_time period_day
## Min. : 1 Coffee :5471 Min. :2016-10-30 09:58:00 afternoon:11569
## 1st Qu.:2552 Bread :3325 1st Qu.:2016-12-03 14:28:00 evening : 520
## Median :5137 Tea :1435 Median :2017-01-22 11:18:00 morning : 8404
## Mean :4976 Cake :1025 Mean :2017-01-18 01:31:08 night : 14
## 3rd Qu.:7357 Pastry : 856 3rd Qu.:2017-02-28 16:00:00
## Max. :9684 Sandwich: 771 Max. :2017-04-09 15:04:00
## (Other) :7624
## weekday_weekend Year Month
## weekday:12807 Min. :2016 Length:20507
## weekend: 7700 1st Qu.:2016 Class :character
## Median :2017 Mode :character
## Mean :2017
## 3rd Qu.:2017
## Max. :2017
##
SUMMARY
Now, to help The Bread Basket increase its sales, we can answer questions that can be useful information for them
Answer:
First, we need to subset the data for the Year in 2017
bread_2017 <- bread_clean[bread_clean$Year == 2017,]
And now, we can create a new data table for looking at which product sold the most in 2017.
bread_most <- as.data.frame(sort(table(bread_2017$Item),decreasing = T))
names(bread_most)[1] <- paste("Products")
names(bread_most)[2] <- paste("Quantity")
head(bread_most,10)
## Products Quantity
## 1 Coffee 3257
## 2 Bread 1935
## 3 Tea 858
## 4 Cake 762
## 5 Sandwich 536
## 6 Pastry 490
## 7 Cookies 374
## 8 Hot chocolate 329
## 9 Juice 259
## 10 Scone 258
library(ggplot2)
ggplot(head(bread_most,10), aes(x = Products, y = Quantity))+
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Most sold products in 2017")
Hence, Coffee and Bread are the most sold products in 2017.
Answer:
First, we need to subset the data for the weekday_weekend on Weekend
bread_weekend <- bread_clean[bread_clean$weekday_weekend == "weekend",]
And now, we can create a new data table for looking at which product sold the most on Weekend.
weekend_most <- as.data.frame(sort(table(bread_weekend$Item),decreasing = T))
names(weekend_most)[1] <- paste("Products")
names(weekend_most)[2] <- paste("Quantity")
head(weekend_most,10)
## Products Quantity
## 1 Coffee 1928
## 2 Bread 1233
## 3 Tea 459
## 4 Cake 413
## 5 Pastry 290
## 6 Medialuna 277
## 7 Sandwich 259
## 8 Hot chocolate 250
## 9 Scone 203
## 10 Brownie 172
ggplot(head(weekend_most,10), aes(x = Products, y = Quantity))+
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Most sold products on Weekend")
Hence, Coffee and Bread are the most sold products on Weekend.
Answer:
month_most <- as.data.frame(sort(table(bread_weekend$Month),decreasing = T))
names(month_most)[1] <- paste("Months")
names(month_most)[2] <- paste("Quantity")
head(month_most)
## Months Quantity
## 1 November 1562
## 2 February 1529
## 3 March 1442
## 4 January 1343
## 5 December 1090
## 6 April 564
ggplot(month_most, aes(x = Months, y = Quantity))+
geom_bar(stat = "identity") +
labs(title = "Months with The Most Sales")
Hence,November is the Month with The Most Sales.
Bread occur in what period of the day?Answer:
First, we need to subset the data for the Item to Bread
item_bread <- bread_clean[bread_clean$Item == "Bread",]
And now, we can create a new data table for looking at which period of the day that sold the most Bread product.
bread_most <- as.data.frame(sort(table(item_bread$period_day),decreasing = T))
names(bread_most)[1] <- paste("Period Day")
names(bread_most)[2] <- paste("Quantity")
head(bread_most,10)
## Period Day Quantity
## 1 afternoon 1661
## 2 morning 1610
## 3 evening 54
## 4 night 0
ggplot(bread_most, aes(x = `Period Day`, y = Quantity))+
geom_bar(stat = "identity") +
labs(title = "Period of Days with the Most Sales of Bread products")
Hence, Afternoon is Period of Days with the Most Sales of Bread products.
Tea occur in what period of the day?Answer:
First, we need to subset the data for the Item to Tea
item_tea <- bread_clean[bread_clean$Item == "Tea",]
And now, we can create a new data table for looking at which period of the day that sold the most Tea product.
tea_most <- as.data.frame(sort(table(item_tea$period_day),decreasing = T))
names(tea_most)[1] <- paste("Period Day")
names(tea_most)[2] <- paste("Quantity")
head(tea_most,10)
## Period Day Quantity
## 1 afternoon 930
## 2 morning 456
## 3 evening 49
## 4 night 0
ggplot(tea_most, aes(x = `Period Day`, y = Quantity))+
geom_bar(stat = "identity") +
labs(title = "Period of Days with the Most Sales of Tea products")
Hence, Afternoon is Period of Days with the Most Sales of Tea products.
In almost 6 months recorded from October 2016 to April 2017 The Bread Basket has had 9684 sales transactions. The Bread Basket has several products on its menu, such as Coffee, Bread, Tea, Cake, Pastry, and others. The Bread Basket has several favorite menus, namely Coffee, Bread, and Tea, with total sales transactions of 5471, 3325, and 1435, respectively. The Bread Basket also has the most sales in the afternoon and the morning. And most of the sales transactions for The Bread Basket occur on weekdays.
In sales in 2017, Coffee, Bread, and Tea became the products with the most sales. And in sales on the Weekend, Coffee, Bread, and Tea also became the products with the most sales