Maven is a website with a diversity of datasets for training or data analysis.
Analyze the dataset “Coffee Store” and take insights about the store’s financial health and advise the manager on how to improve profit.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
## here() starts at /mnt/ssd/projetos/dataviz/Coffee+Shop+Sales/R
library(readxl)
library(glue)
library(ggtext)
library(lubridate)
library(skimr)
df=read_xlsx(here('../dados/Coffee Shop Sales.xlsx'))
df |> glimpse()
## Rows: 149,116
## Columns: 11
## $ transaction_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
## $ transaction_date <dttm> 2023-01-01, 2023-01-01, 2023-01-01, 2023-01-01, 2023…
## $ transaction_time <dttm> 1899-12-31 07:06:11, 1899-12-31 07:08:56, 1899-12-31…
## $ transaction_qty <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1,…
## $ store_id <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8,…
## $ store_location <chr> "Lower Manhattan", "Lower Manhattan", "Lower Manhatta…
## $ product_id <dbl> 32, 57, 59, 22, 57, 77, 22, 28, 39, 58, 56, 33, 51, 5…
## $ unit_price <dbl> 3.00, 3.10, 4.50, 2.00, 3.10, 3.00, 2.00, 2.00, 4.25,…
## $ product_category <chr> "Coffee", "Tea", "Drinking Chocolate", "Coffee", "Tea…
## $ product_type <chr> "Gourmet brewed coffee", "Brewed Chai tea", "Hot choc…
## $ product_detail <chr> "Ethiopia Rg", "Spicy Eye Opener Chai Lg", "Dark choc…
df |> skim()
| Name | df |
| Number of rows | 149116 |
| Number of columns | 11 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 5 |
| POSIXct | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| store_location | 0 | 1 | 7 | 15 | 0 | 3 | 0 |
| product_category | 0 | 1 | 3 | 18 | 0 | 9 | 0 |
| product_type | 0 | 1 | 5 | 21 | 0 | 29 | 0 |
| product_detail | 0 | 1 | 5 | 28 | 0 | 80 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| transaction_id | 0 | 1 | 74737.37 | 43153.60 | 1.0 | 37335.75 | 74727.5 | 112094.25 | 149456 | ▇▇▇▇▇ |
| transaction_qty | 0 | 1 | 1.44 | 0.54 | 1.0 | 1.00 | 1.0 | 2.00 | 8 | ▇▁▁▁▁ |
| store_id | 0 | 1 | 5.34 | 2.07 | 3.0 | 3.00 | 5.0 | 8.00 | 8 | ▇▇▁▁▇ |
| product_id | 0 | 1 | 47.92 | 17.93 | 1.0 | 33.00 | 47.0 | 60.00 | 87 | ▁▆▇▆▅ |
| unit_price | 0 | 1 | 3.38 | 2.66 | 0.8 | 2.50 | 3.0 | 3.75 | 45 | ▇▁▁▁▁ |
Variable type: POSIXct
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| transaction_date | 0 | 1 | 2023-01-01 00:00:00 | 2023-06-30 00:00:00 | 2023-04-24 00:00:00 | 181 |
| transaction_time | 0 | 1 | 1899-12-31 06:00:00 | 1899-12-31 20:59:32 | 1899-12-31 11:15:28 | 25762 |
The ‘money’ column is calculated by multiplying
transaction_qty with unit_price. This will
help to look the revenue of each store.
df$money=df$transaction_qty*df$unit_price
| Category | Description |
|---|---|
| Coffee | A hot drink made from coffee beans. |
| Tea | A drink made from tea leaves. |
| Drinking Chocolate | A sweet chocolate drink, usually hot. |
| Bakery | Baked foods like bread and cakes. |
| Flavours | Things added to change or improve taste. |
| Loose Tea | Tea leaves sold without bags or packaging. |
| Coffee beans | Whole beans used to make coffee. |
| Packaged Chocolate | Chocolate sold in bars or boxes. |
| Branded | Products with a known company name. |
for( product in unique(df$product_category)){
total_money=sum(df$money)
valor_total_do_produto=df|> filter(product_category==product) |> summarise(sum(money))
valor_percentual=round((valor_total_do_produto/total_money)*100,2)
cores = setNames(c("#000000", "#E69F00"), c("Others", product))
g=df |>
mutate(geral= ifelse(product_category == product,product,'Others'),
geral = factor(geral, levels = c("Others",product)) ) |>
group_by(geral,transaction_date) |>
summarise(total=sum(money)) |>
ggplot() +
geom_area(aes(x=transaction_date,y=total,fill=geral))+
scale_fill_manual(values = cores) +
labs(subtitle = glue('The total revenue between January and July was
<b>{total_money}<b>.'),
title = glue('The product category <i><b>{product}</b></i> contributes with
<b>{valor_percentual}%</b> of the revenue')) +
theme(
plot.title = element_markdown(size = 14),
plot.subtitle = element_markdown(size = 10),
panel.background = element_rect(fill = 'white'),
panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
legend.title = element_blank(),
legend.position = 'top',
legend.margin = margin(t=1)
) +
scale_y_sqrt() +
ylab('') +
xlab('')
print(g)
}
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.
## Conclusion
Coffee and tea sales together account for more than 50% of total revenue.
cores=c("#000000", "#E69F00",'#808080')
df$year=lubridate::year(df$transaction_date) |> as.factor()
df |>
group_by(store_location,year) |>
summarise(total=sum(money)) |>
ggplot() +
geom_col(aes(x=year,y=total,fill=store_location),position = position_dodge(width = 1),width = 0.7 )+
scale_fill_manual(values = cores) +
labs(title = 'Each store location contributes similarly to annual revenue.')+
theme(
plot.title = element_markdown(size = 14),
plot.subtitle = element_markdown(size = 10),
panel.background = element_rect(fill = 'white'),
panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
legend.title = element_blank(),
legend.position = 'top',
legend.margin = margin(t=1)
) +
scale_y_sqrt() +
ylab('') +
xlab('')
## `summarise()` has grouped output by 'store_location'. You can override using
## the `.groups` argument.
# Which product category contributes the most in each store?
for( place in unique(df$store_location)){
category_colors <- c(
"Coffee" = "#6F4E37",
"Tea" = "#386641",
"Drinking Chocolate" = "#A0522D",
"Bakery" = "#C06014",
"Flavours" = "#7B1FA2",
"Loose Tea" = "#1E88E5",
"Coffee beans" = "#3E2723",
"Packaged Chocolate" = "#8E44AD",
"Branded" = "#B71C1C"
)
df_g=df |>
filter( store_location == place)
total_money=sum(df_g$money)
g= df_g|>
group_by(transaction_date,product_category ) |>
summarise(total=sum(money)) |>
ggplot() +
geom_area(aes(x=transaction_date,y=total,fill=reorder(product_category,total)))+
scale_fill_manual(values = category_colors) +
labs(subtitle = glue('The total revenue between January and July was
<b>{total_money}<b>.'),
title = glue('How much does each category contribute to the {place} store')) +
theme(
plot.title = element_markdown(size = 14),
plot.subtitle = element_markdown(size = 10),
panel.background = element_rect(fill = 'white'),
panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
legend.title = element_blank(),
legend.position = 'top',
legend.margin = margin(t=1)
) +
scale_y_sqrt() +
ylab('') +
xlab('')
print(g)
}
## `summarise()` has grouped output by 'transaction_date'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'transaction_date'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by 'transaction_date'. You can override using
## the `.groups` argument.
Coffee and tea sales were the categories of products with the most
sales.
for( place in unique(df$store_location)){
df_g=df |>
filter( store_location == place)
g= df_g|>
group_by(product_type,product_category) |>
summarise(mean_unit_price=mean(unit_price),mean_quantity=mean(transaction_qty),total=sum(money))|>
arrange(-total) |>
head(5)
print(g)
}
## `summarise()` has grouped output by 'product_type'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 5
## # Groups: product_type [5]
## product_type product_category mean_unit_price mean_quantity total
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Barista Espresso Coffee 3.64 1.60 31051
## 2 Brewed Chai tea Tea 2.93 1.62 24009.
## 3 Gourmet brewed coffee Coffee 2.71 1.64 23201.
## 4 Hot chocolate Drinking Chocolate 4.14 1.60 22494.
## 5 Brewed Black tea Tea 2.75 1.64 16216.
## `summarise()` has grouped output by 'product_type'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 5
## # Groups: product_type [5]
## product_type product_category mean_unit_price mean_quantity total
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Barista Espresso Coffee 3.55 1.47 32420.
## 2 Brewed Chai tea Tea 2.93 1.50 25645.
## 3 Hot chocolate Drinking Chocolate 4.16 1.51 23586.
## 4 Gourmet brewed coffee Coffee 2.72 1.50 23010.
## 5 Brewed Black tea Tea 2.75 1.51 15576.
## `summarise()` has grouped output by 'product_type'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 5
## # Groups: product_type [5]
## product_type product_category mean_unit_price mean_quantity total
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Barista Espresso Coffee 3.80 1.49 27935
## 2 Brewed Chai tea Tea 2.95 1.48 27428.
## 3 Hot chocolate Drinking Chocolate 4.14 1.48 26335.
## 4 Gourmet brewed coffee Coffee 2.67 1.48 23823.
## 5 Brewed herbal tea Tea 2.74 1.49 16282
The top products in the three stores are: - Barista Espresso - Brewed Chai tea - Hot chocolate - Gourmet brewed coffee
“Brewed Black and herbal tea profits are too low compared to the others!”
beverage_colors <- c(
"Barista Espresso" = "#3B1F1B",
"Brewed Chai tea" = "#B86023",
"Hot chocolate" = "#7B4D45",
"Gourmet brewed coffee" = "#A67C52",
"Others" = "#6E5A75"
)
for (store in df$store_location |> unique()){
df_g = df |> filter(store_location == store)
g = df_g |>
mutate(product= ifelse(product_type %in% c('Barista Espresso','Brewed Chai tea','Hot chocolate','Gourmet brewed coffee'),product_type, 'Others'),
product= factor(product, levels = c("Others",'Barista Espresso','Brewed Chai tea','Hot chocolate','Gourmet brewed coffee'))) |>
mutate(month=month(transaction_date,label = TRUE)) |>
group_by(month,product) |>
summarise(total_quantity=sum(transaction_qty)) |>
ggplot() +
xlab('')+
ylab('')+
geom_col(aes(x=month,y=total_quantity, fill=product)) +
scale_fill_manual(values=beverage_colors) +
labs(title=glue('The contribution proportion of each product for sales by month in {store}'))+
theme(
plot.title = element_text(size = 14,margin = margin(0,0,3,0)),
panel.background = element_rect(fill = 'white'),
panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
legend.title = element_blank(),
legend.position = 'top',
legend.margin = margin(t=1),
legend.text = element_text(size=8)
)
print(g)
}
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
Sales have increased significantly along the year. Barista Espresso, Brewed Chai tea, and Gourmet brewed coffee have similar sales quantities. Hot chocolate sales are significantly lower.
There is a growing trend to sell more. The stores should increase the
proportion of products:
Barista Espresso,Brewed Chai tea,Hot chocolate,Gourmet brewed coffee.