Introduction

Maven is a website with a diversity of datasets for training or data analysis.

Objective

Analyze the dataset “Coffee Store” and take insights about the store’s financial health and advise the manager on how to improve profit.

Libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(here)

## here() starts at /mnt/ssd/projetos/dataviz/Coffee+Shop+Sales/R

library(readxl)
library(glue)
library(ggtext)
library(lubridate)
library(skimr)

Import dataframe

df=read_xlsx(here('../dados/Coffee Shop Sales.xlsx'))

Glimpse dataframe

df |> glimpse()

## Rows: 149,116
## Columns: 11
## $ transaction_id   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
## $ transaction_date <dttm> 2023-01-01, 2023-01-01, 2023-01-01, 2023-01-01, 2023…
## $ transaction_time <dttm> 1899-12-31 07:06:11, 1899-12-31 07:08:56, 1899-12-31…
## $ transaction_qty  <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1,…
## $ store_id         <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 8,…
## $ store_location   <chr> "Lower Manhattan", "Lower Manhattan", "Lower Manhatta…
## $ product_id       <dbl> 32, 57, 59, 22, 57, 77, 22, 28, 39, 58, 56, 33, 51, 5…
## $ unit_price       <dbl> 3.00, 3.10, 4.50, 2.00, 3.10, 3.00, 2.00, 2.00, 4.25,…
## $ product_category <chr> "Coffee", "Tea", "Drinking Chocolate", "Coffee", "Tea…
## $ product_type     <chr> "Gourmet brewed coffee", "Brewed Chai tea", "Hot choc…
## $ product_detail   <chr> "Ethiopia Rg", "Spicy Eye Opener Chai Lg", "Dark choc…

Resume of dataframe

df |> skim()

Data summary
Name	df
Number of rows	149116
Number of columns	11
_______________________
Column type frequency:
character	4
numeric	5
POSIXct	2
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
store_location	1	7	15	3
product_category	1	3	18	9
product_type	1	5	21	29
product_detail	1	5	28	80

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
transaction_id	1	74737.37	43153.60	1.0	37335.75	74727.5	112094.25	149456	▇▇▇▇▇
transaction_qty	1	1.44	0.54	1.0	1.00	1.0	2.00	8	▇▁▁▁▁
store_id	1	5.34	2.07	3.0	3.00	5.0	8.00	8	▇▇▁▁▇
product_id	1	47.92	17.93	1.0	33.00	47.0	60.00	87	▁▆▇▆▅
unit_price	1	3.38	2.66	0.8	2.50	3.0	3.75	45	▇▁▁▁▁

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
transaction_date	0	1	2023-01-01 00:00:00	2023-06-30 00:00:00	2023-04-24 00:00:00	181
transaction_time	0	1	1899-12-31 06:00:00	1899-12-31 20:59:32	1899-12-31 11:15:28	25762

Create the new column ‘money’

The ‘money’ column is calculated by multiplying transaction_qty with unit_price. This will help to look the revenue of each store.

df$money=df$transaction_qty*df$unit_price

What are the coffee categories?

Product Categories (Created with the help of ChatGPT)

Category	Description
Coffee	A hot drink made from coffee beans.
Tea	A drink made from tea leaves.
Drinking Chocolate	A sweet chocolate drink, usually hot.
Bakery	Baked foods like bread and cakes.
Flavours	Things added to change or improve taste.
Loose Tea	Tea leaves sold without bags or packaging.
Coffee beans	Whole beans used to make coffee.
Packaged Chocolate	Chocolate sold in bars or boxes.
Branded	Products with a known company name.

How much does each product category contribute to the semester revenue?

for( product in unique(df$product_category)){
  total_money=sum(df$money)
  valor_total_do_produto=df|> filter(product_category==product) |> summarise(sum(money))
  valor_percentual=round((valor_total_do_produto/total_money)*100,2)
  cores = setNames(c("#000000", "#E69F00"), c("Others", product))
  g=df |> 
    mutate(geral= ifelse(product_category == product,product,'Others'),
           geral = factor(geral, levels = c("Others",product))  ) |>
    group_by(geral,transaction_date) |>
    summarise(total=sum(money)) |> 
    ggplot() +
    geom_area(aes(x=transaction_date,y=total,fill=geral))+
    scale_fill_manual(values = cores) +
    labs(subtitle = glue('The total revenue between January and July was 
<b>{total_money}<b>.'),
         title = glue('The product category <i><b>{product}</b></i> contributes with 
<b>{valor_percentual}%</b> of the revenue')) +
    theme(
      plot.title = element_markdown(size = 14),
      plot.subtitle = element_markdown(size = 10),
      panel.background = element_rect(fill = 'white'),
      panel.grid = element_blank(),
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank(),
      legend.title = element_blank(),
      legend.position = 'top',
      legend.margin = margin(t=1)
      
    ) +
    scale_y_sqrt() +
    ylab('') +
    xlab('')
  print(g)
}

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'geral'. You can override using the
## `.groups` argument.

## Conclusion

Coffee and tea sales together account for more than 50% of total revenue.

How much does each location contribute to the annual revenue?

cores=c("#000000", "#E69F00",'#808080')
df$year=lubridate::year(df$transaction_date) |> as.factor()
df |> 
  group_by(store_location,year) |>
  summarise(total=sum(money)) |> 
  ggplot() +
  geom_col(aes(x=year,y=total,fill=store_location),position = position_dodge(width = 1),width  = 0.7 )+
  scale_fill_manual(values = cores) +
  labs(title = 'Each store location contributes similarly to annual revenue.')+
  theme(
    plot.title = element_markdown(size = 14),
    plot.subtitle = element_markdown(size = 10),
    panel.background = element_rect(fill = 'white'),
    panel.grid = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    legend.title = element_blank(),
    legend.position = 'top',
    legend.margin = margin(t=1)
  ) +
  scale_y_sqrt() +
  ylab('') +
  xlab('')

## `summarise()` has grouped output by 'store_location'. You can override using
## the `.groups` argument.

# Which product category contributes the most in each store?

for( place in unique(df$store_location)){
  category_colors <- c(
  "Coffee"             = "#6F4E37",  
  "Tea"                = "#386641",  
  "Drinking Chocolate" = "#A0522D",  
  "Bakery"             = "#C06014",  
  "Flavours"           = "#7B1FA2",  
  "Loose Tea"          = "#1E88E5",  
  "Coffee beans"       = "#3E2723",  
  "Packaged Chocolate" = "#8E44AD",
  "Branded"            = "#B71C1C"   
)

  df_g=df |> 
     filter( store_location == place)
  
  total_money=sum(df_g$money)
  
  g= df_g|>
    group_by(transaction_date,product_category  ) |>
    summarise(total=sum(money)) |> 
    ggplot() +
    geom_area(aes(x=transaction_date,y=total,fill=reorder(product_category,total)))+
    scale_fill_manual(values = category_colors) +
    labs(subtitle = glue('The total revenue between January and July was 
<b>{total_money}<b>.'),
         title = glue('How much does each category contribute to the {place} store')) +
    theme(
      plot.title = element_markdown(size = 14),
      plot.subtitle = element_markdown(size = 10),
      panel.background = element_rect(fill = 'white'),
      panel.grid = element_blank(),
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank(),
      legend.title = element_blank(),
      legend.position = 'top',
      legend.margin = margin(t=1)
      
    ) +
    scale_y_sqrt() +
    ylab('') +
    xlab('')
  print(g)
}

## `summarise()` has grouped output by 'transaction_date'. You can override using
## the `.groups` argument.

## `summarise()` has grouped output by 'transaction_date'. You can override using
## the `.groups` argument.

## `summarise()` has grouped output by 'transaction_date'. You can override using
## the `.groups` argument.

Coffee and tea sales were the categories of products with the most sales.

What are the top 5 types of products for each store?

for( place in unique(df$store_location)){

  df_g=df |> 
     filter( store_location == place)
  
  g= df_g|>
    group_by(product_type,product_category) |>
    summarise(mean_unit_price=mean(unit_price),mean_quantity=mean(transaction_qty),total=sum(money))|> 
    arrange(-total) |> 
    head(5)

  print(g)
}

## `summarise()` has grouped output by 'product_type'. You can override using the
## `.groups` argument.

## # A tibble: 5 × 5
## # Groups:   product_type [5]
##   product_type          product_category   mean_unit_price mean_quantity  total
##   <chr>                 <chr>                        <dbl>         <dbl>  <dbl>
## 1 Barista Espresso      Coffee                        3.64          1.60 31051 
## 2 Brewed Chai tea       Tea                           2.93          1.62 24009.
## 3 Gourmet brewed coffee Coffee                        2.71          1.64 23201.
## 4 Hot chocolate         Drinking Chocolate            4.14          1.60 22494.
## 5 Brewed Black tea      Tea                           2.75          1.64 16216.

## `summarise()` has grouped output by 'product_type'. You can override using the
## `.groups` argument.

## # A tibble: 5 × 5
## # Groups:   product_type [5]
##   product_type          product_category   mean_unit_price mean_quantity  total
##   <chr>                 <chr>                        <dbl>         <dbl>  <dbl>
## 1 Barista Espresso      Coffee                        3.55          1.47 32420.
## 2 Brewed Chai tea       Tea                           2.93          1.50 25645.
## 3 Hot chocolate         Drinking Chocolate            4.16          1.51 23586.
## 4 Gourmet brewed coffee Coffee                        2.72          1.50 23010.
## 5 Brewed Black tea      Tea                           2.75          1.51 15576.

## `summarise()` has grouped output by 'product_type'. You can override using the
## `.groups` argument.

## # A tibble: 5 × 5
## # Groups:   product_type [5]
##   product_type          product_category   mean_unit_price mean_quantity  total
##   <chr>                 <chr>                        <dbl>         <dbl>  <dbl>
## 1 Barista Espresso      Coffee                        3.80          1.49 27935 
## 2 Brewed Chai tea       Tea                           2.95          1.48 27428.
## 3 Hot chocolate         Drinking Chocolate            4.14          1.48 26335.
## 4 Gourmet brewed coffee Coffee                        2.67          1.48 23823.
## 5 Brewed herbal tea     Tea                           2.74          1.49 16282

The top products in the three stores are: - Barista Espresso - Brewed Chai tea - Hot chocolate - Gourmet brewed coffee

“Brewed Black and herbal tea profits are too low compared to the others!”

“How much does each top product contribute to sales each month?”

beverage_colors <- c(
  "Barista Espresso"      = "#3B1F1B", 
  "Brewed Chai tea"       = "#B86023", 
  "Hot chocolate"         = "#7B4D45",  
  "Gourmet brewed coffee" = "#A67C52",
  "Others"                = "#6E5A75"   
)

for (store in df$store_location |> unique()){
  df_g = df |> filter(store_location == store)
  
  g = df_g |>
    mutate(product= ifelse(product_type %in% c('Barista Espresso','Brewed Chai tea','Hot chocolate','Gourmet brewed coffee'),product_type, 'Others'),
           product= factor(product, levels = c("Others",'Barista Espresso','Brewed Chai tea','Hot chocolate','Gourmet brewed coffee'))) |>
    mutate(month=month(transaction_date,label = TRUE)) |> 
    group_by(month,product) |> 
    summarise(total_quantity=sum(transaction_qty)) |> 
    ggplot() +
    xlab('')+
    ylab('')+
    geom_col(aes(x=month,y=total_quantity, fill=product)) +
    scale_fill_manual(values=beverage_colors) +
    labs(title=glue('The contribution proportion of each product for sales by month in {store}'))+
    theme(
      plot.title = element_text(size = 14,margin = margin(0,0,3,0)),
      panel.background = element_rect(fill = 'white'),
      panel.grid = element_blank(),
      axis.text.y = element_blank(),
      axis.ticks.y = element_blank(),
      legend.title = element_blank(),
      legend.position = 'top',
      legend.margin = margin(t=1),
      legend.text = element_text(size=8)
      
    ) 
  
  print(g)
    
             
}

## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.

## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.

Sales have increased significantly along the year. Barista Espresso, Brewed Chai tea, and Gourmet brewed coffee have similar sales quantities. Hot chocolate sales are significantly lower.

Conclusion

There is a growing trend to sell more. The stores should increase the proportion of products: Barista Espresso,Brewed Chai tea,Hot chocolate,Gourmet brewed coffee.

Coffee Store- Maven Analytics

Romario

2025-07-02