Roses, Chocolates & Data: Valentine’s Day Sales Insights

Overview

Business Problem

Valentine’s Day is one of the most significant retail holidays, driving substantial consumer spending on gifts, food, and beverages. To capitalize on this seasonal demand, it is essential to analyze different purchasing patterns across different product categories and demographic groups to maximze profit and optimize inventory management. By examining sales trends, price variations and different consumer segments, this analysis identifies key shopping behaviors that can inform decisions regarding inventory management, promotional strategy and pricing models.

Data & Methodology

The analysis utilized transactions data from the completejourney package, including products and demographics data for further analysis. The dataset was cleaned and filtered to focus on five Valentine’s-related product categories: Alcohol, Chocolates, Flowers, Cards & Gifts, and Frozen Meals. Various metrics—total sales, quantity sold, and average price per unit—were analyzed over time. Additionally, spending behavior was segmented by income and age groups to identify key consumer trends. Data visualizations were created to highlight patterns and variations.

Proposed Solution & Business Impact

This analysis provides actionable insights that can help the Regork CEO optimize marketing, pricing, and inventory management decisions for future Valentine’s Day promotions. Key proposed solution includes increased inventory for Chocolates, Flowers and Alcohol during peak demand days (February 10-14) and increased prices on high demand items such as Flowers and Alcohol closer to February 14th to maximize revenue. Early promotions, Bundled offers to increase basket size and demographic specific marketing can also be used to increase profits for Regork.

Packages Required

  • completejourney: Data sets characterizing household level transactions
  • tidyverse: System of packages for data manipulation, exploration, and visualization
  • dplyr: A part of the tidyverse, used for efficient data manipulation, including filtering, summarizing, and joining datasets.
  • ggplot2: A powerful visualization package in tidyverse that follows the Grammar of Graphics approach for creating customizable plots.
  • lubridate: Simplifies working with dates and times in R, making parsing, manipulation, and arithmetic operations easier.
  • knitr: Dynamically generate well-formatted reports by converting R code, tables, and plots into HTML, PDF, or Word documents.
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(lubridate)
library(knitr)

Data Preparation

For my data prepartion, I began by identifying key product categories associated with Valentine’s Day, including Flowers, Chocolates, Alcohol, Cards & Gifts, and Frozen Meals. These items were selected based on their traditional significance in Valentine’s Day celebrations, ranging from romantic gifts to frozen meals for at-home dining. Once the categories were established, I filtered transaction data to focus on the weeks leading up to Valentine’s Day (February 1-14, 2017), as well as a broader year-long timeframe (January to December 2017) to observe seasonal patterns.

With the key items identified, I prepared the data in R by merging the transactions dataset with product and demographic information to gain insight into purchasing behavior across different consumer segments. To ensure data integrity, missing values in sales values and product categories were removed, and transactions were grouped based on the Valentine’s category they belonged to. I also developed custom functions for data visualization to analyze key metrics, including total sales, quantity sold, and average price per unit. These functions allowed for an in-depth examination of purchasing trends over time, highlighting peak shopping periods and variations across different product categories while reducing code reptition. By structuring the dataset this way, I was able to explore which categories saw the highest sales spikes, when consumers started making purchases, and how shopping behaviors varied across demographics.

library(dplyr)
library(ggplot2)
library(lubridate)
library(completejourney)

# Data Cleaning & Preparation

transactions <- get_transactions()

# Ensure dates are in proper format
transactions <- transactions %>%
  mutate(transaction_timestamp = as.Date(transaction_timestamp))

# Merge transactions with product and demographic data
transactions_cleaned <- transactions %>%
  left_join(products, by = "product_id") %>%
  left_join(demographics, by = "household_id")

transactions_cleaned <- transactions_cleaned %>%
  rename(
    income_bracket = income,
    age_group = age
  ) # Rename necessary columns for clarity

transactions_cleaned <- transactions_cleaned %>%
  filter(!is.na(sales_value), !is.na(product_category)) # Remove missing values in key columns

valentines_groups <- data.frame(
  product_category = c(
    # Flowers
    "FLORAL-FRESH CUT", "FLORAL BALLOONS", "FLORAL-FLOWERING PLANTS", "FLORAL-ACCESSORIES", "FLORAL-FOLIAGE PLANTS", "ROSES",
    # Chocolates
    "CANDY - PACKAGED", "CANDY - CHECKLANE", "CANDY BARS (SINGLES)(INCLUDING)", 
    "SEASONAL CANDY BAGS-CHOCOLATE", "SEASONAL CANDY BAGS NON-CHOCOL", 
    "SEASONAL CANDY BOX-CHOCOLATE", "CANDY BAGS-CHOCOCLATE", "CANDY BOXED CHOCOLATES",
    "GUM (PACKAGED)", "MINTS CANDY & BREATH (NOT LIF)", "NOVELTY CANDY", "CHEWING GUM", 
    "CANDY BAGS-CHOCOCLATE W/FLOUR", "CANDY BARS MULTI PACK W/FLOUR",
    # Alcohol
    "BEERS/ALES", "DOMESTIC WINE", "IMPORTED WINE", "MISC WINE", "LIQUOR",
    # Cards & Gifts
    "GREETING CARDS/WRAP/PARTY SPLY", "VALENTINE",
    # Frozen Meals
    "FRZN MEAT/MEAT DINNERS"
  ),
  valentines_group = c(
    rep("Flowers", 6), rep("Chocolates", 14), rep("Alcohol", 5),
    rep("Cards & Gifts", 2), rep("Frozen Meals", 1)
  )
) # Valentines Grouping

# Merge transactions with Valentine's categories
transactions_cleaned <- transactions_cleaned %>%
  left_join(valentines_groups, by = "product_category")

transactions_cleaned <- transactions_cleaned %>%
  filter(!is.na(valentines_group))

## Functions

#Plot 1

plot_valentines_trend <- function(metric) {
  
  # Ensure valid 
  valid_metrics <- c("total_sales", "total_quantity", "avg_price")
  if (!(metric %in% valid_metrics)) {
    stop("Invalid metric. Choose from 'total_sales', 'total_quantity', or 'avg_price'.")
  }
  
  # Filter transactions for February 1-14
  transactions_feb <- transactions_cleaned %>%
    filter(transaction_timestamp >= as.Date("2017-02-01") & transaction_timestamp <= as.Date("2017-02-14"))
  
  # Aggregate data 
  trend_data <- transactions_feb %>%
    group_by(transaction_timestamp, valentines_group) %>%
    summarise(
      value = case_when(
        metric == "total_sales" ~ sum(sales_value, na.rm = TRUE),
        metric == "total_quantity" ~ sum(quantity, na.rm = TRUE),
        metric == "avg_price" ~ sum(sales_value, na.rm = TRUE) / sum(quantity, na.rm = TRUE)
      ),
      .groups = "drop"
    )
  
  # y-axis label
  y_label <- case_when(
    metric == "total_sales" ~ "Total Sales ($)",
    metric == "total_quantity" ~ "Total Quantity Bought",
    metric == "avg_price" ~ "Average Price per Unit ($)"
  )
  
  # line chart
  ggplot(trend_data, aes(x = transaction_timestamp, y = value, color = valentines_group, group = valentines_group)) +
    geom_line(size = 1.2) +  
    geom_point(size = 3) +  
    scale_color_manual(values = c("#D72638", "#3F88C5", "#F49D37", "#140F2D", "#9A031E")) + 
    scale_x_date(date_breaks = "1 day", date_labels = "%b %d", expand = c(0.02, 0.02)) +  
    labs(
      title = paste(y_label, "Trends (Feb 1-14)"),
      x = "Date",
      y = y_label,
      color = "Product Category"
    ) +
    theme_minimal(base_size = 14) +
    theme(
      axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
      axis.text.y = element_text(size = 12),
      plot.title = element_text(size = 18, face = "bold"),
      legend.title = element_text(size = 12, face = "bold"),
      legend.text = element_text(size = 10)
    )
}

#Plot 2

plot_faceted_trend <- function(metric) {
  
  # Ensure metric is valid
  valid_metrics <- c("total_sales", "total_quantity", "avg_price")
  if (!(metric %in% valid_metrics)) {
    stop("Invalid metric. Choose from 'total_sales', 'total_quantity', or 'avg_price'.")
  }
  
  # Filter transactions for February 1-14
  transactions_feb <- transactions_cleaned %>%
    filter(transaction_timestamp >= as.Date("2017-02-01") & transaction_timestamp <= as.Date("2017-02-14"))
  
  # Aggregate data
  trend_data <- transactions_feb %>%
    group_by(transaction_timestamp, valentines_group) %>%
    summarise(
      value = case_when(
        metric == "total_sales" ~ sum(sales_value, na.rm = TRUE),
        metric == "total_quantity" ~ sum(quantity, na.rm = TRUE),
        metric == "avg_price" ~ sum(sales_value, na.rm = TRUE) / sum(quantity, na.rm = TRUE)
      ),
      .groups = "drop"
    )
  
  #  y-axis label
  y_label <- case_when(
    metric == "total_sales" ~ "Total Sales ($)",
    metric == "total_quantity" ~ "Total Quantity Bought",
    metric == "avg_price" ~ "Average Price per Unit ($)"
  )
  
  # Generate faceted line chart
  ggplot(trend_data, aes(x = transaction_timestamp, y = value, group = valentines_group, color = valentines_group)) +
    geom_line(size = 1.2) +  
    geom_point(size = 3) +  
    scale_color_manual(values = c("#D72638", "#3F88C5", "#F49D37", "#140F2D", "#9A031E")) + 
    scale_x_date(date_breaks = "3 days", date_labels = "%b %d", expand = c(0.02, 0.02)) +  
    labs(
      title = paste(y_label, "Trends (Feb 1-14) Across Categories"),
      x = "Date",
      y = y_label,
      color = "Product Category"
    ) +
    theme_minimal(base_size = 14) +  
    theme(
      axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
      axis.text.y = element_text(size = 12),
      plot.title = element_text(size = 18, face = "bold"),
      plot.subtitle = element_text(size = 14, face = "italic", color = "#666666"),
      legend.position = "none",  
      strip.text = element_text(size = 14, face = "bold")  
    ) +
    facet_wrap(~valentines_group, scales = "free_y")  
}

#Plot 3

plot_category_trend <- function(category_name, metric) {
  
  # Ensure metric is valid
  valid_metrics <- c("total_sales", "total_quantity", "avg_price")
  if (!(metric %in% valid_metrics)) {
    stop("Invalid metric. Choose from 'total_sales', 'total_quantity', or 'avg_price'.")
  }
  
  # Filter data for the selected category and date range
  category_data <- transactions_cleaned %>%
    filter(valentines_group == category_name,
           transaction_timestamp >= as.Date("2017-01-01") & transaction_timestamp <= as.Date("2017-02-28")) %>%
    group_by(transaction_timestamp) %>%
    summarise(
      value = case_when(
        metric == "total_sales" ~ sum(sales_value, na.rm = TRUE),
        metric == "total_quantity" ~ sum(quantity, na.rm = TRUE),
        metric == "avg_price" ~ sum(sales_value, na.rm = TRUE) / sum(quantity, na.rm = TRUE)
      ),
      .groups = "drop"
    )
  
  # y-axis label
  y_label <- case_when(
    metric == "total_sales" ~ "Total Sales ($)",
    metric == "total_quantity" ~ "Total Quantity Bought",
    metric == "avg_price" ~ "Average Price per Unit ($)"
  )
  
  # Plot area chart
  ggplot(category_data, aes(x = transaction_timestamp, y = value)) +
    geom_area(fill = "#FFC0CB", alpha = 0.8) +
    scale_x_date(date_labels = "%b %d", date_breaks = "1 week") +
    labs(
      title = paste(y_label, "Trends (Jan-Feb):", category_name),
      subtitle = paste("Tracking", y_label, "from Jan 1 to Feb 28"),
      x = "Date",
      y = y_label
    ) +
    theme_minimal(base_size = 14) +  
    theme(
      axis.text.x = element_text(angle = 45, hjust = 1, size = 10, face = "bold"),  
      axis.text.y = element_text(size = 12),  
      plot.title = element_text(size = 18, face = "bold"),  
      plot.subtitle = element_text(size = 14, face = "italic", color = "#666666")  
    )
}

#plot 4

plot_category_trend_yearly <- function(category_name, metric) {
  
  valid_metrics <- c("total_sales", "total_quantity", "avg_price")
  if (!(metric %in% valid_metrics)) {
    stop("Invalid metric. Choose from 'total_sales', 'total_quantity', or 'avg_price'.")
  }
  
  category_data <- transactions_cleaned %>%
    filter(valentines_group == category_name,
           transaction_timestamp >= as.Date("2017-01-01") & 
           transaction_timestamp <= as.Date("2017-12-31")) %>%
    group_by(transaction_timestamp) %>%
    summarise(
      value = case_when(
        metric == "total_sales" ~ sum(sales_value, na.rm = TRUE),
        metric == "total_quantity" ~ sum(quantity, na.rm = TRUE),
        metric == "avg_price" ~ sum(sales_value, na.rm = TRUE) / sum(quantity, na.rm = TRUE)
      ),
      .groups = "drop"
    ) %>%
    complete(transaction_timestamp = seq.Date(from = as.Date("2017-01-01"), 
                                              to = as.Date("2017-12-31"), 
                                              by = "day"), 
             fill = list(value = 0))  


  y_label <- case_when(
    metric == "total_sales" ~ "Total Sales ($)",
    metric == "total_quantity" ~ "Total Quantity Bought",
    metric == "avg_price" ~ "Average Price per Unit ($)"
  )

  ggplot(category_data, aes(x = transaction_timestamp, y = value)) +
    geom_area(fill = "#FFC0CB", alpha = 0.8) +
    scale_x_date(
      date_labels = "%b %d", 
      breaks = seq(as.Date("2017-01-01"), as.Date("2017-12-31"), by = "1 month"), 
      limits = c(as.Date("2017-01-01"), as.Date("2017-12-31"))  
    ) +
    labs(
      title = paste(y_label, "Trends (Jan-Dec):", category_name),
      subtitle = paste("Tracking", y_label, "throughout the year"),
      x = "Month",
      y = y_label
    ) +
    theme_minimal(base_size = 14) +  
    theme(
      axis.text.x = element_text(angle = 45, hjust = 1, size = 10, face = "bold"),  
      axis.text.y = element_text(size = 12),  
      plot.title = element_text(size = 18, face = "bold"),  
      plot.subtitle = element_text(size = 14, face = "italic", color = "#666666")  
    )
}

Exploratory Data Analysis

In this section, we analyze Valentine’s Day sales trends by examining consumer behavior and demographic purchasing patterns.

Early vs Last Minute Summary

library(knitr)

# Create data frame
data <- data.frame(
  valentines_group = c("Flowers", "Chocolates & Candy", "Alcohol", "Greeting Cards & Gifts", "Frozen Meat Dinners",
                       "Flowers", "Chocolates & Candy", "Alcohol", "Greeting Cards & Gifts", "Frozen Meat Dinners"),
  total_sales_value = c(806.30, 2476.34, 5529.81, 1512.77, 3323.46, 
                        1059.06, 957.51, 649.07, 959.58, 526.51),
  shopper_type = c("Early (Feb 1–12)", "Early (Feb 1–12)", "Early (Feb 1–12)", "Early (Feb 1–12)", "Early (Feb 1–12)",
                   "Last-Minute (Feb 13–14)", "Last-Minute (Feb 13–14)", "Last-Minute (Feb 13–14)", 
                   "Last-Minute (Feb 13–14)", "Last-Minute (Feb 13–14)")
)

# Display table
kable(data, caption = "Valentine’s Day Sales Summary by Shopper Type")
Valentine’s Day Sales Summary by Shopper Type
valentines_group total_sales_value shopper_type
Flowers 806.30 Early (Feb 1–12)
Chocolates & Candy 2476.34 Early (Feb 1–12)
Alcohol 5529.81 Early (Feb 1–12)
Greeting Cards & Gifts 1512.77 Early (Feb 1–12)
Frozen Meat Dinners 3323.46 Early (Feb 1–12)
Flowers 1059.06 Last-Minute (Feb 13–14)
Chocolates & Candy 957.51 Last-Minute (Feb 13–14)
Alcohol 649.07 Last-Minute (Feb 13–14)
Greeting Cards & Gifts 959.58 Last-Minute (Feb 13–14)
Frozen Meat Dinners 526.51 Last-Minute (Feb 13–14)

Summary

Problem Statement

The objective of this analysis was to understand consumer purchasing behavior surrounding Valentine’s Day by examining sales trends, quantity sold, price fluctuations, and demographic spending patterns across key product categories. The goal was to identify actionable insights that could help Regork Grocery Chain optimize its inventory, pricing strategies, and marketing efforts to maximize revenue and enhance customer experience.

Methodology & Data Used

The analysis utilized transactions data from completejourney package, including demographics and products data. The dataset was cleaned and filtered to focus on five Valentine’s-related product categories: Alcohol, Chocolates, Flowers, Cards & Gifts, and Frozen Meals. Various metrics—total sales, quantity sold, and average price per unit—were analyzed over time. Additionally, spending behavior was segmented by income and age groups to identify key consumer trends. Data visualizations were created to highlight patterns and variations.

Key Insights:

  • Sales across all categories peak on February 14, with Flowers and Chocolates experiencing the sharpest surge.
  • Alcohol and Chocolates show multiple peaks before Valentine’s Day, suggesting pre-celebration purchases and gift planning.
  • Flower prices increase significantly closer to February 14, indicating dynamic pricing strategies or supply-demand effects

Demographic Trends:

  • Higher-income groups spend more on Alcohol and Flowers, while middle-income shoppers contribute significantly to Chocolates and Gifts.
  • Younger consumers (19-24) show higher spending on Alcohol and Frozen Meals, suggesting a preference for at-home celebrations or gatherings.
  • Middle-aged groups (35-54) lead in purchasing Chocolates and Gifts, making them a key target for promotional campaigns.

Implications & Proposal to the Regork CEO:

  • Inventory levels should be adjusted based on demand spikes, with early stock-ups on Alcohol and Chocolates in early February and high availability of Flowers on February 13-14.
  • Premium pricing for Flowers and Alcohol in the final days before Valentine’s Day can maximize revenue from last-minute buyers.
  • Early discounts on Alcohol and Chocolates can attract pre-Valentine’s shoppers.
  • Cross-category bundling (e.g., Chocolates + Alcohol, Flowers + Greeting Cards) can encourage larger purchases.
  • Focus on younger consumers for Alcohol promotions and middle-aged consumers for Chocolates and Gifts.

Limitations & Areas for Improvement:

  • The analysis focuses on a single year (2017), and consumer behavior may vary in different years due to economic conditions, competitive factors, or changing preferences.
  • Factors like competitor promotions, advertising efforts, and weather conditions are not accounted for but may significantly impact purchasing behavior.
  • Data had missing values for demographic data for the transactions meaning that real scenario could be different