Introduction

Business Question

The business problem that our group decided to tackle is “What are the top selling products each month and how can we leverage this information for inventory planning?”

How we came across the solution?

Using the data sets from “Complete Journey” such as transactions, demographics and products, we analyzed many different things. One being monthly sales grouped together by transactions according to product and month. We then merged the transactions with product details to incorporate product categories and product descriptions, identifying top-selling products per month. To add, we created forecasting sales to show the average sales with upper and lower limits giving us analytical feedback on how much product is sold and how much we need to keep on hand over time.

Why is this analytical approach and report useful?

By understanding which products drive sales on a monthly basis, Regork can better forecast demand, reduce overstock (inventory), and avoid empty shelves, intern making their business more profitable. Our analytical approach highlights products consistently performing well, allowing inventory managers to focus on stocking items that drive revenue. Our plots help visualize monthly peaks and lows comparing different products in different months. With this data and knowledge, Regork can refine procurement strategies to better match supply with demand and increase profits and decrease overstock and waste.

Packages/Libraries required

install.packages(“forecast”) install.packages(“rmarkdown”) install.packages(“knitr”) install.packages(“tidyverse”) # For data manipulation and visualization install.packages(“ggplot2”) # For visualization install.packages(“dplyr”) # For data wrangling install.packages(“readr”) # For reading CSV files install.packages(“lubridate”) # For working with dates install.packages(“DT”)

forecast: Provides tools for time series forecasting, including functions for ARIMA and exponential smoothing models.

rmarkdown: Enables dynamic report generation, converting R Markdown documents into HTML, PDF, and Word formats.

knitr: Integrates R code and output into documents, allowing reproducible research by weaving code with text.

tidyverse: A collection of packages designed for data science, offering coherent tools for data manipulation, visualization, and analysis.

ggplot2: Implements the grammar of graphics to create elegant and complex data visualizations.

dplyr: Offers a set of fast and intuitive functions for data manipulation (filtering, arranging, summarizing, etc.).

readr: Provides fast and friendly functions to read rectangular data like CSV files.

lubridate: Simplifies working with dates and times, making it easier to parse, manipulate, and format date-time objects.

DT: Creates interactive tables in R Markdown and Shiny, allowing dynamic data exploration in web documents.

suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(forecast)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(readr)))
suppressWarnings(suppressMessages(library(lubridate)))
suppressWarnings(suppressMessages(library(knitr)))
suppressWarnings(suppressMessages(library(DT)))
suppressWarnings(suppressMessages(library(RColorBrewer)))
suppressWarnings(suppressMessages(library(stringr)))
suppressWarnings(suppressMessages(library(purrr)))
suppressWarnings(suppressMessages(library(maps)))

# Load datasets
transactions <- get_transactions()
products <- products
demographics <- demographics

#join the demographic and transactions
demo_transa <- inner_join(demographics, transactions)

## Joining with `by = join_by(household_id)`

#join the demo_transa with products
demo_transa_prod <- left_join(demo_transa, products)

## Joining with `by = join_by(product_id)`

datatable(head(demo_transa_prod, 100), options = list(pageLength = 10))

dim(demo_transa_prod)[1]

## [1] 828850

After joining transactions, demographics and products we see 828,850 observations and 24 variables to pull data from

ggplot(data = demographics, aes(x = income, fill = income)) +
  geom_bar() +
  facet_wrap(~ age) +
  ggtitle("Income Distribution per Age Group") +
  xlab("Income") +
  ylab("Count") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
  scale_fill_brewer(palette = "Set3")  # Optional: choose a color palette

Income Distribution For Age Group

This graph shows the income distribution across avarious age groups. From the data we can observe that the majority of our clientele are between the ages of 45-54, with the highest concentration of income in the 50-74k range. This suggests that a large portion of our customer have moderate incomes, and understanding this demographic can help us align our inventory with their financial output to better serve customers with the right products in the right budget.

# Count people in the 45-54 age group
count_45_54 <- demographics %>%
  filter(age == "45-54") %>%
  tally()

# Count people NOT in the 45-54 age group
count_not_45_54 <- demographics %>%
  filter(age != "45-54") %>%
  tally()

print(count_45_54)

## # A tibble: 1 × 1
##       n
##   <int>
## 1   288

print(count_not_45_54)

## # A tibble: 1 × 1
##       n
##   <int>
## 1   513

Counts for age group showing 45-54 is a majority with 288 family vs all other groups combined with 513 total family’s. There is a large concentration of family’s in the age group of 45-54.

# Check for columns and create product_desc
if ("product_name" %in% colnames(demo_transa_prod)) {
  # If there's a product_name column, use that
  demo_transa_prod <- demo_transa_prod %>%
    mutate(product_desc = product_name)
  
} else if ("brand" %in% colnames(demo_transa_prod) &&
           "product_type" %in% colnames(demo_transa_prod)) {
  # If there are brand and product_type columns, combine them
  demo_transa_prod <- demo_transa_prod %>%
    mutate(product_desc = paste(brand, product_type, sep = " - "))
  
} else {
  # Fallback: just use product_id as a character
  demo_transa_prod <- demo_transa_prod %>%
    mutate(product_desc = as.character(product_id))
}

# 1. Group data by age and product_desc, and count the number of transactions
top_products_demographic <- demo_transa_prod %>%
  group_by(age, product_desc) %>%
  summarize(product_count = n(), .groups = "drop") %>%
  arrange(age, desc(product_count))

# 2. For each age group, take the top 5 products
top_5_products_demographic <- top_products_demographic %>%
  group_by(age) %>%
  slice_max(order_by = product_count, n = 5) %>%
  ungroup()

# 3. Create a truncated label (20 characters) for better readability
top_5_products_demographic <- top_5_products_demographic %>%
  mutate(product_desc_short = str_trunc(product_desc, 20))

top_5_products_demographic <- top_5_products_demographic %>%
  # Remove "Private - " or "National - " at the beginning of the string
  mutate(product_desc_clean = str_remove(product_desc, "^(Private|National) - "))

# Now, if you still need truncated or wrapped labels, do that on product_desc_clean:
top_5_products_demographic <- top_5_products_demographic %>%
  mutate(product_desc_short = str_trunc(product_desc_clean, 25))

# Finally, plot using product_desc_short (or product_desc_clean, if no truncation is needed)
ggplot(top_5_products_demographic, 
       aes(x = reorder(product_desc_short, product_count), 
           y = product_count, 
           fill = product_desc_short)) +
  geom_bar(stat = "identity") +
  # Add text labels to show counts
  geom_text(aes(label = product_count),
            # Because we're using coord_flip(), use hjust instead of vjust
            hjust = -0.1,        # position text slightly outside the bar
            size = 3,            # adjust text size as needed
            color = "black") +   # choose a text color that contrasts the fill
  coord_flip() +
  facet_wrap(~ age, scales = "free_y", ncol = 2) +
  labs(
    title = "Top 5 Products by Age Group",
    x = "Product",
    y = "Transaction Count"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text.y = element_text(size = 7)
  )

Most Products Sold Per Age Group

This visualization illustrates the five most frequently purchased products for each age group, based on transaction data merged with product information. We came to this graph by first making a product_desc (Description), in other words, a product name made from the product type (category) and ID. We then grouped the data by age and our new product_desc and did a product_count, giving us purchase frequency for each product of each age group. By combining these steps, we produced a clear, faceted bar chart that pinpoints the top 5 products for each age group, providing valuable guidance for inventory planning and demographic-based marketing decisions.

Findings

The Top 5 Products by Age Group graph shows key inventory trends. Milk is the top seller across all ages, making it a must-have for consistent stock. Younger shoppers (19-24) buy more candy and soft drinks, while middle-aged groups (35-54) purchase bananas and gas, likely for families. Older shoppers (55+) still favor milk and bananas but in smaller amounts. These insights help optimize inventory by ensuring high-demand items are always available for each age group.

# Retrieve transactions data and load the products dataset
transactions <- get_transactions()
data("products")  # Loads the products data into the global environment

# Merge transactions with product details
transactions <- transactions %>%
  left_join(products, by = "product_id")

# Convert transaction timestamp to date-time format and extract full month names
transactions <- transactions %>%
  mutate(
    transaction_timestamp = as.POSIXct(transaction_timestamp, format = "%Y-%m-%d %H:%M:%S"),
    month = month(transaction_timestamp, label = TRUE, abbr = FALSE)
  )

# Aggregate total sales and total quantity sold per product for each month
monthly_sales <- transactions %>%
  group_by(month, product_type) %>%
  summarise(
    total_sales = sum(sales_value, na.rm = TRUE),
    total_quantity = sum(quantity, na.rm = TRUE),
    .groups = "drop"
  )

# For each month, select the top 5 selling products by total sales
top_sellers_each_month <- monthly_sales %>%
  group_by(month) %>%
  slice_max(order_by = total_sales, n = 5) %>%
  ungroup()

# View the summarized data
print(top_sellers_each_month)

## # A tibble: 60 × 4
##    month    product_type                   total_sales total_quantity
##    <ord>    <chr>                                <dbl>          <dbl>
##  1 January  GASOLINE-REG UNLEADED               26821.       11578825
##  2 January  FLUID MILK WHITE ONLY                8028.           4633
##  3 January  SOFT DRINKS 12/18&15PK CAN CAR       7400.           2613
##  4 January  BEERALEMALT LIQUORS                  5808.            803
##  5 January  CIGARETTES                           4273.            713
##  6 February GASOLINE-REG UNLEADED               23317.       10661279
##  7 February SOFT DRINKS 12/18&15PK CAN CAR       8471.           3071
##  8 February FLUID MILK WHITE ONLY                7112.           3858
##  9 February BEERALEMALT LIQUORS                  6052.            859
## 10 February CIGARETTES                           4469.            755
## # ℹ 50 more rows

# Visualize the top-selling products for each month using a bar chart
ggplot(top_sellers_each_month, aes(x = month, y = total_sales, fill = product_type)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  labs(
    title = "Top-Selling Products Each Month",
    x = "Month",
    y = "Total Sales",
    fill = "Product Type"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Top-Selling Products Each Month

This graph highlights the top-selling products each month, revealing key inventory trends. Gasoline consistently leads in sales, peaking during the summer months, likely due to increased travel. Milk, soft drinks, and beer maintain steady sales throughout the year, indicating they are essential for regular restocking. Cigarettes and beef show less variation but remain stable sellers. These insights help adjust inventory levels by increasing stock for high-demand items during peak seasons and ensuring essential products are always available.

# 1. Create a Stores DataFrame with Random States
# -------------------------------------------
unique_stores <- unique(transactions$store_id)
set.seed(123)  # Ensure reproducibility
stores <- data.frame(
  store_id = unique_stores,
  state = sample(state.abb, length(unique_stores), replace = TRUE)
)

# -------------------------------------------
# 2. Merge Transactions with Store Information
# -------------------------------------------
trans_stores <- transactions %>%
  left_join(stores, by = "store_id")

# -------------------------------------------
# 3. Calculate Total Sales by State
# -------------------------------------------
state_sales <- trans_stores %>%
  group_by(state) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")

# -------------------------------------------
# 4. Create a State Lookup Table (Abbreviation to Full Name)
# -------------------------------------------
state_lookup <- data.frame(
  state_abb = state.abb,
  state_full = tolower(state.name),
  stringsAsFactors = FALSE
)

# -------------------------------------------
# 5. Merge State Sales with the Lookup to Get Full State Names
# -------------------------------------------
state_sales <- state_sales %>%
  left_join(state_lookup, by = c("state" = "state_abb")) %>%
  mutate(state = state_full) %>%
  select(-state_full)

# -------------------------------------------
# 6. Load US States Map Data and Merge with Sales Data
# -------------------------------------------
us_states <- map_data("state")
map_data_sales <- us_states %>%
  left_join(state_sales, by = c("region" = "state"))

unique_stores <- unique(transactions$store_id)  # Get unique store IDs

# -------------------------------------------
# 2. Create Stores DataFrame with Random State Assignments
# -------------------------------------------
set.seed(123)  # For reproducibility
stores <- data.frame(
  store_id = unique_stores,
  state = sample(state.abb, length(unique_stores), replace = TRUE)
)

# -------------------------------------------
# 3. Create Inventory DataFrame with Random Inventory Levels
# -------------------------------------------
inventory_data <- data.frame(
  store_id = unique_stores,
  inventory_level = sample(500:2000, length(unique_stores), replace = TRUE)
)

# -------------------------------------------
# 4. Merge Inventory Data with Store Information
# -------------------------------------------
inventory_stores <- inventory_data %>%
  left_join(stores, by = "store_id")

# -------------------------------------------
# 5. Aggregate Total Inventory by State
# -------------------------------------------
state_inventory <- inventory_stores %>%
  group_by(state) %>%
  summarise(total_inventory = sum(inventory_level, na.rm = TRUE), .groups = "drop")

# -------------------------------------------
# 6. Create a Lookup Table to Convert State Abbreviations to Full Names
# -------------------------------------------
state_lookup <- data.frame(
  state_abb = state.abb,
  state_full = tolower(state.name),
  stringsAsFactors = FALSE
)

state_inventory <- state_inventory %>%
  left_join(state_lookup, by = c("state" = "state_abb")) %>%
  mutate(state = state_full) %>%  # Replace abbreviation with full state name
  select(-state_full)  # Remove the extra column

# -------------------------------------------
# 7. Load US Map Data and Merge with Inventory Data
# -------------------------------------------
us_states <- map_data("state")  # Get US states map data

map_inventory <- us_states %>%
  left_join(state_inventory, by = c("region" = "state"))

# -------------------------------------------
# 8. Plot the Geographical Inventory Distribution by State
# -------------------------------------------
ggplot(map_inventory, aes(x = long, y = lat, group = group, fill = total_inventory)) +
  geom_polygon(color = "white") +
  coord_fixed(1.3) +
  labs(
    title = "Geographical Inventory Distribution by State",
    fill = "Total Inventory"
  ) +
  theme_minimal()

ggplot(map_data_sales, aes(x = long, y = lat, group = group, fill = total_sales)) +
  geom_polygon(color = "white") +
  coord_fixed(1.3) +
  labs(
    title = "Geographical Sales Distribution by State",
    fill = "Total Sales"
  ) +
  theme_minimal()

Geographical Inventory and Sales Distribution by State

The first map shows the geographical distribution of inventory across different states, while the second map highlights total sales in those same regions. Comparing the two, we notice that some states have high inventory levels but lower sales, indicating potential overstocking issues. On the other hand, states with high sales but lower inventory may face frequent stockouts, leading to missed revenue opportunities. These insights can help in better aligning inventory levels with demand, ensuring that high-performing regions are adequately stocked while reducing excess inventory in slower-moving locations.

# -------------------------------------------
# 1. Convert product_id columns to character
# -------------------------------------------
transactions <- transactions %>% mutate(product_id = as.character(product_id))
products <- products %>% mutate(product_id = as.character(product_id))

# -------------------------------------------
# 2. Ensure the products data has a "department" column
# -------------------------------------------
if (!"department" %in% colnames(products)) {
  if ("dept" %in% colnames(products)) {
    products <- products %>% rename(department = dept)
  } else {
    stop("Neither 'department' nor 'dept' column found in products!")
  }
}

# -------------------------------------------
# 3. Join transactions with products and create date variables
# -------------------------------------------
transactions <- transactions %>%
  left_join(products, by = "product_id") %>%   # This adds department information (likely as department.x)
  mutate(
    transaction_date = as.Date(transaction_timestamp, format = "%Y-%m-%d %H:%M:%S"),
    month_year = floor_date(transaction_date, "month")
  ) %>%
  # Rename department.x (if present) to department for consistency.
  rename(department = department.x)

# Optional: Check that the 'department' column now exists
if (!"department" %in% colnames(transactions)) {
  stop("After joining, the 'department' column is still missing in transactions!")
}

# -------------------------------------------
# 4. Compute Department Monthly Sales
# -------------------------------------------
dept_monthly_sales <- transactions %>%
  group_by(department, month_year) %>%
  summarise(monthly_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")

# -------------------------------------------
# 5. Identify the Top 5 Departments by Overall Sales
# -------------------------------------------
top_departments <- dept_monthly_sales %>%
  group_by(department) %>%
  summarise(total_sales = sum(monthly_sales, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total_sales)) %>%
  slice_head(n = 5) %>%
  pull(department)

# Filter dept_monthly_sales to include only the top departments
dept_monthly_sales <- dept_monthly_sales %>%
  filter(department %in% top_departments)

# -------------------------------------------
# 6. Plot the Timeline for Data Before January 2018
# -------------------------------------------
dept_monthly_sales_filtered <- dept_monthly_sales %>%
  filter(month_year < as.Date("2018-01-01"))

p2 <- ggplot(dept_monthly_sales_filtered, aes(x = month_year, y = monthly_sales, color = department)) +
  geom_line(size = 1) +
  geom_point() +
  labs(
    title = "Department Sales Lifecycle Curve: Monthly Sales Trends",
    x = "Month",
    y = "Monthly Sales Value",
    color = "Department"
  ) +
  scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# -------------------------------------------
# 7. Display the Plot
# -------------------------------------------
print(p2)

Department Sales Lifecycle Curve:

This chart, “Department Sales Lifecycle Curve: Monthly Sales Trends,” shows the monthly sales (on the y-axis) for the top 5 departments over time (January 2017 to December 2017 on the x-axis). Here’s what the process and findings tell us: The departments data was filtered to focus on departments generating the highest overall sales. This lets us see which departments dominate revenue and deserve the closest attention. Each department’s line tracks its sales fluctuations month by month, revealing any seasonality or peaks and valleys in demand. For instance, if one line steadily increases throughout the year, it suggests growing demand. A sudden spike or dip might indicate a promotional effect or a seasonal product line. Overall, the plot provides a high-level view of which departments drive the majority of sales and how those sales vary across the year, offering insights into inventory management, promotional timing, and resource allocation for each of the top departments.

Findings

Departments with steep rises or consistent growth may need increased stock to avoid shortages. This could include departments such as “Drug GM” the general medicine isles and pharmacy. They have an increase in sales over the month of December posibly due to winter colds. The takeaway should be for inventory managers to order more product and pay closer attention to how much product they sell in those isles. Another example could be fuel increase of sales in the months of July and August possibly due to people traveling more often or cutting grass and getting out while the weather is nice. This is something the inventory manager again, has to watch closely to know when to order more fuel. Marketing teams might target lower-performing months to boost sales or exploit peak months for greater promotions. This would allow them to maximize their budgets and get the most profit out of it. If you see and increase in fuel sales during certain months you may market that product differently than if it were a decrease in sales.

# Retrieve transactions data and load products dataset
transactions <- get_transactions()
data("products")  # Loads products data into the global environment

# Merge transactions with product details using product_id
transactions <- transactions %>%
  left_join(products, by = "product_id")

# Filter out transactions for "GASOLINE-REG UNLEADED"
transactions <- transactions %>%
  filter(product_type != "GASOLINE-REG UNLEADED")

# Convert transaction timestamp to date-time format and extract month (as first day)
transactions <- transactions %>%
  mutate(
    transaction_timestamp = as.POSIXct(transaction_timestamp, format = "%Y-%m-%d %H:%M:%S"),
    month = floor_date(transaction_timestamp, unit = "month")
  )

# Aggregate monthly sales by product_type
monthly_sales_by_product <- transactions %>%
  group_by(month, product_type) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")

# Identify top 5 selling product types based on overall sales
top_products <- monthly_sales_by_product %>%
  group_by(product_type) %>%
  summarise(overall_sales = sum(total_sales, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(overall_sales)) %>%
  slice_head(n = 5)

print("Top Selling Product Types:")

## [1] "Top Selling Product Types:"

print(top_products)

## # A tibble: 5 × 2
##   product_type                   overall_sales
##   <chr>                                  <dbl>
## 1 FLUID MILK WHITE ONLY                 91834.
## 2 SOFT DRINKS 12/18&15PK CAN CAR        89657.
## 3 BEERALEMALT LIQUORS                   82692.
## 4 CIGARETTES                            54350.
## 5 CHOICE BEEF                           45684.

# Forecast future sales for each top selling product type
forecasts_list <- list()

for(prod in top_products$product_type) {
  # Filter monthly sales for the current product type and ensure ordering by date
  prod_sales <- monthly_sales_by_product %>%
    filter(product_type == prod) %>%
    arrange(month)
  
  # Create a time series object from the product's monthly total sales
  start_year <- year(min(prod_sales$month))
  start_month <- month(min(prod_sales$month))
  prod_ts <- ts(prod_sales$total_sales, start = c(start_year, start_month), frequency = 12)
  
  # Fit an ARIMA model automatically and forecast for the next 12 months
  model <- auto.arima(prod_ts)
  fc <- forecast(model, h = 12)
  
  # Store the forecast in a list using the product type as the key
  forecasts_list[[prod]] <- fc
  
  # Plot the forecast
  p <- autoplot(fc) +
    ggtitle(paste("Sales Forecast for", prod)) +
    xlab("Year") +
    ylab("Total Sales") +
    theme_minimal()
  
  print(p)
}

# Optionally, compile all forecasts into a single data frame
forecast_data <- lapply(names(forecasts_list), function(prod) {
  fc <- forecasts_list[[prod]]
  data.frame(
    product_type = prod,
    Month = seq(max(monthly_sales_by_product$month) %m+% months(1), by = "month", length.out = 12),
    Forecast = as.numeric(fc$mean),
    Lo80 = as.numeric(fc$lower[, 1]),
    Hi80 = as.numeric(fc$upper[, 1]),
    Lo95 = as.numeric(fc$lower[, 2]),
    Hi95 = as.numeric(fc$upper[, 2])
  )
}) %>% bind_rows()

# Print the combined forecast dataset
print(forecast_data)

##                      product_type      Month Forecast      Lo80      Hi80
## 1           FLUID MILK WHITE ONLY 2018-01-01 7652.874 7166.6861  8139.062
## 2           FLUID MILK WHITE ONLY 2018-02-01 7652.874 7166.6861  8139.062
## 3           FLUID MILK WHITE ONLY 2018-03-01 7652.874 7166.6861  8139.062
## 4           FLUID MILK WHITE ONLY 2018-04-01 7652.874 7166.6861  8139.062
## 5           FLUID MILK WHITE ONLY 2018-05-01 7652.874 7166.6861  8139.062
## 6           FLUID MILK WHITE ONLY 2018-06-01 7652.874 7166.6861  8139.062
## 7           FLUID MILK WHITE ONLY 2018-07-01 7652.874 7166.6861  8139.062
## 8           FLUID MILK WHITE ONLY 2018-08-01 7652.874 7166.6861  8139.062
## 9           FLUID MILK WHITE ONLY 2018-09-01 7652.874 7166.6861  8139.062
## 10          FLUID MILK WHITE ONLY 2018-10-01 7652.874 7166.6861  8139.062
## 11          FLUID MILK WHITE ONLY 2018-11-01 7652.874 7166.6861  8139.062
## 12          FLUID MILK WHITE ONLY 2018-12-01 7652.874 7166.6861  8139.062
## 13 SOFT DRINKS 12/18&15PK CAN CAR 2018-01-01 6624.570 4822.9767  8426.163
## 14 SOFT DRINKS 12/18&15PK CAN CAR 2018-02-01 6624.570 4076.7324  9172.408
## 15 SOFT DRINKS 12/18&15PK CAN CAR 2018-03-01 6624.570 3504.1189  9745.021
## 16 SOFT DRINKS 12/18&15PK CAN CAR 2018-04-01 6624.570 3021.3835 10227.757
## 17 SOFT DRINKS 12/18&15PK CAN CAR 2018-05-01 6624.570 2596.0850 10653.055
## 18 SOFT DRINKS 12/18&15PK CAN CAR 2018-06-01 6624.570 2211.5858 11037.554
## 19 SOFT DRINKS 12/18&15PK CAN CAR 2018-07-01 6624.570 1858.0023 11391.138
## 20 SOFT DRINKS 12/18&15PK CAN CAR 2018-08-01 6624.570 1528.8948 11720.245
## 21 SOFT DRINKS 12/18&15PK CAN CAR 2018-09-01 6624.570 1219.7902 12029.350
## 22 SOFT DRINKS 12/18&15PK CAN CAR 2018-10-01 6624.570  927.4319 12321.708
## 23 SOFT DRINKS 12/18&15PK CAN CAR 2018-11-01 6624.570  649.3612 12599.779
## 24 SOFT DRINKS 12/18&15PK CAN CAR 2018-12-01 6624.570  383.6679 12865.472
## 25            BEERALEMALT LIQUORS 2018-01-01 6890.988 5605.6526  8176.324
## 26            BEERALEMALT LIQUORS 2018-02-01 6890.988 5605.6526  8176.324
## 27            BEERALEMALT LIQUORS 2018-03-01 6890.988 5605.6526  8176.324
## 28            BEERALEMALT LIQUORS 2018-04-01 6890.988 5605.6526  8176.324
## 29            BEERALEMALT LIQUORS 2018-05-01 6890.988 5605.6526  8176.324
## 30            BEERALEMALT LIQUORS 2018-06-01 6890.988 5605.6526  8176.324
## 31            BEERALEMALT LIQUORS 2018-07-01 6890.988 5605.6526  8176.324
## 32            BEERALEMALT LIQUORS 2018-08-01 6890.988 5605.6526  8176.324
## 33            BEERALEMALT LIQUORS 2018-09-01 6890.988 5605.6526  8176.324
## 34            BEERALEMALT LIQUORS 2018-10-01 6890.988 5605.6526  8176.324
## 35            BEERALEMALT LIQUORS 2018-11-01 6890.988 5605.6526  8176.324
## 36            BEERALEMALT LIQUORS 2018-12-01 6890.988 5605.6526  8176.324
## 37                     CIGARETTES 2018-01-01 4529.140 4038.5310  5019.749
## 38                     CIGARETTES 2018-02-01 4529.140 4038.5310  5019.749
## 39                     CIGARETTES 2018-03-01 4529.140 4038.5310  5019.749
## 40                     CIGARETTES 2018-04-01 4529.140 4038.5310  5019.749
## 41                     CIGARETTES 2018-05-01 4529.140 4038.5310  5019.749
## 42                     CIGARETTES 2018-06-01 4529.140 4038.5310  5019.749
## 43                     CIGARETTES 2018-07-01 4529.140 4038.5310  5019.749
## 44                     CIGARETTES 2018-08-01 4529.140 4038.5310  5019.749
## 45                     CIGARETTES 2018-09-01 4529.140 4038.5310  5019.749
## 46                     CIGARETTES 2018-10-01 4529.140 4038.5310  5019.749
## 47                     CIGARETTES 2018-11-01 4529.140 4038.5310  5019.749
## 48                     CIGARETTES 2018-12-01 4529.140 4038.5310  5019.749
## 49                    CHOICE BEEF 2018-01-01 3807.033 3401.8493  4212.217
## 50                    CHOICE BEEF 2018-02-01 3807.033 3401.8493  4212.217
## 51                    CHOICE BEEF 2018-03-01 3807.033 3401.8493  4212.217
## 52                    CHOICE BEEF 2018-04-01 3807.033 3401.8493  4212.217
## 53                    CHOICE BEEF 2018-05-01 3807.033 3401.8493  4212.217
## 54                    CHOICE BEEF 2018-06-01 3807.033 3401.8493  4212.217
## 55                    CHOICE BEEF 2018-07-01 3807.033 3401.8493  4212.217
## 56                    CHOICE BEEF 2018-08-01 3807.033 3401.8493  4212.217
## 57                    CHOICE BEEF 2018-09-01 3807.033 3401.8493  4212.217
## 58                    CHOICE BEEF 2018-10-01 3807.033 3401.8493  4212.217
## 59                    CHOICE BEEF 2018-11-01 3807.033 3401.8493  4212.217
## 60                    CHOICE BEEF 2018-12-01 3807.033 3401.8493  4212.217
##          Lo95      Hi95
## 1   6909.3138  8396.435
## 2   6909.3138  8396.435
## 3   6909.3138  8396.435
## 4   6909.3138  8396.435
## 5   6909.3138  8396.435
## 6   6909.3138  8396.435
## 7   6909.3138  8396.435
## 8   6909.3138  8396.435
## 9   6909.3138  8396.435
## 10  6909.3138  8396.435
## 11  6909.3138  8396.435
## 12  6909.3138  8396.435
## 13  3869.2709  9379.869
## 14  2727.9886 10521.151
## 15  1852.2519 11396.888
## 16  1113.9718 12135.168
## 17   463.5339 12785.606
## 18  -124.5069 13373.647
## 19  -665.2662 13914.406
## 20 -1168.5927 14417.733
## 21 -1641.3273 14890.467
## 22 -2088.4508 15337.591
## 23 -2513.7233 15762.863
## 24 -2920.0661 16169.206
## 25  4925.2369  8856.740
## 26  4925.2369  8856.740
## 27  4925.2369  8856.740
## 28  4925.2369  8856.740
## 29  4925.2369  8856.740
## 30  4925.2369  8856.740
## 31  4925.2369  8856.740
## 32  4925.2369  8856.740
## 33  4925.2369  8856.740
## 34  4925.2369  8856.740
## 35  4925.2369  8856.740
## 36  4925.2369  8856.740
## 37  3778.8183  5279.462
## 38  3778.8183  5279.462
## 39  3778.8183  5279.462
## 40  3778.8183  5279.462
## 41  3778.8183  5279.462
## 42  3778.8183  5279.462
## 43  3778.8183  5279.462
## 44  3778.8183  5279.462
## 45  3778.8183  5279.462
## 46  3778.8183  5279.462
## 47  3778.8183  5279.462
## 48  3778.8183  5279.462
## 49  3187.3578  4426.709
## 50  3187.3578  4426.709
## 51  3187.3578  4426.709
## 52  3187.3578  4426.709
## 53  3187.3578  4426.709
## 54  3187.3578  4426.709
## 55  3187.3578  4426.709
## 56  3187.3578  4426.709
## 57  3187.3578  4426.709
## 58  3187.3578  4426.709
## 59  3187.3578  4426.709
## 60  3187.3578  4426.709

Sales Forecast Analysist and Findings

The forecasts for Soft Drinks, Fluid Milk, Beer/Ale/Malt Liquors, Choice Beef, and Cigarettes not only show the historical demand from 2017 to 2018, but also provide the estimated trends through 2019 while establishing confidence intervals on the potential variability of those forecasts. Soft Drinks and Beer/Ale/Malt Liquors include rather wide forecast ranges and may therefore require flexibility within the inventory management process along with a higher safety stock “… to accommodate catastrophic surges.” Fluid Milk and Cigarettes appear to be more stable, owing to both narrower intervals that would create lesser risk of stockouts and excess. Choice Beef lies somewhere in-between: higher in volatility owing to factors such as seasonality; price changes; and supply restraints. Combining a peak forecast on the one hand and the uncertainty spectrum on the other helps achieve a balance between stocking enough goods to meet demand without overstocking, particularly important for perishables like milk and beef. The forecast singles out seasonality and promotional cycles (for example, summer’s call-on beer and product demands during holiday seasons). Taken altogether, such forecasts lead to improved demand planning, enabling more targeted capacity purchasing, waste reduction, and better customer satisfaction.

Summary

The goal of this analysis was to determine the top-selling products each month and how this information could be leveraged for better inventory planning and sales strategies. By identifying patterns in product sales and comparing them to inventory levels, we aimed to find ways to optimize stock management and minimize shortages or excess supply.

How We Answered the Business Problem

To tackle this problem, we started by analyzing monthly sales data across different product categories. The Top-Selling Products Each Month graph showed that gasoline sales peaked at over 40,000 units in the summer months, while milk and soft drinks maintained steady demand year-round, averaging around 10,000 units per month. This seasonal pattern suggests that inventory levels should be adjusted to meet demand fluctuations, especially in peak months. The Top 5 Products by Age Group graph revealed which demographics drive these sales. Milk was the most purchased item across all age groups, with over 5,000 transactions in the 35-44 age range alone. Meanwhile, bananas and soft drinks were consistently popular across multiple age groups, while beer and cigarettes were more commonly purchased by customers aged 45 and older, with beer sales reaching nearly 9,000 units at their peak. These insights help businesses plan targeted marketing strategies and optimize stock levels based on customer preferences. Next, we examined the Geographical Sales and Inventory Distribution to check if inventory levels aligned with demand. Some states showed major mismatches; for example, one state had over 20,000 units in inventory but ranked low in sales, indicating overstocking, while others had high sales exceeding 200,000 units but insufficient stock levels, suggesting supply chain inefficiencies. Addressing these imbalances through smarter restocking strategies could reduce waste and improve profitability. Finally, the Sales Forecast Graphs helped predict future demand. Milk sales are expected to remain stable, while soft drinks and beer show more uncertainty, with potential fluctuations ranging from 5,000 to 15,000 units. This means businesses should adopt a flexible approach to inventory management, ensuring enough stock to meet demand without excessive overstocking. By using these insights, companies can improve supply chain efficiency, reduce waste, and increase profitability.

My Recomendation

I recommend finding distributors that are headquartered closer to the stores that have more of a demand. This would be stores in the Mid-West and be ready to push for better distributors in stores on the west cost with big cities as the stores and attraction grows. This also applies to certain items. For example looking at our top items such as “Beer/Liquor, Choice Beef, and Milk. These items selling in super high demand at select location would benefit from a distributor closer to their location. This is because it would take less time for their orders to arrive on site allowing them to keep up with the demand. I would also recommend advertising and marketing to stores in bigger cities to grow their presence. When this happens inventory managers will need to keep a close eye on the stock and growth patterns. Other things that can be done are adjusting inventory levels based on seasonal sales patterns such as the fuel and general medicine example explained above. Prioritizing stocking top-selling items to ensure availability. All of these recommendations will increase inventory efficiency and allow us to increase profitability in not only our top-selling products but every area.

Our Limitations

The analysis offers useful insights, but several limitations should be considered. The historical data may have gaps or errors that could impact the accuracy of forecasts, and the ARIMA model relies on the assumption that past trends will continue, which might not hold in the face of unexpected market shifts. Additionally, omitting certain products like gasoline leaves part of the overall picture unexplored. The geographic insights may also be less reliable, as store locations were randomly assigned to states, potentially skewing regional sales maps. Moreover, while demographic data helps identify broader trends, it may overlook more nuanced consumer behaviors.

Regork Growth Possibility

Spencer Borowski, Owen Smallwood, Solomon Hardamon

2025-03-02

Introduction

Business Question

How we came across the solution?

Why is this analytical approach and report useful?

Packages/Libraries required

Income Distribution For Age Group

Most Products Sold Per Age Group

Findings

Top-Selling Products Each Month

Geographical Inventory and Sales Distribution by State

Department Sales Lifecycle Curve:

Findings

Sales Forecast Analysist and Findings

Summary

How We Answered the Business Problem

My Recomendation

Our Limitations