Matt Burden
October 5, 2024

Regork Ice Cream Analysis

Introduction

Do you love ice cream? I certainly love ice cream. This dessert is a common household item in the US. Most fond memories involve ice cream. It is at every birthday party and celebratory event. Ice cream is known to complement warm weather as Hot Chocolate (or known as Hot Cocoa) complements cold weather. Maybe I am a statistical outlier as I will eat ice cream year-round. In this analysis I take a deep dive into ice cream sales in 2017 to answer the question of, does weather temperature affect ice cream sales? This analysis will better understand how outside factors such as temperature can affect sales of products.

In my analysis I look at weather trends in the US compared to our sales of ice cream and hot cocoa mixes. Using data from the National Weather Service I showcase daily temperatures from various cities across the US and correlate the temperature to our sales.

Packages Required

completejourney- Data set characterizing household transactions over one year.

ggplot2- Visualization of data package.

dplyr- Combines data frames and allows for filtering.

tidyverse- A group of statistical packages to ease data analysis. This package is needed in almost every analysis computed on R Studios.

library(completejourney)
library(ggplot2)
library(dplyr)
library(tidyverse)

Data Preparation

My analysis includes data of transactions, demographics, and products within “completejourney”. I also found the daily temperature in 2017 for Cincinnati, New York, Seattle, San Antonio, and Tampa when the completejourney data was compiled. These locations are geographically spaced out across the continental US (48 states).

I filtered the product_category for “Ice Cream” and used the associated product_id from the transactions data frame to find when these products were purchased. I then plugged this data into the demographics data frame to find who was buying ice cream. I could then find out which demographic was buying the most ice cream and when.

The temperature data was not raw and I could not find the raw versions to upload into R Studios. The graphs were auto generated from the National Weather Services website (cited below graphs and in references).

library(completejourney)
library(ggplot2)
library(dplyr)
library(tidyverse)

transactions <- get_transactions()
demographics <- demographics
products <- products

Ice_Cream_Sales_per_Month <- products %>%
inner_join(transactions, by = "product_id") %>%
filter(str_detect(product_category, regex("ICE CREAM", ignore_case = TRUE))) %>%
group_by(month = month(transaction_timestamp, label = TRUE)) %>%
summarize(total_sales = sum(sales_value))

knitr::kable(Ice_Cream_Sales_per_Month, format= "markdown")
month total_sales
Jan 3899.99
Feb 3313.17
Mar 3977.91
Apr 3634.69
May 3946.00
Jun 4344.59
Jul 4874.61
Aug 4153.73
Sep 4028.95
Oct 3724.57
Nov 3424.42
Dec 3553.88
library(completejourney)
library(ggplot2)
library(dplyr)
library(tidyverse)

transactions <- get_transactions()
demographics <- demographics
products <- products
Ice_Cream_Sales_by_Age <- products %>%
filter(str_detect(product_category, regex("(ICE CREAM)"))) %>%
inner_join(transactions, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
group_by(age) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))

knitr::kable(Ice_Cream_Sales_by_Age, format= "markdown")
age total_sales
19-24 1046.35
25-34 3693.99
35-44 7376.07
45-54 10355.26
55-64 1889.73
65+ 1915.93
library(completejourney)
library(ggplot2)
library(dplyr)
library(tidyverse)

transactions <- get_transactions()
demographics <- demographics
products <- products

Ice_Cream_Sales_per_Income_and_Number_of_Kids <- products %>%
filter(str_detect(product_category, regex("ICE CREAM", ignore_case = TRUE))) %>%
inner_join(transactions, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
group_by(kids_count, income) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE))

knitr::kable(Ice_Cream_Sales_per_Income_and_Number_of_Kids, format= "markdown")
kids_count income total_sales
0 Under 15K 895.87
0 15-24K 1475.84
0 25-34K 1726.94
0 35-49K 3104.13
0 50-74K 4357.26
0 75-99K 1564.20
0 100-124K 780.08
0 125-149K 747.88
0 150-174K 773.91
0 175-199K 181.61
0 200-249K 29.19
0 250K+ 437.81
1 Under 15K 267.79
1 15-24K 307.66
1 25-34K 410.32
1 35-49K 1205.68
1 50-74K 656.30
1 75-99K 909.07
1 100-124K 169.58
1 125-149K 460.91
1 150-174K 515.11
1 175-199K 68.79
1 200-249K 79.54
1 250K+ 121.39
2 Under 15K 171.57
2 15-24K 173.55
2 25-34K 179.49
2 35-49K 385.84
2 50-74K 793.59
2 75-99K 156.84
2 100-124K 115.23
2 125-149K 118.05
2 150-174K 63.47
3+ Under 15K 93.71
3+ 15-24K 340.18
3+ 25-34K 291.33
3+ 35-49K 407.63
3+ 50-74K 217.29
3+ 75-99K 482.94
3+ 100-124K 161.28
3+ 125-149K 535.63
3+ 150-174K 231.33
3+ 175-199K 25.01
3+ 200-249K 14.97
3+ 250K+ 71.54
library(completejourney)
library(ggplot2)
library(dplyr)
library(tidyverse)

transactions <- get_transactions()
demographics <- demographics
products <- products

Cocoa_Sales_Per_Month <- products %>%
inner_join(transactions, by = "product_id") %>%
filter(str_detect(product_category, regex("COCOA MIXES", ignore_case = TRUE))) %>%
group_by(month = month(transaction_timestamp, label = TRUE)) %>%
summarize(total_sales = sum(sales_value))

knitr::kable(Cocoa_Sales_Per_Month, format= "markdown")
month total_sales
Jan 529.26
Feb 438.49
Mar 414.49
Apr 322.15
May 351.00
Jun 315.95
Jul 272.09
Aug 297.04
Sep 378.60
Oct 546.31
Nov 564.75
Dec 637.97

Exploratory Data Analysis

transactions <- get_transactions()
demographics <- demographics
products <- products

products %>%
inner_join(transactions, by = "product_id") %>%
filter(str_detect(product_category, regex("ICE CREAM", ignore_case = TRUE))) %>%
group_by(month = month(transaction_timestamp, label = TRUE)) %>%
summarize(total_sales = sum(sales_value)) %>%

ggplot(aes(x = month, y = total_sales,color = month, fill = month)) +
geom_col() +
scale_x_discrete("Month") +
scale_y_continuous("Ice Cream Sales", label = scales::dollar) +
ggtitle(label = "Ice Cream Sales per Month", subtitle = "July has the most ice cream sales")

The data shown above shows a distinct curve in ice cream sales. There is two notable outliers (Jan and Mar).

products %>%
filter(str_detect(product_category, regex("(ICE CREAM)"))) %>%
inner_join(transactions, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
group_by(age) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%

ggplot(aes(x = factor(age), y = total_sales, color = factor(age), fill = factor(age))) +
geom_col() +
scale_y_continuous("Total Sales of Ice Cream", labels = scales::dollar) +
scale_x_discrete("Age") +
ggtitle(label = "Ice Cream Sales by Age", subtitle = "Middle aged customers buy the most ice cream")

The bulk of consumers buying ice cream are middle aged between the ages of 35 and 54.

products %>%
filter(str_detect(product_category, regex("ICE CREAM", ignore_case = TRUE))) %>%
inner_join(transactions, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
group_by(kids_count, income) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%

ggplot(aes(x = kids_count, y = total_sales, color = kids_count, fill = kids_count)) +
geom_col() +
facet_wrap(~income) +
scale_x_discrete("Number of Kids") +
scale_y_continuous("Total Ice Cream Sales", label = scales::dollar) +
ggtitle(label = "Ice Cream Sales per Income and Number of Kids", subtitle = "Households with 50-74K and no kids spend the most on ice cream")

Households without kids leads every income range in regards to ice cream purchases except one, income ranging from 200-249k.

transactions <- get_transactions()
demographics <- demographics
products <- products

products %>%
inner_join(transactions, by = "product_id") %>%
filter(str_detect(product_category, regex("COCOA MIXES", ignore_case = TRUE))) %>%
group_by(month = month(transaction_timestamp, label = TRUE)) %>%
summarize(total_sales = sum(sales_value)) %>%

ggplot(aes(x = month, y = total_sales,color = month, fill = month)) +
geom_col() +
scale_x_discrete("Month") +
scale_y_continuous("Cocoa Mix Sales", label = scales::dollar) +
ggtitle(label = "Cocoa Mix Sales per Month", subtitle = "December has the most cocoa sales")

This graph is almost the opposite results from Ice Cream Sales. These sales numbers also correlate to the temperatures.

Cincinnati Temperatures in 2017 Source: National Weather Service (https://www.weather.gov/wrh/Climate?wfo=iln)

After mid October, the temperature was not recorded for the Cincinnati area by the National Weather Service.

New York Temperatures in 2017 Source: National Weather Service (https://www.weather.gov/wrh/Climate?wfo=okx)

New York, NY best represents ice cream sales by Regork. This graph shows a couple peaks in temperature in late winter which is also shown in ice cream sales.

San Antonio Temperatures in 2017 Source: National Weather Service (https://www.weather.gov/wrh/Climate?wfo=ewx)

Seattle Temperatures in 2017 Source: National Weather Service (https://www.weather.gov/wrh/Climate?wfo=sew)

Tampa Temperatures in 2017 Source: National Weather Service (https://www.weather.gov/wrh/Climate?wfo=tbw)

All of the temperature graphs show a hump in temperature for the month of July. Some areas might not be as drastic as others.

Summary

After reviewing the data, I have determined that the temperature does affect the sales of ice cream. This can be proven by also looking at hot cocoa mix sales. Note that the weather across the country is not all the same. I also found that most ice cream is being purchased by customers who are 35-54 with no kids and make between 35-74K a year.

My proposed solution to increase ice cream sales during colder months is to add additional marketing to areas of the country where temperature change is not as drastic. With a moderate temperature year-round ice cream sales should be relatively constant. Adding additional marketing would help boost the sales.

I also propose to add seasonal flavors. The fall could feature flavors like apple, pumpkin, and spice. Spring could feature marshmallow and confetti flavors. Finally, the winter could feature flavors like gingerbread, eggnog, and peppermint. Making these flavors more available in the season and less during off season would create a desire for the product without consumers being burnt out on the flavors. The main consumer of ice cream is middle aged customers so unique flavors that remind them of holidays and family could be beneficial.

Note that local mom and pop ice cream parlors usually close from Labor Day to Memorial Day. This would allow us to tap into a revenue stream that may be diverted during peak warm weather.

As a last resort, we now know that ice cream is not selling the best in colder months so this will give us the opportunity to possibly invest in hot cocoa mixes more heavily as we know these sales drastically increase during colder months. This is not the ideal proposed solution, but it gives us flexibility if the above options fail.

In closing, the temperature does impact our ice cream sales. This has been shown and reinforced with the sales of hot cocoa mixes.

Limitations

If the data from the National Weather Service was raw this could be used to pick out the key data (temperature) and leave out unwanted information about record highs/lows in the area. I think another table within completejourney with data about locations of the stores would allow temperature data to be more accurate.

Refrences

Boehmke, B. (2019, September 28). The Complete Journey User Guide. https://cran.r-project.org/web/packages/completejourney/vignettes/completejourney.html

US Department of Commerce, N. (2024, July 30). Climate. National Weather Service. https://www.weather.gov/wrh/climate

Cream, M. I. (2020, December 11). 4 tips for increasing wholesale ice cream sales - my/mochiTM blog. My/MochiTM Ice Cream. https://www.mymochi.com/blog/4-tips-increasing-wholesale-ice-cream-sales/

Moncel, B. (2024, June 20). What is ice cream and how is it made?. The Spruce Eats. https://www.thespruceeats.com/what-is-ice-cream-1328784