Abstract

This project applies classical time series methods to monthly U.S. ice cream sales spanning January 2004–December 2019. I investigated key features including trend, seasonality, and structural changes; transformed and differenced the data to achieve stationarity; and used ACF/PACF analysis to select and fit candidate ARIMA models. To determine the best fit model, I utilized residual diagnostics, and the final model is used to generate out‐of‐sample forecasts with confidence intervals. Our results demonstrate the ability of the chosen model to capture seasonal fluctuations and trend dynamics, yielding accurate forecasts and insights into production patterns over time.

Through various techniques such as Box-Cox transformations, seasonal differencing, and ACF/PACF residual diagnostics, we were able to identify a SARIMA(2,1,1) x \((0,1,1)_{12}\) model. The final model delivered statistically significant AR and MA coefficients, exhibited white‐noise residuals (Ljung–Box p > 0.05), and produced narrow 95% forecast intervals that closely tracked the original data. We conclude that the chosen SARIMA model successfully captures both trend and seasonal patterns, meeting our project objectives of interpretable modeling and reliable short-term forecasting.

Introduction

In this project, I investigate ice-cream sales over time using a dataset I found through Kaggle (Monthly Ice Cream Sales Data (1972-2020)). As summer is right around the corner, and the days are getting hotter, the demand for ice cream will inevitably rise – including my demand for it. That’s why I have gained interest and chose to analyze this fun yet interesting dataset. The central goal of this project is to perform a time series analysis and forecasting of ice cream sales for an ambiguous city in the U.S. So, can we build and implement a model to forecast future ice cream sales using past data? To answer this question, I will be following this pipeline for classical time series analysis: data processing, visualization, transformation, model identification via ACF/PACF, model fitting, and forecasting.

Data Processing

Data Loading

library(readr)
library(tidyverse)
library(lubridate)
library(forecast)
library(tseries)
library(zoo)
library(ggplot2)
library(MASS)
library(tseries)

data <- read_csv("C:/Users/jaych/Downloads/archive (4)/ice_cream.csv", col_types = cols(DATE = col_date(format = "%Y-%m-%d"), IPN31152N = col_double()))

names(data)

## [1] "DATE"      "IPN31152N"

Condensing the Data

ice <- data %>%
  filter(DATE >= as.Date("2004-01-01") & DATE <= as.Date("2019-12-01"))

nrow(ice)

## [1] 192

The original dataset contained 577 observations – a little too much for my liking. So, I narrowed down the dataset into a smaller one, one with 192 observations, making this an ideal number for a time series analysis.

Splitting the Data (Training and Test Sets)

n_total   <- nrow(ice)
ice_train <- ice[1:(n_total - 24), ]
ice_test  <- ice[(n_total - 23):n_total, ]

Here, I am splitting the “ice” dataset into a training set (for modeling) and a test set (for validation). The training set will be from 2004-2017, while the test set will be from 2018-2019.

Creating Time Series Objects

ice_ts <- ts(ice$IPN31152N, start = c(year(ice$DATE[1]), month(ice$DATE[1])), frequency = 12)

train_ts <- ts(ice_train$IPN31152N, start = c(year(ice_train$DATE[1]), month(ice_train$DATE[1])), frequency = 12)

test_ts <- ts(ice_test$IPN31152N, start = c(year(min(ice_test$DATE)), month(min(ice_test$DATE))), frequency = 12)

I have made separate time series objects from each set. Now, we can move on to visualizing the time series.

Visualization

First, we will start with an initial analysis of base time series plots.

plot(train_ts, main = "Monthly Ice Cream Sales", ylab = "Units of Ice Cream", xlab = "Year", col = "black")

trend <- lm(as.numeric(train_ts) ~ time(train_ts))
abline(trend, col="red")
abline(h=mean(train_ts), col="blue")

ice_decomp <- decompose(train_ts)

# Plotting each component separately
par(mfrow = c(4, 1), mar = c(2, 4, 2, 2)) 

plot(ice_decomp$x, type = "l", col = "darkred", main = "Observed", ylab = "Production")

plot(ice_decomp$trend, type = "l", col = "darkblue", main = "Trend", ylab = "Trend")

plot(ice_decomp$seasonal, type = "l", col = "darkgreen", main = "Seasonal", ylab = "Seasonality")

plot(ice_decomp$random, type = "l", col = "gold", main = "Random", ylab = "Residual")

After plotting our basic time-series plot, we have now decomposed our dataset and isolated separate plots for each significant component to study further:

Trend:
- The fitted trend confirms a clear downward drift over 2004–2018, with both seasonal highs and lows gradually moving lower. Early in the period, trend values hover around 145–150, declining to roughly 95–100 by 2018.
Seasonality:
- The seasonal component shows a strong annual cycle: each year’s pattern is nearly identical in shape, with positive values in late spring/summer (peak time for ice cream) and negative offsets in winter. This sheds light on the need for a seasonal differencing or SARIMA seasonal term.

From this, my next step is to apply any necessary Box-Cox and/or differencing transformations to achieve stationariy before moving on to ACF/PACF analysis.

Transformations

Upon our initial analysis, we can clearly observe that the heavy seasonality, negative trend, and slight variance change makes our data unsuitable for our main goal of forecasting.

Box Cox Transformation

The purpose of this Box-Cox transformation is to stabilize variance across our data set. To begin, we need to find the appropriate lambda level to apply to the transformation.

bcTransform <- boxcox(as.numeric(train_ts) ~ as.numeric(1:length(train_ts)))

lambda = bcTransform$x[which(bcTransform$y == max(bcTransform$y))]

ice_bc = (1/lambda)*(train_ts^lambda-1)

lambda

## [1] 0.4646465

With our Box-Cox transformation set and our lambda level of 0.46464, we now have to determine whether or not this transformation positively affects our data. We can do this by observing new time series graph as well as a histogram, making it easy to observe the spread.

par <- par(mfrow = c(1,2))
ts.plot(train_ts, main="Original Data", ylab = expression(X[t]), col="blue")
ts.plot(ice_bc, main="Box-Cox Transformed Data", ylab = expression(Y[t]), col="red")

trend <- lm(ice_bc ~ time(ice_bc))
abline(trend, col="green")
abline(h=mean(ice_bc), col="blue")

With our transformed data looking like it has a more stable variance, we can move on to check if it helped our normality.

par = par(mfrow = c(1,2))
hist(train_ts, main= "Original Data", ylab=expression(X[t]), col="blue")
hist(ice_bc, main="Box-Cox Transformed Data", ylab=expression(Y[t]), col="red")

Comparing the transformed data histogram to the original, we can see that the spread is more normal. So, we have successfully transformed the data – so far. After this variance stabilization, the series still has a downward trend, so a first differencing should help remove the linear trend.

Differencing

ice_diff = diff(ice_bc, 1)

par = par(mfrow = c(1,2))
acf(ice_diff,lag.max = 60,main = "")
pacf(ice_diff,lag.max = 60,main = "")
title("ACF & PACF: First Differencing", line = -1, outer=TRUE)

Here, we can spikes at lag 1 and lag 12 of the PACF, indicating seasonality, so the first differencing didn’t remove the annual cycle. Looking at our ACF, it is apparent that it displays a strong seasonal cycle still present in the differenced data. Clearly, stationarity has not yet been reached, so we will now difference at lag 12 to remove seasonal components.

ice_diff2 = diff(ice_diff,12)

par = par(mfrow = c(1,2))
acf(ice_diff2,lag.max = 60,main = "")
pacf(ice_diff2,lag.max = 60,main = "")
title("ACF & PACF: Seasonal (Second) Differencing", line = -1, outer=TRUE)

Now, we have a much better understanding of the ACF and PACF patterns to help create our model. With our trend and seasonal components removed from our data, we can decompose and display all components once again to see how our data has changed.

diff_decomp <- decompose(ice_diff2)

par(mfrow = c(4, 1), mar = c(2, 4, 2, 2))

# Plot each component
plot(diff_decomp$x, type = "l", col = "darkred", main = "Observed", ylab = "Production")
plot(diff_decomp$trend, type = "l", col = "darkblue", main = "Trend", ylab = "Trend")
plot(diff_decomp$seasonal, type = "l", col = "darkgreen", main = "Seasonal", ylab = "Seasonality")
plot(diff_decomp$random, type = "l", col = "gold", main = "Random", ylab = "Residual")

Here, we can clearly see that the trend has been almost completely removed, and the severity of the seasonality has decreased compared to the original decomposition. We can take this a step further by plotting the transformed time series data and comparing it to White Noise – an ideal representation of what our transformed data should resemble.

ts.plot(ice_diff2)
title("Transformed Data")

trend <- lm(ice_diff2 ~ time(ice_diff2))
abline(trend, col="red")
abline(h=mean(ice_diff2) , col='blue')

Clearly, we can see that the model and our data is much more “random”, resembling very little to no seasonality, and resembling white noise closely. Additionally, compared to the original time series plot, we can see that this transformed one is completely different, all indicating that it is stationary.

Preliminary Model Identification

Now, to preliminary identify the model, we will be using and plotting ACF and PACF and analyzing them.

par = par(mfrow = c(1,2))
acf(ice_diff2,lag.max = 60,main = "")
pacf(ice_diff2,lag.max = 60,main = "")
title("ACF & PACF: Non-seasonal", line = -1, outer=TRUE)

Modeling with Seasonality (P, D, Q):

We used one seasonal differencing, so D = 1 at lag=12
The ACF shows peaks at lag 1 but none after that. So, a good choice for MA could be Q = 1
The PACF shows a significant peak at lag 1 and 2. So, a good choice for AR could be P = 1 or 2.

Modeling without Seasonality (p, d, q):

One first-differencing indicates that d = 1.
The ACF cuts off significantly at lag 1, with all other lags being within the boundaries. So, a good choice for MA could be q = 1 or 0.
The PACF shows significant negative spike at lag 1 and positive spike at lag 2, so a good choice for AR could be p = 2.

So, we now have a few potential models we can choose from:

SARIMA(2,1,1) x \((1,1,1)_s\) = 12
SARIMA(2,1,1) x \((2,1,1)_s\) = 12
SARIMA(2,1,0) x \((1,1,1)_s\) = 12

Fitting the Models

Model 1

model1 =  Arima(ice_bc, order=c(2,1,1), seasonal = list(order = c(1,1,1), period = 12), method="ML")
model1

## Series: ice_bc 
## ARIMA(2,1,1)(1,1,1)[12] 
## 
## Coefficients:
##           ar1      ar2     ma1    sar1     sma1
##       -0.8095  -0.3079  0.5907  0.2072  -0.8552
## s.e.   0.1843   0.0786  0.1839  0.1249   0.1153
## 
## sigma^2 = 0.1227:  log likelihood = -60.65
## AIC=133.3   AICc=133.87   BIC=151.56

From this, most terms are highly significant (|t|>2), except the seasonal AR(1) term sar1, which has a t-ratio value of around 1.7. So, I can try dropping P=1.The expression, however, is this:

\((1+0.8095B+0.3079B^2)(1-0.2072B^{12})(1-B)(1-B^{12})X_t = (1+0.5907B)(1-0.8552B^{12})a_t\)

model1.2 = Arima(ice_bc, order=c(2,1,1), seasonal = list(order = c(0,1,1), period = 12), method="ML")
model1.2

## Series: ice_bc 
## ARIMA(2,1,1)(0,1,1)[12] 
## 
## Coefficients:
##           ar1      ar2     ma1     sma1
##       -0.7811  -0.3193  0.5631  -0.7261
## s.e.   0.1805   0.0780  0.1809   0.0803
## 
## sigma^2 = 0.1262:  log likelihood = -62.06
## AIC=134.13   AICc=134.53   BIC=149.34

Preferably, we would like to see the AIC get lowered. However, all estimated coefficients are significant still, and this model is simpler than before. We can represent this with:

\((1+0.7811B+0.3193B^2)Y_t = (1+0.5631B)(1-0.7261B^{12})a_t\)

Model 2

model2 = Arima(ice_bc, order=c(2,1,1), seasonal = list(order = c(2,1,1), period = 12), method="ML")
model2

## Series: ice_bc 
## ARIMA(2,1,1)(2,1,1)[12] 
## 
## Coefficients:
##           ar1      ar2     ma1    sar1     sar2     sma1
##       -0.8307  -0.3112  0.6083  0.1609  -0.0817  -0.7905
## s.e.   0.1831   0.0787  0.1813  0.1421   0.1151   0.1389
## 
## sigma^2 = 0.1241:  log likelihood = -60.41
## AIC=134.82   AICc=135.58   BIC=156.12

Here, all estimated coeffcients are significant, except for two seasonal AR terms sar1 and sar2. So, we will try dropping them to make the model simpler. If we dropped both sar1 and sar2, we would have the same model as the first.

Model 3

model3 = Arima(ice_bc, order=c(2,1,0), seasonal = list(order = c(1,1,1), period = 12), method="ML")
model3

## Series: ice_bc 
## ARIMA(2,1,0)(1,1,1)[12] 
## 
## Coefficients:
##           ar1      ar2    sar1     sma1
##       -0.2499  -0.1729  0.2005  -0.8433
## s.e.   0.0825   0.0819  0.1261   0.1142
## 
## sigma^2 = 0.1256:  log likelihood = -62.62
## AIC=135.24   AICc=135.64   BIC=150.45

Here, we can see that the AIC and AICc hasn’t lowered and has actually increased compared to the last model. Also, the sar1 is again, not significant, suggesting it can be dropped.

Final Decision

Based on our findings, we can choose the simplest model with the lowest AIC/AICc – hence, we will choose the modified model 1 as our “best” model (model1.2). Yes, there are still a lot of factors, but since it has the lowest AIC out of our options, we will stay with this. Additionally, this model is favored by the ACF/PACF and by AICc, making it ideal.

\((1+0.7811B+0.3193B^2)Y_t = (1+0.5631B)(1-0.7261B^{12})a_t\)

Diagnostic Checking

checkresiduals(model1.2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,1)(0,1,1)[12]
## Q* = 28.455, df = 20, p-value = 0.09906
## 
## Model df: 4.   Total lags used: 24

# Extracting residuals from final model
resid_final <- residuals(model1.2)

qqnorm(resid_final,
       main = "Normal Q–Q Plot")
qqline(resid_final)

pacf(resid_final, lag.max=40,main="")
title("PACF of the Residuals of Final Model")

Here, we have implemented a variety of diagnostic checks. First, through our Ljung-Box test, we can see that our p-value is greater than 0.5, which indicates that our residual data is normal and random. Next, we have a time series plot of the residuals from our chosen model, and we can see that there is no trend or obvious seasonality. Furthermore, looking at the ACF and PACF graphs, we can see that barring very marginal borderline cases, the values at all lags are within the confidence interval and the residuals can be seen as white noise. Finally, from the histogram and qqplot, we can also see that there’s no apparent skewness or outliers, and it we can confrim that it is Gaussian and roughly normally distributed – all of which are good signs that our model is satisfactory. Now, we can move forward with forecasting.

Forecasting

Now, we have come to answering our main objective: predicting future ice cream sales (from January 2018 to December 2019). We will utilize our model, and then compare our forecasted values with the true values to evaluate our accuracy.

pred.ts <- predict(model1.2, n.ahead = 24)

# Confidence interval
upper <- pred.ts$pred + 2*pred.ts$se
lower <- pred.ts$pred - 2*pred.ts$se

ts.plot(ice_bc,
        xlim = c(2004, 2021),
        ylim = c(min(ice_bc), max(ice_bc)),
        col  = "black",
        main = "Original Data w/ Forecast")
lines(upper, col = "blue", lty = "dashed")
lines(lower, col = "blue", lty = "dashed")
points(pred.ts$pred, col = "red")

fc <- forecast(model1.2, h = 24, level = 95)

plot(fc, main = "Orginal Data w/ Forecast (Method 2)", xlab = "Year", ylab = "Transformed Index", flty = 2
)

I have implemented two different methods of graphing the forecasted values with the original data, allowing for two different ways of viewing the data. We can see that the predicted values are contained within the confidence intervals of the forecasting. However, to compare the predicted values with the true values, we need to revert them back to the scale from before the box-cox transformation.

orig_pred <- ((pred.ts$pred)*(lambda)+1)^(1/lambda)

up= ((upper)*(lambda)+1)^(1/lambda)
low= ((lower)*(lambda)+1)^(1/lambda)

ts.plot(ice_ts, main = "Original Data (Pre-Box Transform)", xlim = c(2004,2021),ylim = c(min(ice_ts), max(ice_ts)), col="black")
lines(up, col="blue", lty="dashed")
lines(low, col="blue", lty="dashed")
points(orig_pred, col="red")

Here, we can see that the true values are also within the confidence interval of the forecasting, and the predicted values are overlaying the true values. Therefore, we have successfully addressed our objective: through our model, we are successfully able to forecast future ice cream sales.

Conclusion

The objective of this project was clear: to build an accurate forecasting model for monthly ice cream sales. The steps included: exploring/processing the data (identifying trends, seasonality, etc.), transforming and differencing the series to achieve stationarity, identifying candidate models through ACF/PACF, validating model selection through residual diagnostics, and finally, forecasting.

All of these goals were met. A Box–Cox transform stabilized the variance, first and seasonal differencing removed trend and cycle, and ACF/PACF of the resulting stationary series pointed to a non‐seasonal AR(2) & MA(1) and a seasonal MA(1). Comparing models by AICc and diagnostic checks confirmed that the most adequate model is SARIMA(2,1,1) x \((0,1,1)_{12}\).

References

Lecture, labs, and notes from Professor Feldman, PSTAT 174W
Kaggle - Ice cream dataset: https://www.kaggle.com/datasets/abdocan/monthly-ice-cream-sales-data-1972-2020

Forecasting Monthly Ice Cream Sales: A Time Series Analysis

Jay Chang

2025-06-14