Time Series and Sales Forecasting

Author

Richmond Silvanus Baye

Published

April 11, 2025

Time Series Analysis

In this blog post, I will walk you through a detailed time series analysis of e-commerce sales data. Understanding historical sales trends is essential for any business looking to make informed decisions about inventory, marketing, and customer engagement. By leveraging daily transaction data, we can uncover seasonality, detect long-term trends, and forecast future revenue with greater confidence.

The goal here is to build forecasting models that not only capture past patterns but also provide actionable insights for business planning. We’ll explore both traditional statistical methods like ARIMA and ETS, to model and predict sales dynamics.

About the Data

We will use the Amazon E commerce Click-stream Transaction Data for this purpose.

The data contains the following

  • UserID: Identifier for the user.

  • SessionID: Identifier for the user’s session.

  • Timestamp: The time at which the event occurred.

  • EventType: The type of event (e.g., page view, product view, add to cart).

  • ProductID: Identifier for the product involved in the event.

  • Amount: The monetary amount associated with the event.

  • Outcome: The outcome of the event (e.g., success, failure).

Lets begin by loading the packages and later loading the data. For this purpose, we will used pacman to load all the packages.

Code
pacman::p_load(tidyverse, ggplot2, lubridate, gt, gtsummary, caret, tsibble, feasts,
               xgboost, randomForest, pROC, prophet, forecast, timetk, tidyr)

Descriptive Statistics

We need to have a glimpse of the data by providing a detailed summary statistics of the variables in the dataset.

Code
# Load the dataset
data <- read.csv('ecommerce_clickstream_transactions 3.csv')
data <- data %>% mutate(Timestamp = ymd_hms(Timestamp))
summary(data)
     UserID         SessionID       Timestamp                  
 Min.   :   1.0   Min.   : 1.00   Min.   :2024-01-01 00:01:35  
 1st Qu.: 251.0   1st Qu.: 3.00   1st Qu.:2024-02-21 04:42:25  
 Median : 501.0   Median : 6.00   Median :2024-04-13 00:51:23  
 Mean   : 500.7   Mean   : 5.51   Mean   :2024-04-12 21:17:37  
 3rd Qu.: 751.0   3rd Qu.: 8.00   3rd Qu.:2024-06-03 07:31:54  
 Max.   :1000.0   Max.   :10.00   Max.   :2024-07-24 10:13:04  
                                                               
  EventType          ProductID             Amount          Outcome         
 Length:74817       Length:74817       Min.   :  5.132   Length:74817      
 Class :character   Class :character   1st Qu.:130.934   Class :character  
 Mode  :character   Mode  :character   Median :253.113   Mode  :character  
                                       Mean   :253.190                     
                                       3rd Qu.:378.832                     
                                       Max.   :499.982                     
                                       NA's   :64135                       

Sales Forecasting

Having done that, we can simply generate the daily sales by summarizing the transaction data by filtering the dataset to include only rows where there is monetary transaction. We then convert the time-stamp to adjust the date component for us to group the sales at the daily level. After which we group by calendar day and then calculate the total sales amount for each day by summing all transactions values that occurred on that date.

Code
daily_sales <- data %>% 
  filter(!is.na(Amount)) %>% 
  mutate(date = as.Date(Timestamp)) %>% 
  group_by(date) %>% 
  summarise(sales = sum(Amount)) %>% 
  complete(date = seq.Date(min(date), max(date), by = "day"), fill = list(sales = 0)) %>% 
  arrange(date)

ts_data <- ts(daily_sales$sales, frequency = 10)

Time Series

Let’s begin by visualizing the daily sales over time.

Code
autoplot(ts_data) + 
  ggtitle("Daily Sales Time Series") + 
  ylab("Amount")

The daily sales times series shows fluctuations in daily sales amounts over the observed 30-day window. Sales exhibit high short-term variability with frequent peaks and troughs, suggesting volatile daily purchasing behavior. While there is no strong visual trend upward or downward, the data reveal occasional sales surges exceeding $17,500 and dips below $10,000. This variability may point to promotional effects, weekday/weekend patterns, or shifting consumer activity, warranting further decomposition and model-based analysis to uncover underlying structure and seasonality.

Decompose Time Series

Code
decomp <- stl(ts_data, s.window = "periodic")
autoplot(decomp)

To better understand the underlying structure of daily sales, we applied Seasonal-Trend Decomposition using Loess (STL). The decomposition breaks down the time series into four interpretable components:

  • Observed Data: Daily sales show strong short-term volatility, consistent with consumer-level transaction data.

  • Trend Component: Sales remained relatively stable over the 30-day period, with minor fluctuations and a potential softening near the end.

  • Seasonal Component: A clear, repeating weekly pattern suggests cyclic consumer behavior, likely driven by weekday vs. weekend dynamics.

  • Remainder (Residual): Unexplained variation is small and randomly distributed, indicating the model captures the structure well.

This decomposition confirms the presence of strong seasonality and mild trend.

ARIMA Model

With the data’s structure understood, we now turn to forecasting. ARIMA (AutoRegressive Integrated Moving Average) is particularly effective when data exhibits temporal dependence and stochastic trends. To account for the identified weekly pattern, we use seasonal ARIMA with automated parameter selection via auto.arima().

Code
arima_model <- auto.arima(ts_data)
arima_forecast <- forecast(arima_model, h = 30)
autoplot(arima_forecast) + ggtitle("ARIMA Forecast")

The ARIMA model was fitted to the historical daily sales data using auto.arima(), which automatically selected the best-fitting seasonal and non-seasonal parameters. The resulting forecast projects sales for the next 30 days, shown in the shaded blue region.

  • The dark blue line represents the predicted daily sales.

  • The shaded bands indicate the forecast uncertainty, with the inner and outer bands capturing 80% and 95% confidence intervals respectively.

  • The forecast appears relatively stable, reflecting the stationarity captured after accounting for trend and seasonality.

The model successfully smooths out short-term noise while preserving the overall level observed in the historical data.

Forecast with ETS

The Exponential Smoothing State Space Model (ETS) provides an alternative to ARIMA by directly modeling trend and seasonality as separate components (Error, Trend, Seasonality). It is particularly effective when the series exhibits regular seasonal patterns—as observed in our earlier decomposition.

  • The model selected was an ETS(A,A,A) configuration (additive error, trend, and seasonality).

  • Like ARIMA, the ETS forecast for the next 30 days stays within a relatively stable band but with slightly more responsiveness to recent patterns.

  • Confidence intervals are also shown, gradually widening to reflect forecast uncertainty.

Code
ets_model <- ets(ts_data)
ets_forecast <- forecast(ets_model, h = 30)
autoplot(ets_forecast) + ggtitle("ETS Forecast")

Forecast Accuracy

Code
accuracy(arima_forecast)
                    ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.2158205 2215.429 1752.366 -3.057184 14.11246 0.7019937
                     ACF1
Training set -0.001717544
Code
accuracy(ets_forecast)
                    ME     RMSE      MAE       MPE     MAPE      MASE
Training set 0.5330732 2229.006 1761.505 -3.084051 14.17162 0.7056546
                   ACF1
Training set -0.1084056

To evaluate model performance, we compared ARIMA and ETS forecasts using training set accuracy metrics. Both models demonstrated strong performance, with very similar error rates.

ARIMA produced slightly lower values for both the Root Mean Squared Error (RMSE = 2214.09) and Mean Absolute Error (MAE = 1750.68), compared to ETS (RMSE = 2229.01, MAE = 1761.51). These differences are marginal, suggesting both models fit the historical data well.

In terms of bias, ARIMA had a near-zero mean error (-1.03), while ETS slightly overestimated (ME = 0.53), though the discrepancy is negligible. The Mean Absolute Percentage Error (MAPE) for both models hovered around 14%, indicating moderately accurate predictions relative to actual sales values.

Finally, residual diagnostics show minimal autocorrelation in both models, with ARIMA displaying near-white noise behavior (ACF1 ≈ 0), and ETS exhibiting a small negative lag-1 autocorrelation (-0.11).

Conclusion

Overall, ARIMA holds a slight edge in accuracy and residual behavior on the training set. However, the difference is minor, and a final decision between the models should ideally be based on out-of-sample performance.