1. Project Overview

Identifying and understanding past trends in sales transactions can be used for defining a better marketing strategy. Forecasting retail sales are essential for managing retail stores from supplying goods and resources to business development prospects. The provided data set contains weekly purchased quantities of 800 products over 52 weeks.

Since we only have sales transaction data, our first assumption is that we will analyze and forecast the transactions only based on the historical transactions. The amount of each transaction and other business factors are not considered.

In this project, original data is used. Because all the methods used in this project will not require normalization; moreover, the normalized data may have lower representation of the characteristics of the original data. More importantly, it is difficult to interpret the forecasting results with normalized data.


df_raw <- read.csv("Sales_Transactions_Dataset_Weekly-1.csv", header = TRUE)
df <- data.frame(df_raw[, 2:53], row.names = df_raw$Product_Code)

#Check whether there is missing value.
sum(is.na(df))
## [1] 0

2. Product Segmentation

Since it is hard and unreasonable to analyze data of all 800 projects, the first step is to perform product segmentation. We would like to classify the products into groups based on the similarity of transactions over 52 weeks. K-means clustering method is used for this problem. In order to determine the number of clusters, an elbow plot is created as below.


k.max <- 15
wss <- sapply(1:k.max, function(k){kmeans(df, k, nstart = 25, iter.max = 15)$tot.withinss})

plot(1:k.max, wss, type = "b", pch = 19, main = "Elblow Plot of K-Means Clustering", ylab = "Total Within-cluster Sum of Squares", xlab = "Number of Clusters")


set.seed(101)
k3 <- kmeans(df, centers = 3, nstart = 25)
k3$size
## [1] 197 124 490
k3$betweenss/k3$totss
## [1] 0.8863056

#Clusters visualization
fviz_cluster(k3,geom = "point", df)

According to the elbow plot above, the elbow point is 3. Saying that, it is appropriate to classify the products into three groups.There are 490, 124 and 197 products in each group, respectively. The cluster plot shows that a clearly separated three clusters, and the BSS/TSS ratio of 88.63% also indicates this clustering model is a good fit for the data.

Specifically, there are 490 products classified into group 1. The average weekly transaction is 1.66. As shown in the histogram, there are lots of products whose weekly transactions are less than 1. While the transactions of other products seem to follow a normal distribution. Thus, we decide to separate the initial group 1 into two parts: group 0 with average weekly transactions smaller than 2, and a new group 1 containing the rest products.


group1 <- data.frame(t(df[k3$cluster == 3,]))
summary(sapply(group1, mean))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01923 0.19712 0.73077 1.66150 3.25000 6.23077
hist(sapply(group1, mean), main = "Histogram of Product Group 1", xlab = "Number of Transactions")


group0 <- group1[,sapply(group1, mean) <= 2]
group1 <- group1[,sapply(group1, mean) > 2]

group2 <- data.frame(t(df[k3$cluster == 1,]))
group3 <- data.frame(t(df[k3$cluster == 2,]))

#Create pie chart
slices <- c(ncol(group0), ncol(group1), ncol(group2), ncol(group3))
lbls <- c("Group 0:", "Group 1:", "Group 2:", "Group 3:")
lbls <- paste(lbls, percent(slices/sum(slices)))
pie(slices,labels = lbls, col=rainbow(length(lbls)),
   main="Pie Chart of Product Segmentation")

As a result, 37.2% products belong to Group 0, 23.2% to Group 1, 24.3% to Group 2, and 15.3% to Group 3.


#Create box plot
group0_mean <- sapply(group0, mean)
group1_mean <- sapply(group1, mean)
group2_mean <- sapply(group2, mean)
group3_mean <- sapply(group3, mean)
boxplot(group0_mean, group1_mean, group2_mean, group3_mean, main = "Box-Plot of Product Segmentation", 
        names = c("Group 0", "Group 1", "Group 2", "Group 3"))


lapply(list(group0_mean, group1_mean, group2_mean, group3_mean), summary)
## [[1]]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01923 0.11538 0.25000 0.40525 0.51442 1.98077 
## 
## [[2]]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.096   2.942   3.712   3.680   4.231   6.231 
## 
## [[3]]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.288   9.077   9.904  11.506  12.558  22.173 
## 
## [[4]]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.10   32.17   33.04   33.36   34.97   42.69

According to the summary results, it reveals that products are clustered into four separately groups. More specifically, products in Group 0 has the lowest number of weekly transactions; and Group 3 has the largest number of transactions.

3. Analysis and Forecast

3.1 Gruop 0

3.1.1 Group 0 - Data Analysis


summary(group0_mean)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01923 0.11538 0.25000 0.40525 0.51442 1.98077
par(mfrow = c(1, 2))
hist(group0_mean, main = "Histogram of Group 0", xlab = "Number of Transactions")
boxplot(group0_mean, main = "Box-plot of Group 0", names = c("Group0"))

According to the histogram, the distribution of the average weekly transactions of Group 0 is right skewed. More over, the box plot shows that the third quartile is around 0.5, which means that the majority products in this group have less than 1 transactions per week. Even though there are some products who have weekly transactions larger than 1. Those products are considered as outliers in the box plot. In other words, those products are a very small proportion of all products. Therefore, we can conclude that the products in Group 0 have very low demand.

For further analysis, we would like to select a representative product, whose average weekly transaction is mostly close to a target value (i.e., median or mean). In this case, since the distribution is right skewed, we select the median as our target value. The representative product of this group is the product whose number of transactions is minimally different from the median.


idx0 <- which.min(abs(group0_mean-summary(group0_mean)['Median']))
print(idx0)
## P214 
##    3

The representative product of Group 0 is P214.

3.1.2 Group 0 - Forecast


#Convert P215 data to time series
ts0 <- ts(group0[,idx0], frequency = 365.25/7)

autoplot(ts0) + xlab("Week") + ylab("Number of Transaction") +
  ggtitle("Time Plot of P214 in Group 0")

According to the time plot, most of time P214 has zero transactions, some 1 transaction, and one 4 transactions per week. Most of products in this group will have similar plots.

3.1.3 Summary

Given a time series like P214 data, it is unnecessary to perform further decomposition or develop a forecasting model. If we would like to predict what happens in the next year, the most appropriate way is to copy the current data. In other words, we assume the transactions will be same as the previous year for a certain product in Group 0.

For business purpose, we may reach out the customers who ordered the products in the previous year, to check whether we can get an estimation of future orders.

3.2 Group 1

3.2.1 Group 1 - Data Analysis


summary(group1_mean)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.096   2.942   3.712   3.680   4.231   6.231
par(mfrow = c(1, 2))
hist(group1_mean, main = "Histogram of Group 1", xlab = "Number of Transactions")
boxplot(group1_mean, main = "Box-plot of Group 1", names = c("Group1"))

According to the histogram, the distribution of weekly transactions is almost centered around 4. In the box plot, the distribution is a little bit right skewed, with one outlier. In addition, the 75th percentile is slightly larger than 4, which means most of the products in Group 1 have weekly transactions smaller than 4. Therefore, we can conclude that the products in Group 1 have low demands.

In this group, since the distribution of weekly transactions is almost centered, we select the mean as our target value. The representative product of this group is the product whose number of transactions is minimally different from the mean.


idx1 <- which.min(abs(group1_mean-summary(group1_mean)['Mean']))
print(idx1)
## P318 
##   57

The representative product of Group 1 is P3188.


summary(group1[,idx1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   3.673   5.000   9.000

boxplot(group1[,idx1], main = "Box-Plot of P318")

The distribution of P318’s transaction data is a little bit right skewed. The average number of weekly transaction is 3.673, and the third quartile is 5. There is no outlier.

3.2.2 Group 1 - Decomposition


#Convert P3188 data to time series
ts1 <- ts(group1[,idx1], frequency = 365.25/7)

p1 <- autoplot(ts1) + xlab("Week") + ylab("Number of Transaction") + ggtitle("Time Plot of P318 in Group 1")

p2 <- ggAcf(ts1)

grid.arrange(p1, p2, ncol = 2)

According to the time plot, the majority of the number of transactions is between 2 and 6 per week. It is clear that there is no trend or seasonality in this time series. Moreover, the correlogram shows that this data is a white noise. In this way, there is no need to perform decomposition for this time series prior building a forecasting model.

3.2.2 Group 1 - Forecast

In this project, we will mainly apply two methods to this time series, ETS(Exponential Smoothing) and ARIMA(Autoregressive Integrated Moving Average ). If neither of these models work, we then will investigate other forecasting methods, such as average, drift, and naive, etc., based on the characteristic of the data.

Since we only have transactions data of 52 weeks, it is difficult for us to make long-term forecasting. Therefore, the forecasting horizontal is 4 months, roughly one month.

  1. ETS Method
fit_ets <- ets(ts1)

summary(fit_ets)
## ETS(A,N,N) 
## 
## Call:
##  ets(y = ts1) 
## 
##   Smoothing parameters:
##     alpha = 0.1373 
## 
##   Initial states:
##     l = 4.9344 
## 
##   sigma:  2.3852
## 
##      AIC     AICc      BIC 
## 299.8299 300.3299 305.6836 
## 
## Training set error measures:
##                      ME     RMSE      MAE  MPE MAPE MASE       ACF1
## Training set -0.1693657 2.338864 1.964891 -Inf  Inf  NaN 0.08297485
checkresiduals(fit_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 7.4342, df = 8.4, p-value = 0.532
## 
## Model df: 2.   Total lags used: 10.4
autoplot(forecast(fit_ets, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 1: P318 Forecast using ETS Method")

The ETS(A,N,N) is the simple exponential smoothing with additive errors. This model is expected, because the original time series is a white noise, without trend and seasonality. The residual plots show that the residuals have constant variance and zero mean, and there is no significant autocorrelation with each other. This observation somehow indicates the model is a good fit. Moreover, the p-value of Ljung-Box test also indicates independence of residuals. However, when we look at the forecasting results, the prediction interval is unacceptably wide, which also include negative values. Therefore, the ETS model is not a good choice for this time series.

  1. ARIMA Method
summary(ur.kpss(ts1))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.3727 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
p1 <- ggAcf(ts1)
p2 <- ggPacf(ts1)
grid.arrange(p1, p2, ncol = 2)

fit_arima <- auto.arima(ts1, seasonal = FALSE)
summary(fit_arima)
## Series: ts1 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##          ma1    mean
##       0.3247  3.6898
## s.e.  0.1597  0.4168
## 
## sigma^2 estimated as 5.403:  log likelihood=-116.68
## AIC=239.37   AICc=239.87   BIC=245.22
## 
## Training set error measures:
##                       ME     RMSE      MAE  MPE MAPE MASE        ACF1
## Training set 0.005398077 2.279339 1.888182 -Inf  Inf  NaN -0.03290114
checkresiduals(fit_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 5.5431, df = 8.4, p-value = 0.7351
## 
## Model df: 2.   Total lags used: 10.4
autoplot(forecast(fit_arima, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 1: P318 Forecast using ARIMA Method")

As discussed, the transaction time series of P318 is a white noise, which is a stationary data. Therefore, we expect the ARIMA model will perform better than the ETS model.Though for white noise, there is no need to do differencing;a KPSS unit root test is still conducted to verify this decision. The value of test-statistic is smaller than all critical values, which confirms that the data is stationary. Moreover, both ACF and PACF plots have same observation of stationary.

The next step is to develop an ARIMA model using auto.arima() function. The fitted model is ARIMA(0,0,0) with non-zero mean.For a white noise, this model is expected. Because there is no need of autoregression, differencing, and moving average for a white noise. The residual plots generally meet the requirements. Further, the RMSE and AIC are both smaller than those in the ETS model, saying that, the ARIMA model fits better. However, when checking the forecasting results, we notice that the forecasts are very similar with the ETS model, with an unacceptably wide prediction interval. Therefore, for this time series, ARIMA model is not a good choice, neither.

  1. Other Methods
p1 <- autoplot(ts1) + autolayer(meanf(ts1, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 1: P318 Forecast using Average Method")

p2 <- autoplot(ts1) + autolayer(naive(ts1, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 1: P318 Forecast using Naive Method")

p3 <- autoplot(ts1) + autolayer(rwf(ts1, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 1: P318 Forecast using Drift Method")

grid.arrange(p1, p2, p3, ncol = 3)

Forecasts using other methods are plotted above. In general, none of these methods fit the data better than either ARIMA or ETS method. Specifically, the forecast using Average Method has very similar results. However, the prediction intervals of Naive method and drift method are even wider. Therefore, these methods are not good options.

3.2.3 Summary

Since the original data of P318 is a white noise, it is very difficult to make a good prediction for this time series. We notice that the forecasting results of both ARIMA and ETS models are equal to the mean of P318, which is 3.673. However, because of the wide prediction interval, it will put business into risk if we only use the forecasting results directly. Thus, instead of using the mean value, it will be more practical to use the third quartile value, 5 transactions per week. In this case, we may cover 75% of business needs.

3.3 Group 2

3.3.1 Group 2 - Data Analysis


summary(group2_mean)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.288   9.077   9.904  11.506  12.558  22.173
par(mfrow = c(1, 2))
hist(group2_mean, main = "Histogram of Group 2", xlab = "Number of Transactions")
boxplot(group2_mean, main = "Box-plot of Group 2", names = c("Group2"))

According to the histogram, the distribution of weekly transactions is right skewed. In the box plot, the distribution is right skewed, with some outliers. In addition, the 75th percentile is slightly larger than 12, which means most of the products in Group 2 have weekly transactions smaller than 12. Therefore, we can conclude that the products in Group 2 have medium demands.

In this group, since the distribution of weekly transactions is right skewed, we select the median as our target value. The representative product of this group is the product whose number of transactions is minimally different from the median


idx2 <- which.min(abs(group2_mean-summary(group2_mean)['Median']))
print(idx2)
## P209 
##   67

The representative product of Group 2 is P209.


summary(group2[,idx2])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.000   7.750  10.000   9.904  12.000  17.000

boxplot(group2[,idx2], main = "Box-Plot of P209")

The distribution of P318’s transaction data is almost centered. The average number of weekly transaction is 9.904, and the third quartile is 12. There is no outlier.

3.3.2 Group 2 - Decomposition


#Convert P209 data to time series
ts2 <- ts(group2[,idx2], frequency = 365.25/7)

p1 <- autoplot(ts2) + xlab("Week") + ylab("Number of Transaction") + ggtitle("Time Plot of P209 in Group 2")

p2 <- ggAcf(ts2)

grid.arrange(p1, p2, ncol = 2)

According to the time plot, the majority of the number of transactions is between 6 and 13 per week. It is clear that there is no trend or seasonality in this time series. Moreover, the correlogram shows that this data is a white noise. In this way, there is no need to perform decomposition for this time series prior building a forecasting model.

3.3.2 Group 2 - Forecast

In this project, we will mainly apply two methods to this time series, ETS(Exponential Smoothing) and ARIMA(Autoregressive Integrated Moving Average ). If neither of these models work, we then will investigate other forecasting methods, such as average, drift, and naive, etc., based on the characteristic of the data.

Since we only have transactions data of 52 weeks, it is difficult for us to make long-term forecasting. Therefore, the forecasting horizontal is 4 months, roughly one month.

  1. ETS Method
fit_ets <- ets(ts2)

summary(fit_ets)
## ETS(M,N,N) 
## 
## Call:
##  ets(y = ts2) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
## 
##   Initial states:
##     l = 9.9036 
## 
##   sigma:  0.3363
## 
##      AIC     AICc      BIC 
## 334.5475 335.0475 340.4013 
## 
## Training set error measures:
##                         ME     RMSE      MAE       MPE     MAPE MASE
## Training set -0.0002554157 3.265715 2.684235 -13.57015 33.07803  NaN
##                    ACF1
## Training set 0.00510893
checkresiduals(fit_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,N)
## Q* = 8.7113, df = 8.4, p-value = 0.4059
## 
## Model df: 2.   Total lags used: 10.4
autoplot(forecast(fit_ets, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 2: P209 Forecast using ETS Method")

The ETS(M,N,N) is the simple exponential smoothing with multiplicative errors. This model is expected, because the original time series is a white noise, without trend and seasonality. The residual plots show that the residuals have constant variance and zero mean, and there is no significant autocorrelation with each other. This observation somehow indicates the model is a good fit. Moreover, the p-value of Ljung-Box test also indicates independence of residuals. However, when we look at the forecasting results, the prediction interval is relatively large. Therefore, the ETS model is not a good choice for this time series.

  1. ARIMA Method
summary(ur.kpss(ts2))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.2296 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
p1 <- ggAcf(ts2)
p2 <- ggPacf(ts2)
grid.arrange(p1, p2, ncol = 2)

fit_arima <- auto.arima(ts2, seasonal = FALSE)
summary(fit_arima)
## Series: ts2 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##         mean
##       9.9038
## s.e.  0.4529
## 
## sigma^2 estimated as 10.87:  log likelihood=-135.32
## AIC=274.65   AICc=274.89   BIC=278.55
## 
## Training set error measures:
##                         ME     RMSE      MAE       MPE     MAPE MASE
## Training set -1.229752e-15 3.265552 2.684172 -13.56664 33.07648  NaN
##                     ACF1
## Training set 0.005115978
checkresiduals(fit_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 8.7116, df = 9.4, p-value = 0.503
## 
## Model df: 1.   Total lags used: 10.4
autoplot(forecast(fit_arima, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 2: P209 Forecast using ARIMA Method")

As discussed, the transaction time series of P209 is a white noise, which is a stationary data. Therefore, we expect the ARIMA model will perform better than the ETS model.Though for white noise, there is no need to do differencing;a KPSS unit root test is still conducted to verify this decision. The value of test-statistic is smaller than all critical values, which confirms that the data is stationary. Moreover, both ACF and PACF plots have same observation of stationary.

The next step is to develop an ARIMA model using auto.arima() function. The fitted model is ARIMA(0,0,0) with non-zero mean.For a white noise, this model is expected. Because there is no need of autoregression, differencing, and moving average for a white noise. The residual plots generally meet the requirements, except that the residual distribution is right skewed. Further, the RMSE and AIC are both smaller than those in the ETS model, saying that, the ARIMA model fits better. However, when checking the forecasting results, we notice that the forecasts are very similar with the ETS model, with a large prediction interval. Therefore, for this time series, ARIMA model is not a good choice, neither.

  1. Other Methods
p1 <- autoplot(ts2) + autolayer(meanf(ts2, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 2: P209 Forecast using Average Method")

p2 <- autoplot(ts2) + autolayer(naive(ts2, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 2: P209 Forecast using Naive Method")

p3 <- autoplot(ts2) + autolayer(rwf(ts2, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 2: P209 Forecast using Drift Method")

grid.arrange(p1, p2, p3, ncol = 3)

Forecasts using other methods are plotted above. In general, none of these methods fit the data better than either ARIMA or ETS method. Specifically, the forecast using Average Method has very similar results. However, the prediction intervals of Naive method and drift method are even wider. Therefore, these methods are not good options.

3.3.3 Summary

Since the original data of P209 is a white noise, it is very difficult to make a good prediction for this time series. We notice that the forecasting results of both ARIMA and ETS models are equal to the mean of P209, which is 9.904. However, because of the wide prediction interval, it will put business into risk if we only use the forecasting results directly. Thus, instead of using the mean value, it will be more practical to use the third quartile value, 12 transactions per week. In this case, we may cover 75% of business needs.

3.4 Group 3

3.4.1 Group 3 - Data Analysis


summary(group3_mean)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.10   32.17   33.04   33.36   34.97   42.69
par(mfrow = c(1, 2))
hist(group3_mean, main = "Histogram of Group 3", xlab = "Number of Transactions")
boxplot(group3_mean, main = "Box-plot of Group 3", names = c("Group3"))

According to the histogram, the distribution of weekly transactions is centered. In the box plot, the distribution is a little bit right skewed, with few outlierers. In addition, the 25th percentile is around 32, which means most of the products in Group 3 have weekly transactions larger than 32. Therefore, we can conclude that the products in Group 3 have high demands.

In this group, since the distribution of weekly transactions is centered, we select the mean as our target value. The representative product of this group is the product whose number of transactions is minimally different from the mean.


idx3 <- which.min(group3_mean-summary(group3_mean)['Mean'])
print(idx3)
## P533 
##  113

The representative product of Group 3 is P140.


summary(group3[,idx3])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    12.0    20.0    23.0    24.1    28.0    41.0

boxplot(group3[,idx3], main = "Box-Plot of P533")

The distribution of P318’s transaction data is a little bit right skewed, with one ourlier. The average number of weekly transaction is 24.1, and the third quartile is 28.

3.4.2 Group 3 - Decomposition


#Convert P533 data to time series
ts3 <- ts(group3[,idx3], frequency = 365.25/7)

p1 <- autoplot(ts3) + xlab("Week") + ylab("Number of Transaction") + ggtitle("Time Plot of P533 in Group 3")

p2 <- ggAcf(ts3)

grid.arrange(p1, p2, ncol = 2)

According to the time plot, the majority of the number of transactions is between 15 and 35 per week. There is a decreasing trend over time, but no seasonality. The correlogram confirms with this observation. In addition, there is a dramatic increase in the last week, which may result in large variance in our forecasting results.

3.4.2 Group 3 - Decomposition

The original time series consists of a trend component and an irregular component, but no seasonal component. Therefore, it is appropriate to use the simple moving average method to smooth the trend.

autoplot(ts2, series = "Data") + autolayer(ma(ts2, 5), series = "5-MA" ) + 
  autolayer(ma(ma(ts2, 5), 3), series = "3x5-MA") +
  scale_colour_manual(values=c("Data"="grey50","5-MA"="blue", "3x5-MA" = "red"),
                      breaks=c("Data","5-MA", "3x5-MA"))
## Warning: Removed 4 rows containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_path).

According to the results above, with an order of 5, the estimated trend is not smoothing as expected. Thus, moving average of the moving average is performed. The “3x5-MA” trend is a better fit.

3.4.3 Group 3 - Forecast

In this project, we will mainly apply two methods to this time series, ETS(Exponential Smoothing) and ARIMA(Autoregressive Integrated Moving Average ). If neither of these models work, we then will investigate other forecasting methods, such as average, drift, and naive, etc., based on the characteristic of the data.

Since we only have transactions data of 52 weeks, it is difficult for us to make long-term forecasting. Therefore, the forecasting horizontal is 4 months, roughly one month.

  1. ETS Method
fit_ets <- ets(ts3, model = "AAZ")

summary(fit_ets)
## ETS(A,Ad,N) 
## 
## Call:
##  ets(y = ts3, model = "AAZ") 
## 
##   Smoothing parameters:
##     alpha = 2e-04 
##     beta  = 2e-04 
##     phi   = 0.9584 
## 
##   Initial states:
##     l = 30.8384 
##     b = -0.4786 
## 
##   sigma:  6.0183
## 
##      AIC     AICc      BIC 
## 398.8668 400.7334 410.5742 
## 
## Training set error measures:
##                       ME     RMSE      MAE      MPE     MAPE MASE
## Training set -0.05536064 5.721614 4.353331 -5.84661 19.14019  NaN
##                   ACF1
## Training set 0.1276799
checkresiduals(fit_ets)

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,Ad,N)
## Q* = 6.8655, df = 5.4, p-value = 0.2705
## 
## Model df: 5.   Total lags used: 10.4
autoplot(forecast(fit_ets, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 3: P533 Forecast using ETS Method")

forecast(fit_ets, h = 4)
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1.996578       20.92721 13.21450 28.63993 9.131630 32.72280
## 2.015743       20.87984 13.16712 28.59255 9.084253 32.67542
## 2.034908       20.83443 13.12171 28.54715 9.038843 32.63001
## 2.054073       20.79091 13.07819 28.50363 8.995319 32.58650

Since we already know that there is a trend component in the time series, and the variance seems constant over time. Therefore, the ETS() model should include trend-cycle component with additive errors. As a result, the forecasting model is ETS(A,Ad,N), where the trend component is damped to include the changes of trend. The residual plots show that the residuals have constant variance and zero mean, and there is no significant autocorrelation with each other.This observation somehow indicates the model is a good fit. Moreover, the p-value of Ljung-Box test also indicates independence of residuals. When we look at the forecasting results, the forecasts follow the decreasing trend with an acceptable prediction interval. Therefore, the ets model is a good fit.

  1. ARIMA Method

As discussed, the transaction time series of P533 has a trend-cycle component, which means it is non-stationary. Therefore, differencing is required in the Arima model.

summary(ur.kpss(ts3))
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.57 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
ndiffs(ts3)
## [1] 1
p1 <- ggAcf(ts3)
p2 <- ggPacf(ts3)
grid.arrange(p1, p2, ncol = 2)

The KPSS unit root test result shows the necessary of differencing, and the differencing degree should be 1. In the ACF plot, we see that there are two spikes and no significant spikes thereafter. In the PACF, there are three significant spikes, and then no significant spikes thereafter. So in this case, the ACF and PACF lead us to think an ARIMA(2,1,0) model might be appropriate. As comparison, we also develop an ARIMA model using auto.arima() function.

fit_arima1 <- arima(ts3, order = c(2, 1, 0))
fit_arima2 <- arima(ts3, order = c(2, 1, 1))
fit_arima3 <- arima(ts3, order = c(2, 1, 1))

fit_arima_auto <- auto.arima(ts3, seasonal = FALSE)

print(c(fit_arima1$aic, fit_arima2$aic, fit_arima3$aic, fit_arima_auto$aic))
## [1] 333.9637 335.9452 335.9452 331.1298

Based on the AIC values, the best model is the auto.Arima model, who has the smallest value.

summary(fit_arima_auto)
## Series: ts3 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.7598
## s.e.   0.1282
## 
## sigma^2 estimated as 35.85:  log likelihood=-163.56
## AIC=331.13   AICc=331.38   BIC=334.99
## 
## Training set error measures:
##                      ME     RMSE      MAE       MPE     MAPE MASE
## Training set -0.1814896 5.871235 4.656843 -5.940685 20.48817  NaN
##                    ACF1
## Training set 0.00640853
checkresiduals(fit_arima_auto)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 7.1526, df = 9.4, p-value = 0.6587
## 
## Model df: 1.   Total lags used: 10.4
autoplot(forecast(fit_arima_auto, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 3: P533 Forecast using ARIMA Method")

The best fit Arima model is Arima(0,1,1), in which the order of autocorrelation is set to 0. This may be attributed to the small spikes in the PACF plot. In the forecasting plot, we notice that the forecasts are around 25 and have a slightly decreasing trend. Based on the trend of historical data, this forecast may overestimate the transactions. This overestimation may result from the dramatic increase in the last two weeks. Therefore, the ETS model fits better.

  1. Other Methods
p1 <- autoplot(ts3) +
  autolayer(holt(ts3, h = 4)) + xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 3: P533 Forecast using Holt's Method")


p2 <- autoplot(ts3) +
  autolayer(holt(ts3, h = 4, damped = TRUE, phi = 0.9)) +
  xlab("Week") + ylab("Number of Transactions") + ggtitle("Group 3: P533 Forecast using Damped Holt's method")

grid.arrange(p1, p2, ncol = 2)

Forecasts using other methods are plotted above. In general, none of these methods fit the data better than the ETS method.

3.4.4 Summary

The original data of P533 has a decreasing trend over time without seasonality. The ETS model fits the data best and gives an acceptable prediction interval. The forecasts are about 20 transactions per week.

4 Conclusion

In this project, we firstly classify over 800 products into four segmentation: 37.2% products belong to Group 0, 23.2% to Group 1, 24.3% to Group 2, and 15.3% to Group 3. Thereafter, a representative product is selected by minimizing the difference between weekly average transactions and target value (i.e., mean and median). Based on the analysis and forecasting results, we provide following recommendations:
1. Group 0: Demands are very low that have less than 1.5 transactions per week, in average. It is difficult to make predictions for this group. For forecasting purpose, we would like to copy the historical data and use it as reference. In addition, we may want to contact with the customer to get an early estimation for future orders.

  1. Group 1: Demands are low that have less than 4 transactions per week, in average. The transaction data of the representative product is a white noise, which makes it impossible to give a reliable forecast. Therefore, in order to reduce business risks, we would like to use the third quartile as a forecast, which is about 5 transactions per week. In this way, we can roughly cover 75% of orders.

  2. Group 2: Demands are medium that have 11.5 transactions per week, in average. The transaction data of the representative product is a white noise, which makes it impossible to give a reliable forecast. Therefore, in order to reduce business risks, we would like to use the third quartile as a forecast, which is 12 transactions per week. In this way, we can roughly cover 75% of orders.

  3. Group 3: Demands are high that have 24 transactions per week, in average. The transaction data of the representative product has a decrease trend, but no seasonality. The forecasts of ETS model are around 20 transactions per week.