I analyze time series data from a departmental store and attempt to make forecasts. I also discuss the various terms associated with time series- trend, seasonality, and noise. I then compare the performance of the models in forecasting. The results show that a polynomial model with seasonal effects does better than a pure linear model.
The file DepartmentStorSales.csv contains data on the quarterly sales for a department store over a 6-year period from 2000 to 2005
Code
## Read in the data and see first six pointssales <-read_csv("DepartmentStoreSales.csv")head(sales) %>%gt(caption ="Sample Data")
Sample Data
Quarter
Sales
1
50147
2
49325
3
57048
4
76781
5
48617
6
50898
1.1Components of Time Series
I discuss four major components of time series (level, trend, seasonality, and noise). Discuss the meaning of these four major components. (3 points)
1.1.1Level
In time series analysis, level refers to the average value in the series. Usually, firms may work with the level value and then adjust their stock levels for seasonal and trend components.
1.1.2Trend
Trend in time series analysis is the sustained increase or decrease observed in a series that spans accross any seasonal patterns. In the case of the departmental store, there is an overall upward trajectory in sales. This means that, on average, overall sales are increasing even before factoring in seasonality. This trajectory is also quite predictable. The store management can then plan to stock higher levels of stock in the current quarter compared to the corresponding stock levels in the previous quarter.
1.1.3Seasonality
Seasonality refers to the short-term recurring cycles in the series. In other words, these are the periodic ups and downs observed in the data. In the case of our departmental store sales, we can observe a periodic rise and fall in demand that occurs every 3 quarters. The seasonal component is very predictable. In this case, the department should increase the stock during the peak periods and hold relatively lower stock during the off-peak periods.
1.1.4Noise
Noise refers to the random variation in the series. This random variation is unpredictable. Noise arises from the unknown factors that affect demand or factors that are not measurable, the known-knowns and the unknown-unknowns. This is the reason many businesses hold working capital in the form of extra cash or inventory to take care of these unpredictable ups and downs in sales.
1.2Plotting the Data
I show the data in a time-series format and create a well-performed time plot of the data.
Code
## Create a time series datamy_sales_ts <-ts(sales, start =c(2000, 1), frequency =4)## View the dataprint(my_sales_ts)
## Plot the dataautoplot(my_sales_ts[, "Sales"]) +labs(title ="Quarterly Sales, 2000-2005", y ="Sales", x ="Year")
I also create a seasons plot that shows the sales for each year by season. The plot shows that in spite of the seasonality, sales in each subsequent year were higher than the last.
Likewise, the sub-series plot shows the trends in sales for each quarter. For instance, the first panel is a plot of sales in \(2000Q1, 2001Q1 \cdots 2005Q1\). Again the plot shows a general upward trend.
Code
ggsubseriesplot(my_sales_ts[, "Sales"]) +labs(y ="Sales",title ="Subseries Plots by Quarters")
1.3Decomposition
Decompose your time-series data into four major components (level, trend, seasonality, and noise) and plot these four components individually. Ensure to and explain your observations about different trends from your plots. (6 points)
Code
## Decompose the data into level,seasons and trendmy_dec_ts <-decompose(my_sales_ts[, "Sales"])## Plot the decompositionautoplot(my_dec_ts)
The decomposition above consists of 4 plots. The first plot is the observed raw data. The second plot illustrates the general upward trend of sales over the period. The trend shows that quarter after quarter, the general trend is up. The management of the store could plan future inventory on this trend among other factors.
The third plot shows the seasonal component, the short swings in sales. when stocking, the management could consider these dynamics by stocking more when the demand is high compared the the off-peak periods.
The final plot shows the random component of the sales. Here, the data appears to follow a random walk with no apparent pattern. In planning for sales, it is hard to cater for this component except for keeping some buffer stock just in case of an upward but unexpected uptick in demand.
1.4Modelling
I run and discuss and compare different trend models in time series, including the linear trend regression model, exponential trend model, and polynomial trend model.
1.4.1Linear trend regression model
The forecasting equation for the linear trend model is:
\(y = a + \beta t\)
Here, t is the time index. The \(\alpha\) and \(\beta\) (the “intercept” and “slope”) are come from a simple regression in with Y as the dependent variable and the time t as the independent variable.
Although useful, these models perform poorly when the trend is not linear as is the case with our data.In this case a polynomial trend model does better. Even where the trend is linear, the linear model may fail to pick seasonal or cyclical fluctuations.
1.4.2Exponential trend model
Exponential smoothing methods use the weighted averages of past observations to make forecasts of the future. These weights reduce as the observations get older with more recent observations having a higher weight (Hyndman and Athanasopoulos 2018). The technique is easy and quick to implement and produces fairly good forecasts in several cases.
1.4.3Polynomial trend model
Polynomial trend models model the trend in data as a smooth variation in time. This model permits us to fit a either a line or a curve to the data as opposed to the linear model that assumes a purely linear relationship over time. The polynomial trend models provide a simple, flexible forms to describe the local trend components (West and Harrison 1989). These class of models are useful for modelling data that has an increasing or decreasing trend. More specifically, The polynomial trend models can be used in those situations where the relationship between study and explanatory variables is curvilinear.
Given a polynomial regression model in one variable as follows;
This is a second-order model or quadratic model. \(\beta_{1}\) and \(\beta_{2}\) are called the linear effect parameter and quadratic effect parameter, respectively.This class of models can extend to include several variables and their interactions and even higher orders like cubes.
1.5Forecasting
Partition the first 18 records as training sets and the rest of 6 records as validation sets.
1.5.1Linear Model
Fit a linear trend model on training sets. Based on this linear trend model, make a forecast for validation sets and show the results.
Code
## Create training and testing sets ts_training <-window(my_sales_ts, start =c(2000, 1), end =c(2004, 2))ts_testing <-window(my_sales_ts, start =c(2004, 2))
I then fit a linear trend model on the training data.
Code
## Fit a linear trend modelsales_model <-tslm(Sales ~ Quarter, data = ts_training)## Summary of linear trend modelsummary(sales_model)
Call:
tslm(formula = Sales ~ Quarter, data = ts_training)
Residuals:
Min 1Q Median 3Q Max
-10862 -8390 -5160 1665 20742
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54132 5674 9.54 5.3e-08 ***
Quarter 644 524 1.23 0.24
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11500 on 16 degrees of freedom
Multiple R-squared: 0.0861, Adjusted R-squared: 0.029
F-statistic: 1.51 on 1 and 16 DF, p-value: 0.237
Code
## Forecast model with linear modelmy_linear_forecast <- forecast::forecast(sales_model, newdata = ts_testing[, "Quarter"])## Print my linear model forecastmy_linear_forecast
autoplot(my_linear_forecast) +labs(title ="Forecasts from Linear Trend Model")
1.5.2Polynomial trend model with seasonal effects
I fit a polynomial trend model with seasonal effects on training sets. Based on this model, make a forecast for validation sets and show the results. Check accuracy measures of the forecasted model and show these accuracy measures. (8 points)
Code
## Fit a model with season a trend my_trend_model <- forecast::tslm(Sales ~ trend + season, data = ts_training)## Summarise the modelsummary(my_trend_model)
Call:
forecast::tslm(formula = Sales ~ trend + season, data = ts_training)
Residuals:
Min 1Q Median 3Q Max
-2289 -938 -373 402 3070
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46519.5 1065.7 43.65 1.7e-15 ***
trend 557.2 80.1 6.96 1.0e-05 ***
season2 1388.8 1112.8 1.25 0.23
season3 8489.5 1177.3 7.21 6.8e-06 ***
season4 27735.4 1180.0 23.50 4.9e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1750 on 13 degrees of freedom
Multiple R-squared: 0.983, Adjusted R-squared: 0.978
F-statistic: 186 on 4 and 13 DF, p-value: 2.49e-11
Code
## A function to compute mean percent errormape <-function(actual,pred){ mape <-mean(abs((actual - pred)/actual))*100return (mape)}
Code
## Forecasting with the model with season a trend my_trend_forecast <- forecast::forecast(my_trend_model, newdata = ts_testing)my_trend_forecast
In this section I compute the error for both models. Note that the model with trend and seasons tends to do better than the linear trend model.
Code
## capturing the forecastsforecasts <-data.frame(Quarter = (nrow(my_sales_ts) -6):nrow(my_sales_ts),linear_trend =c(65721, 66365, 67008, 67652, 68296, 68940, 69584),trend_season =c(65596, 85399, 58221, 60166, 67824, 87627, 60449))## % Error for the linear trend modelmape(ts_testing[, "Sales"], forecasts$linear_trend)
[1] 15
Code
## % Error for model with trend and seasonsmape(ts_testing[, "Sales"], forecasts$trend_season)
[1] 18
1.6Comparison
Finally, based on the decomposition of four major components from 5.3, I compare your 5.5 and 5.6 models and discuss which model is more appropriate in this particular setting, giving reasons.
The plot below compares the comparison of both models. Note that the model with seasonality and trend (red broken line) captures the data much better than the linear model (blue line). I would choose the model with seasonality and trend given the nature of the data; It has both a trend component and a seasonal component. The linear trend model fares especially poorly in capturing seasons and cycles compared to the model with seasonality and trend.
Code
sales %>%ggplot(aes(x = Quarter, y = Sales)) +geom_line() +geom_line(data = forecasts, aes(y = linear_trend),col ="blue") +geom_line(data = forecasts, aes(y = trend_season),col ="red", linetype ="dashed") +labs(title ="Comaring the Linear Trend Model and the Model with Season and Trend")
References
Hyndman, Rob J., and George Athanasopoulos. 2018. Forecasting: Principles and Practice. Melbourne: OTexts.
West, Mike, and Jeff Harrison. 1989. “Polynomial TrendModels.” In Bayesian Forecasting and DynamicModels, 201–28. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4757-9365-9_7.