Introduction

In this project, we will be working with time series data from the Time Series Data Library (FinYang/tsdl), which encompasses a diverse collection of 648 time series across various domains. Specifically, we will focus on the Microeconomic data subset, which includes 36 time series records.

The dataset spans different frequencies—ranging from very high (0.1) to annual (365)—and covers a variety of subjects relevant to time series analysis. For instance, the Microeconomic data features:

27 series with a frequency of 1 (monthly), 1 series with a frequency of 4 (quarterly), 7 series with a frequency of 12 (yearly), 1 series with a frequency of 365 (daily). This rich dataset will be used to develop and assess forecasting models tailored to microeconomic variables, providing insights and predictions relevant for economic analysis and decision-making.

Source: https://pkg.yangzhuoranyang.com/tsdl/

Business Question

We will explore how to develop a forecasting model for profit using Time Series Analysis, employing four distinct models: ARIMA (AutoRegressive Integrated Moving Average), Holt’s Winter (Exponential Smoothing with Trend and Seasonality), STLF (Seasonal and Trend decomposition using Loess), and TBATS (Trigonometric, Box-Cox, ARMA, Trend, and Seasonal components). By applying these models, we aim to identify the most effective method for predicting future profits based on historical data. This analysis will help us evaluate each model’s performance and accuracy, providing valuable insights for strategic financial planning and decision-making.

1. Data Preparation

1.1 Prerequisites

1.2 Importing Libraries

library(tidyverse)
library(lubridate)
library(forecast)
library(TTR)
library(fpp)
library(tseries)
library(TSstudio)
library(padr)
library(recipes)
library(tidyquant)
library(ggplot2)
library(tsdl)
options(scipen = 100, max.print = 1e+06)

1.3 Importing Dataset

tsdl
## Time Series Data Library: 648 time series  
## 
##                        Frequency
## Subject                 0.1 0.25   1   4   5   6  12  13  52 365 Total
##   Agriculture             0    0  37   0   0   0   3   0   0   0    40
##   Chemistry               0    0   8   0   0   0   0   0   0   0     8
##   Computing               0    0   6   0   0   0   0   0   0   0     6
##   Crime                   0    0   1   0   0   0   2   1   0   0     4
##   Demography              1    0   9   2   0   0   3   0   0   2    17
##   Ecology                 0    0  23   0   0   0   0   0   0   0    23
##   Finance                 0    0  23   5   0   0  20   0   2   1    51
##   Health                  0    0   8   0   0   0   6   0   1   0    15
##   Hydrology               0    0  42   0   0   0  78   1   0   6   127
##   Industry                0    0   9   0   0   0   2   0   1   0    12
##   Labour market           0    0   3   4   0   0  17   0   0   0    24
##   Macroeconomic           0    0  18  33   0   0   5   0   0   0    56
##   Meteorology             0    0  18   0   0   0  17   0   0  12    47
##   Microeconomic           0    0  27   1   0   0   7   0   1   0    36
##   Miscellaneous           0    0   4   0   1   1   3   0   1   0    10
##   Physics                 0    0  12   0   0   0   4   0   0   0    16
##   Production              0    0   4  14   0   0  28   1   1   0    48
##   Sales                   0    0  10   3   0   0  24   0   9   0    46
##   Sport                   0    1   1   0   0   0   0   0   0   0     2
##   Transport and tourism   0    0   1   1   0   0  12   0   0   0    14
##   Tree-rings              0    0  34   0   0   0   1   0   0   0    35
##   Utilities               0    0   2   1   0   0   8   0   0   0    11
##   Total                   1    1 300  64   1   1 240   3  16  21   648

2. Data Wrangling

tdsl_microeconomic <- subset(tsdl,12,"microeconomic")
tdsl_microeconomic
## Time Series Data Library: 7 Microeconomic time series with frequency 12 
## 
##                Frequency
## Subject         12
##   Microeconomic  7
micro_ts <- tdsl_microeconomic[[1]]

2.1 Data Inspections

str(micro_ts)
##  Time-Series [1:132] from 1965 to 1976: 52149 47205 82150 100931 98408 ...
##  - attr(*, "source")= chr "Abraham & Ledolter (1983)"
##  - attr(*, "description")= chr "Monthly U.S. housing starts (privately owned 1-family) 1965 – 1975"
##  - attr(*, "subject")= chr "Microeconomic"
summary(micro_ts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   33363   64239   79805   79406   92813  135167

2.2 Handling Missing Values

anyNA(micro_ts)
## [1] FALSE
micro_ts
##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
## 1965  52149  47205  82150 100931  98408  97351  96489  88830  80876  85750
## 1966  46561  50361  83236  94343  84748  79828  69068  69362  59404  53530
## 1967  40157  40274  66592  79839  87341  87594  82344  83712  78194  81704
## 1968  45234  55431  79325  97983  86806  81424  86398  82522  80078  85560
## 1969  51300  47909  71941  84982  91301  82741  73523  69465  71504  68039
## 1970  33363  41367  61879  73835  74848  83007  75461  77291  75961  79393
## 1971  54856  58287  91584 116013 115627 116946 107747 111663 102149 102882
## 1972  76185  76306 111358 119840 135167 131870 119078 131324 120491 116990
## 1973  77105  73560 105136 120453 131643 114822 114746 106806  84504  86004
## 1974  43292  57593  76946 102237  96340  99318  90715  79782  73443  69460
## 1975  39791  39959  62498  77777  92782  90284  92782  90655  84517  93826
##         Nov    Dec
## 1965  72351  61198
## 1966  50212  37972
## 1967  69088  47026
## 1968  64819  53847
## 1969  55069  42827
## 1970  67443  69041
## 1971  92904  80362
## 1972  97428  73195
## 1973  70488  46767
## 1974  57898  41041
## 1975  71646  55650

3. Exploratory Data Analysis

To analyze the seasonality of a time series, two primary methods are used: additive and multiplicative. When categorizing a time series object, three key components are essential for forecasting:

Trend (T): Represents the overall movement of the mean across a given time period. Seasonal (S): Reflects the recurring pattern observed within each seasonal cycle. Error (E): Accounts for the variations that cannot be explained by the trend or seasonal components.

micro_ts %>% 
  autoplot()

Based on the provided data, the pattern appears to be more additive rather than multiplicative. This is evident from the absence of a clear exponential trend, with the data fluctuating around the mean without significant overall increases or decreases, and the relatively constant seasonal amplitude throughout the observed period.

micro_ts %>% decompose() %>% 
  autoplot()

The data reveals a consistent upward trend from 1966 until approximately 1973, followed by a steady decline until 1976. A distinct seasonal pattern is evident, with peaks at the beginning of each year and troughs in the middle of the year. The residual component exhibits random fluctuations around zero, with occasional significant deviations.

micro_decom <- decompose(x = micro_ts)
micro_decom$trend %>% autoplot()

The time series data displays a clear cyclical pattern with an annual peak and trough. There is a general upward trend from 1966 to 1972, followed by a downward trend until 1976. The data also exhibits some random fluctuations around these trends.

micro_decom$seasonal %>% autoplot()

The provided image clearly shows a pronounced seasonal pattern with a repeating annual cycle. Peaks typically occur at the beginning of the year, followed by a sharp decline to a low point in the middle of the year, after which the data values rise again to reach the next peak at the start of the following year.

4. Cross Validation

test_micro <- tail(micro_ts, 24)
train_micro <- head(micro_ts, -length(test_micro))

Since our time series data contains both seasonal and trend components, the model will be developed using:

  • ARIMA
  • Holt’s Winter
  • SLTF
  • TBATS

5. Time Series Modeling

5.1 ARIMA

5.1.1 Modeling

model_arima_ts <- stlm(train_micro, s.window = 12, method = "arima")

5.1.2 Forecast

forecast_arima_ts <- forecast(model_arima_ts, h= 24)
forecast_arima_ts
##          Point Forecast    Lo 80     Hi 80     Lo 95     Hi 95
## Jan 1974       45840.55 38569.56  53111.55 34720.525  56960.58
## Feb 1974       47536.99 38419.64  56654.33 33593.215  61480.76
## Mar 1974       76012.01 65363.80  86660.23 59726.972  92297.05
## Apr 1974       90736.77 78751.65 102721.88 72407.109 109066.42
## May 1974       94720.01 81532.84 107907.19 74551.969 114888.06
## Jun 1974       90939.44 76650.98 105227.90 69087.123 112791.76
## Jul 1974       84710.21 69399.47 100020.95 61294.457 108125.97
## Aug 1974       85296.23 69027.32 101565.13 60415.082 110177.37
## Sep 1974       77313.16 60139.46  94486.85 51048.256 103578.06
## Oct 1974       77835.92 59802.77  95869.07 50256.599 105415.24
## Nov 1974       63682.77 44829.31  82536.23 34848.895  92516.65
## Dec 1974       49120.34 29480.80  68759.88 19084.255  79156.42
## Jan 1975       45840.55 25445.21  66235.90 14648.566  77032.54
## Feb 1975       47536.99 26412.86  68661.11 15230.427  79843.55
## Mar 1975       76012.01 54183.43  97840.60 42628.070 109395.95
## Apr 1975       90736.77 68225.75 113247.78 56309.140 125164.39
## May 1975       94720.01 71546.66 117893.37 59279.429 130160.60
## Jun 1975       90939.44 67122.16 114756.72 54514.053 127364.83
## Jul 1975       84710.21 60265.96 109154.46 47325.955 122094.47
## Aug 1975       85296.23 60240.69 110351.76 46977.087 123615.36
## Sep 1975       77313.16 51660.90 102965.42 38081.409 116544.91
## Oct 1975       77835.92 51600.50 104071.34 37712.310 117959.53
## Nov 1975       63682.77 36876.89  90488.66 22686.706 104678.84
## Dec 1975       49120.34 21755.87  76484.81  7269.994  90970.68

5.1.3 Performance

accuracy(forecast_arima_ts, test_micro)
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set -62.90601 5620.809 4489.849 -0.4268869  6.051957 0.3159900
## Test set     530.96663 7958.745 6936.612 -1.0578176 10.198215 0.4881901
##                     ACF1 Theil's U
## Training set -0.01649926        NA
## Test set      0.63277508 0.6409478

5.1.4 Visualization

test_forecast(actual = micro_ts, forecast.obj = forecast_arima_ts, train = train_micro, test = test_micro)

5.2 HW | Holt Winters

5.2.1 Modeling

model_hw_ts <- HoltWinters(train_micro, seasonal = "additive")

5.2.2 Forecast

forecast_hw_ts <- forecast(model_hw_ts, h= 24)

5.2.3 Performance

accuracy(forecast_hw_ts,test_micro)
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  -224.9028  6929.402  5499.193 -0.3653901  7.61040 0.3870264
## Test set     40414.5789 47748.584 40530.823 56.5673658 56.83588 2.8525088
##                    ACF1 Theil's U
## Training set 0.02445684        NA
## Test set     0.82813482  3.078167

5.2.4 Visualization

test_forecast(actual = micro_ts, forecast.obj = forecast_hw_ts, train = train_micro, test = test_micro)

5.3 SLTF | Seasonal and Trend decomposition using Loess

5.3.1 Model

model_stlf_ts <- stlf(train_micro,h = 24, s.window = 12)

5.3.2 Forecast

forecast_stlf_ts <- forecast(model_stlf_ts, h= 24)

5.3.3 Performance

accuracy(forecast_stlf_ts,test_micro)
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set -51.34777 5620.791 4499.587 -0.4047588  6.070721 0.3166753
## Test set     511.81873 7957.490 6936.612 -1.0860601 10.202868 0.4881901
##                     ACF1 Theil's U
## Training set -0.01668245        NA
## Test set      0.63277508  0.641267

5.3.4 Visualization

test_forecast(actual = micro_ts, forecast.obj = forecast_stlf_ts, train = train_micro, test = test_micro)

5.4 TBATS | Trigonometric, Box-Cox transform, ARMA errors, Trend, and Seasonal components

5.4.1 Model

model_tbats_ts <- train_micro %>% tbats(use.box.cox = T,
                                        use.trend = T,
                                        use.damped.trend = T)

5.4.2 Forecast

forecast_tbats_ts <- forecast(model_tbats_ts, h= 24)

5.4.3 Performance

accuracy(forecast_tbats_ts,test_micro)
##                      ME      RMSE      MAE        MPE      MAPE      MASE
## Training set  -286.4424  5535.963  4350.47 -0.7101978  5.882747 0.3061807
## Test set     13991.7508 16863.352 14083.22 17.0288704 17.240156 0.9911595
##                    ACF1 Theil's U
## Training set 0.06948397        NA
## Test set     0.67523579  0.981795

5.4.4 Visualization

test_forecast(actual = micro_ts, forecast.obj = forecast_tbats_ts, train = train_micro, test = test_micro)

6. Assumtion Best Models

6.1 ARIMA

6.1.1 Auto-Correlation

Box.test(model_arima_ts$residuals, type = "Ljung-Box", lag = 24)
## 
##  Box-Ljung test
## 
## data:  model_arima_ts$residuals
## X-squared = 34.294, df = 24, p-value = 0.07958

6.1.2 Normality

shapiro.test(x = model_arima_ts$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model_arima_ts$residuals
## W = 0.99551, p-value = 0.9805

6.1.3 Histogram

hist(model_arima_ts$residuals, breaks = 10)

6.2 STLF | Seasonal and Trend decomposition using Loess

6.2.1 Auto-Correlation

Box.test(model_stlf_ts$residuals, type = "Ljung-Box", lag = 24)
## 
##  Box-Ljung test
## 
## data:  model_stlf_ts$residuals
## X-squared = 34.798, df = 24, p-value = 0.07145

6.2.2 Normality

shapiro.test(x = model_stlf_ts$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model_stlf_ts$residuals
## W = 0.99528, p-value = 0.9747

6.2.3 Histogram

hist(model_stlf_ts$residuals, breaks = 10)

7. Conclusion

Based on the above metrics, ARIMA and STLF are the best models for the test set, with very similar performance. Among them, ARIMA is slightly superior due to lower RMSE and Theil’s U values, though the difference is minimal. Holt-Winters performs the worst among the four models, followed by TBATS.

Analysis

RMSE & MAE

  • ARIMA and STLF exhibit the lowest RMSE and MAE, indicating the smallest prediction errors among the four models.
  • Holt-Winters has significantly higher RMSE and MAE, suggesting that this model is less accurate.
  • TBATS has lower RMSE and MAE compared to Holt-Winters but still higher than ARIMA and STLF.

MAPE

  • ARIMA and STLF also have the lowest MAPE, reflecting smaller percentage errors.
  • Holt-Winters shows a very high MAPE, indicating predictions that are far from accurate.
  • TBATS performs better than Holt-Winters but still has a relatively high MAPE.

ACF1

  • ARIMA and STLF have identical ACF1 values, which, while still relatively high, are lower than those of TBATS and Holt-Winters.
  • Higher ACF1 values for Holt-Winters and TBATS suggest that there is residual autocorrelation, meaning these models do not capture all the patterns in the data.

Theil’s U

  • ARIMA and STLF have very close Theil’s U values, indicating that their predictions are better compared to TBATS and Holt-Winters.
  • Holt-Winters has the highest Theil’s U value, showing the poorest performance.