Introduction to Time Series Analysis
Types of data
Time Series terminology
Time Series Analysis
Visualize the Time Series
Patterns in a Time Series
Additive and Multiplicative Time Series
Decomposition of a Time Series
Stationary and Non-Stationary Time Series
How to make a time series stationary
How to test for stationarity

11.1 Kwiatkowski-Phillips-Schmidt-Shin – KPSS test (trend stationary)

Difference between white noise and a stationary series
Model Building

13.1 Simple Exponential Smoothing
13.2 Double Exponential Smoothing
13.3 Triple Exponential Smoothing
13.4 Seasonal AutoRegressive Integrated Moving Average(SARIMA)

Conclusion and Insights

14.1 Summary
14.2 Potential Improvements

1. Introduction to Time-Series Analysis

Table of Contents

A time-series data is a series of data points or observations recorded at different or regular time intervals. In general, a time series is a sequence of data points taken at equally spaced time intervals. The frequency of recorded data points may be hourly, daily, weekly, monthly, quarterly, or annually.
Time-Series Forecasting is the process of using a statistical model to predict future values of a time-series based on past results.
A time series analysis encompasses statistical methods for analyzing time series data. These methods enable us to extract meaningful statistics, patterns, and other characteristics of the data. Time series are visualized with the help of line charts. So, time series analysis involves understanding inherent aspects of the time series data so that we can create meaningful and accurate forecasts.
Applications of time series are used in statistics, finance, or business applications. A very common example of time series data is the daily closing value of the stock index like NASDAQ or Dow Jones. Other common applications of time series are sales and demand forecasting, weather forecasting, econometrics, signal processing, pattern recognition, and earthquake prediction.

Components of a Time-Series

Trend: The trend shows a general direction of the time series data over a long period of time. A trend can be increasing (upward), decreasing (downward), or horizontal (stationary).
Seasonality: The seasonality component exhibits a trend that repeats with respect to timing, direction, and magnitude. Some examples include an increase in water consumption in summer due to hot weather conditions.
Cyclical Component: These are the trends with no set repetition over a particular period. A cycle refers to the period of ups and downs, booms and slumps in a time series, mostly observed in business cycles. These cycles do not exhibit a seasonal variation but generally occur over a period of 3 to 12 years depending on the nature of the time series.
Irregular Variation: These are fluctuations in the time series data that become evident when trend and cyclical variations are removed. These variations are unpredictable, erratic, and may or may not be random.
ETS Decomposition: ETS Decomposition is used to separate different components of a time series. The term ETS stands for Error, Trend, and Seasonality.

In this notebook, we conduct time series analysis of US Air Passengers over time.

2. Types of data

Table of Contents

As stated above, the time series analysis is the statistical analysis of the time series data. A time series data means that data is recorded at different time periods or intervals. The time series data may be of three types:-

1 Time series data - The observations of the values of a variable recorded at different points in time is called time series data.

2 Cross sectional data - It is the data of one or more variables recorded at the same point in time.

3 Pooled data- It is the combination of time series data and cross sectional data.

3. Time Series terminology

Table of Contents

There are various terms and concepts in time series that we should know. These are as follows:-

1 Dependence- It refers to the association of two observations of the same variable at prior time periods.

2 Stationarity- It shows the mean value of the series that remains constant over the time period. If past effects accumulate and the values increase towards infinity then stationarity is not met.

3 Differencing- Differencing is used to make the series stationary and to control the auto-correlations. There may be some cases in time series analyses where we do not require differencing and over-differenced series can produce wrong estimates.

4 Specification - It may involve the testing of the linear or non-linear relationships of dependent variables by using time series models such as ARIMA models.

5 Exponential Smoothing - Exponential smoothing in time series analysis predicts the one next period value based on the past and current value. It involves averaging of data such that the non-systematic components of each individual case or observation cancel out each other. The exponential smoothing method is used to predict the short term prediction.

6 Curve fitting - Curve fitting regression in time series analysis is used when data is in a non-linear relationship.

7 ARIMA - ARIMA stands for Auto Regressive Integrated Moving Average.

4. Time Series Analysis

Table of Contents

4.1 Basic set up

Table of Contents

install.packages(c("fpp3", "lubridate", "dplyr", "tidyr", "forecast", "tseries", "imputeTS", "Metrics"))

## package 'fpp3' successfully unpacked and MD5 sums checked
## package 'lubridate' successfully unpacked and MD5 sums checked

## package 'dplyr' successfully unpacked and MD5 sums checked

## package 'tidyr' successfully unpacked and MD5 sums checked

## package 'forecast' successfully unpacked and MD5 sums checked

## package 'tseries' successfully unpacked and MD5 sums checked

## package 'imputeTS' successfully unpacked and MD5 sums checked

## package 'Metrics' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\ASUS\AppData\Local\Temp\Rtmp6VQ6Es\downloaded_packages

install.packages("seasonal")

## package 'seasonal' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\ASUS\AppData\Local\Temp\Rtmp6VQ6Es\downloaded_packages

# Loading core time series and data wrangling libraries
library(fpp3)         # Includes tsibble, lubridate, ggplot2, and others for time series
library(feasts)       # For time series decomposition
library(urca)         # For KPSS test
library(zoo)          # For rolling mean and std. deviation
library(tsibble)      # For working with time series data frames
library(readr)        # For reading the csv file
library(lubridate)    # For date-time handling
library(dplyr)        # Data manipulation
library(tidyr)        # Data tidying
library(forecast)     # ARIMA and forecasting models
library(tseries)      # Time series analysis functions
library(imputeTS)     # Missing value handling
library(Metrics)      # Performance evaluation metrics
library(ggplot2)      # For good visuals

4.2 Import data

Table of Contents

# Load the dataset
AirPassengers <- read.csv("~/AirPassengers.csv")

# Statistics of data:
head(AirPassengers)

##     Month X.Passengers
## 1 1949-01          112
## 2 1949-02          118
## 3 1949-03          132
## 4 1949-04          129
## 5 1949-05          121
## 6 1949-06          135

summary(AirPassengers)

##     Month            X.Passengers  
##  Length:144         Min.   :104.0  
##  Class :character   1st Qu.:180.0  
##  Mode  :character   Median :265.5  
##                     Mean   :280.3  
##                     3rd Qu.:360.5  
##                     Max.   :622.0

We should rename the column names.

colnames(AirPassengers) <- c("Date", "Number of Passengers")

head(AirPassengers)

##      Date Number of Passengers
## 1 1949-01                  112
## 2 1949-02                  118
## 3 1949-03                  132
## 4 1949-04                  129
## 5 1949-05                  121
## 6 1949-06                  135

Converting the data frame to a tsibble structure (with proper index, key and value variables)

AirPassengers_tsibble <- AirPassengers %>%
  mutate(
    Date = yearmonth(Date),
    Year = year(Date),
    Month = month(Date, label=TRUE)
  ) %>% # Since its a monthly data..
  as_tsibble(index = Date)

AirPassengers_tsibble

## # A tsibble: 144 x 4 [1M]
##        Date `Number of Passengers`  Year Month
##       <mth>                  <int> <dbl> <ord>
##  1 1949 Jan                    112  1949 Jan  
##  2 1949 Feb                    118  1949 Feb  
##  3 1949 Mar                    132  1949 Mar  
##  4 1949 Apr                    129  1949 Apr  
##  5 1949 May                    121  1949 May  
##  6 1949 Jun                    135  1949 Jun  
##  7 1949 Jul                    148  1949 Jul  
##  8 1949 Aug                    148  1949 Aug  
##  9 1949 Sep                    136  1949 Sep  
## 10 1949 Oct                    119  1949 Oct  
## # ℹ 134 more rows

4.3 Analysis of Data

4.3.1 Handling Missing Values

Check for any missing values and handling them:

sapply(AirPassengers_tsibble, function(x) sum(is.na(x)))

##                 Date Number of Passengers                 Year 
##                    0                    0                    0 
##                Month 
##                    0

No missing values are present in the dataset.

4.3.2 Data Transformation

Apply transformations, if needed, to stabilize variance or remove seasonality.

AirPassengers_tsibble <- AirPassengers_tsibble %>%
  mutate(
    LogPassengers = log(`Number of Passengers`),         # Log transformation
    DiffPassengers = difference(LogPassengers),          # First-order differencing
    SeasonalDiffPassengers = difference(LogPassengers, lag = 12)  # Seasonal differencing
  )

tail(AirPassengers_tsibble)

## # A tsibble: 6 x 7 [1M]
##       Date `Number of Passengers`  Year Month LogPassengers DiffPassengers
##      <mth>                  <int> <dbl> <ord>         <dbl>          <dbl>
## 1 1960 Jul                    622  1960 Jul            6.43         0.151 
## 2 1960 Aug                    606  1960 Aug            6.41        -0.0261
## 3 1960 Sep                    508  1960 Sep            6.23        -0.176 
## 4 1960 Oct                    461  1960 Oct            6.13        -0.0971
## 5 1960 Nov                    390  1960 Nov            5.97        -0.167 
## 6 1960 Dec                    432  1960 Dec            6.07         0.102 
## # ℹ 1 more variable: SeasonalDiffPassengers <dbl>

AirPassengers_tsibble %>%
  pivot_longer(cols = c(`Number of Passengers`, LogPassengers, DiffPassengers, SeasonalDiffPassengers),
               names_to = "Transformation", values_to = "Value") %>%
  ggplot(aes(x = Date, y = Value)) +
  geom_line() +
  facet_wrap(~ Transformation, scales = "free_y") +
  labs(title = "Transformations of US Air Passengers Time Series")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Insights from the Transformations

The transformations provide several insights into the US Air Passengers time series data, particularly concerning trends, seasonality, and variance stabilization:

Original Series (Number of Passengers):
- The original series shows a clear upward trend, indicating an increase in the number of passengers over time.
- There is also evident seasonality, with recurring peaks and troughs each year, likely due to seasonal travel patterns.
- The variance appears to increase over time, as later values have larger fluctuations, making it challenging to model directly.
Log Transformation (LogPassengers):
- The log transformation stabilizes the variance, reducing the impact of larger values and creating a more consistent scale over time.
- The trend and seasonality remain, but the variation across time becomes more uniform, which is helpful for models that assume constant variance.
First-Order Differencing (DiffPassengers):
- First-order differencing removes the upward trend, resulting in a series that oscillates around zero.
- This transformation helps in making the data stationary, as it eliminates long-term trends.
- However, some seasonality remains, indicating that an additional seasonal differencing step may be needed.
Seasonal Differencing (SeasonalDiffPassengers):
- Seasonal differencing (with a lag of 12 for monthly data) further removes the seasonal component, resulting in a series with more stable fluctuations and no apparent trend or seasonality.
- This transformation is crucial for making the data stationary, which is often required for time series forecasting models.

Key Takeaways

The Log Transformation effectively stabilizes the variance, making the series more suitable for models that require constant variance.
Differencing and Seasonal Differencing are effective in removing trend and seasonality, which helps make the data stationary.
After these transformations, the series is closer to being stationary, which is ideal for applying models like ARIMA for forecasting.

These insights guide us in preparing the data for modeling, as stationary data typically results in more accurate forecasts and better model performance.

4.3.3 Summary Statistics

yearly_summary <- AirPassengers_tsibble %>%
  index_by(Year) %>%                          # Aggregates data by year
  summarise(
    Mean = mean(`Number of Passengers`, na.rm = TRUE),
    Variance = var(`Number of Passengers`, na.rm = TRUE),
    Standard_Deviation = sd(`Number of Passengers`, na.rm = TRUE),
    Min = min(`Number of Passengers`, na.rm = TRUE),
    Max = max(`Number of Passengers`, na.rm = TRUE),
    Sum = sum(`Number of Passengers`, na.rm = TRUE)
  )

# Display the yearly summary statistics
yearly_summary

## # A tsibble: 12 x 7 [1Y]
##     Year  Mean Variance Standard_Deviation   Min   Max   Sum
##    <dbl> <dbl>    <dbl>              <dbl> <int> <int> <int>
##  1  1949  127.     188.               13.7   104   148  1520
##  2  1950  140.     364.               19.1   114   170  1676
##  3  1951  170.     340.               18.4   145   199  2042
##  4  1952  197      527.               23.0   171   242  2364
##  5  1953  225      810.               28.5   180   272  2700
##  6  1954  239.    1220.               34.9   188   302  2867
##  7  1955  284     1776.               42.1   233   364  3408
##  8  1956  328.    2291.               47.9   271   413  3939
##  9  1957  368.    3351.               57.9   301   467  4421
## 10  1958  381     4164.               64.5   310   505  4572
## 11  1959  428.    4876.               69.8   342   559  5140
## 12  1960  476.    6043.               77.7   390   622  5714

Provides the summary on Passengers data, grouped by each year.

5. Visualize the Time Series

Table of Contents

# Create a box plot for the number of passengers by year
ggplot(AirPassengers_tsibble, aes(x = factor(Year), y = `Number of Passengers`)) +
  geom_boxplot(show.legend = FALSE, outlier.shape = NA) +
  stat_summary(fun = "mean", geom = "point", shape = 20, color = "red", size = 2) +
  labs(title = "Year-wise Box Plot of Passenger Count", x = "Year", y = "Number of Passengers") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_discrete(limits = as.character(1949:1960))  # Set x-axis labels from 1949 to 1960

6. Patterns in a Time Series

Table of Contents

Any time series visualization may consist of the following components: Base Level + Trend + Seasonality + Error.

Trend

A trend is observed when there is an increasing or decreasing slope observed in the time series.

Seasonality

A seasonality is observed when there is a distinct repeated pattern observed between regular intervals due to seasonal factors. It could be because of the month of the year, the day of the month, weekdays or even time of the day.

However, It is not mandatory that all time series must have a trend and/or seasonality. A time series may not have a distinct trend but have a seasonality and vice-versa.

plot_df <- function(data, x, y, title = "", xlabel = "Date", ylabel = "Number of Passengers") {
  ggplot(data, aes(x = {{ x }}, y = {{ y }})) +
    geom_line(color = "blue") +
    labs(title = title, x = xlabel, y = ylabel) +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5))  # Center the title
}

# Use the function to plot the data
plot_df(AirPassengers_tsibble, x = Date, y = `Number of Passengers`, title = "Trend and Seasonality")

Cyclic behaviour

Another important thing to consider is the cyclic behaviour. It happens when the rise and fall pattern in the series does not happen in fixed calendar-based intervals. We should not confuse ‘cyclic’ effect with ‘seasonal’ effect.
If the patterns are not of fixed calendar based frequencies, then it is cyclic. Because, unlike the seasonality, cyclic effects are typically influenced by the business and other socio-economic factors.

7. Additive and Multiplicative Time Series

Table of Contents

We may have different combinations of trends and seasonality. Depending on the nature of the trends and seasonality, a time series can be modeled as an additive or multiplicative time series. Each observation in the series can be expressed as either a sum or a product of the components.

Additive time series:

Value = Base Level + Trend + Seasonality + Error

Multiplicative Time Series:

Value = Base Level x Trend x Seasonality x Error

8. Decomposition of a Time Series

Table of Contents

Decomposition of a time series can be performed by considering the series as an additive or multiplicative combination of the base level, trend, seasonal index and the residual term.
The seasonal_decompose in statsmodels implements this conveniently.

# Additive Decomposition using STL
additive_decomposition <- AirPassengers_tsibble %>%
  model(STL(`Number of Passengers` ~ season(window = "periodic"))) %>%
  components()

# Multiplicative Decomposition using X11
multiplicative_decomposition <- AirPassengers_tsibble %>%
  model(X_13ARIMA_SEATS(`Number of Passengers` ~ x11())) %>%
  components()

# Plot Additive Decomposition
autoplot(additive_decomposition) +
  labs(title = "Additive Decomposition", x = "Date", y = "Number of Passengers") +
  theme_minimal()

# Plot Multiplicative Decomposition
autoplot(multiplicative_decomposition) +
  labs(title = "Multiplicative Decomposition", x = "Date", y = "Number of Passengers") +
  theme_minimal()

decomposed <- decompose(ts(AirPassengers_tsibble$`Number of Passengers`, frequency = 12), type = "multiplicative")
plot(decomposed)

- If we look at the residuals of the additive decomposition closely, it has some pattern left over.

The multiplicative decomposition, looks quite random which is good. So ideally, multiplicative decomposition should be preferred for this particular series.

9. Stationary and Non-Stationary Time Series

Table of Contents

Now, we wil discuss Stationary and Non-Stationary Time Series. Stationarity is a property of a time series. A stationary series is one where the values of the series is not a function of time. So, the values are independent of time.
Hence the statistical properties of the series like mean, variance and autocorrelation are constant over time. Autocorrelation of the series is nothing but the correlation of the series with its previous values.
A stationary time series is independent of seasonal effects as well.
Now, we will plot some examples of stationary and non-stationary time series for clarity.

Stationary and Non-Stationary Time Series

Image source: Machine Learning Plus

We can covert any non-stationary time series into a stationary one by applying a suitable transformation. Mostly statistical forecasting methods are designed to work on a stationary time series. The first step in the forecasting process is typically to do some transformation to convert a non-stationary series to stationary.

10. How to make a time series stationary?

Table of Contents

We can apply some sort of transformation to make the time-series stationary. These transformation may include:

Differencing the Series (once or more)
Take the log of the series
Take the nth root of the series
Combination of the above

The most commonly used and convenient method to stationarize the series is by differencing the series at least once until it becomes approximately stationary.

10.1 Introduction to Differencing

Table of Contents

If Y_t is the value at time t, then the first difference of Y = Yt – Yt-1. In simpler terms, differencing the series is nothing but subtracting the next value by the current value.
If the first difference doesn’t make a series stationary, we can go for the second differencing and so on.
- For example, consider the following series: [1, 5, 2, 12, 20]
- First differencing gives: [5-1, 2-5, 12-2, 20-12] = [4, -3, 10, 8]
- Second differencing gives: [-3-4, -10-3, 8-10] = [-7, -13, -2]

10.2 Reasons to convert a non-stationary series into stationary one before forecasting

Table of Contents

There are reasons why we want to convert a non-stationary series into a stationary one. These are given below:

Forecasting a stationary series is relatively easy and the forecasts are more reliable.
An important reason is, autoregressive forecasting models are essentially linear regression models that utilize the lag(s) of the series itself as predictors.
We know that linear regression works best if the predictors (X variables) are not correlated against each other. So, stationarizing the series solves this problem since it removes any persistent autocorrelation, thereby making the predictors(lags of the series) in the forecasting models nearly independent.

11. How to test for stationarity?

Table of Contents

The stationarity of a series can be checked by looking at the plot of the series.
Another method is to split the series into 2 or more contiguous parts and computing the summary statistics like the mean, variance and the autocorrelation. If the stats are quite different, then the series is not likely to be stationary.
There are several quantitative methods we can use to determine if a given series is stationary or not. This can be done using statistical tests called Unit Root Tests. This test checks if a time series is non-stationary and possess a unit root.
There are multiple implementations of Unit Root tests but we will use:

11.1 Kwiatkowski-Phillips-Schmidt-Shin – KPSS test (trend stationary)

Table of Contents

The KPSS test, on the other hand, is used to test for trend stationarity. The null hypothesis and the P-Value interpretation is just the opposite of ADH test.
Interested readers can learn more about the KPSS test from the below links:

https://en.wikipedia.org/wiki/KPSS_test

https://www.machinelearningplus.com/time-series/kpss-test-for-stationarity/

https://www.statisticshowto.com/kpss-test/

https://nwfsc-timeseries.github.io/atsa-labs/sec-boxjenkins-kpss.html

# Defining the stationarity test function:
test_stationarity <- function(timeseries, window = 52) {
  # Calculate rolling mean and rolling standard deviation
  rolmean <- rollmean(timeseries, k = window, fill = NA)
  rolstd <- rollapply(timeseries, width = window, FUN = sd, fill = NA, align = "center")
  
  # Plot original data, rolling mean, and rolling standard deviation
  p <- ggplot() +
    geom_line(aes(x = index(timeseries), y = coredata(timeseries)), color = "blue", linewidth = 0.8, linetype = "solid") +
    geom_line(aes(x = index(timeseries), y = rolmean), color = "red", linewidth = 0.8, linetype = "dashed") +
    geom_line(aes(x = index(timeseries), y = rolstd), color = "black", linewidth = 0.8, linetype = "dotted") +
    labs(title = "Rolling Mean & Standard Deviation",
         x = "Date",
         y = "Number of Passengers") +
    theme_minimal() +
    theme(plot.title = element_text(hjust = 0.5)) +
    guides(color = guide_legend(title = NULL)) +
    theme(legend.position = "bottom") +
    labs(color = "Legend") +
    scale_color_manual(values = c("blue", "red", "black"),
                       labels = c("Original", "Rolling Mean", "Rolling Std"))
  
  # Print the plot:
  print(p)
  
  # Perform the KPSS Test:
  print("Results of KPSS Test:")
  kptest <- ur.kpss(timeseries, type = "tau")
  summary(kptest)
}

test_stationarity(AirPassengers_tsibble$`Number of Passengers`)

## Warning: Removed 51 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 51 rows containing missing values or values outside the scale range
## (`geom_line()`).

## [1] "Results of KPSS Test:"

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: tau with 4 lags. 
## 
## Value of test-statistic is: 0.0961 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.119 0.146  0.176 0.216

By Observing the plot in the result of the KPSS Test, we infer the test statistic is less than critical value so we fail to reject the null hypothesis
This shows that the time series is stationary.

12. Difference between white noise and a stationary series

Table of Contents

Like a stationary series, the white noise is also not a function of time. So, its mean and variance does not change over time. But the difference is that, the white noise is completely random with a mean of 0. In white noise there is no pattern.
Mathematically, a sequence of completely random numbers with mean zero is a white noise.

# Generate 1000 random numbers from a standard normal distribution
rand_numbers <- rnorm(1000)

# Convert to a data frame for plotting
rand_df <- data.frame(Index = 1:1000, Value = rand_numbers)

# Plot the random numbers to represent white noise
ggplot(rand_df, aes(x = Index, y = Value)) +
  geom_line(color = "blue") +
  labs(title = "Random White Noise", x = "Index", y = "Value") +
  theme_minimal()

13. Model Building

Table of Contents

# Splitting the data:
train <- head(AirPassengers_tsibble$`Number of Passengers`, 120)
test <- tail(AirPassengers_tsibble$`Number of Passengers`, 24)

13.1 Simple Exponential Smoothing

# Smoothing Parameter:
alphas <- seq(0.01, 1, by=0.10)

# SES optimizer function to find the best alpha based on MAE
ses_optimizer <- function(train, test, alphas, step = 24) {
  best_alpha <- NULL
  best_mae <- Inf
  
  for (alpha in alphas) {
    # Fit the SES model with the current alpha
    ses_model <- ses(train, alpha = alpha, h = step)
    y_pred <- ses_model$mean
    
    # Calculate MAE between the forecast and the test set
    mae <- mae(test, y_pred)
    
    # Update best_alpha and best_mae if current MAE is lower
    if (mae < best_mae) {
      best_alpha <- alpha
      best_mae <- mae
    }
    cat("Alpha:", round(alpha, 2), "MAE:", round(mae, 4), "\n")
  }
  
  cat("Best Alpha is:", round(best_alpha, 2), "with Best MAE:", round(best_mae, 4), "\n")
  return(list(best_alpha = best_alpha, best_mae = best_mae))
}

# Find the best alpha
opt_results <- ses_optimizer(train, test, alphas)

## Alpha: 0.01 MAE: 190.126 
## Alpha: 0.11 MAE: 82.528 
## Alpha: 0.21 MAE: 82.8979 
## Alpha: 0.31 MAE: 89.8377 
## Alpha: 0.41 MAE: 99.0585 
## Alpha: 0.51 MAE: 107.5558 
## Alpha: 0.61 MAE: 113.7514 
## Alpha: 0.71 MAE: 117.2224 
## Alpha: 0.81 MAE: 118.1776 
## Alpha: 0.91 MAE: 117.2438 
## Best Alpha is: 0.11 with Best MAE: 82.528

best_ses_alpha <- opt_results$best_alpha

# Fit the final SES model with the best alpha
final_ses_model <- ses(train, alpha = best_ses_alpha, h = 24)
y_pred_ses <- final_ses_model$mean
forecast_errors <- test - y_pred_ses
forecast_se <- sd(forecast_errors)

# Prediction intervals
lower_bounds <- y_pred_ses - 1.96 * forecast_se
upper_bounds <- y_pred_ses + 1.96 * forecast_se

# Plotting function
plot_ses <- function(train, test, y_pred, lower_bounds, upper_bounds, title = "Simple Exponential Smoothing Forecast") {
  # Create a combined data frame for plotting
  df <- data.frame(
    Date = c(1:length(train), (length(train) + 1):(length(train) + length(test))),
    Passengers = c(train, test),
    DataType = c(rep("Train", length(train)), rep("Test", length(test)))
  )
  
  # Predicted values data frame
  df_pred <- data.frame(
    Date = (length(train) + 1):(length(train) + length(y_pred)),
    Passengers = y_pred,
    Lower = lower_bounds,
    Upper = upper_bounds,
    DataType = rep("Prediction", length(y_pred))
  )
  
  # Plot the training, test, and predicted data
  p <- ggplot() +
    geom_line(data = df, aes(x = Date, y = Passengers, color = DataType), size = 1) +
    geom_line(data = df_pred, aes(x = Date, y = Passengers, color = DataType), size = 1) +
    geom_ribbon(data = df_pred, aes(x = Date, ymin = Lower, ymax = Upper, fill = DataType), alpha = 0.3) +
    labs(title = title, x = "Time", y = "Number of Passengers") +
    scale_color_manual(values = c("Train" = "blue", "Test" = "green", "Prediction" = "red")) +
    theme_minimal() +
    theme(legend.position = "bottom")
  
  print(p)
}

# Plot the SES forecast
plot_ses(train, test, y_pred_ses, lower_bounds, upper_bounds, title = "Simple Exponential Smoothing Forecast")

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

13.2 Double Exponential Smoothing

# Smoothing parameters:
alphas <- seq(0.01, 1, by = 0.10)
betas <- seq(0.01, 1, by = 0.10)

# Double Exponential Smoothing optimizer function with error handling
des_optimizer <- function(train, test, alphas, betas, step = 24) {
  best_alpha <- NULL
  best_beta <- NULL
  best_mae <- Inf
  
  for (alpha in alphas) {
    for (beta in betas) {
      # Try fitting the model and handle errors
      tryCatch({
        # Fit the Double Exponential Smoothing model with current alpha and beta
        des_model <- holt(train, alpha = alpha, beta = beta, h = step, damped = FALSE, trend = "additive")
        y_pred <- des_model$mean
        
        # Calculate MAE between forecast and test set
        mae <- mae(test, y_pred)
        
        # Update best_alpha, best_beta, and best_mae if current MAE is lower
        if (mae < best_mae) {
          best_alpha <- alpha
          best_beta <- beta
          best_mae <- mae
        }
        cat("Alpha:", round(alpha, 2), "Beta:", round(beta, 2), "MAE:", round(mae, 4), "\n")
      }, error = function(e) {
        # Skip invalid models and print a message
        cat("Alpha:", round(alpha, 2), "Beta:", round(beta, 2), "could not fit a valid model.\n")
      })
    }
  }
  
  cat("Best Alpha:", round(best_alpha, 2), "Best Beta:", round(best_beta, 2), "Best MAE:", round(best_mae, 4), "\n")
  return(list(best_alpha = best_alpha, best_beta = best_beta, best_mae = best_mae))
}

# Find the best alpha and beta
opt_results <- des_optimizer(train, test, alphas, betas)

## Alpha: 0.01 Beta: 0.01 MAE: 58.3642 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.11 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.21 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.31 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.41 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.51 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.61 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.71 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.01 Beta: 0.91 could not fit a valid model.
## Alpha: 0.11 Beta: 0.01 MAE: 56.2696 
## Alpha: 0.11 Beta: 0.11 MAE: 68.4861 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.21 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.31 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.41 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.51 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.61 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.71 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.11 Beta: 0.91 could not fit a valid model.
## Alpha: 0.21 Beta: 0.01 MAE: 60.0355 
## Alpha: 0.21 Beta: 0.11 MAE: 185.5454 
## Alpha: 0.21 Beta: 0.21 MAE: 865.9462 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.21 Beta: 0.31 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.21 Beta: 0.41 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.21 Beta: 0.51 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.21 Beta: 0.61 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.21 Beta: 0.71 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.21 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.21 Beta: 0.91 could not fit a valid model.
## Alpha: 0.31 Beta: 0.01 MAE: 69.0191 
## Alpha: 0.31 Beta: 0.11 MAE: 250.9023 
## Alpha: 0.31 Beta: 0.21 MAE: 724.9144 
## Alpha: 0.31 Beta: 0.31 MAE: 1054.183 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.31 Beta: 0.41 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.31 Beta: 0.51 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.31 Beta: 0.61 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.31 Beta: 0.71 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.31 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.31 Beta: 0.91 could not fit a valid model.
## Alpha: 0.41 Beta: 0.01 MAE: 78.7054 
## Alpha: 0.41 Beta: 0.11 MAE: 272.5327 
## Alpha: 0.41 Beta: 0.21 MAE: 607.6325 
## Alpha: 0.41 Beta: 0.31 MAE: 840.2858 
## Alpha: 0.41 Beta: 0.41 MAE: 842.9667 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.41 Beta: 0.51 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.41 Beta: 0.61 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.41 Beta: 0.71 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.41 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.41 Beta: 0.91 could not fit a valid model.
## Alpha: 0.51 Beta: 0.01 MAE: 87.9837 
## Alpha: 0.51 Beta: 0.11 MAE: 269.3238 
## Alpha: 0.51 Beta: 0.21 MAE: 513.0025 
## Alpha: 0.51 Beta: 0.31 MAE: 678.7275 
## Alpha: 0.51 Beta: 0.41 MAE: 683.8468 
## Alpha: 0.51 Beta: 0.51 MAE: 568.958 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.51 Beta: 0.61 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.51 Beta: 0.71 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.51 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.51 Beta: 0.91 could not fit a valid model.
## Alpha: 0.61 Beta: 0.01 MAE: 94.4417 
## Alpha: 0.61 Beta: 0.11 MAE: 253.8162 
## Alpha: 0.61 Beta: 0.21 MAE: 434.8159 
## Alpha: 0.61 Beta: 0.31 MAE: 552.246 
## Alpha: 0.61 Beta: 0.41 MAE: 551.2462 
## Alpha: 0.61 Beta: 0.51 MAE: 448.5929 
## Alpha: 0.61 Beta: 0.61 MAE: 274.5915 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.61 Beta: 0.71 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.61 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.61 Beta: 0.91 could not fit a valid model.
## Alpha: 0.71 Beta: 0.01 MAE: 97.7789 
## Alpha: 0.71 Beta: 0.11 MAE: 233.3323 
## Alpha: 0.71 Beta: 0.21 MAE: 369.8343 
## Alpha: 0.71 Beta: 0.31 MAE: 453.105 
## Alpha: 0.71 Beta: 0.41 MAE: 447.3636 
## Alpha: 0.71 Beta: 0.51 MAE: 361.3254 
## Alpha: 0.71 Beta: 0.61 MAE: 220.8079 
## Alpha: 0.71 Beta: 0.71 MAE: 69.0392 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.71 Beta: 0.81 could not fit a valid model.
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.71 Beta: 0.91 could not fit a valid model.
## Alpha: 0.81 Beta: 0.01 MAE: 98.2514 
## Alpha: 0.81 Beta: 0.11 MAE: 211.7285 
## Alpha: 0.81 Beta: 0.21 MAE: 316.1826 
## Alpha: 0.81 Beta: 0.31 MAE: 375.9728 
## Alpha: 0.81 Beta: 0.41 MAE: 368.3713 
## Alpha: 0.81 Beta: 0.51 MAE: 298.9904 
## Alpha: 0.81 Beta: 0.61 MAE: 187.98 
## Alpha: 0.81 Beta: 0.71 MAE: 68.8423 
## Alpha: 0.81 Beta: 0.81 MAE: 102.2852 
## [1] "Model: ETS(A,A,N)"
## Alpha: 0.81 Beta: 0.91 could not fit a valid model.
## Alpha: 0.91 Beta: 0.01 MAE: 96.6515 
## Alpha: 0.91 Beta: 0.11 MAE: 190.9407 
## Alpha: 0.91 Beta: 0.21 MAE: 272.1183 
## Alpha: 0.91 Beta: 0.31 MAE: 315.9012 
## Alpha: 0.91 Beta: 0.41 MAE: 308.1503 
## Alpha: 0.91 Beta: 0.51 MAE: 252.7779 
## Alpha: 0.91 Beta: 0.61 MAE: 164.5791 
## Alpha: 0.91 Beta: 0.71 MAE: 68.7598 
## Alpha: 0.91 Beta: 0.81 MAE: 83.6097 
## Alpha: 0.91 Beta: 0.91 MAE: 159.7716 
## Best Alpha: 0.11 Best Beta: 0.01 Best MAE: 56.2696

best_des_alpha <- opt_results$best_alpha
best_des_beta <- opt_results$best_beta

# Fit the final DES model with the best alpha and beta
final_des_model <- holt(train, alpha = best_des_alpha, beta = best_des_beta, h = 24, damped = FALSE, trend = "additive")
y_pred_des <- final_des_model$mean

# After fitting the final DES model
forecast_errors_des <- test - y_pred_des
forecast_se_des <- sd(forecast_errors_des)

# Prediction intervals
lower_bounds_des <- y_pred_des - 1.96 * forecast_se_des
upper_bounds_des <- y_pred_des + 1.96 * forecast_se_des

# Plotting function for Double Exponential Smoothing
plot_des <- function(train, test, y_pred, lower_bounds, upper_bounds, title = "Double Exponential Smoothing Forecast") {
  # Create a combined data frame for plotting
  df <- data.frame(
    Date = c(1:length(train), (length(train) + 1):(length(train) + length(test))),
    Passengers = c(train, test),
    DataType = c(rep("Train", length(train)), rep("Test", length(test)))
  )
  
  # Predicted values data frame
  df_pred <- data.frame(
    Date = (length(train) + 1):(length(train) + length(y_pred)),
    Passengers = y_pred,
    Lower = lower_bounds,
    Upper = upper_bounds,
    DataType = rep("Prediction", length(y_pred))
  )
  
  # Plot the training, test, and predicted data
  p <- ggplot() +
    geom_line(data = df, aes(x = Date, y = Passengers, color = DataType), size = 1) +
    geom_line(data = df_pred, aes(x = Date, y = Passengers, color = DataType), size = 1) +
    geom_ribbon(data = df_pred, aes(x = Date, ymin = Lower, ymax = Upper, fill = DataType), alpha = 0.3) +
    labs(title = title, x = "Time", y = "Number of Passengers") +
    scale_color_manual(values = c("Train" = "blue", "Test" = "green", "Prediction" = "red")) +
    theme_minimal() +
    theme(legend.position = "bottom")
  
  print(p)
}

# Plot the DES forecast
plot_des(train, test, y_pred_des, lower_bounds_des, upper_bounds_des, title = "DES MODEL")

13.3 Triple Exponential Smoothing

# Smoothing Parameters:
alpha <- seq(0.01, 1, by = 0.10)
beta <- seq(0.01, 1, by = 0.10)
gamma <- seq(0.01, 1, by = 0.10)

# Create combinations of alpha, beta, gamma
abg <- expand.grid(alpha = alpha, beta = beta, gamma = gamma)

# Monthly data: two full cycles
train_ts <- ts(train, frequency = 12)

# Triple Exponential Smoothing optimizer function using HoltWinters
tes_optimizer <- function(train_ts, abg, step = 24, ci_level = c(80, 95)) {
  best_model <- NULL
  best_mae <- Inf
  
  for(i in 1:nrow(abg)) {
    comb <- abg[i, ]
    tes_model <- tryCatch({HoltWinters(train_ts, 
                             alpha = comb$alpha, 
                             beta = comb$beta, 
                             gamma = comb$gamma, 
                             seasonal = "additive")
    }, error = function(e) return(NULL))
    
    if (!is.null(tes_model)) {
      tes_forecast <- forecast(tes_model, h = step, level = ci_level)
      mae <- mean(abs(tes_forecast$mean - test))  # Ensure 'test' is defined or adapt as needed
      
      if (mae < best_mae) {
        best_model <- tes_model
        best_mae <- mae
        best_alpha <- comb$alpha
        best_beta <- comb$beta
        best_gamma <- comb$gamma
      }
    }
  }
  
  cat(sprintf("Best Alpha: %.2f, Best Beta: %.2f, Best Gamma: %.2f, Best MAE: %.4f\n", 
              best_alpha, best_beta, best_gamma, best_mae))
  
  return(list(best_model = best_model, best_mae = best_mae))
}

results <- tes_optimizer(train_ts, abg)

## Best Alpha: 0.21, Best Beta: 0.41, Best Gamma: 0.61, Best MAE: 9.8513

# Final model using the best parameters
final_model <- results$best_model
final_forecast <- forecast(final_model, h = 24)

# Plotting function for Triple Exponential Smoothing
plot_tes <- function(train, test, forecast_obj, title = "Triple Exponential Smoothing Forecast") {
  # Extract prediction and confidence intervals
  y_pred <- forecast_obj$mean
  lower <- forecast_obj$lower[,2]
  upper <- forecast_obj$upper[,2]
  
  # Lengths of each data segment
  train_len <- length(train)
  test_len <- length(test)
  pred_len <- length(y_pred)
  
  train_dates <- 1:train_len
  forecast_dates <- (train_len + 1):(train_len + pred_len)
  
  # Create a combined data frame for plotting
  # Data frames for plotting
  df_train <- data.frame(Date = train_dates, Passengers = train, DataType = "Train")
  df_test <- data.frame(Date = forecast_dates, Passengers = test, DataType = "Test")
  df_pred <- data.frame(Date = forecast_dates, Passengers = y_pred, DataType = "Prediction")
  df_conf_int <- data.frame(Date = forecast_dates, Lower = lower, Upper = upper)
  
  # Combine data frames
  df_all <- rbind(df_train, df_test, df_pred)
  
  # Plot the training, test, and predicted data
  ggplot(df_all, aes(x = Date, y = Passengers, color = DataType)) +
    geom_line() +
    geom_ribbon(data = df_conf_int, aes(ymin = Lower, ymax = Upper, x = Date), fill = "orange", alpha = 0.3, inherit.aes = FALSE) +
    labs(title = title, x = "Time", y = "Number of Passengers") +
    scale_color_manual(values = c("Train" = "blue", "Test" = "green", "Prediction" = "red")) +
    theme_minimal() +
    theme(legend.position = "bottom")
}

plot_tes(train, test, final_forecast, title = "TES Model Forecast with 95% confidence interval")

13.4 Seasonal AutoRegressive Integrated Moving Average(SARIMA)

auto_fit <- auto.arima(train, seasonal=TRUE, stepwise=FALSE, approximation=FALSE, trace=TRUE)

## 
##  ARIMA(0,1,0)                    : 1138.843
##  ARIMA(0,1,0) with drift         : 1140.395
##  ARIMA(0,1,1)                    : 1127.123
##  ARIMA(0,1,1) with drift         : 1128.877
##  ARIMA(0,1,2)                    : 1127.182
##  ARIMA(0,1,2) with drift         : 1128.701
##  ARIMA(0,1,3)                    : 1125.796
##  ARIMA(0,1,3) with drift         : Inf
##  ARIMA(0,1,4)                    : 1108.02
##  ARIMA(0,1,4) with drift         : Inf
##  ARIMA(0,1,5)                    : 1106.023
##  ARIMA(0,1,5) with drift         : Inf
##  ARIMA(1,1,0)                    : 1130.76
##  ARIMA(1,1,0) with drift         : 1132.545
##  ARIMA(1,1,1)                    : 1125.638
##  ARIMA(1,1,1) with drift         : 1127.299
##  ARIMA(1,1,2)                    : 1118.45
##  ARIMA(1,1,2) with drift         : Inf
##  ARIMA(1,1,3)                    : 1120.32
##  ARIMA(1,1,3) with drift         : Inf
##  ARIMA(1,1,4)                    : 1110.116
##  ARIMA(1,1,4) with drift         : Inf
##  ARIMA(2,1,0)                    : 1127.053
##  ARIMA(2,1,0) with drift         : 1128.688
##  ARIMA(2,1,1)                    : 1113.397
##  ARIMA(2,1,1) with drift         : Inf
##  ARIMA(2,1,2)                    : Inf
##  ARIMA(2,1,2) with drift         : 1087.779
##  ARIMA(2,1,3)                    : 1119.766
##  ARIMA(2,1,3) with drift         : Inf
##  ARIMA(3,1,0)                    : 1126.612
##  ARIMA(3,1,0) with drift         : 1128.075
##  ARIMA(3,1,1)                    : 1114.967
##  ARIMA(3,1,1) with drift         : Inf
##  ARIMA(3,1,2)                    : 1113.326
##  ARIMA(3,1,2) with drift         : Inf
##  ARIMA(4,1,0)                    : 1119.263
##  ARIMA(4,1,0) with drift         : 1119.784
##  ARIMA(4,1,1)                    : 1118.672
##  ARIMA(4,1,1) with drift         : 1119.764
##  ARIMA(5,1,0)                    : 1121.282
##  ARIMA(5,1,0) with drift         : 1121.966
## 
## 
## 
##  Best model: ARIMA(2,1,2) with drift

# the selected model
summary(auto_fit)

## Series: train 
## ARIMA(2,1,2) with drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2   drift
##       1.6431  -0.9066  -1.8743  0.9688  2.1761
## s.e.  0.0371   0.0361   0.0572  0.0595  0.7107
## 
## sigma^2 = 488.2:  log likelihood = -537.51
## AIC=1087.03   AICc=1087.78   BIC=1103.7
## 
## Training set error measures:
##                     ME     RMSE      MAE        MPE     MAPE     MASE
## Training set 0.3246097 21.53481 17.21691 -0.5203958 7.005998 0.776948
##                    ACF1
## Training set 0.08174572

# Forecasting
forecasted_values <- forecast(auto_fit, h=24)

# Check diagnostics
checkresiduals(auto_fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,2) with drift
## Q* = 47.412, df = 6, p-value = 1.548e-08
## 
## Model df: 4.   Total lags used: 10

# Plotting function for SARIMA
plot_sarima <- function(train, test, best_fit, h = 24, title = "SARIMA Forecast") {
  # Forecast future values
  forecasted_values <- forecast(best_fit, h = h)
  
  # Lengths of each data segment
  train_len <- length(train)
  test_len <- length(test)
  forecast_dates <- (train_len + 1):(train_len + h)
  
  # Create a combined data frame for plotting
  df_train <- data.frame(
    Date = 1:train_len,
    Passengers = train,
    DataType = "Train"
  )
  
  df_test <- data.frame(
    Date = (train_len + 1):(train_len + test_len),
    Passengers = test,
    DataType = "Test"
  )
  
  df_pred <- data.frame(
    Date = (train_len + 1):(train_len + length(forecasted_values$mean)),
    Passengers = forecasted_values$mean,
    DataType = "Prediction"
  )
  
  # Data frames for confidence intervals
  df_conf_int <- data.frame(
    Date = forecast_dates,
    Lower = forecasted_values$lower[, "95%"],
    Upper = forecasted_values$upper[, "95%"]
  )
  
  # Combine data frames
  df_all <- rbind(df_train, df_test, df_pred)
  
  # Plot the training, test, and predicted data
  p <- ggplot(df_all, aes(x = Date, y = Passengers, color = DataType)) +
    geom_line() +
    geom_ribbon(data = df_conf_int, aes(ymin = Lower, ymax = Upper, x = Date), fill = "orange", alpha = 0.3, inherit.aes = FALSE) +
    labs(title = title, x = "Time", y = "Number of Passengers") +
    scale_color_manual(values = c("Train" = "blue", "Test" = "green", "Prediction" = "red")) +
    theme_minimal() +
    theme(legend.position = "bottom")
  
  print(p)
}

plot_sarima(train, test, best_fit = auto_fit)

14. Conclusion and Insights

Table of Contents

14.1 Summary

Summary and Analysis of Time Series Forecasting Models

In the evaluation of various time series forecasting models—Simple Exponential Smoothing (SES), Double Exponential Smoothing (DES), Triple Exponential Smoothing (TES), and Seasonal Autoregressive Integrated Moving Average (SARIMA)—Triple Exponential Smoothing emerged as the most effective model. The assessment was based on the accuracy of forecasting future data points in a given time series, where TES demonstrated superior performance compared to its counterparts.

Reasons for the Superior Performance of TES:

Incorporation of Seasonality: Unlike SES and DES, TES includes a seasonal component, which is crucial for datasets exhibiting regular seasonal variations. This allows TES to adjust its forecasts to align with periodic fluctuations, making it more accurate for data with inherent seasonality.
Level, Trend, and Seasonality Modeling: TES extends beyond DES by not only accounting for the level and trend components of the series but also explicitly modeling the seasonal variations. This comprehensive approach provides a more nuanced understanding and adjustment for changes in the data, enhancing predictive accuracy.
Flexibility in Parameters: TES allows for the adjustment of parameters related to level, trend, and seasonality. This flexibility enables the model to be finely tuned to the specific characteristics of the data, such as smoothing parameters and the type of seasonal patterns (additive or multiplicative), thus optimizing performance.
Robustness to Data Irregularities: The additional parameters and components in TES make it robust against anomalies and abrupt changes in the dataset, unlike simpler models like SES and DES, which might either underfit or overfit in the presence of noise and outliers.

In conclusion, Triple Exponential Smoothing’s superior performance can be attributed to its comprehensive approach to modeling time series data, encompassing level, trend, and seasonality. This holistic model structure allows TES to deliver more accurate forecasts, particularly in data environments where seasonal patterns play a significant role. The adaptability and robustness of TES make it an ideal choice for complex forecasting tasks where simpler models might falter.

14.2 Potential Improvements

Suggestions for Potential Improvements

To enhance the effectiveness of the Triple Exponential Smoothing (TES) model or the overall forecasting approach, consider implementing the following strategies:

Model Ensembling: Combine the strengths of TES with other forecasting models, such as SARIMA or machine learning methods like Random Forests or Gradient Boosting Machines. Ensembling can improve forecast accuracy by reducing variance and bias.
Incorporating Exogenous Variables: If external factors significantly impact the time series, incorporating these as exogenous variables could enhance the model’s accuracy. For example, adding variables related to economic indicators, weather conditions, or special events might yield better forecasts.
Error Analysis: Perform a detailed analysis of forecast errors to pinpoint where the model excels or fails. This insight can guide modifications to the model or its parameters, addressing any discovered biases or systematic errors.
Increasing Data Granularity: If possible, using data with higher frequency (e.g., switching from monthly to weekly data) could provide more detailed insights into seasonal patterns, potentially improving the model’s predictive accuracy.
Model Updating: Regular updates to the model with new data ensure it remains relevant and accurately reflects any shifts in trends or seasonality, crucial for maintaining forecasting accuracy over time.
Robustness Checks: Conduct stress tests and robustness checks by simulating various scenarios to evaluate how the model performs under different conditions. These tests can reveal weaknesses in the model’s assumptions and improve its reliability.
Advanced Time Series Techniques: Investigate advanced time series methodologies such as state space models or Bayesian approaches. These techniques may offer superior flexibility and robustness, particularly in handling complex or non-linear time series data.

By implementing these suggestions, the forecasting process can be significantly enhanced, leading to more reliable and precise predictions.

Time Series Analysis Group-1

Karan Kumawat(220150005)(k.kumawat@iitg.ac.in) Rishab Sonthalia(220150035)(s.rishab@iitg.ac.in) Sahil Kumar(220150017)(k.sahil@iitg.ac.in)

2024-11-09

Table of Contents

1. Introduction to Time-Series Analysis

Components of a Time-Series

2. Types of data

3. Time Series terminology

4. Time Series Analysis

4.1 Basic set up

4.2 Import data

4.3 Analysis of Data

4.3.1 Handling Missing Values

4.3.2 Data Transformation

Insights from the Transformations

Key Takeaways

4.3.3 Summary Statistics

5. Visualize the Time Series

6. Patterns in a Time Series

Trend

Seasonality

Cyclic behaviour

7. Additive and Multiplicative Time Series

Additive time series:

Multiplicative Time Series:

8. Decomposition of a Time Series

9. Stationary and Non-Stationary Time Series

10. How to make a time series stationary?

10.1 Introduction to Differencing

10.2 Reasons to convert a non-stationary series into stationary one before forecasting

11. How to test for stationarity?

11.1 Kwiatkowski-Phillips-Schmidt-Shin – KPSS test (trend stationary)

12. Difference between white noise and a stationary series

13. Model Building

13.1 Simple Exponential Smoothing

13.2 Double Exponential Smoothing

13.3 Triple Exponential Smoothing

13.4 Seasonal AutoRegressive Integrated Moving Average(SARIMA)

14. Conclusion and Insights

14.1 Summary

Summary and Analysis of Time Series Forecasting Models

Reasons for the Superior Performance of TES:

14.2 Potential Improvements

Suggestions for Potential Improvements