Assignment2: Time series analysis and forcasting with Nottem dataset

Load Required Packages:

#install.packages("forecast")
#install.packages("fpp3")
#install.packages("ggfortify")
#install.packages("ggplot2")
#install.packages("tsibble")
#install.packages("tseries")

library(forecast)
library(fpp3)
library(tseries)
library(itsmr)
library(ggfortify)
library(ggplot2)
library(tsibble)
library(seasonal)

Select Dataset:

select_dataset <- function(ID) {
  if(ID %% 6 == 0) {
    print("nottem dataset")
    
    if(ID %% 3 == 0) {
      print("Title 1: Forecasting with seasonal adjustment using classical decomposition - An application to Average Monthly Temperatures at Nottingham, 1920–1939")
    }
      
  } else if(ID %% 6 == 1) {
    print("USAccDeaths")
  } else if(ID %% 6 == 2) {
    print("austres")
  } else if(ID %% 6 == 3) {
    print("UKgas")
  } else if(ID %% 6 == 4) {
    print("AirPassengers")
  } else if(ID %% 6 == 5) {
    print("UKDriverDeaths")
  }
}
select_dataset(18)

## [1] "nottem dataset"
## [1] "Title 1: Forecasting with seasonal adjustment using classical decomposition - An application to Average Monthly Temperatures at Nottingham, 1920–1939"

?nottem

nottem {datasets}: R Documentation: Average Monthly Temperatures at Nottingham, 1920–1939
Description: A time series object containing average air temperatures at Nottingham Castle in degrees Fahrenheit for 20 years.

nottem

##       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1920 40.6 40.8 44.4 46.7 54.1 58.5 57.7 56.4 54.3 50.5 42.9 39.8
## 1921 44.2 39.8 45.1 47.0 54.1 58.7 66.3 59.9 57.0 54.2 39.7 42.8
## 1922 37.5 38.7 39.5 42.1 55.7 57.8 56.8 54.3 54.3 47.1 41.8 41.7
## 1923 41.8 40.1 42.9 45.8 49.2 52.7 64.2 59.6 54.4 49.2 36.3 37.6
## 1924 39.3 37.5 38.3 45.5 53.2 57.7 60.8 58.2 56.4 49.8 44.4 43.6
## 1925 40.0 40.5 40.8 45.1 53.8 59.4 63.5 61.0 53.0 50.0 38.1 36.3
## 1926 39.2 43.4 43.4 48.9 50.6 56.8 62.5 62.0 57.5 46.7 41.6 39.8
## 1927 39.4 38.5 45.3 47.1 51.7 55.0 60.4 60.5 54.7 50.3 42.3 35.2
## 1928 40.8 41.1 42.8 47.3 50.9 56.4 62.2 60.5 55.4 50.2 43.0 37.3
## 1929 34.8 31.3 41.0 43.9 53.1 56.9 62.5 60.3 59.8 49.2 42.9 41.9
## 1930 41.6 37.1 41.2 46.9 51.2 60.4 60.1 61.6 57.0 50.9 43.0 38.8
## 1931 37.1 38.4 38.4 46.5 53.5 58.4 60.6 58.2 53.8 46.6 45.5 40.6
## 1932 42.4 38.4 40.3 44.6 50.9 57.0 62.1 63.5 56.3 47.3 43.6 41.8
## 1933 36.2 39.3 44.5 48.7 54.2 60.8 65.5 64.9 60.1 50.2 42.1 35.8
## 1934 39.4 38.2 40.4 46.9 53.4 59.6 66.5 60.4 59.2 51.2 42.8 45.8
## 1935 40.0 42.6 43.5 47.1 50.0 60.5 64.6 64.0 56.8 48.6 44.2 36.4
## 1936 37.3 35.0 44.0 43.9 52.7 58.6 60.0 61.1 58.1 49.6 41.6 41.3
## 1937 40.8 41.0 38.4 47.4 54.1 58.6 61.4 61.8 56.3 50.9 41.4 37.1
## 1938 42.1 41.2 47.3 46.6 52.4 59.0 59.6 60.4 57.0 50.7 47.8 39.2
## 1939 39.4 40.9 42.4 47.8 52.4 58.0 60.7 61.8 58.2 46.7 46.6 37.8

Outlier detection:

tsoutliers(nottem)

## $index
## integer(0)
## 
## $replacements
## numeric(0)

nottem dataset is cleaned. There is no outliers.

1. Plot the series (use time-plot, ACF plot and seasonal-subseries plot) and describe its main features.

Seasonal-subseries plot:

ggsubseriesplot(nottem)

Description of nottem data’s main features using timeplot and Seasonal-subseries plot:
Timeplot:
1. Trend: There is no apparent trend in the data over this period.
2. Seasonality: Additive seasonality.
3. Stationarity: Nottem data is stationary because mean are same and variance are also same. But will do further tests to confirm its stationarity.

ACF plot: The plot of sample autocorrelation coefficient against lag.Just as correlation measures the extent of a linear relationship between two variables, autocorrelation measures the linear relationship between lagged values of a time series.
1. Trend: When data have a trend, the autocorrelations for the small lags tend to be larger and positive. So, The nottem timeseries exhibits no trend.
2. Seasonality: When data are seasonal, the autocorrelations will be larger at the seasonal lags (at multiples of the seasonal frequency) than for other lags. In nottem data the autocorrelations spike are larger at 1, 6, 12, … showing seasonality.
3. iid noise: Time series that show no autocorrelation are called white noise. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above). If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise. So, nottem time series is not white noise.

Seasonal-subseries plot:
An alternative plot that emphasises the seasonal patterns is where the data for each season are collected together in separate mini time plots.

The horizontal lines indicate the means for each month. This form of plot enables the underlying seasonal pattern to be seen clearly, and also shows the changes in seasonality over time. It is especially useful in identifying changes within particular seasons. In this example, the plot is not particularly revealing; but in some cases, this is the most useful way of viewing seasonal changes over time.Temperature is highest during the middle of the year in July and lowest in the beginning and end of the year that is in February and December.

#### 2. Decompose your data and get the seasonal adjusted data. Use any suitable transformation (if required for your study seasonal adjusted data) and take the suitable number of differences (if required) to get a stationary series.

ADF (Augmented Dickey-Fuller) Test:
In order to test the stationarity of the time series, let’s run the Augmented Dickey-Fuller Test using the adf.test function from the tseries R package.

First set the hypothesis test:
The null hypothesis H0 : that the time series is non stationary
The alternative hypothesis HA : that the time series is stationary

adf.test(nottem)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  nottem
## Dickey-Fuller = -12.998, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

As a rule of thumb, where the p-value is less than 5%, we strong evidence against the null hypothesis, so we reject the null hypothesis. As per the test results above, the p-value is 0.01 which is <0.05 therefore we reject the null in favour of the alternative hypothesis that the time series is stationary.

KPSS (Kwiatkowski-Phillips-Schmidt-Shin) Test:
In order to test the stationarity of the time series, let’s run the Kwiatkowski-Phillips-Schmidt-Shin Test using the kpss.test function from the tseries R package.

First set the hypothesis test:
The null hypothesis H0 : that the time series is stationary
The alternative hypothesis HA : that the time series is non stationary

kpss.test(nottem)

## 
##  KPSS Test for Level Stationarity
## 
## data:  nottem
## KPSS Level = 0.032053, Truncation lag parameter = 4, p-value = 0.1

As a rule of thumb, where the p-value is greater than 5%, we do not reject the null hypothesis, so we accept the null hypothesis. As per the test results above, the p-value is 0.1 which is >0.05 therefore we accept the null hypothesis that the time series is stationary.

Stationary time series have constant mean and constant variance. It means that for any observation the value will not be influenced by trend or seasonality.

Any Transformation will not be necessary for nottem data because nottem data variance is constant (stationary).

And we don’t need to take differences to get a stationary series.

Classical Decomposition for additive seasonality to get seasonal adjusted data:

decompose_nottem <- decompose(nottem, type="additive")
plot(decompose_nottem)

Seasonal Data:

seasonal_data <- decompose_nottem$seasonal
#seasonal_data
plot(seasonal_data)

# Substructing seasonality from training data
seasonal_adj_data <- nottem - decompose_nottem$seasonal
#seasonal_adj_data

Splitting the data into training (70%) and test (30%) data sets:

len_nottem <- length(nottem)
cat("Length of nottem dataset = ", len_nottem)

## Length of nottem dataset =  240

cat("\n")

traindatacount <- (70 * len_nottem) / 100
cat("Length of training dataset = ", traindatacount)

## Length of training dataset =  168

cat("\n")

testdatacount <- (30 * len_nottem) / 100
cat("Length of training dataset = ", testdatacount)

## Length of training dataset =  72

training_nottem <- ts(seasonal_adj_data[1:168], frequency = 12, start = c(1920,1))
training_nottem

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1920 49.93936 50.69989 51.34660 49.45735 50.64660 49.51349 44.73279 44.94090
## 1921 53.53936 49.69989 52.04660 49.75735 50.64660 49.71349 53.33279 48.44090
## 1922 46.83936 48.59989 46.44660 44.85735 52.24660 48.81349 43.83279 42.84090
## 1923 51.13936 49.99989 49.84660 48.55735 45.74660 43.71349 51.23279 48.14090
## 1924 48.63936 47.39989 45.24660 48.25735 49.74660 48.71349 47.83279 46.74090
## 1925 49.33936 50.39989 47.74660 47.85735 50.34660 50.41349 50.53279 49.54090
## 1926 48.53936 53.29989 50.34660 51.65735 47.14660 47.81349 49.53279 50.54090
## 1927 48.73936 48.39989 52.24660 49.85735 48.24660 46.01349 47.43279 49.04090
## 1928 50.13936 50.99989 49.74660 50.05735 47.44660 47.41349 49.23279 49.04090
## 1929 44.13936 41.19989 47.94660 46.65735 49.64660 47.91349 49.53279 48.84090
## 1930 50.93936 46.99989 48.14660 49.65735 47.74660 51.41349 47.13279 50.14090
## 1931 46.43936 48.29989 45.34660 49.25735 50.04660 49.41349 47.63279 46.74090
## 1932 51.73936 48.29989 47.24660 47.35735 47.44660 48.01349 49.13279 52.04090
## 1933 45.53936 49.19989 51.44660 51.45735 50.74660 51.81349 52.53279 53.44090
##           Sep      Oct      Nov      Dec
## 1920 46.89989 49.84529 49.51765 49.16020
## 1921 49.59989 53.54529 46.31765 52.16020
## 1922 46.89989 46.44529 48.41765 51.06020
## 1923 46.99989 48.54529 42.91765 46.96020
## 1924 48.99989 49.14529 51.01765 52.96020
## 1925 45.59989 49.34529 44.71765 45.66020
## 1926 50.09989 46.04529 48.21765 49.16020
## 1927 47.29989 49.64529 48.91765 44.56020
## 1928 47.99989 49.54529 49.61765 46.66020
## 1929 52.39989 48.54529 49.51765 51.26020
## 1930 49.59989 50.24529 49.61765 48.16020
## 1931 46.39989 45.94529 52.11765 49.96020
## 1932 48.89989 46.64529 50.21765 51.16020
## 1933 52.69989 49.54529 48.71765 45.16020

test_nottem <- ts(seasonal_adj_data[169:240], frequency = 12, start = c(1934,1))
test_nottem

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1934 48.73936 48.09989 47.34660 49.65735 49.94660 50.61349 53.53279 48.94090
## 1935 49.33936 52.49989 50.44660 49.85735 46.54660 51.51349 51.63279 52.54090
## 1936 46.63936 44.89989 50.94660 46.65735 49.24660 49.61349 47.03279 49.64090
## 1937 50.13936 50.89989 45.34660 50.15735 50.64660 49.61349 48.43279 50.34090
## 1938 51.43936 51.09989 54.24660 49.35735 48.94660 50.01349 46.63279 48.94090
## 1939 48.73936 50.79989 49.34660 50.55735 48.94660 49.01349 47.73279 50.34090
##           Sep      Oct      Nov      Dec
## 1934 51.79989 50.54529 49.41765 55.16020
## 1935 49.39989 47.94529 50.81765 45.76020
## 1936 50.69989 48.94529 48.21765 50.66020
## 1937 48.89989 50.24529 48.01765 46.46020
## 1938 49.59989 50.04529 54.41765 48.56020
## 1939 50.79989 46.04529 53.21765 47.16020

Choosing a model1 using the ACF & PACF plots:

# Checking ACF & PACF plot for the order of ARIMA model
ggtsdisplay(seasonal_adj_data)

In ACF & PACF plots, there are several spikes have crossed threshold. So we can choose ARMA. So we can choose ARMA. Here we fit auto arima model.

Fit Model1: auto.arima

M1 = auto.arima(training_nottem)
M1

## Series: training_nottem 
## ARIMA(1,0,0)(2,0,0)[12] with non-zero mean 
## 
## Coefficients:
##          ar1     sar1    sar2     mean
##       0.3262  -0.1474  0.1752  48.8294
## s.e.  0.0743   0.0777  0.0852   0.2470
## 
## sigma^2 = 4.613:  log likelihood = -365.41
## AIC=740.82   AICc=741.19   BIC=756.44

Test for the Model1 residuals diagnostic checking:

test(M1$residuals)

## Null hypothesis: Residuals are iid noise.
## Test                        Distribution Statistic   p-value
## Ljung-Box Q                Q ~ chisq(20)     19.49    0.4899
## McLeod-Li Q                Q ~ chisq(20)      9.68    0.9736
## Turning points T  (T-110.7)/5.4 ~ N(0,1)       103    0.1584
## Diff signs S       (S-83.5)/3.8 ~ N(0,1)        80     0.351
## Rank P           (P-7014)/364.5 ~ N(0,1)      7246    0.5245

Model1 Residual Diagnostic Assumptions and useful properties:
1. {et} uncorrelated. all of the spikes from ACF and PACF plots are within threshold bound. So, the residuals are uncorrelated. The lack of correlation suggesting the forecasts are good.

2. {et} have mean zero. The time plot of the residuals shows mean zero. Variation of the residuals stays much the same across the historical data, and no outlier, and therefore the residual variance can be treated as constant.

3. {et} are normally distributed. The Normal Q-Q Plot shows the residuals are normally distributed.

Forecast with Model1:

F1 = forecast::forecast(M1, h = length(test_nottem))
autoplot(F1)

Forecasting with seasonal adjustment:

F1_mean <- F1$mean
Forecast_M1 = F1_mean + seasonal_data[169:240]
Forecast_M1

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1934 39.33557 38.40711 41.09721 45.38670 51.74476 57.22868 61.30233 60.17087
## 1935 38.93623 39.07141 42.45720 46.63356 52.69804 58.42537 62.51841 61.11390
## 1936 39.54457 38.81700 41.66040 45.86912 52.12725 57.62311 61.60353 60.14615
## 1937 39.38490 38.97092 42.01620 46.20032 52.37845 57.95109 61.95152 60.45408
## 1938 39.51505 38.90364 41.82411 46.01753 52.24139 57.76215 61.73989 60.23910
## 1939 39.46788 38.94053 41.91478 46.10252 52.30562 57.84748 61.83208 60.32475
##           Sep      Oct      Nov      Dec
## 1934 55.67102 48.99574 42.47143 40.41857
## 1935 56.99005 49.68152 42.15383 38.68621
## 1936 56.01946 49.36938 42.26574 39.75096
## 1937 56.39371 49.53558 42.19359 39.29041
## 1938 56.16845 49.45637 42.22384 39.54489
## 1939 56.26724 49.49718 42.20673 39.42666

Model1 Forecasting Accuracy Measure:

et1 <- test_nottem - F1_mean ### Error

MSE1 <- sum(et1**2)/length(test_nottem) 
RMSE1 <- sqrt(MSE1)
ME1 <- sum(et1)/length(test_nottem)
MAE1 <- sum(abs(et1))/length(test_nottem)

#Percent of Error
PE1 <- 100*(et1)/test_nottem
MPE1 <- sum(PE1)/length(test_nottem) 
MAPE1 <- sum(abs(PE1))/length(test_nottem)

cat("Model1 Mean Square Error, MSE = ", MSE1)

## Model1 Mean Square Error, MSE =  4.647496

cat("\n")

cat("Model1 Root Mean Square Error, RMSE = ", RMSE1)

## Model1 Root Mean Square Error, RMSE =  2.155805

cat("\n")

cat("Model1 Mean Error, ME = ", ME1)

## Model1 Mean Error, ME =  0.7724585

cat("\n")

cat("Model1 Mean Absolute Error, MAE = ", MAE1)

## Model1 Mean Absolute Error, MAE =  1.683961

cat("\n")

cat("Model1 Mean Percent of Error, MPE = ", MPE1)

## Model1 Mean Percent of Error, MPE =  1.393614

cat("\n")

cat("Model1 Mean Absolute Percent of Error, MAPE = ", MAPE1)

## Model1 Mean Absolute Percent of Error, MAPE =  3.356576

Fit MODEL2: ETS

In ACF & PACF plots, there are several spikes have crossed threshold. Now, considering the significant spikes: p = 1, q = 3; ARIMA(1,0,3)

M2 <- ets(training_nottem)
M2

## ETS(M,N,N) 
## 
## Call:
##  ets(y = training_nottem) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
## 
##   Initial states:
##     l = 48.8045 
## 
##   sigma:  0.0473
## 
##      AIC     AICc      BIC 
## 1145.977 1146.123 1155.349

Test for the Model2 residuals diagnostic checking:

test(M2$residuals)

## Null hypothesis: Residuals are iid noise.
## Test                        Distribution Statistic   p-value
## Ljung-Box Q                Q ~ chisq(20)     33.47      0.03 *
## McLeod-Li Q                Q ~ chisq(20)        10    0.9682
## Turning points T  (T-110.7)/5.4 ~ N(0,1)        97    0.0119 *
## Diff signs S       (S-83.5)/3.8 ~ N(0,1)        83     0.894
## Rank P           (P-7014)/364.5 ~ N(0,1)      7154    0.7009

Model2 Residual Diagnostic Assumptions and useful properties:
1. {et2} slightly correlated. 2 of the spikes from ACF and PACF plots have crossed the threshold bound. So the residuals are correlated. The correlation suggesting the forecasts are not good.
2. {et2} have mean zero. The time plot of the residuals shows mean zero. Variation of the residuals stays much the same across the historical data, and no outlier, and therefore the residual variance can be treated as constant.
3. {et2} are normally distributed. The Normal Q-Q Plot shows the residuals are normally distributed.

Forecast with Model2:

F2 = forecast::forecast(M2, h = length(test_nottem))
autoplot(F2)

Forecasting with seasonal adjustment:

F2_mean <- F2$mean
Forecast_M2 = F2_mean + seasonal_data[169:240]
Forecast_M2

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1934 39.46514 38.90461 41.85790 46.04716 52.25790 57.79102 61.77172 60.26361
## 1935 39.46514 38.90461 41.85790 46.04716 52.25790 57.79102 61.77172 60.26361
## 1936 39.46514 38.90461 41.85790 46.04716 52.25790 57.79102 61.77172 60.26361
## 1937 39.46514 38.90461 41.85790 46.04716 52.25790 57.79102 61.77172 60.26361
## 1938 39.46514 38.90461 41.85790 46.04716 52.25790 57.79102 61.77172 60.26361
## 1939 39.46514 38.90461 41.85790 46.04716 52.25790 57.79102 61.77172 60.26361
##           Sep      Oct      Nov      Dec
## 1934 56.20461 49.45922 42.18685 39.44431
## 1935 56.20461 49.45922 42.18685 39.44431
## 1936 56.20461 49.45922 42.18685 39.44431
## 1937 56.20461 49.45922 42.18685 39.44431
## 1938 56.20461 49.45922 42.18685 39.44431
## 1939 56.20461 49.45922 42.18685 39.44431

Model2 Forecasting Accuracy Measure:

et2 <- test_nottem - F2_mean ### Error

MSE2 <- sum(et2**2)/length(test_nottem) 
RMSE2 <- sqrt(MSE2)
ME2 <- sum(et2)/length(test_nottem)
MAE2 <- sum(abs(et2))/length(test_nottem)

#Percent of Error
PE2 <- 100*(et2)/test_nottem
MPE2 <- sum(PE2)/length(test_nottem) 
MAPE2 <- sum(abs(PE2))/length(test_nottem)

cat("Model2 Mean Square Error, MSE = ", MSE2)

## Model2 Mean Square Error, MSE =  4.893781

cat("\n")

cat("Model2 Root Mean Square Error, RMSE = ", RMSE2)

## Model2 Root Mean Square Error, RMSE =  2.212189

cat("\n")

cat("Model2 Mean Error, ME = ", ME2)

## Model2 Mean Error, ME =  0.7857725

cat("\n")

cat("Model2 Mean Absolute Error, MAE = ", MAE2)

## Model2 Mean Absolute Error, MAE =  1.726933

cat("\n")

cat("Model2 Mean Percent of Error, MPE = ", MPE2)

## Model2 Mean Percent of Error, MPE =  1.413725

cat("\n")

cat("Model2 Mean Absolute Percent of Error, MAPE = ", MAPE2)

## Model2 Mean Absolute Percent of Error, MAPE =  3.441328

4. Apply the Boxcox transformation with the optimum lambda, then take suitable diferrences to obtain a stationary series.

box_nottem = BoxCox(nottem, lambda = BoxCox.lambda(nottem))
#box_nottem

Plot the BoxCox transformed nottem series (training data) using time-plot, ACF & PACF plot:

ggtsdisplay(box_nottem)

Take the suitable number of differences (if required) to get a stationary series:

D_box_nottem = diff(box_nottem, differences = 1)

Test stationarity of the boxCox transformed time series: ADF (Augmented Dickey-Fuller) Test:
Again test the stationarity of the time series, let’s run the Augmented Dickey-Fuller Test using the adf.test function from the tseries R package.

First set the hypothesis test:
The null hypothesis H0 : that the time series is non stationary
The alternative hypothesis HA : that the time series is stationary

adf.test(D_box_nottem)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  D_box_nottem
## Dickey-Fuller = -14.844, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

kpss.test(D_box_nottem)

## 
##  KPSS Test for Level Stationarity
## 
## data:  D_box_nottem
## KPSS Level = 0.014226, Truncation lag parameter = 4, p-value = 0.1

As a rule of thumb, where the p-value is greater than 5%, we do not reject the null hypothesis, so we accept the null hypothesis. As per the test results above, the p-value is 0.1 which is > 0.05 therefore we accept the null hypothesis that the time series is stationary.

Stationary time series have constant mean and constant variance. It means that for any observation the value will not be influenced by trend or seasonality.
Adjust Seasonality:

decompose_D_box_nottem <- decompose(D_box_nottem)
plot(decompose_D_box_nottem)

Seasonal Data:

seasonal_box_data <- decompose_D_box_nottem$seasonal
#seasonal_data
plot(seasonal_box_data)

# Substructing seasonality from training data
seasonal_adj_box_data <- box_nottem - decompose_D_box_nottem$seasonal
#seasonal_adj_data

Choosing a model using the ACF & PACF plots:

# Checking ACF & PACF plot for the order of ARIMA model
ggtsdisplay(seasonal_adj_box_data)

In ACF & PACF plots, there are several spikes have crossed threshold. So we can choose ARMA. Here we fit auto arima model.

Splitting the BoxCox transformed data into training (70%) and test (30%) datasets:

training_box_nottem=ts(box_nottem[1:168], frequency = 12, start = c(1920,1))
training_box_nottem

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1920 4.008455 4.014200 4.113238 4.172560 4.346051 4.438717 4.422376 4.395354
## 1921 4.107941 3.985202 4.131598 4.180090 4.346051 4.442768 4.587685 4.466804
## 1922 3.915767 3.952487 3.976367 4.050896 4.380556 4.424431 4.403732 4.350417
## 1923 4.042525 3.993974 4.072942 4.149687 4.233947 4.315047 4.549301 4.460839
## 1924 3.970441 3.915767 3.940370 4.141968 4.326210 4.422376 4.484530 4.432614
## 1925 3.991057 4.005573 4.014200 4.131598 4.339472 4.456846 4.536239 4.488435
## 1926 3.967467 4.086522 4.086522 4.226740 4.267030 4.403732 4.517338 4.507778
## 1927 3.973407 3.946444 4.136794 4.182590 4.292415 4.365578 4.476683 4.478650
## 1928 4.014200 4.022767 4.070208 4.187575 4.274005 4.395354 4.511611 4.478650
## 1929 3.828850 3.706022 4.019918 4.099953 4.323985 4.405817 4.517338 4.474714
## 1930 4.036912 3.903276 4.025609 4.177585 4.280941 4.476683 4.470765 4.500077
## 1931 3.903276 3.943411 3.943411 4.167514 4.332859 4.436686 4.480613 4.432614
## 1932 4.059210 3.943411 3.999787 4.118512 4.274005 4.407899 4.509696 4.536239
## 1933 3.874693 3.970441 4.115878 4.221912 4.348236 4.484530 4.573201 4.562228
##           Sep      Oct      Nov      Dec
## 1920 4.350417 4.264696 4.072942 3.985202
## 1921 4.407899 4.348236 3.982264 4.070208
## 1922 4.350417 4.182590 4.042525 4.039722
## 1923 4.352594 4.233947 3.877903 3.918869
## 1924 4.395354 4.248234 4.113238 4.091912
## 1925 4.321757 4.252960 3.934265 3.877903
## 1926 4.418257 4.172560 4.036912 3.985202
## 1927 4.359103 4.260015 4.056445 3.842125
## 1928 4.374159 4.257668 4.075670 3.909537
## 1929 4.464819 4.233947 4.072942 4.045322
## 1930 4.407899 4.274005 4.075670 3.955498
## 1931 4.339472 4.170039 4.141968 4.008455
## 1932 4.393251 4.187575 4.091912 4.042525
## 1933 4.470765 4.257668 4.050896 3.861771

test_box_nottem = ts(box_nottem[169:240], frequency = 12, start = c(1934,1))
test_box_nottem

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1934 3.973407 3.937321 4.002684 4.177585 4.330646 4.460839 4.591279 4.476683
## 1935 3.991057 4.064721 4.089220 4.182590 4.252960 4.478650 4.556704 4.545583
## 1936 3.909537 3.835506 4.102621 4.099953 4.315047 4.440744 4.468786 4.490383
## 1937 4.014200 4.019918 3.943411 4.190060 4.346051 4.440744 4.496209 4.503934
## 1938 4.050896 4.025609 4.187575 4.170039 4.308301 4.448821 4.460839 4.476683
## 1939 3.973407 4.017062 4.059210 4.199949 4.308301 4.428529 4.482573 4.503934
##           Sep      Oct      Nov      Dec
## 1934 4.452840 4.280941 4.070208 4.149687
## 1935 4.403732 4.219491 4.107941 3.881103
## 1936 4.430573 4.243490 4.036912 4.028445
## 1937 4.393251 4.274005 4.031274 3.903276
## 1938 4.407899 4.269360 4.199949 3.967467
## 1939 4.432614 4.172560 4.170039 3.925051

MODEL3:

M3 = auto.arima(training_box_nottem)
M3

## Series: training_box_nottem 
## ARIMA(0,0,2)(2,1,2)[12] 
## 
## Coefficients:
##          ma1     ma2     sar1    sar2     sma1     sma2
##       0.3098  0.1513  -0.1722  0.1757  -0.8028  -0.0038
## s.e.  0.0795  0.0814   0.4368  0.1693   0.4417   0.3071
## 
## sigma^2 = 0.003691:  log likelihood = 211.41
## AIC=-408.83   AICc=-408.07   BIC=-387.48

Test for the Model3 residuals diagnostic checking:

test(M3$residuals)

## Null hypothesis: Residuals are iid noise.
## Test                        Distribution Statistic   p-value
## Ljung-Box Q                Q ~ chisq(20)     19.51    0.4889
## McLeod-Li Q                Q ~ chisq(20)     23.81    0.2507
## Turning points T  (T-110.7)/5.4 ~ N(0,1)       103    0.1584
## Diff signs S       (S-83.5)/3.8 ~ N(0,1)        83     0.894
## Rank P           (P-7014)/364.5 ~ N(0,1)      7521    0.1643

Model3 Residual Diagnostic Assumptions and useful properties:
1. {et} uncorrelated. all of the spikes from ACF and PACF plots are within threshold bound. So, the residuals are uncorrelated. The lack of correlation suggesting the forecasts are good.
2. {et} have mean zero. The time plot of the residuals shows mean zero. Variation of the residuals stays much the same across the historical data, and one outlier, and therefore the residual variance can be treated as constant.
3. {et} are normally distributed. The Normal Q-Q Plot shows the residuals are normally distributed.

Forecast with Model3:

F3 = forecast::forecast(M3, h = length(test_box_nottem))
autoplot(F3)

Forecasting with seasonal adjustment:

F3_mean <- F3$mean
Forecast_M3 = F3_mean + seasonal_box_data[169:240]
Forecast_M3

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1934 3.937776 4.007407 4.133271 4.294836 4.414785 4.500660 4.472877 4.410514
## 1935 3.924256 4.033736 4.169592 4.325743 4.436591 4.525321 4.495911 4.426595
## 1936 3.940933 4.020707 4.146682 4.307186 4.423756 4.509418 4.479515 4.411681
## 1937 3.935687 4.027576 4.157007 4.315811 4.429797 4.516488 4.486385 4.417074
## 1938 3.939520 4.024104 4.151205 4.311066 4.426502 4.512477 4.482322 4.413525
## 1939 3.937938 4.025909 4.154018 4.313398 4.428130 4.514410 4.484228 4.415084
##           Sep      Oct      Nov      Dec
## 1934 4.238019 4.029808 3.994303 3.997726
## 1935 4.265347 4.048890 3.983251 3.943248
## 1936 4.246279 4.038548 3.989198 3.976282
## 1937 4.254363 4.043681 3.986232 3.961024
## 1938 4.249622 4.040980 3.987788 3.969454
## 1939 4.251858 4.042347 3.986999       NA

Model3 Forecasting Accuracy Measure:

et3 <- test_box_nottem - F3_mean ### Error

MSE3 <- sum(et3**2)/length(test_box_nottem) 
RMSE3 <- sqrt(MSE3)
ME3 <- sum(et3)/length(test_box_nottem) 
MAE3 <- sum(abs(et3))/length(test_box_nottem) 

#Percent of Error
PE3 <- 100*(et3)/test_box_nottem
MPE3 <- sum(PE3)/length(test_box_nottem) 
MAPE3 <- sum(abs(PE3))/length(test_box_nottem)

cat("Model3 Mean Square Error, MSE = ", MSE3)

## Model3 Mean Square Error, MSE =  0.003030475

cat("\n")

cat("Model3 Root Mean Square Error, RMSE = ", RMSE3)

## Model3 Root Mean Square Error, RMSE =  0.05504975

cat("\n")

cat("Model3 Mean Error, ME = ", ME3)

## Model3 Mean Error, ME =  0.01698467

cat("\n")

cat("Model3 Mean Absolute Error, MAE = ", MAE3)

## Model3 Mean Absolute Error, MAE =  0.04215661

cat("\n")

cat("Model3 Mean Percent of Error, MPE = ", MPE3)

## Model3 Mean Percent of Error, MPE =  0.3975953

cat("\n")

cat("Model3 Mean Absolute Percent of Error, MAPE = ", MAPE3)

## Model3 Mean Absolute Percent of Error, MAPE =  1.01278

Back Transformation:
Forecasts are calculated on the transformed data rather than the original data. But since we are really interested in forecasts of the original data, not the transformed data. We must reverse the transformation (or back-transform) to obtain forecasts on the original scale.

BT_BC_F3 <- exp(Forecast_M3)
BT_BC_F3

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1934 51.30439 55.00406 62.38164 73.32020 82.66408 90.07655 87.60837 82.31176
## 1935 50.61540 56.47151 64.68904 75.62168 84.48645 92.32557 89.64977 83.64608
## 1936 51.46662 55.74051 63.22386 74.23130 83.40898 90.86890 88.19190 82.40789
## 1937 51.19729 56.12471 63.88005 74.87430 83.91436 91.51368 88.79983 82.85350
## 1938 51.39392 55.93020 63.51045 74.51985 83.63832 91.14733 88.43976 82.56001
## 1939 51.31269 56.03121 63.68937 74.69385 83.77464 91.32367 88.60852 82.68877
##           Sep      Oct      Nov      Dec
## 1934 69.27047 56.25009 54.28801 54.47416
## 1935 71.18960 57.33379 53.69131 51.58588
## 1936 69.84507 56.74388 54.01156 53.31843
## 1937 70.41198 57.03591 53.85162 52.51105
## 1938 70.07891 56.88208 53.93544 52.95562
## 1939 70.23582 56.95987 53.89291       NA

plot(BT_BC_F3)

3. Select a model from M1, M2 & M3 models with the minimum value of model selection criteria (AIC, BIC & AICc). Also, check which model had fulfilled more underlying assumptions of residuals.

Model	AIC	BIC	AICc
M1	740.82	756.44	741.19
M2	1145.977	1155.349	1146.123
M3	-408.83	-387.48	-408.07

M1 is the best model because AIC, BIC & AICc are the lowest.

Also M2 model residuals first assumption is violated for showing correlation. M1, M2 & M3 model residuals are uncorrelated. So, residuals assumptions are met.

So, the minimum value of model selection criteria (AIC, BIC, AICc). And model3 had fulfilled more underlying assumptions of residuals. So Model3 (auto.arima) is the best model

4. Forecast with seasonal adjustment. Thereafter, evaluate all those models (M1, M2 and M3) using the forecasting accuracy measures based on the test data set. Back-transforming is required (for those models, where transformation was used) to get forecasts on the original scale. Hence, find the best model.

Model	MSE	RMSE	MAE	MPE	MAPE
M1	4.647496	2.155805	1.683961	1.393614	3.356576
M2	4.893781	2.212189	1.726933	1.413725	3.441328
M3	0.003030475	0.05504975	0.04215661	0.3975953	1.01278

From the above comparison, it is obvious that our M3 has lower error.

So, comparatively M3 is the best model.

#### 5. Using the best model, forecast 10 points ahead. Construct a suitable plot.

From the above calculation, we ca see that M3 is the best model due to AIC, & BIC and the lower error rate comparatively. Now, we will forecast 10 points ahead of M3 model.

Final fitted model3 auto.arima(0,0,2)(2,1,2)[12] with fitted values and forecast for 10 steps ahead

# fitted values
plot(1:240, box_nottem)
lines(1:240, box_nottem, type="l" )

# forecast for 10 steps ahead
final_forecast = predict(M3, n.ahead=10)
final_forecast

## $pred
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1934 3.955505 3.921726 4.021066 4.146572 4.296547 4.418176 4.502450 4.493097
##           Sep      Oct
## 1934 4.389014 4.217500
## 
## $se
##             Jan        Feb        Mar        Apr        May        Jun
## 1934 0.06077961 0.06362820 0.06428896 0.06428896 0.06428896 0.06428896
##             Jul        Aug        Sep        Oct
## 1934 0.06428896 0.06428896 0.06428896 0.06428896

lines(241:250, final_forecast$pred, type="o", col="red")
lines(241:250, final_forecast$pred-1.96*final_forecast$se, col="blue")
lines(241:250, final_forecast$pred+1.96*final_forecast$se, col="blue")

Assignment2: Time series analysis and forcasting with Nottem dataset

Jobaida Gulsanara Jarin (20215018)

2022-08-02

1. Plot the series (use time-plot, ACF plot and seasonal-subseries plot) and describe its main features.

Classical Decomposition for additive seasonality to get seasonal adjusted data:

Splitting the data into training (70%) and test (30%) data sets:

Choosing a model1 using the ACF & PACF plots:

3. Select a model from M1, M2 & M3 models with the minimum value of model selection criteria (AIC, BIC & AICc). Also, check which model had fulfilled more underlying assumptions of residuals.