Introduction

Cryptocurrency

Freedom is a powerful notion. It inspires inventions and also destruction. Carnage and blessings swirl, pulling strings of change from and to all over the nooks and crannies of the world. Ideas popped up and chopped up. One still growing and stumbling is a fancy distributed ledger we call cryptocurrency. It was supposed to give a chance for decent humans to break off from the arm-enforced fiat currency. So far, it keeps us waiting still.

AXS

Released in 2018, an NFT game called Axie Infinity has suffered its ups and downs. It has enabled a phenomenon of digital colonialism culture. It got robbed 620 million USD worth of cryptocurrency. Its Smooth Love Potion (SLP) token crashed. But the game is still on. Other than SLP, it has Axie Infinity Shards. What is it? While SLP was fully circulated and mainly used to breed Axie monster, AXS is an in game currency that let holders to govern, to stake, and to play.

Objective

In this document, I apply time series and forecasting analysis to predict the price of AXS.

Data

Source

cryptodatadownload.com

Format

CSV

Date Downloaded

2022 June 20

Data Timeframe

Glossary

Feature Description
Unix Timestamp This is the unix timestamp or also known as “Epoch Time”. Use this to convert to your local timezone
Date This timestamp is converted to NY EST Standard Time
Symbol The symbol for which the timeseries data refers
Open This is the opening price of the time period
High This is the highest price of the time period
Low This is the lowest price of the time period
Close This is the closing price of the time period
Volume (Crypto) This is the volume in the transacted Ccy. Ie. For BTC/USDT, this is in BTC amount
Volume Base Ccy This is the volume in the base/converted ccy. Ie. For BTC/USDT, this is in USDT amount
Trade Count This is the unique number of trades for the given time period

Preparation

Import

Library

library(tseries)
library(lubridate) 
library(MLmetrics)
library(readr)
library(flextable)
library(fable)
library(tsibble)
library(tidyverse) 
library(feasts)
library(seasonal)
library(fpp3)

Data

axs_usd <- read_csv("Binance_AXSUSDT_1h.csv", 
    skip = 1)

Preview

head(axs_usd)
## # A tibble: 6 x 10
##            unix date                symbol   open  high   low close `Volume AXS`
##           <dbl> <dttm>              <chr>   <dbl> <dbl> <dbl> <dbl>        <dbl>
## 1 1656288000000 2022-06-27 00:00:00 AXS/US~  16.2  16.4  16    16.3       45607.
## 2 1656284400000 2022-06-26 23:00:00 AXS/US~  16.4  16.5  16.1  16.2       64406.
## 3 1656280800000 2022-06-26 22:00:00 AXS/US~  16.9  16.9  16.2  16.4       73052.
## 4 1656277200000 2022-06-26 21:00:00 AXS/US~  17.1  17.1  16.9  16.9       14764.
## 5 1656273600000 2022-06-26 20:00:00 AXS/US~  16.8  17.2  16.8  17.1       23015.
## 6 1656270000000 2022-06-26 19:00:00 AXS/US~  16.6  16.9  16.6  16.8       28427.
## # ... with 2 more variables: `Volume USDT` <dbl>, tradecount <dbl>

We’ll pick the date, symbol, and closing price.

anyNA(axs_usd) 
## [1] FALSE

Great, no missing data.

str(axs_usd)
## spec_tbl_df [3,740 x 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ unix       : num [1:3740] 1.66e+12 1.66e+12 1.66e+12 1.66e+12 1.66e+12 ...
##  $ date       : POSIXct[1:3740], format: "2022-06-27 00:00:00" "2022-06-26 23:00:00" ...
##  $ symbol     : chr [1:3740] "AXS/USDT" "AXS/USDT" "AXS/USDT" "AXS/USDT" ...
##  $ open       : num [1:3740] 16.2 16.4 16.9 17.1 16.8 ...
##  $ high       : num [1:3740] 16.4 16.5 16.9 17.1 17.2 ...
##  $ low        : num [1:3740] 16 16.1 16.2 16.9 16.8 ...
##  $ close      : num [1:3740] 16.3 16.2 16.4 16.9 17.1 ...
##  $ Volume AXS : num [1:3740] 45607 64406 73052 14764 23015 ...
##  $ Volume USDT: num [1:3740] 741142 1050515 1202355 250871 391031 ...
##  $ tradecount : num [1:3740] 3183 3665 4804 1214 1829 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   unix = col_double(),
##   ..   date = col_datetime(format = ""),
##   ..   symbol = col_character(),
##   ..   open = col_double(),
##   ..   high = col_double(),
##   ..   low = col_double(),
##   ..   close = col_double(),
##   ..   `Volume AXS` = col_double(),
##   ..   `Volume USDT` = col_double(),
##   ..   tradecount = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

I want to change date data type with lubridate for ease of use.

Wrangling

axs_usd_selected  <- axs_usd %>% 
  mutate(date_hour = ymd_hms(date),
         close_price = close) %>% 
  select(date_hour, close_price) %>% 
  arrange(date_hour)
dim(axs_usd_selected)
## [1] 3740    2
head(axs_usd_selected)
## # A tibble: 6 x 2
##   date_hour           close_price
##   <dttm>                    <dbl>
## 1 2022-01-22 05:00:00        54.4
## 2 2022-01-22 06:00:00        53.8
## 3 2022-01-22 07:00:00        52.4
## 4 2022-01-22 08:00:00        52.6
## 5 2022-01-22 09:00:00        50.2
## 6 2022-01-22 10:00:00        51.7
tail(axs_usd_selected)
## # A tibble: 6 x 2
##   date_hour           close_price
##   <dttm>                    <dbl>
## 1 2022-06-26 19:00:00        16.8
## 2 2022-06-26 20:00:00        17.1
## 3 2022-06-26 21:00:00        16.9
## 4 2022-06-26 22:00:00        16.4
## 5 2022-06-26 23:00:00        16.2
## 6 2022-06-27 00:00:00        16.3
start_hour <- min(axs_usd_selected$date_hour)
end_hour <- max(axs_usd_selected$date_hour)
start_hour
## [1] "2022-01-22 05:00:00 UTC"
end_hour
## [1] "2022-06-27 UTC"

Tsibble

Create a time series from axs_usd_selected

axs_tsb <- axs_usd_selected %>% 
  as_tsibble(index = date_hour)
axs_tsb %>% head()
## # A tsibble: 6 x 2 [1h] <UTC>
##   date_hour           close_price
##   <dttm>                    <dbl>
## 1 2022-01-22 05:00:00        54.4
## 2 2022-01-22 06:00:00        53.8
## 3 2022-01-22 07:00:00        52.4
## 4 2022-01-22 08:00:00        52.6
## 5 2022-01-22 09:00:00        50.2
## 6 2022-01-22 10:00:00        51.7

Check for implicit missing date.

axs_tsb %>% has_gaps()
## # A tibble: 1 x 1
##   .gaps
##   <lgl>
## 1 FALSE

Check for regularity.

axs_tsb %>% is_regular()
## [1] TRUE

Add week and month.

axs_tsb <- axs_tsb %>% mutate(
  day = day(date_hour),
  date = date(date_hour),
  week = yearweek(date_hour),
  month = yearmonth(date_hour))
axs_tsb %>% head()
## # A tsibble: 6 x 6 [1h] <UTC>
##   date_hour           close_price   day date           week    month
##   <dttm>                    <dbl> <int> <date>       <week>    <mth>
## 1 2022-01-22 05:00:00        54.4    22 2022-01-22 2022 W03 2022 Jan
## 2 2022-01-22 06:00:00        53.8    22 2022-01-22 2022 W03 2022 Jan
## 3 2022-01-22 07:00:00        52.4    22 2022-01-22 2022 W03 2022 Jan
## 4 2022-01-22 08:00:00        52.6    22 2022-01-22 2022 W03 2022 Jan
## 5 2022-01-22 09:00:00        50.2    22 2022-01-22 2022 W03 2022 Jan
## 6 2022-01-22 10:00:00        51.7    22 2022-01-22 2022 W03 2022 Jan

Exploratory Data Analysis

Time Series

Bellow, I’m going to plot the axs_tsb to various plots.

Hourly

hourly_axs <- axs_tsb %>% autoplot(close_price) +
  labs(title = "AXS Hourly Closing Price in 2022",
       y = "Close Price",
       x = NULL)
hourly_axs

Daily

Total

date_axs <- axs_tsb %>% 
  index_by(date) %>% 
  summarise(sum = sum(close_price)) %>% 
  autoplot(sum) +
  labs(title = "AXS Daily Total Closing Price",
       y = "Close Price",
       x = NULL)
date_axs

Average

date_axs_avg <- axs_tsb %>% 
  index_by(date) %>% 
  summarise(avg = mean(close_price)) %>% 
  autoplot(avg) +
  labs(title = "AXS Average Daily Closing Price",
       y = "Close Price",
       x = NULL)
date_axs_avg

Weekly

Total

week_axs <- axs_tsb %>% 
  index_by(week) %>% 
  summarise(sum = sum(close_price)) %>% 
  autoplot(sum) +
  labs(title = "AXS Total Weekly Closing Price",
       y = "Close Price",
       x = NULL)
week_axs

Average

week_axs_avg <- axs_tsb %>% 
  index_by(week) %>% 
  summarise(avg = mean(close_price)) %>% 
  autoplot(avg) +
  labs(title = "AXS Average Weekly Closing Price",
       y = "Close Price",
       x = NULL)
week_axs_avg

Monthly

Total

month_axs <- axs_tsb %>% 
  index_by(month) %>% 
  summarise(sum = sum(close_price)) %>% 
  autoplot(sum) +
  labs(title = "AXS Total Monthly Closing Price",
       y = "Close Price",
       x = NULL)
month_axs

Average

month_axs_avg <- axs_tsb %>% 
  index_by(month) %>% 
  summarise(avg = mean(close_price)) %>% 
  autoplot(avg) +
  labs(title = "AXS Average Monthly Closing Price",
       y = "Close Price",
       x = NULL)
month_axs_avg

Seasonal Comparison

Let’s look if there’s any seasonality.

Daily

s_daily_axs <- axs_tsb %>% gg_season(close_price, 
                      label = "none",
                      polar = F,
                      period = "day"
                      ) +
  labs(title = "AXS Daily Comparison",
       y = "Close Price",
       x = NULL)

s_daily_axs

There’s no obvious pattern here.

Weekly

s_weekly_axs <- axs_tsb %>% gg_season(close_price, 
                      label = "none",
                      polar = F,
                      period = "week"
                      ) +
  labs(title = "AXS Weekly Comparison",
       y = "Close Price",
       x = NULL)

s_weekly_axs

Not a pattern found in weekly chunk either.

Monthly

s_monthly_axs <- axs_tsb %>% gg_season(close_price, 
                      label = "right",
                      polar = F,
                      period = "month"
                      ) +
  labs(title = "AXS Monthly Comparison",
       y = "Close Price",
       x = NULL)

s_monthly_axs

From the plot above, I see:

  • Price decrease in the middle of the month
  • Price increase by the end of the month–except in April
  • But these are too small repeats

Autocorrelation

In the autocorrelation plots bellow, the vertical lines are r. These vertical lines measures the relationship between value in its date point to a lagged version of its date point. Meanwhile the dashed blue lines are the minimum limit of r to be considered significant.

Hourly Autocorrelation.

axs_tsb %>% 
  ACF(close_price) %>% 
  autoplot()

I see a tiny drop in trend.

Daily Autocorrelation.

axs_tsb %>% 
  index_by(day) %>% 
  summarise(sum =sum(close_price)) %>% 
  ACF(sum) %>% 
  autoplot()

There’s down and up trend. But not so sure about seasonality.

Weekly autocorrelation.

axs_tsb %>% 
  index_by(week) %>% 
  summarise(sum =sum(close_price)) %>% 
  ACF(sum) %>% 
  autoplot()

A clear downtrend.

Monthly autocorrelation.

axs_tsb %>% 
  index_by(month) %>% 
  summarise(sum =sum(close_price)) %>% 
  ACF(sum) %>% 
  autoplot()

There’s not enough correlation for the r to be significant. Time series like this is called white noise.

Decomposition

Let’s decompose the daily data.

date_axs

Classical Decomposition

axs_tsb %>% 
  index_by(date) %>% 
  summarise(sum = sum(close_price)) %>% 
  model(
    classical_decomposition(sum, type = "additive")
  ) %>% 
  components %>% 
  autoplot() +
  labs(title = "Classical Additive Decomposition of AXS Daily Closing Price")

There seems to be a daily seasonality of up and down at the difference of 10 USD of AXS. But since classical decomposition is unable to address seasonality shift over time, we need to proceed to the next method.

STL Decomposition

Seasonal and Trend decomposition using Loess method is versatile and robust but we have to determine trend(window = ?) and season(window=?) for how quick seasonal components change.

According to Cleveland et al, window size should odd and at least 7. If we choose NULL, the function will choose automatically.

axs_tsb %>% 
  index_by(date) %>% 
  summarise(sum = sum(close_price)) %>% 
  model(
    STL(sum ~ season(window = NULL) ,
        robust = T)
  ) %>% 
  components() %>%
  autoplot() +
  labs(title = "STL Decomposition of AXS Daily Closing Price")

If we choose periodic for the window season, it will force uniform seasonality for the whole time series.

axs_tsb %>% 
  index_by(date) %>% 
  summarise(sum = sum(close_price)) %>% 
  model(
    STL(sum ~ season(window = "periodic"),
        robust = T)
  ) %>% 
  components() %>% 
  autoplot() +
  labs(title = "STL Decomposition of AXS Daily Closing Price")

But for this time series, we have to ask, how fast crypto currency seasonality changes? It depends on the currency of course, but generally, crypto currencies are quite volatile. We should use window smaller number. Also, the default trend window for monthly data is 21, we should use number smaller than that.

axs_tsb %>% 
  index_by(date) %>%
  summarise(sum = sum(close_price)) %>%
  model(
    STL(sum ~ season(window = 7) + trend(window = 5),
        robust = T)
  ) %>% 
  components() %>% 
  autoplot() +
  labs(title = "STL Decomposition of AXS Daily Closing Price")

Modeling

Cross-Validation

axs_daily <- axs_tsb %>% 
  index_by(date) %>%
  summarise(sum = sum(close_price))

head(axs_daily)
## # A tsibble: 6 x 2 [1D]
##   date         sum
##   <date>     <dbl>
## 1 2022-01-22  968.
## 2 2022-01-23 1246.
## 3 2022-01-24 1188.
## 4 2022-01-25 1214.
## 5 2022-01-26 1248.
## 6 2022-01-27 1151.
nrow(axs_daily)
## [1] 157
nrow_test <- round(0.2 * nrow(axs_daily)) 

test_axs <- tail(axs_daily, n = nrow_test) 
train_axs <- head(axs_daily, n = -nrow_test)

Train

Initial Benchmark

fit <- train_axs %>% model(avg = MEAN(sum),
                               naive = NAIVE(sum),
                               snaive = SNAIVE(sum ~ lag("month")),
                               drift = RW(sum ~ drift())
)
fit %>% 
  forecast(h = "31 days") %>% 
  autoplot(train_axs, level = NULL) +
  autolayer(test_axs)

accuracy(fit %>% forecast(h = "31 days"),
         test_axs)
## # A tibble: 4 x 10
##   .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 avg    Test  -715.  722.  715.  -389.  389.    NaN   NaN 0.469 
## 2 drift  Test    26.7  84.4  61.8  -51.8  70.6   NaN   NaN 0.260 
## 3 naive  Test   -40.4 110.   79.0  -94.2 102.    NaN   NaN 0.469 
## 4 snaive Test  -212.  264.  212.  -224.  224.    NaN   NaN 0.0653

From the plot, Drift got the trend right, and Seasonal Naive got the seasonality closer. But from the metrics, drift is best. So let’s use it as the benchmark.

ETS

fit_ets <- train_axs %>% 
  model(
    drift = RW(sum ~ drift()),
    additive = ETS(sum ~ error("A") + trend("A") + season("A")),
    multiplicative = ETS(sum ~ error("M") + trend("A") + season("M"))
    ) 
fit_ets %>% 
  forecast(h = "31 days") %>% 
  autoplot(train_axs, level = NULL) +
  autolayer(test_axs)

accuracy(fit_ets %>% forecast(h = "31 days"),
         test_axs)
## # A tibble: 3 x 10
##   .model         .type    ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>          <chr> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 additive       Test   76.3 105.   89.7 -18.7   57.7   NaN   NaN 0.196
## 2 drift          Test   26.7  84.4  61.8 -51.8   70.6   NaN   NaN 0.260
## 3 multiplicative Test  100.  124.  110.   -4.06  54.8   NaN   NaN 0.226

The multiplicative ETS has the most apropriate trend if we only looked at the plot, but if we dived into the metrics, drift has the least RMSE.

STL Decomp + ETS

We seen how the data can be decomposed. Let’s decompose it to train_axs date range then have the seasonally adjusted data to model with drift.

Take the decomposition config to train_axs and test_axs.

comp_axs_tsb <- train_axs %>% 
  model(
    STL(sum ~ season(window = 7) + trend(window = 5),
        robust = T)
  ) %>% 
  components() %>% select(-.model)

test_axs_s.adjust <- test_axs %>% 
  model(
    STL(sum ~ season(window = 7) + trend(window = 5),
        robust = T)
  ) %>% 
  components() %>% select(-.model)
fit_decomp_drift <- 
  comp_axs_tsb %>%
  model(drift = RW(season_adjust ~ drift())
        ) 
fit_decomp_drift %>% 
  forecast(h = "31 days") %>% 
  autoplot(comp_axs_tsb, level = NULL)

accuracy(fit_decomp_drift %>% forecast(h = "31 days"),
         test_axs_s.adjust)
## # A tibble: 1 x 10
##   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift  Test   34.1  83.4  62.9 -9.78  30.4   NaN   NaN 0.256

Let’s compare.

accuracy(fit_ets %>% forecast(h = "31 days"),
         test_axs) %>% filter(.model == "drift")
## # A tibble: 1 x 10
##   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift  Test   26.7  84.4  61.8 -51.8  70.6   NaN   NaN 0.260

The seasonally adjusted data gives drift model better in RMSE and MAPE.

ARIMA

Let’s review the plot.

train_axs %>% autoplot(sum)

Applying ARIMA requires the data to be stationary. Since it’s not a stationary plot, I need to do some differencing. A bit of cox_box transformation might be needed to regularize the variance. But is it seasonal? As we’ve seen in the STL Decomposition, the plot initially has some seasonalitity but it fades halfway to the end.

Stabilize

# before transformation
train_axs %>% autoplot(sum)

# get lambda
lambda <- train_axs %>% features(sum, features = guerrero)

# stabilize train_axs
train_axs_stable <- train_axs %>% 
  mutate(sum = box_cox(sum, lambda))
train_axs_stable %>% autoplot(sum)

Don’t forget to transform the test data as well.

test_axs %>% autoplot(sum)

test_axs_stable <- test_axs %>% 
  mutate(sum = box_cox(sum, lambda))
test_axs_stable %>% autoplot(sum)

Stationarity

Check stationarity with unit root test. To check how many differencing we need to remove trend we use unitroot_ndiffs. To verify whether or not we need to apply seasonal differening we run unitroot_nsdiffs.

train_axs_stable %>% features(sum, unitroot_ndiffs)
## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      2
train_axs_stable %>% features(sum, unitroot_nsdiffs)
## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       0

The results above show that we have to perform 2 differencing and 0 seasonal differencing.

Let’s check the autocorrelation plots.

train_axs_stable %>%
  gg_tsdisplay(difference(sum, differences = 2), plot_type = "partial")

The PACF is sinusoidal and there’s a significant spike in the ACF but none beyond lag q. The data may follow ARIMA(0, d, q).

The value of d is 2, since we needed to apply double differencing. PACF plot suggests p = 0, 1, 2. ACF plot suggests q = 0, 1.

The data may follow ARIMA(0, d, q) if the PACF is sinusoidal and there’s a significant spike in the ACF but none beyond lag q.

ARIMA Model we can build from these are:

ARIMA(p, q, d):

  • ARIMA 0, 2, 0
  • ARIMA 1, 2, 0
  • ARIMA 2, 2, 0
  • ARIMA 0, 2, 1
  • ARIMA 1, 2, 1
  • ARIMA 2, 2, 1

Let’s try to build them all and see if ARIMA(0, 2, 1) is the best because PACF is sinusoidal and there’s a spike in initial ACF but none beyond lag q.

Fitting

fit_arima <- 
  train_axs_stable %>% 
  model(
    arima020 = ARIMA(sum ~ pdq(0, 2 , 0)),
    arima120 = ARIMA(sum ~ pdq(1, 2, 0)),
    arima220 = ARIMA(sum ~ pdq(2, 2, 0)),
    arima021 = ARIMA(sum ~ pdq(0, 2, 1)),
    arima121 = ARIMA(sum ~ pdq(1, 2, 1)),
    arima221 = ARIMA(sum ~ pdq(2, 2, 1)),
    stepwise = ARIMA(sum),
    search = ARIMA(sum, stepwise = F)
  )
fn <- names(fit_arima)
fit_arima %>% pivot_longer(fn,
                           names_to = "model",
                           values_to = "orders") %>% 
  glance() %>% 
  arrange(AICc)
## # A tibble: 8 x 9
##   model    .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>    <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima021 orders  0.129   -49.5  103.  103.  109. <cpl [0]> <cpl [1]>
## 2 stepwise orders  0.129   -49.5  103.  103.  109. <cpl [0]> <cpl [1]>
## 3 search   orders  0.129   -49.5  103.  103.  109. <cpl [0]> <cpl [1]>
## 4 arima121 orders  0.128   -48.6  103.  103.  112. <cpl [1]> <cpl [1]>
## 5 arima221 orders  0.129   -48.5  105.  105.  116. <cpl [2]> <cpl [1]>
## 6 arima220 orders  0.157   -60.5  127.  127.  135. <cpl [2]> <cpl [0]>
## 7 arima120 orders  0.174   -66.9  138.  138.  144. <cpl [1]> <cpl [0]>
## 8 arima020 orders  0.202   -76.8  156.  156.  158. <cpl [0]> <cpl [0]>

Indeed arima021 has the best AICc; exactly the same performance with the arima model from stepwise and search.

fit_arima %>% 
  forecast(h = "31 days") %>% 
  autoplot(train_axs_stable, level = NULL) +
  autolayer(test_axs_stable)

fit_arima %>% 
  forecast(h = "31 days") %>% 
  autoplot(fit_arima$arima021$fitted, level = NULL)

Evaluation

Let’s see their accuracy on the test data set.

accuracy(fit_arima %>% forecast(h = "31 days"),
         test_axs_stable)
## # A tibble: 8 x 10
##   .model   .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 arima020 Test  6.18   6.95  6.18 45.3   45.3   NaN   NaN  0.758 
## 2 arima021 Test  0.958  1.92  1.44  2.40  14.2   NaN   NaN -0.0445
## 3 arima120 Test  4.89   5.54  4.89 34.8   34.8   NaN   NaN  0.645 
## 4 arima121 Test  0.931  1.91  1.42  2.15  14.1   NaN   NaN -0.0440
## 5 arima220 Test  3.71   4.28  3.84 25.1   28.3   NaN   NaN  0.466 
## 6 arima221 Test  0.940  1.91  1.42  2.23  14.1   NaN   NaN -0.0446
## 7 search   Test  0.958  1.92  1.44  2.40  14.2   NaN   NaN -0.0445
## 8 stepwise Test  0.958  1.92  1.44  2.40  14.2   NaN   NaN -0.0445

ARIMA vs ETS

We discovered that in the test data, arima121 actually performed a bit better than arima021. But is it better than the seasonally adjusted ETS drift?

accuracy(fit_arima %>% forecast(h = "31 days"),
         test_axs_stable) %>% filter(.model=="arima121")
## # A tibble: 1 x 10
##   .model   .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 arima121 Test  0.931  1.91  1.42  2.15  14.1   NaN   NaN -0.0440
accuracy(fit_decomp_drift %>% forecast(h = "31 days"),
         test_axs_s.adjust)
## # A tibble: 1 x 10
##   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift  Test   34.1  83.4  62.9 -9.78  30.4   NaN   NaN 0.256

We conclude that arima121 has better overall metrics.

Prediction

Let’s train the whole data to predict AXS close price in the next month.

Preprocessing

Review the data.

head(axs_daily)
## # A tsibble: 6 x 2 [1D]
##   date         sum
##   <date>     <dbl>
## 1 2022-01-22  968.
## 2 2022-01-23 1246.
## 3 2022-01-24 1188.
## 4 2022-01-25 1214.
## 5 2022-01-26 1248.
## 6 2022-01-27 1151.
tail(axs_daily)
## # A tsibble: 6 x 2 [1D]
##   date         sum
##   <date>     <dbl>
## 1 2022-06-22 342. 
## 2 2022-06-23 347. 
## 3 2022-06-24 393. 
## 4 2022-06-25 426. 
## 5 2022-06-26 412. 
## 6 2022-06-27  16.3
axs_daily %>% autoplot()

Stabilize

# get lambda
lambda <- train_axs %>% features(sum, features = guerrero)

# stabilize train_axs
axs_daily_stable <- axs_daily %>% 
  mutate(sum = box_cox(sum, lambda))
axs_daily_stable %>% autoplot(sum)

Stationarity

axs_daily_stable %>% features(sum, unitroot_ndiffs)
## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      1
axs_daily_stable %>% features(sum, unitroot_nsdiffs)
## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       0

We need 1 differencing. d = 1.

Investigate autocorrelation plots.

axs_daily_stable %>%
  gg_tsdisplay(difference(sum), plot_type = "partial")

There’s no significant autocorrelation, it will be 0 for both p and q.

The most likely model is ARIMA (0, 1, 0). But this can’t be tested, since there’s no label for the future. We can compare this to the ARIMA (1, 2, 1) that has been tested on test data.

Fitting

predict_m <- 
  axs_daily_stable %>% 
  model(
    arima010 = ARIMA(sum ~ pdq(0, 1 , 0)),
    arima121 = ARIMA(sum ~ pdq(1, 2, 1)),
  )
fn <- names(predict_m)
predict_m %>% pivot_longer(fn,
                           names_to = "model",
                           values_to = "orders") %>% 
  glance() %>% 
  arrange(AICc)
## # A tibble: 2 x 9
##   model    .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>    <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima010 orders  0.807   -205.  411.  411.  414. <cpl [0]> <cpl [0]>
## 2 arima121 orders  0.815   -204.  416.  416.  428. <cpl [8]> <cpl [1]>

It’s a close one, but arima010 is better overall.

Residual Diagnostic

It is ideal for innovation residuals of a forecasting model to have these:

  • Uncorelated. Any correlation between innovation residuals would suggest information left that should be used up in computing forecast.

  • The mean of the innovation residuals should be zero. If it’s not zero, the forecasting is biased.

Other extra desirable properties include normally distributed residuals and residuals heteroscedasticity.

arima010 <- 
  axs_daily_stable %>% 
  model(
    arima010 = ARIMA(sum ~ pdq(0, 1, 0))
  )

arima121 <- 
  axs_daily_stable %>% 
  model(
    arima010 = ARIMA(sum ~ pdq(1, 2, 1))
  )

Correlation Check

\(H_0\): residual has no-autocorrelation \(H_1\): residual has autocorrelation

res <- arima010 %>% 
  augment() %>% 
  pull(.innov)

Box.test(res, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  res
## X-squared = 0.17419, df = 1, p-value = 0.6764

p-value = 0.6 means there is a lack of evidence to reject the null hypothesis that the residual of arima010 has no-autocorrelation.

res <- arima121 %>% 
  augment() %>% 
  pull(.innov)

Box.test(res, type = "Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  res
## X-squared = 0.020569, df = 1, p-value = 0.886

p-value = 0.8 means there is a lack of evidence to reject the null hypothesis that the residual of arima121 has no-autocorrelation.

Both models passed the check but arima 010 does a bit better.

Mean Check

res <- arima010 %>% 
  augment() %>% 
  pull(.innov)

mean(res)
## [1] -0.09393018
res <- arima121 %>% 
  augment() %>% 
  pull(.innov)

mean(res)
## [1] -0.1105915

They’re not exactly zero but close, and arima010 is better than arima121.

Distribution Check.

arima010 %>% 
  augment() %>% 
  ggplot(aes(x = .innov)) +
  geom_histogram(binwidth = 0.1) +
  labs(title = "Histogram of ARIMA (0, 1, 0) Residuals")

arima121 %>% 
  augment() %>% 
  ggplot(aes(x = .innov)) +
  geom_histogram(binwidth = 0.1) +
  labs(title = "Histogram of ARIMA (1, 2, 1) Residuals")

Look normally distributed enough for both models.

gg_tsresiduals(arima010, type = "response")

gg_tsresiduals(arima121, type = "response")

Both are great and all but arima010 wins and let’s predict with it.

Forecasting

arima010 %>% 
  forecast(h = "31 days") %>% 
  autoplot(axs_daily_stable, level = NULL)

arima010 expect AXS to stay stagnant and low for the next 31 days.

Summary

I attempted this time series analysis on AXS to help me understand a little bit more about the situation. The data so far has a down trend with some semblance of seasonality-but-not-really kind of thing. Non-seasonal ARIMA fits the data quite well when compare to other models. And, the most important thing here is now I do not expect AXS price to rise again in the immediate future. Perhaps holders should sell. Perhaps they should keep it and just forget about it, hoping one day some day far far away in far future it will rise again from the ashes.

Take care folks.