Freedom is a powerful notion. It inspires inventions and also destruction. Carnage and blessings swirl, pulling strings of change from and to all over the nooks and crannies of the world. Ideas popped up and chopped up. One still growing and stumbling is a fancy distributed ledger we call cryptocurrency. It was supposed to give a chance for decent humans to break off from the arm-enforced fiat currency. So far, it keeps us waiting still.
Released in 2018, an NFT game called Axie Infinity has suffered its ups and downs. It has enabled a phenomenon of digital colonialism culture. It got robbed 620 million USD worth of cryptocurrency. Its Smooth Love Potion (SLP) token crashed. But the game is still on. Other than SLP, it has Axie Infinity Shards. What is it? While SLP was fully circulated and mainly used to breed Axie monster, AXS is an in game currency that let holders to govern, to stake, and to play.
In this document, I apply time series and forecasting analysis to predict the price of AXS.
cryptodatadownload.com
CSV
2022 June 20
| Feature | Description |
|---|---|
| Unix Timestamp | This is the unix timestamp or also known as “Epoch Time”. Use this to convert to your local timezone |
| Date | This timestamp is converted to NY EST Standard Time |
| Symbol | The symbol for which the timeseries data refers |
| Open | This is the opening price of the time period |
| High | This is the highest price of the time period |
| Low | This is the lowest price of the time period |
| Close | This is the closing price of the time period |
| Volume (Crypto) | This is the volume in the transacted Ccy. Ie. For BTC/USDT, this is in BTC amount |
| Volume Base Ccy | This is the volume in the base/converted ccy. Ie. For BTC/USDT, this is in USDT amount |
| Trade Count | This is the unique number of trades for the given time period |
library(tseries)
library(lubridate)
library(MLmetrics)
library(readr)
library(flextable)
library(fable)
library(tsibble)
library(tidyverse)
library(feasts)
library(seasonal)
library(fpp3)
axs_usd <- read_csv("Binance_AXSUSDT_1h.csv",
skip = 1)
head(axs_usd)
## # A tibble: 6 x 10
## unix date symbol open high low close `Volume AXS`
## <dbl> <dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1656288000000 2022-06-27 00:00:00 AXS/US~ 16.2 16.4 16 16.3 45607.
## 2 1656284400000 2022-06-26 23:00:00 AXS/US~ 16.4 16.5 16.1 16.2 64406.
## 3 1656280800000 2022-06-26 22:00:00 AXS/US~ 16.9 16.9 16.2 16.4 73052.
## 4 1656277200000 2022-06-26 21:00:00 AXS/US~ 17.1 17.1 16.9 16.9 14764.
## 5 1656273600000 2022-06-26 20:00:00 AXS/US~ 16.8 17.2 16.8 17.1 23015.
## 6 1656270000000 2022-06-26 19:00:00 AXS/US~ 16.6 16.9 16.6 16.8 28427.
## # ... with 2 more variables: `Volume USDT` <dbl>, tradecount <dbl>
We’ll pick the date, symbol, and closing price.
anyNA(axs_usd)
## [1] FALSE
Great, no missing data.
str(axs_usd)
## spec_tbl_df [3,740 x 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ unix : num [1:3740] 1.66e+12 1.66e+12 1.66e+12 1.66e+12 1.66e+12 ...
## $ date : POSIXct[1:3740], format: "2022-06-27 00:00:00" "2022-06-26 23:00:00" ...
## $ symbol : chr [1:3740] "AXS/USDT" "AXS/USDT" "AXS/USDT" "AXS/USDT" ...
## $ open : num [1:3740] 16.2 16.4 16.9 17.1 16.8 ...
## $ high : num [1:3740] 16.4 16.5 16.9 17.1 17.2 ...
## $ low : num [1:3740] 16 16.1 16.2 16.9 16.8 ...
## $ close : num [1:3740] 16.3 16.2 16.4 16.9 17.1 ...
## $ Volume AXS : num [1:3740] 45607 64406 73052 14764 23015 ...
## $ Volume USDT: num [1:3740] 741142 1050515 1202355 250871 391031 ...
## $ tradecount : num [1:3740] 3183 3665 4804 1214 1829 ...
## - attr(*, "spec")=
## .. cols(
## .. unix = col_double(),
## .. date = col_datetime(format = ""),
## .. symbol = col_character(),
## .. open = col_double(),
## .. high = col_double(),
## .. low = col_double(),
## .. close = col_double(),
## .. `Volume AXS` = col_double(),
## .. `Volume USDT` = col_double(),
## .. tradecount = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
I want to change date data type with lubridate for ease of use.
axs_usd_selected <- axs_usd %>%
mutate(date_hour = ymd_hms(date),
close_price = close) %>%
select(date_hour, close_price) %>%
arrange(date_hour)
dim(axs_usd_selected)
## [1] 3740 2
head(axs_usd_selected)
## # A tibble: 6 x 2
## date_hour close_price
## <dttm> <dbl>
## 1 2022-01-22 05:00:00 54.4
## 2 2022-01-22 06:00:00 53.8
## 3 2022-01-22 07:00:00 52.4
## 4 2022-01-22 08:00:00 52.6
## 5 2022-01-22 09:00:00 50.2
## 6 2022-01-22 10:00:00 51.7
tail(axs_usd_selected)
## # A tibble: 6 x 2
## date_hour close_price
## <dttm> <dbl>
## 1 2022-06-26 19:00:00 16.8
## 2 2022-06-26 20:00:00 17.1
## 3 2022-06-26 21:00:00 16.9
## 4 2022-06-26 22:00:00 16.4
## 5 2022-06-26 23:00:00 16.2
## 6 2022-06-27 00:00:00 16.3
start_hour <- min(axs_usd_selected$date_hour)
end_hour <- max(axs_usd_selected$date_hour)
start_hour
## [1] "2022-01-22 05:00:00 UTC"
end_hour
## [1] "2022-06-27 UTC"
Create a time series from axs_usd_selected
axs_tsb <- axs_usd_selected %>%
as_tsibble(index = date_hour)
axs_tsb %>% head()
## # A tsibble: 6 x 2 [1h] <UTC>
## date_hour close_price
## <dttm> <dbl>
## 1 2022-01-22 05:00:00 54.4
## 2 2022-01-22 06:00:00 53.8
## 3 2022-01-22 07:00:00 52.4
## 4 2022-01-22 08:00:00 52.6
## 5 2022-01-22 09:00:00 50.2
## 6 2022-01-22 10:00:00 51.7
Check for implicit missing date.
axs_tsb %>% has_gaps()
## # A tibble: 1 x 1
## .gaps
## <lgl>
## 1 FALSE
Check for regularity.
axs_tsb %>% is_regular()
## [1] TRUE
Add week and month.
axs_tsb <- axs_tsb %>% mutate(
day = day(date_hour),
date = date(date_hour),
week = yearweek(date_hour),
month = yearmonth(date_hour))
axs_tsb %>% head()
## # A tsibble: 6 x 6 [1h] <UTC>
## date_hour close_price day date week month
## <dttm> <dbl> <int> <date> <week> <mth>
## 1 2022-01-22 05:00:00 54.4 22 2022-01-22 2022 W03 2022 Jan
## 2 2022-01-22 06:00:00 53.8 22 2022-01-22 2022 W03 2022 Jan
## 3 2022-01-22 07:00:00 52.4 22 2022-01-22 2022 W03 2022 Jan
## 4 2022-01-22 08:00:00 52.6 22 2022-01-22 2022 W03 2022 Jan
## 5 2022-01-22 09:00:00 50.2 22 2022-01-22 2022 W03 2022 Jan
## 6 2022-01-22 10:00:00 51.7 22 2022-01-22 2022 W03 2022 Jan
Bellow, I’m going to plot the axs_tsb to various plots.
hourly_axs <- axs_tsb %>% autoplot(close_price) +
labs(title = "AXS Hourly Closing Price in 2022",
y = "Close Price",
x = NULL)
hourly_axs
date_axs <- axs_tsb %>%
index_by(date) %>%
summarise(sum = sum(close_price)) %>%
autoplot(sum) +
labs(title = "AXS Daily Total Closing Price",
y = "Close Price",
x = NULL)
date_axs
date_axs_avg <- axs_tsb %>%
index_by(date) %>%
summarise(avg = mean(close_price)) %>%
autoplot(avg) +
labs(title = "AXS Average Daily Closing Price",
y = "Close Price",
x = NULL)
date_axs_avg
week_axs <- axs_tsb %>%
index_by(week) %>%
summarise(sum = sum(close_price)) %>%
autoplot(sum) +
labs(title = "AXS Total Weekly Closing Price",
y = "Close Price",
x = NULL)
week_axs
week_axs_avg <- axs_tsb %>%
index_by(week) %>%
summarise(avg = mean(close_price)) %>%
autoplot(avg) +
labs(title = "AXS Average Weekly Closing Price",
y = "Close Price",
x = NULL)
week_axs_avg
month_axs <- axs_tsb %>%
index_by(month) %>%
summarise(sum = sum(close_price)) %>%
autoplot(sum) +
labs(title = "AXS Total Monthly Closing Price",
y = "Close Price",
x = NULL)
month_axs
month_axs_avg <- axs_tsb %>%
index_by(month) %>%
summarise(avg = mean(close_price)) %>%
autoplot(avg) +
labs(title = "AXS Average Monthly Closing Price",
y = "Close Price",
x = NULL)
month_axs_avg
Let’s look if there’s any seasonality.
s_daily_axs <- axs_tsb %>% gg_season(close_price,
label = "none",
polar = F,
period = "day"
) +
labs(title = "AXS Daily Comparison",
y = "Close Price",
x = NULL)
s_daily_axs
There’s no obvious pattern here.
s_weekly_axs <- axs_tsb %>% gg_season(close_price,
label = "none",
polar = F,
period = "week"
) +
labs(title = "AXS Weekly Comparison",
y = "Close Price",
x = NULL)
s_weekly_axs
Not a pattern found in weekly chunk either.
s_monthly_axs <- axs_tsb %>% gg_season(close_price,
label = "right",
polar = F,
period = "month"
) +
labs(title = "AXS Monthly Comparison",
y = "Close Price",
x = NULL)
s_monthly_axs
From the plot above, I see:
In the autocorrelation plots bellow, the vertical lines are r. These vertical lines measures the relationship between value in its date point to a lagged version of its date point. Meanwhile the dashed blue lines are the minimum limit of r to be considered significant.
axs_tsb %>%
ACF(close_price) %>%
autoplot()
I see a tiny drop in trend.
axs_tsb %>%
index_by(day) %>%
summarise(sum =sum(close_price)) %>%
ACF(sum) %>%
autoplot()
There’s down and up trend. But not so sure about seasonality.
axs_tsb %>%
index_by(week) %>%
summarise(sum =sum(close_price)) %>%
ACF(sum) %>%
autoplot()
A clear downtrend.
axs_tsb %>%
index_by(month) %>%
summarise(sum =sum(close_price)) %>%
ACF(sum) %>%
autoplot()
There’s not enough correlation for the r to be significant. Time series like this is called white noise.
Let’s decompose the daily data.
date_axs
axs_tsb %>%
index_by(date) %>%
summarise(sum = sum(close_price)) %>%
model(
classical_decomposition(sum, type = "additive")
) %>%
components %>%
autoplot() +
labs(title = "Classical Additive Decomposition of AXS Daily Closing Price")
There seems to be a daily seasonality of up and down at the difference of 10 USD of AXS. But since classical decomposition is unable to address seasonality shift over time, we need to proceed to the next method.
Seasonal and Trend decomposition using Loess method is versatile and robust but we have to determine trend(window = ?) and season(window=?) for how quick seasonal components change.
According to Cleveland et al, window size should odd and at least 7. If we choose NULL, the function will choose automatically.
axs_tsb %>%
index_by(date) %>%
summarise(sum = sum(close_price)) %>%
model(
STL(sum ~ season(window = NULL) ,
robust = T)
) %>%
components() %>%
autoplot() +
labs(title = "STL Decomposition of AXS Daily Closing Price")
If we choose periodic for the window season, it will force uniform seasonality for the whole time series.
axs_tsb %>%
index_by(date) %>%
summarise(sum = sum(close_price)) %>%
model(
STL(sum ~ season(window = "periodic"),
robust = T)
) %>%
components() %>%
autoplot() +
labs(title = "STL Decomposition of AXS Daily Closing Price")
But for this time series, we have to ask, how fast crypto currency seasonality changes? It depends on the currency of course, but generally, crypto currencies are quite volatile. We should use window smaller number. Also, the default trend window for monthly data is 21, we should use number smaller than that.
axs_tsb %>%
index_by(date) %>%
summarise(sum = sum(close_price)) %>%
model(
STL(sum ~ season(window = 7) + trend(window = 5),
robust = T)
) %>%
components() %>%
autoplot() +
labs(title = "STL Decomposition of AXS Daily Closing Price")
axs_daily <- axs_tsb %>%
index_by(date) %>%
summarise(sum = sum(close_price))
head(axs_daily)
## # A tsibble: 6 x 2 [1D]
## date sum
## <date> <dbl>
## 1 2022-01-22 968.
## 2 2022-01-23 1246.
## 3 2022-01-24 1188.
## 4 2022-01-25 1214.
## 5 2022-01-26 1248.
## 6 2022-01-27 1151.
nrow(axs_daily)
## [1] 157
nrow_test <- round(0.2 * nrow(axs_daily))
test_axs <- tail(axs_daily, n = nrow_test)
train_axs <- head(axs_daily, n = -nrow_test)
fit <- train_axs %>% model(avg = MEAN(sum),
naive = NAIVE(sum),
snaive = SNAIVE(sum ~ lag("month")),
drift = RW(sum ~ drift())
)
fit %>%
forecast(h = "31 days") %>%
autoplot(train_axs, level = NULL) +
autolayer(test_axs)
accuracy(fit %>% forecast(h = "31 days"),
test_axs)
## # A tibble: 4 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 avg Test -715. 722. 715. -389. 389. NaN NaN 0.469
## 2 drift Test 26.7 84.4 61.8 -51.8 70.6 NaN NaN 0.260
## 3 naive Test -40.4 110. 79.0 -94.2 102. NaN NaN 0.469
## 4 snaive Test -212. 264. 212. -224. 224. NaN NaN 0.0653
From the plot, Drift got the trend right, and Seasonal Naive got the seasonality closer. But from the metrics, drift is best. So let’s use it as the benchmark.
fit_ets <- train_axs %>%
model(
drift = RW(sum ~ drift()),
additive = ETS(sum ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(sum ~ error("M") + trend("A") + season("M"))
)
fit_ets %>%
forecast(h = "31 days") %>%
autoplot(train_axs, level = NULL) +
autolayer(test_axs)
accuracy(fit_ets %>% forecast(h = "31 days"),
test_axs)
## # A tibble: 3 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 additive Test 76.3 105. 89.7 -18.7 57.7 NaN NaN 0.196
## 2 drift Test 26.7 84.4 61.8 -51.8 70.6 NaN NaN 0.260
## 3 multiplicative Test 100. 124. 110. -4.06 54.8 NaN NaN 0.226
The multiplicative ETS has the most apropriate trend if we only looked at the plot, but if we dived into the metrics, drift has the least RMSE.
We seen how the data can be decomposed. Let’s decompose it to train_axs date range then have the seasonally adjusted data to model with drift.
Take the decomposition config to train_axs and test_axs.
comp_axs_tsb <- train_axs %>%
model(
STL(sum ~ season(window = 7) + trend(window = 5),
robust = T)
) %>%
components() %>% select(-.model)
test_axs_s.adjust <- test_axs %>%
model(
STL(sum ~ season(window = 7) + trend(window = 5),
robust = T)
) %>%
components() %>% select(-.model)
fit_decomp_drift <-
comp_axs_tsb %>%
model(drift = RW(season_adjust ~ drift())
)
fit_decomp_drift %>%
forecast(h = "31 days") %>%
autoplot(comp_axs_tsb, level = NULL)
accuracy(fit_decomp_drift %>% forecast(h = "31 days"),
test_axs_s.adjust)
## # A tibble: 1 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift Test 34.1 83.4 62.9 -9.78 30.4 NaN NaN 0.256
Let’s compare.
accuracy(fit_ets %>% forecast(h = "31 days"),
test_axs) %>% filter(.model == "drift")
## # A tibble: 1 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift Test 26.7 84.4 61.8 -51.8 70.6 NaN NaN 0.260
The seasonally adjusted data gives drift model better in RMSE and MAPE.
Let’s review the plot.
train_axs %>% autoplot(sum)
Applying ARIMA requires the data to be stationary. Since it’s not a stationary plot, I need to do some differencing. A bit of cox_box transformation might be needed to regularize the variance. But is it seasonal? As we’ve seen in the STL Decomposition, the plot initially has some seasonalitity but it fades halfway to the end.
# before transformation
train_axs %>% autoplot(sum)
# get lambda
lambda <- train_axs %>% features(sum, features = guerrero)
# stabilize train_axs
train_axs_stable <- train_axs %>%
mutate(sum = box_cox(sum, lambda))
train_axs_stable %>% autoplot(sum)
Don’t forget to transform the test data as well.
test_axs %>% autoplot(sum)
test_axs_stable <- test_axs %>%
mutate(sum = box_cox(sum, lambda))
test_axs_stable %>% autoplot(sum)
Check stationarity with unit root test. To check how many differencing we need to remove trend we use unitroot_ndiffs. To verify whether or not we need to apply seasonal differening we run unitroot_nsdiffs.
train_axs_stable %>% features(sum, unitroot_ndiffs)
## # A tibble: 1 x 1
## ndiffs
## <int>
## 1 2
train_axs_stable %>% features(sum, unitroot_nsdiffs)
## # A tibble: 1 x 1
## nsdiffs
## <int>
## 1 0
The results above show that we have to perform 2 differencing and 0 seasonal differencing.
Let’s check the autocorrelation plots.
train_axs_stable %>%
gg_tsdisplay(difference(sum, differences = 2), plot_type = "partial")
The PACF is sinusoidal and there’s a significant spike in the ACF but none beyond lag q. The data may follow ARIMA(0, d, q).
The value of d is 2, since we needed to apply double differencing. PACF plot suggests p = 0, 1, 2. ACF plot suggests q = 0, 1.
The data may follow ARIMA(0, d, q) if the PACF is sinusoidal and there’s a significant spike in the ACF but none beyond lag q.
ARIMA Model we can build from these are:
ARIMA(p, q, d):
Let’s try to build them all and see if ARIMA(0, 2, 1) is the best because PACF is sinusoidal and there’s a spike in initial ACF but none beyond lag q.
fit_arima <-
train_axs_stable %>%
model(
arima020 = ARIMA(sum ~ pdq(0, 2 , 0)),
arima120 = ARIMA(sum ~ pdq(1, 2, 0)),
arima220 = ARIMA(sum ~ pdq(2, 2, 0)),
arima021 = ARIMA(sum ~ pdq(0, 2, 1)),
arima121 = ARIMA(sum ~ pdq(1, 2, 1)),
arima221 = ARIMA(sum ~ pdq(2, 2, 1)),
stepwise = ARIMA(sum),
search = ARIMA(sum, stepwise = F)
)
fn <- names(fit_arima)
fit_arima %>% pivot_longer(fn,
names_to = "model",
values_to = "orders") %>%
glance() %>%
arrange(AICc)
## # A tibble: 8 x 9
## model .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima021 orders 0.129 -49.5 103. 103. 109. <cpl [0]> <cpl [1]>
## 2 stepwise orders 0.129 -49.5 103. 103. 109. <cpl [0]> <cpl [1]>
## 3 search orders 0.129 -49.5 103. 103. 109. <cpl [0]> <cpl [1]>
## 4 arima121 orders 0.128 -48.6 103. 103. 112. <cpl [1]> <cpl [1]>
## 5 arima221 orders 0.129 -48.5 105. 105. 116. <cpl [2]> <cpl [1]>
## 6 arima220 orders 0.157 -60.5 127. 127. 135. <cpl [2]> <cpl [0]>
## 7 arima120 orders 0.174 -66.9 138. 138. 144. <cpl [1]> <cpl [0]>
## 8 arima020 orders 0.202 -76.8 156. 156. 158. <cpl [0]> <cpl [0]>
Indeed arima021 has the best AICc; exactly the same performance with the arima model from stepwise and search.
fit_arima %>%
forecast(h = "31 days") %>%
autoplot(train_axs_stable, level = NULL) +
autolayer(test_axs_stable)
fit_arima %>%
forecast(h = "31 days") %>%
autoplot(fit_arima$arima021$fitted, level = NULL)
Let’s see their accuracy on the test data set.
accuracy(fit_arima %>% forecast(h = "31 days"),
test_axs_stable)
## # A tibble: 8 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima020 Test 6.18 6.95 6.18 45.3 45.3 NaN NaN 0.758
## 2 arima021 Test 0.958 1.92 1.44 2.40 14.2 NaN NaN -0.0445
## 3 arima120 Test 4.89 5.54 4.89 34.8 34.8 NaN NaN 0.645
## 4 arima121 Test 0.931 1.91 1.42 2.15 14.1 NaN NaN -0.0440
## 5 arima220 Test 3.71 4.28 3.84 25.1 28.3 NaN NaN 0.466
## 6 arima221 Test 0.940 1.91 1.42 2.23 14.1 NaN NaN -0.0446
## 7 search Test 0.958 1.92 1.44 2.40 14.2 NaN NaN -0.0445
## 8 stepwise Test 0.958 1.92 1.44 2.40 14.2 NaN NaN -0.0445
We discovered that in the test data, arima121 actually performed a bit better than arima021. But is it better than the seasonally adjusted ETS drift?
accuracy(fit_arima %>% forecast(h = "31 days"),
test_axs_stable) %>% filter(.model=="arima121")
## # A tibble: 1 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima121 Test 0.931 1.91 1.42 2.15 14.1 NaN NaN -0.0440
accuracy(fit_decomp_drift %>% forecast(h = "31 days"),
test_axs_s.adjust)
## # A tibble: 1 x 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 drift Test 34.1 83.4 62.9 -9.78 30.4 NaN NaN 0.256
We conclude that arima121 has better overall metrics.
Let’s train the whole data to predict AXS close price in the next month.
Review the data.
head(axs_daily)
## # A tsibble: 6 x 2 [1D]
## date sum
## <date> <dbl>
## 1 2022-01-22 968.
## 2 2022-01-23 1246.
## 3 2022-01-24 1188.
## 4 2022-01-25 1214.
## 5 2022-01-26 1248.
## 6 2022-01-27 1151.
tail(axs_daily)
## # A tsibble: 6 x 2 [1D]
## date sum
## <date> <dbl>
## 1 2022-06-22 342.
## 2 2022-06-23 347.
## 3 2022-06-24 393.
## 4 2022-06-25 426.
## 5 2022-06-26 412.
## 6 2022-06-27 16.3
axs_daily %>% autoplot()
# get lambda
lambda <- train_axs %>% features(sum, features = guerrero)
# stabilize train_axs
axs_daily_stable <- axs_daily %>%
mutate(sum = box_cox(sum, lambda))
axs_daily_stable %>% autoplot(sum)
axs_daily_stable %>% features(sum, unitroot_ndiffs)
## # A tibble: 1 x 1
## ndiffs
## <int>
## 1 1
axs_daily_stable %>% features(sum, unitroot_nsdiffs)
## # A tibble: 1 x 1
## nsdiffs
## <int>
## 1 0
We need 1 differencing. d = 1.
Investigate autocorrelation plots.
axs_daily_stable %>%
gg_tsdisplay(difference(sum), plot_type = "partial")
There’s no significant autocorrelation, it will be 0 for both p and q.
The most likely model is ARIMA (0, 1, 0). But this can’t be tested, since there’s no label for the future. We can compare this to the ARIMA (1, 2, 1) that has been tested on test data.
predict_m <-
axs_daily_stable %>%
model(
arima010 = ARIMA(sum ~ pdq(0, 1 , 0)),
arima121 = ARIMA(sum ~ pdq(1, 2, 1)),
)
fn <- names(predict_m)
predict_m %>% pivot_longer(fn,
names_to = "model",
values_to = "orders") %>%
glance() %>%
arrange(AICc)
## # A tibble: 2 x 9
## model .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 arima010 orders 0.807 -205. 411. 411. 414. <cpl [0]> <cpl [0]>
## 2 arima121 orders 0.815 -204. 416. 416. 428. <cpl [8]> <cpl [1]>
It’s a close one, but arima010 is better overall.
It is ideal for innovation residuals of a forecasting model to have these:
Uncorelated. Any correlation between innovation residuals would suggest information left that should be used up in computing forecast.
The mean of the innovation residuals should be zero. If it’s not zero, the forecasting is biased.
Other extra desirable properties include normally distributed residuals and residuals heteroscedasticity.
arima010 <-
axs_daily_stable %>%
model(
arima010 = ARIMA(sum ~ pdq(0, 1, 0))
)
arima121 <-
axs_daily_stable %>%
model(
arima010 = ARIMA(sum ~ pdq(1, 2, 1))
)
Correlation Check
\(H_0\): residual has no-autocorrelation \(H_1\): residual has autocorrelation
res <- arima010 %>%
augment() %>%
pull(.innov)
Box.test(res, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: res
## X-squared = 0.17419, df = 1, p-value = 0.6764
p-value = 0.6 means there is a lack of evidence to reject the null hypothesis that the residual of arima010 has no-autocorrelation.
res <- arima121 %>%
augment() %>%
pull(.innov)
Box.test(res, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: res
## X-squared = 0.020569, df = 1, p-value = 0.886
p-value = 0.8 means there is a lack of evidence to reject the null hypothesis that the residual of arima121 has no-autocorrelation.
Both models passed the check but arima 010 does a bit better.
Mean Check
res <- arima010 %>%
augment() %>%
pull(.innov)
mean(res)
## [1] -0.09393018
res <- arima121 %>%
augment() %>%
pull(.innov)
mean(res)
## [1] -0.1105915
They’re not exactly zero but close, and arima010 is better than arima121.
Distribution Check.
arima010 %>%
augment() %>%
ggplot(aes(x = .innov)) +
geom_histogram(binwidth = 0.1) +
labs(title = "Histogram of ARIMA (0, 1, 0) Residuals")
arima121 %>%
augment() %>%
ggplot(aes(x = .innov)) +
geom_histogram(binwidth = 0.1) +
labs(title = "Histogram of ARIMA (1, 2, 1) Residuals")
Look normally distributed enough for both models.
gg_tsresiduals(arima010, type = "response")
gg_tsresiduals(arima121, type = "response")
Both are great and all but arima010 wins and let’s predict with it.
arima010 %>%
forecast(h = "31 days") %>%
autoplot(axs_daily_stable, level = NULL)
arima010 expect AXS to stay stagnant and low for the next 31 days.
I attempted this time series analysis on AXS to help me understand a little bit more about the situation. The data so far has a down trend with some semblance of seasonality-but-not-really kind of thing. Non-seasonal ARIMA fits the data quite well when compare to other models. And, the most important thing here is now I do not expect AXS price to rise again in the immediate future. Perhaps holders should sell. Perhaps they should keep it and just forget about it, hoping one day some day far far away in far future it will rise again from the ashes.
Take care folks.