In this exercise, I will use the time series data of two commodies, iron ore and steel, from Chinese Future market as my target. Because these two product are expected to be highly correlated, I would use models to capture their unexpected price deviation and generate trading signals based on those unexpectations. To select the most suitable model, I will compare the performance of three models, ETS model, ARIMA model, and Neural Network model.
I download iron contract price from the website of Dalian commodity Exchange, http://www.dce.com.cn/, and steel price from Shanghai Futures Exchange, http://www.shfe.com.cn/.
rm(list = ls())
cat("\f")
packages <- c("fpp2", "seasonal", "readxl", "stargazer", "rugarch")
for (i in 1:length(packages)){
if(!packages[i] %in% installed.packages()){
install.packages(packages[i])
}
library(packages[i], character.only = T)
}
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## -- Attaching packages ---------------------------------------------- fpp2 2.4 --
## v ggplot2 3.3.5 v fma 2.4
## v forecast 8.15 v expsmooth 2.3
##
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
## 载入需要的程辑包:parallel
##
## 载入程辑包:'rugarch'
## The following object is masked from 'package:stats':
##
## sigma
rm(packages)
setwd("C:/Users/admin/Desktop/study/hw3")
steel <- read.delim("steel.txt")
iron <- read.delim("iron.txt")
head(steel)
## 日期 开盘 最高 最低 收盘 成交量 持仓量 结算价
## 1 2013/08/19 3802 3843 3785 3812 1892324 1552992 3814
## 2 2013/08/20 3810 3828 3792 3796 1517702 1548418 3811
## 3 2013/08/21 3796 3807 3786 3789 1150250 1559268 3797
## 4 2013/08/22 3777 3799 3762 3769 1747080 1579348 3777
## 5 2013/08/23 3772 3820 3771 3817 2136050 1553756 3803
## 6 2013/08/26 3827 3848 3802 3813 2048358 1536606 3822
head(iron)
## 日期 开盘 最高 最低 收盘 成交量 持仓量 结算价
## 1 2013/10/18 978 984 962 977 300818 65042 975
## 2 2013/10/21 976 977 960 969 95432 68130 967
## 3 2013/10/22 963 966 948 948 91296 76004 956
## 4 2013/10/23 949 953 936 939 116490 81162 944
## 5 2013/10/24 938 939 916 926 169028 87872 928
## 6 2013/10/25 928 936 917 918 104060 86472 927
Several data points are zero and NA, which could be due to technique errors. Therefore, I delete rows containing NAs and replace zeros with the values of previous data. The plot looks better now.
steel.adj.price <- ts(na.omit(steel$结算价), start = as.Date("2014-01-01")
, end = as.Date("2020-11-30"))
iron.adj.price <- ts(na.omit(iron$结算价), start = as.Date("2014-01-01")
, end = as.Date("2020-11-30"))
autoplot(steel.adj.price, series = "steel")
steel.adj.price[which.min(steel.adj.price)] <- steel.adj.price[which.min(steel.adj.price)-1]
autoplot(steel.adj.price, series = "steel")
autoplot(iron.adj.price, series = "iron")
for(i in which(iron.adj.price == 0)) {
iron.adj.price[i] <- iron.adj.price[i-1]
}
autoplot(iron.adj.price, series = "iron")
Before model selection, I need to check the stationarity of the data. I first difference the log return and use these two log return value to generate a deviation log return, which is the target I would use models to forecast. If the forecast deviation is higher than actual one or lower than the acutal, trading signal will be generated correspondingly.
steel.return <- log10(steel.adj.price / lag(steel.adj.price, 1))
iron.return <- log10(iron.adj.price / lag(iron.adj.price, 1))
ggAcf(steel.return)
ggAcf(iron.return)
steel.iron.return.diff <- steel.return - iron.return
plot(steel.iron.return.diff)
ggAcf(steel.iron.return.diff)
To better evaluate the performance of each model, I also download 100-day price data starting from the end of the training data as my test data. I will compare the accuracy of each model to select the best model. The accuracy test shows that neutral net work models are the best model.
test.steel <- ts(na.omit(steel$结算价), start = as.Date("2020-11-30") )
test.iron <- ts(na.omit(iron$结算价), start = as.Date("2020-11-30")
, end = as.Date("2020-11-30") + 101 )
test.return.diff <- log10(test.steel / lag(test.steel, 1)) - log10(test.iron / lag(test.iron, 1))
ets.fit <- ets(steel.iron.return.diff)
checkresiduals(ets.fit)
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 117.46, df = 8, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 10
arima.fit <- auto.arima(steel.iron.return.diff)
checkresiduals(arima.fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,0,1) with zero mean
## Q* = 16.512, df = 9, p-value = 0.05693
##
## Model df: 1. Total lags used: 10
nt.fit <- nnetar(steel.iron.return.diff)
checkresiduals(nt.fit)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
accuracy(forecast(ets.fit, h = 100), test.return.diff)
## ME RMSE MAE MPE MAPE MASE
## Training set 5.710209e-06 0.011220361 0.007531200 Inf Inf 0.7825206
## Test set -7.635275e-04 0.005377468 0.003643917 100.2441 100.6359 0.3786170
## ACF1 Theil's U
## Training set 0.1954390 NA
## Test set 0.1406888 1.004771
accuracy(forecast(arima.fit, h = 100), test.return.diff)
## ME RMSE MAE MPE MAPE MASE
## Training set -6.893402e-05 0.010990489 0.007366322 NaN Inf 0.7653891
## Test set -8.388981e-04 0.005385542 0.003641728 99.7332 99.7332 0.3783895
## ACF1 Theil's U
## Training set -0.002049981 NA
## Test set 0.140082087 1.005565
accuracy(forecast(nt.fit, h = 100), test.return.diff)
## ME RMSE MAE MPE MAPE MASE
## Training set -2.237983e-05 0.010165180 0.007130760 NaN Inf 0.7409134
## Test set -1.042753e-03 0.005459424 0.003687927 99.70581 104.5555 0.3831897
## ACF1 Theil's U
## Training set 0.03720757 NA
## Test set 0.14896713 1.022907
When the forecast log return deviation is higher than the actual one, I interpret that the actual deviation might be underestimated, so I should buy the expensive one and short the cheap one. Therefore, positive log return of steel and negative log return of iron should be added up to generate daily return. When situation is inverse, trading signals are also inverse. The profit of 100 days trading is shown below. We can see a downward profit plot, showing this type of trading algorithms is flawed. I think potential improvement could be setting up a suitable threshold. Only when there is an extreme deviation, I trigger the signal. However, to find the best threshold needs optimazation methods, such as corss validation, which will be my future attemps to fix this model.
fnt <- forecast(nt.fit, h = 100)
autoplot(fnt)
profit <- rep(NA, 101)
for(i in 2:100){
profit[1] = 0
if(fnt$mean[i] > test.return.diff[i]){
profit[i] = profit[i-1] + test.return.diff[i]
} else {
profit[i] = profit[i-1] - test.return.diff[i]
}
}
plot(profit)