Pairs Trading: Performance of a Relative-Value Arbitrage Rule

Pairs Trading Strategies : 概念

當期價差= 𝑠𝑡𝑜𝑐𝑘𝐴 漲跌 −∆× 𝑠𝑡𝑜𝑐𝑘𝐵漲跌

當 Pairs Opening 時,持有直到兩者價差有發生相反情況時,再進行平倉。

Mean Reverting (均數回歸)
- 利率
- 連動性高的股票股價比例

針對資料先畫圖顯示分佈,在挑選可能的模型 Vasicek 可能有負數,CIR一律是正值 GAM (兩個常態分佈 機率組合)

模型使用的資料
- 共整合因子(用價差)
- 財務隨機模型(用比例)

使用比率(ratio)的動機

Bollinger Bands

實務上Bollinger Bands計算方式
- 帶狀中心線=20天移動平均線(20MA) - 帶狀上限=帶狀中心線+兩個標準差
- 帶狀下限=帶狀中心線-兩個標準差

期間 標準差
10 1.9
20 2.0
50 2.1

注意事項 (From John Bollinger, 2001)

交易策略指標設計 (傳統Bollinger Bands)

  1. 由Bollinger bands 定義出發,本策略由過去歷史資料(如20天、30 天、60天),估出具Exponential Vasicek(mean-reverting)的隨機過程參數

  2. 由過去資料所估計之隨機過程,Mote Carlo模擬未來一期(如1天、5分,3分)之所有可能路徑(如10000條路徑)。

  3. 由模擬之所有可能路徑中,計算樣本標準差,與樣本平均數。接著依Bollinger Band指標概念,算出帶狀中心線、帶狀上限與帶狀下限(如上下加減1個標準差或0.8個標準差)。

Vasicek Model

Vasicek模型指出短期利率會遵循隨機微分方程式

It assumes that the instantaneous spot rate (or ”short rate”) follows an Ornstein-Uhlenbeck process with constant coefficients under the statistical or objective measure used for historical estimation \({ dx_t = \alpha (\theta - x_t) dt + \sigma * dW_t }\) -\({ \theta }\): 回歸均值 -\({ \alpha }\): 回歸均職的速度 -\({ \sigma }\): 波動度,數值越大隨機性越明顯

透過 Ito Lemma積分,計算每個時間點的數值
\({ x_t = \theta (1 - e^{-\alpha (t - s)} )+ x_s * e ^ {-\alpha (t - s)} + \sigma e^{-\alpha t} \int_s^t(e^{-\alpha \mu} \, dW_u) }\)

任一時間點資料,可透過回歸模型估算下一個時間點的資料,加上一個隨機因子 \({ x(t_i) = c + b * x(t_{i-1}) + \delta * \epsilon (t_i) \ }\)

透過回歸模型取得
- 截距 \({c = \theta (1 - e^{-\alpha * dt }) }\)
- 斜率 \({b = e^{-\alpha * dt } }\)
- 殘差 \({\delta = \sigma \sqrt{ \frac{ (1- e^{2\alpha * dt} ) } {2 \alpha} } }\)

價格不會有負值,可以透過指數或平方,讓數值強制為正值,計算Vasicek參數時透過取 log / sqrt修正數值

ML_VasicekParams <- function(ratios, dt = 1/365, transform="log" ) {
       if(transform == "log") {
               ratios_new <- log(ratios)
       } else {
               ratios_new <- sqrt(ratios)
       }
       N <- length(ratios_new)
       x <- as.numeric(ratios_new[1: N-1][,1])
       y <- as.numeric(ratios_new[2: N][,1])
       
       data <- data.frame(x0 = x, x1 = y)
       #x1 = c + b * x0
       parm <- lm(x1~x0, data)
       
       #delta standard deviation of residuals
       delta <- sd( residuals(parm) )
       
       c <- as.numeric(parm$coefficients[1])
       b <- as.numeric(parm$coefficients[2])
       
       alpha <- log(b) / (-1 * dt) 
       theta <- c/ (1- exp(-alpha * dt))
       
       sigma <- delta / sqrt(( (1 - exp(-2 * alpha * dt)) / (2 * alpha) ) )
       vasicekParam <- list( alpha = alpha, theta = theta, sigma = sigma)
}

由過去資料所估計之隨機過程,Mote Carlo模擬未來一期(如1天、5分,3分)之所有可能路徑(如10000條路徑)。

VasicekSimulator <- function(r0, parms, dt = 1/365, steps, paths) {
        alpha <- parms[[1]]; 
        theta <- parms[[2]];
        sigma <- parms[[3]];
        
        randNum <- matrix(rnorm(steps*paths),nrow = steps, ncol = paths)
        rMat <- matrix(0,nrow = steps + 1, ncol = paths)
        rMat[1,] <- r0

        lapply(2:(steps+1), function(i) {
                preRate <- rMat[i-1,]
                dr <- alpha*(theta - preRate)*dt + sigma*randNum[i-1,]*sqrt(dt)
                rMat[i,] <<- preRate + dr
        })
        return (rMat)
}

使用quantmod

require(quantmod)
## Loading required package: quantmod
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
require(lubridate)
## Loading required package: lubridate

讀取 Apple, Google 股價

symbols <- c("AAPL", "GOOG")
getSymbols(symbols)
##     As of 0.4-0, 'getSymbols' uses env=parent.frame() and
##  auto.assign=TRUE by default.
## 
##  This  behavior  will be  phased out in 0.5-0  when the call  will
##  default to use auto.assign=FALSE. getOption("getSymbols.env") and 
##  getOptions("getSymbols.auto.assign") are now checked for alternate defaults
## 
##  This message is shown once per session and may be disabled by setting 
##  options("getSymbols.warning4.0"=FALSE). See ?getSymbols for more details.
## [1] "AAPL" "GOOG"

檢視資料

summary(GOOG)
##      Index              GOOG.Open       GOOG.High        GOOG.Low    
##  Min.   :2014-03-27   Min.   :494.7   Min.   :496.0   Min.   :487.6  
##  1st Qu.:2014-08-26   1st Qu.:533.8   1st Qu.:538.3   1st Qu.:529.7  
##  Median :2015-01-27   Median :555.1   Median :558.1   Median :548.5  
##  Mean   :2015-01-26   Mean   :571.9   Mean   :576.6   Mean   :566.3  
##  3rd Qu.:2015-06-28   3rd Qu.:587.4   3rd Qu.:589.6   3rd Qu.:582.2  
##  Max.   :2015-11-25   Max.   :757.5   Max.   :762.7   Max.   :751.8  
##    GOOG.Close     GOOG.Volume       GOOG.Adjusted  
##  Min.   :492.6   Min.   :    7900   Min.   :492.6  
##  1st Qu.:534.1   1st Qu.: 1403900   1st Qu.:534.1  
##  Median :555.0   Median : 1714350   Median :555.0  
##  Mean   :571.6   Mean   : 1988608   Mean   :571.6  
##  3rd Qu.:586.7   3rd Qu.: 2220025   3rd Qu.:586.7  
##  Max.   :756.6   Max.   :11164900   Max.   :756.6
summary(AAPL)
##      Index              AAPL.Open        AAPL.High        AAPL.Low    
##  Min.   :2007-01-03   Min.   : 79.39   Min.   : 82.0   Min.   : 78.2  
##  1st Qu.:2009-03-25   1st Qu.:124.50   1st Qu.:126.1   1st Qu.:122.6  
##  Median :2011-06-14   Median :201.16   Median :202.9   Median :198.7  
##  Mean   :2011-06-15   Mean   :280.01   Mean   :282.8   Mean   :276.7  
##  3rd Qu.:2013-09-05   3rd Qu.:427.57   3rd Qu.:431.6   3rd Qu.:423.3  
##  Max.   :2015-11-25   Max.   :702.41   Max.   :705.1   Max.   :699.6  
##    AAPL.Close     AAPL.Volume        AAPL.Adjusted   
##  Min.   : 78.2   Min.   : 14479600   Min.   : 10.40  
##  1st Qu.:124.5   1st Qu.: 75455275   1st Qu.: 22.79  
##  Median :200.6   Median :115645600   Median : 46.92  
##  Mean   :279.9   Mean   :144703609   Mean   : 53.65  
##  3rd Qu.:427.6   3rd Qu.:188165950   3rd Qu.: 76.36  
##  Max.   :702.1   Max.   :843242400   Max.   :131.38
head(GOOG)
##            GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume
## 2014-03-27  568.0026  568.0026 552.9225   558.4626       13100
## 2014-03-28  561.2025  566.4326 558.6725   559.9925       41200
## 2014-03-31  566.8926  567.0026 556.9325   556.9725       10800
## 2014-04-01  558.7126  568.4526 558.7126   567.1626        7900
## 2014-04-02  599.9927  604.8328 562.1926   567.0026      147100
## 2014-04-03  569.8526  587.2827 564.1326   569.7426     5099200
##            GOOG.Adjusted
## 2014-03-27      558.4626
## 2014-03-28      559.9925
## 2014-03-31      556.9725
## 2014-04-01      567.1626
## 2014-04-02      567.0026
## 2014-04-03      569.7426
head(AAPL)
##            AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume
## 2007-01-03     86.29     86.58    81.90      83.80   309579900
## 2007-01-04     84.05     85.95    83.82      85.66   211815100
## 2007-01-05     85.77     86.20    84.40      85.05   208685400
## 2007-01-08     85.96     86.53    85.28      85.47   199276700
## 2007-01-09     86.45     92.98    85.15      92.57   837324600
## 2007-01-10     94.75     97.80    93.45      97.00   738220000
##            AAPL.Adjusted
## 2007-01-03      11.14677
## 2007-01-04      11.39418
## 2007-01-05      11.31304
## 2007-01-08      11.36891
## 2007-01-09      12.31332
## 2007-01-10      12.90259
par(mfrow=c(2,2))

plot(index(GOOG), GOOG$GOOG.Adjusted,type="l",col="red")
plot(index(AAPL), AAPL$AAPL.Adjusted,type="l",col="blue")
ratios <- GOOG$GOOG.Adjusted / AAPL$AAPL.Adjusted
plot(index(ratios), ratios, type="l",col="green")

依指定日期區間取得修正股票收盤價

getData <- function(beg, end, data){
        range <-new_interval(beg, end) 
        tData <- data[,6][index(data) %within% range]
        return (tData)
}

依指定日期區間取得配對交易的股價比例

getRatio <- function(beg, end, pair1 = GOOG, pair2 = AAPL){
        tPair2 <- getData(beg, end, pair2)
        tPair1 <- getData(beg, end, pair1)
        
        ratios <- tPair1 / tPair2 
        return (ratios)
}

依照輸入日期,取得歷史區間(60天)的股價比例

getHistoryRatio <- function(start, period = days(60), pair1 = GOOG, pair2 = AAPL){
        #print(start)
        end <- start - days(1)
        #print(end)
        beg <- start - period
        #print(beg)
        
        ratios <- getRatio(beg, end)
        return (ratios)
}

依照模擬股價值取得均值與上下限值,預設兩個標準差 (96信心比例)

getBand <- function(simu, cfvalue = 2) {
        simuR <- exp(simu)
        
        meanRatio <- apply(simuR, 1, mean)
        sdRatio <- apply(simuR, 1, sd)
        highRatio <-meanRatio + cfvalue * sdRatio
        lowRatio <-meanRatio - cfvalue * sdRatio
        #print(paste("meanRatio",meanRatio))
        #print(paste("sdRatio",sdRatio))
        band <- c(meanRatio[2], highRatio[2], lowRatio[2])
        #print(band)
        return (band)
}

回測每個交易日
1 .使用過去60天歷史資料
2. 透過Vasicek取的隨機參數
3. 透過蒙地卡羅透過隨機亂數模擬1000條曲線估算每個交易日的均值與上下限值
4. 彙整配對交易每日的股價與比例

rollBand <- function(startT, endT, steps= 1, pair1 = GOOG, pair2 = AAPL)  {
        dates <- seq(startT, endT, by = "day") 
        dates <- dates[as.Date(dates) %in% as.Date(index(pair1))]
        dates <- dates[as.Date(dates) %in% as.Date(index(pair2))]
        #print(dates)
        #print(class(dates))
        
        data <- data.frame(matrix(NA, nrow=length(dates), ncol=3))
        
        for(i in 1:length(dates)) {
                #print(start)
                ratios <- getHistoryRatio(dates[i])
                #                print(ratios)
                results <- ML_VasicekParams(ratios, dt = 1/365)
                #                print(results) 
                r0 <- log(ratios[length(ratios)]) 
                #print(paste("r0",r0) )
                
                simuR <- VasicekSimulator(r0, parms = results, steps = 1, paths = 1000)
                
                band <- getBand(simu = simuR)
                
                data[i,] <- band
        }
        pairRatio <- getRatio(startT , endT, pair1, pair2)
        data <- cbind.data.frame(pairRatio, data)
        data <- cbind(data, dates)
        tPair1 <- getData(startT, endT, pair1)
        #data.frame(tPari1[,1])
        #print(tPair1[,1])
        data <- cbind.data.frame(data, tPair1[,1])
        tPair2 <- getData(startT, endT, pair2)
        #print(tPair1)
        data <- cbind.data.frame(data, tPair2[,1])
        #print(dim(data))
        names(data) <- c("ratio", "mid", "upper", "lower", "date", "GOOG","AAPL")
        return (data)
        
}

配對交易在股價比例超過上限現值會出現套利空間,透過模擬估算結果,將繪製可能的進場時機

aa <- rollBand(ymd("2014-08-01"), ymd("2015-11-26"))

tail(aa)
##               ratio      mid    upper    lower       date   GOOG   AAPL
## 2015-11-18 6.309148 6.373780 6.588635 6.158926 2015-11-18 740.00 117.29
## 2015-11-19 6.216619 6.304252 6.519886 6.088618 2015-11-19 738.41 118.78
## 2015-11-20 6.341995 6.206973 6.425263 5.988683 2015-11-20 756.60 119.30
## 2015-11-23 6.420212 6.320626 6.544013 6.097239 2015-11-23 755.98 117.75
## 2015-11-24 6.294415 6.402461 6.631260 6.173661 2015-11-24 748.28 118.88
## 2015-11-25 6.338643 6.275871 6.490417 6.061326 2015-11-25 748.15 118.03
par(mfrow=c(1,1))
plot(aa$date, aa$ratio,type="l",col="black")
lines(aa$date, aa$mid,type="l",col="blue")
lines(aa$date, aa$upper,type="l",col="green")
lines(aa$date, aa$lower,type="l",col="red")

byin1 <- which(aa$ratio > aa$upper)
byin2 <- which(aa$ratio < aa$lower)
which(abs(aa$ratio - aa$mid) < 0.01)
##  [1]   4  12  25  33  62  72  91  94 139 151 164 165 183 191 193 202 205
## [18] 211 215 224 227 228 229 233 277 310 313 325 327 328
points(aa[byin1,]$date, aa[byin1,]$ratio, color="green")
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "color" 不是一個繪圖
## 參數
points(aa[byin2,]$date, aa[byin2,]$ratio, color="red")
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "color" 不是一個繪圖
## 參數

報酬率對交易回測不是那麼重要,勝率比較重要,可以提高交易頻率增加獲利

交易策略的頻率是否重要?看資訊取得頻率

R Markdown Math Equation