Introduction to quantitative trading and investment

Introduction
Key packages
Remember R
- Exercise 1.1
- Exercise 1.2
zoo
- Exercise 1.3
Applying functions over time
- Exercise 1.4
xts (extensible time series)
Appendix

Introduction

This is the introduction to Quantitative Trading and Investment with R.

Even though we are concentrating on investment and trading, most of the ideas that are presented here are relevant to other areas of finance, economics and the analysis of data. We will apply cointegration, GARCH and some other standard statistical tools to solve investment problems. We could use the same techniques to solve marketing or logistic or climate problems. We are using base R but what are doing is easy to translate into the tidyverse or Python or any other language.

The module has the following structure:

Introduction: remember R, key packages and a review of assignments.
Investment Performance: reviewing risk and return. The limits to quantification.
Smart Beta and Factor Investment: risk or institutional/behavioural bias?
Hedge Funds: history and evolution of alternative investment. Key strategies.
Assignment 1
Market Inefficiencies: identification, causes and persistence.
Technical Analysis: the trend is your friend and identifying bubbles.
Backtesting: pitfalls and limitations.
Pairs trading: divergence, relative value and cointegration.
GARCH: volatility and options.
Assignment 2

Key packages

We will be using a number of key packages for this module:

zoo: provides methods for ordered indexed observations. It is particularly aimed at irregular time series.
xts: extends zoo to provide additional time series capabilities.
quantmod: will allow you to specify, build, trade and analyse quantitative financial trading strategies.
Performance Analytics: is a collection of econometric functions for performance and risk analysis.
urca: is used for testing whether time series are stationary and for cointegration.
vars: is used for Vector Autoregression modelling.
ruarch: will fit a variety of garch models

Remember R

Exercise 1.1

Create a vector of the following dates:

1st Jan 2022
5th Jan 2022
10th Jan 2022
15th Jan 2022

Make sure that they are of the Date class. How do you check for that?

Exercise 1.2

Add some imaginary prices to each of the dates and calculate the return or percentage change for each of the prices.

zoo

Zoo is a basic class that works on numbers and an index. The standard structure is:

zoo(x = NULL, order.by = index(x), frequency = NULL, calendar = getOption("zoo.calendar", TRUE))

For example,

require(zoo)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

x.Date <- as.Date("2003-02-01") + c(1,3,7,8,14) 
x <- zoo(x = rnorm(5), order.by = x.Date)
plot(x)

time(x)

## [1] "2003-02-02" "2003-02-04" "2003-02-08" "2003-02-09" "2003-02-15"

str(x)

## 'zoo' series from 2003-02-02 to 2003-02-15
##   Data: num [1:5] 1.413 -1.477 1.237 -0.432 1.358
##   Index:  Date[1:5], format: "2003-02-02" "2003-02-04" "2003-02-08" "2003-02-09" "2003-02-15"

One aspect of zoo that is useful for economics is as.yearmon and as.yearqtr which will store data as monthly or quarterly. For example,

x <- as.yearqtr(2000 + seq(0, 7)/4)
plot(x, rnorm(8), type = 'l', xlab = "Dates")

Exercise 1.3

How does the sequence work here?

Applying functions over time

rollapply will allow you to apply rolling functions. So that the mean or max of the last 5 days could be returned. For example,

x <- as.yearmon(2000 + seq(0, 23)/12)
da <- data.frame("Date" = x, "Return" = rnorm(24))
da$MA3 <- rollapply(da$Return, width = 3, FUN = mean, na.pad = TRUE)
head(da)

##       Date     Return        MA3
## 1 Jan 2000 -0.2948665         NA
## 2 Feb 2000 -0.3767053 -0.1429659
## 3 Mar 2000  0.2426741 -0.4039392
## 4 Apr 2000 -1.0777864 -0.5028438
## 5 May 2000 -0.6734193 -0.8818577
## 6 Jun 2000 -0.8943674 -0.4376195

Exercise 1.4

Check how the moving average is calculated.
How would we change that to something more conventional?

xts (extensible time series)

This is a very useful set of functions that we will use throughout the module. There are three components of an xts object:

coredata: this is a matrix for xts but could be a vector for zoo objects.
index: this can be a date or other chronological class
xtsAttributes: arbitrary attributes

creating xts objects

require(xts)
xts1 <- xts(x = 1:10, order.by = Sys.Date() + 1:10)
head(xts1)

##            [,1]
## 2024-02-03    1
## 2024-02-04    2
## 2024-02-05    3
## 2024-02-06    4
## 2024-02-07    5
## 2024-02-08    6

str(xts1)

## An xts object on 2024-02-03 / 2024-02-12 containing: 
##   Data:    integer [10, 1]
##   Index:   Date [10] (TZ: "UTC")

or download data

da <- read.csv('../../Data/SPYTLT.csv')
dax <- xts(x = da[, -1], order.by = as.Date(da$Date, format = "%Y-%m-%d"))
head(dax)

##                 TLT      SPY
## 2002-07-30 39.85169 60.46203
## 2002-07-31 40.34540 60.60829
## 2002-08-01 40.57519 59.02594
## 2002-08-02 40.99071 57.70287
## 2002-08-05 41.17163 55.69501
## 2002-08-06 40.81963 57.56989

str(dax)

## An xts object on 2002-07-30 / 2023-12-22 containing: 
##   Data:    double [5389, 2]
##   Columns: TLT, SPY
##   Index:   Date [5389] (TZ: "UTC")

Exercise 1.5

Think about where you are going to place the data and how you are going to identify it. The reason we remove the date column when bringing the da object into xts is because xts is a matrix and therefore can only be one class. The dates are characters and this will force the numbers to be characters (which we do not want).

Functions in xts

We can take the end points of time periods with the endpoints function. For example, to get the data for Friday.

ep1 <- endpoints(dax, on = "weeks")
head(dax[ep1])

##                 TLT      SPY
## 2002-08-02 40.99071 57.70287
## 2002-08-09 41.49423 60.69472
## 2002-08-16 41.58222 61.97788
## 2002-08-23 41.99285 62.89536
## 2002-08-30 42.56974 61.02049
## 2002-09-06 42.93941 59.83707

If you leave out weeks the system will default to the end of the month. It is possible to use everything from milliseconds to years. See documentation ?endpoints for full details.

ep2 <- endpoints(dax) 
head(dax[ep2])

##                 TLT      SPY
## 2002-07-31 40.34540 60.60829
## 2002-08-30 42.56974 61.02049
## 2002-09-30 44.38285 54.62237
## 2002-10-31 42.74318 59.11691
## 2002-11-29 42.35160 62.76328
## 2002-12-31 44.26818 59.21279

Missing values for xts

There are a number of ways that xts can deal with NA or missing values.

na.omit() will only use the rows with values
na.locf() will fill missing values with the last value
na.locf(x = x, fromLast = TRUE): fill missing values with the next observation.
na.approx(): interpolate NAs using linear approximation.()

For example.

require(zoo)
set.seed(123)
myseq <- c(rep(NA, 5), rnorm(15))
mydataframe <- data.frame("A" = myseq, "B" = na.locf(myseq, na.rm = FALSE, 
                                                     fromLast = TRUE))
mydataframe

##              A           B
## 1           NA -0.56047565
## 2           NA -0.56047565
## 3           NA -0.56047565
## 4           NA -0.56047565
## 5           NA -0.56047565
## 6  -0.56047565 -0.56047565
## 7  -0.23017749 -0.23017749
## 8   1.55870831  1.55870831
## 9   0.07050839  0.07050839
## 10  0.12928774  0.12928774
## 11  1.71506499  1.71506499
## 12  0.46091621  0.46091621
## 13 -1.26506123 -1.26506123
## 14 -0.68685285 -0.68685285
## 15 -0.44566197 -0.44566197
## 16  1.22408180  1.22408180
## 17  0.35981383  0.35981383
## 18  0.40077145  0.40077145
## 19  0.11068272  0.11068272
## 20 -0.55584113 -0.55584113

z <- c(1, 2, NA, 4, 5, 6)
na.approx(z)

## [1] 1 2 3 4 5 6

z <- c(1, 5, NA, NA, 15, 20)
na.approx(z)

## [1]  1.000000  5.000000  8.333333 11.666667 15.000000 20.000000

Exercise 1.6

What is the calculation for that interpolation?

Appendix

This is a list of some functions and ideas that you will likely come across.

POSIXct and POSIXlt. These represent a class of time and a list of time respectively. The first works like time we have been using already but with a time zone noted tz = UTC is the default for us. Internally, it is stored as seconds since the beginning of 1970. The second version has a list that includes the day of the week so that we can remove weekends if necessary.

An example of the use of POSIXct to reduce a timestamp by two hours

mytime <- as.POSIXct("2024-01-30 10:00:00") # create a time object
newtime <- mytime - (2 * 3600)
mytime

## [1] "2024-01-30 10:00:00 GMT"

newtime

## [1] "2024-01-30 08:00:00 GMT"

Key quant players to keep an eye on
Rob Arnott: Research Affiliates
Cliff Asness. AQR