Introduction to quantitative trading and investment

Introduction
Key packages
Remember R
- Exercise 1.1
- Exercise 1.2
zoo
- Exercise 1.3
Applying functions over time
- Exercise 1.4
xts (extensible time series)
Appendix
- Key quant players

Introduction

Though we are concentrating on investment and trading, most of the tools and techniques that are presented here are relevant to other areas of finance, economics and the analysis of data. We will focus on time series with cointegration, GARCH. We will also investigate some machine learning techniques such as supervised and unsupervised learning. We will utilise base R but what are doing is easy to translate into the tidyverse or Python or any other programming language. It is most important to understand the basic principles.

The module has the following structure:

Introduction: remember R, key packages and a review of assignments.
Investment Performance: reviewing risk and return. The limits to quantification.
Smart Beta and Factor Investment: risk or institutional/behavioural bias?
Hedge Funds: history and evolution of alternative investment. Key strategies. Machine learning and AI.
Assignment 1
Market Inefficiencies: identification, causes and persistence.
Technical Analysis: the trend is your friend and finding bubbles.
Backtesting: pitfalls and limitations.
Pairs trading: divergence, relative value and cointegration.
GARCH: volatility and options.
Assignment 2

Key packages

We will be using a number of key time series and machine learning packages for this module:

zoo: provides methods for ordered indexed observations. It is particularly aimed at irregular time series.
xts: extends zoo to provide additional time series capabilities.
quantmod: will allow you to specify, build, trade and analyse quantitative financial trading strategies.
Performance Analytics: is a collection of econometric functions for performance and risk analysis.
urca: is used for testing whether time series are stationary and for cointegration.
vars: is used for Vector Autoregression modelling.
ruarch: will fit a variety of garch models
class: is used for classification, including k-nearest neighbour and Learning Vector Quantization.

Remember R

Exercise 1.1

Create a vector of the following dates:

1st Jan 2022
5th Jan 2022
10th Jan 2022
15th Jan 2022

Make sure that they are of the Date class. How do you check for that?

Exercise 1.2

Add some imaginary prices to each of the dates and calculate the return or percentage change for each of the prices.

zoo

Zoo is a basic class that works on numbers and an index. The standard structure is:

zoo(x = NULL, order.by = index(x), frequency = NULL, calendar = getOption("zoo.calendar", TRUE))

For example,

require(zoo)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

x.Date <- as.Date("2003-02-01") + c(1,3,7,8,14) 
x <- zoo(x = rnorm(5), order.by = x.Date)
plot(x)

time(x)

## [1] "2003-02-02" "2003-02-04" "2003-02-08" "2003-02-09" "2003-02-15"

str(x)

## 'zoo' series from 2003-02-02 to 2003-02-15
##   Data: num [1:5] -1.9679 -0.667 -0.0703 -1.0724 -0.7531
##   Index:  Date[1:5], format: "2003-02-02" "2003-02-04" "2003-02-08" "2003-02-09" "2003-02-15"

One aspect of zoo that is useful for economics is as.yearmon and as.yearqtr which will store data as monthly or quarterly. For example,

x <- as.yearqtr(2000 + seq(0, 7)/4)
plot(x, rnorm(8), type = 'l', xlab = "Dates")

Exercise 1.3

How does the sequence work here?

Applying functions over time

rollapply will allow you to apply rolling functions. So that the mean or max of the last 5 days could be returned. For example,

x <- as.yearmon(2000 + seq(0, 23)/12)
da <- data.frame("Date" = x, "Return" = rnorm(24))
da$MA3 <- rollapply(da$Return, width = 3, FUN = mean, na.pad = TRUE)
head(da)

##       Date      Return       MA3
## 1 Jan 2000  0.99438170        NA
## 2 Feb 2000  0.08365364 0.3964216
## 3 Mar 2000  0.11122942 0.4624270
## 4 Apr 2000  1.19239807 0.1749535
## 5 May 2000 -0.77876700 0.2365017
## 6 Jun 2000  0.29587406 0.1715383

Exercise 1.4

Check how the moving average is calculated.
How would we change that to something more conventional?

xts (extensible time series)

This is a very useful set of functions that we will use throughout the module. There are three components of an xts object:

coredata: this is a matrix for xts but could be a vector for zoo objects.
index: this can be a date or other chronological class
xtsAttributes: arbitrary attributes or other bits of information that we will mostly not use

creating xts objects

require(xts)
xts1 <- xts(x = 1:10, order.by = Sys.Date() + 1:10)
head(xts1)

##            [,1]
## 2026-01-30    1
## 2026-01-31    2
## 2026-02-01    3
## 2026-02-02    4
## 2026-02-03    5
## 2026-02-04    6

str(xts1)

## An xts object on 2026-01-30 / 2026-02-08 containing: 
##   Data:    integer [10, 1]
##   Index:   Date [10] (TZ: "UTC")

or download data

da <- read.csv('../../Data/SPYTLT.csv')
dax <- xts(x = da[, -1], order.by = as.Date(da$Date, format = "%Y-%m-%d"))
head(dax)

##                 TLT      SPY
## 2002-07-30 39.85169 60.46203
## 2002-07-31 40.34540 60.60829
## 2002-08-01 40.57519 59.02594
## 2002-08-02 40.99071 57.70287
## 2002-08-05 41.17163 55.69501
## 2002-08-06 40.81963 57.56989

str(dax)

## An xts object on 2002-07-30 / 2023-12-22 containing: 
##   Data:    double [5389, 2]
##   Columns: TLT, SPY
##   Index:   Date [5389] (TZ: "UTC")

The reason we remove the date column when bringing the da object into xts is because xts is a matrix and therefore can only be one class. The dates are characters and this will force the numbers to be characters (which we do not want).

Exercise 1.5

Think about where you are going to store the data and how you are going to identify it. Think of a project structure.

Useful functions in xts

We can take the end points of time periods with the endpoints function. For example, to get the data for Friday.

ep1 <- endpoints(dax, on = "weeks")
head(dax[ep1])

##                 TLT      SPY
## 2002-08-02 40.99071 57.70287
## 2002-08-09 41.49423 60.69472
## 2002-08-16 41.58222 61.97788
## 2002-08-23 41.99285 62.89536
## 2002-08-30 42.56974 61.02049
## 2002-09-06 42.93941 59.83707

If you leave out weeks the system will default to the end of the month. It is possible to use everything from milliseconds to years. See documentation ?endpoints for full details.

ep2 <- endpoints(dax) 
head(dax[ep2])

##                 TLT      SPY
## 2002-07-31 40.34540 60.60829
## 2002-08-30 42.56974 61.02049
## 2002-09-30 44.38285 54.62237
## 2002-10-31 42.74318 59.11691
## 2002-11-29 42.35160 62.76328
## 2002-12-31 44.26818 59.21279

Missing values for xts

There are a number of ways that xts can deal with NA or missing values.

na.omit() will only use the rows with values
na.locf() will fill missing values with the last value
na.locf(x = x, fromLast = TRUE): fill missing values with the next observation.
na.approx(): interpolate NAs using linear approximation.()

For example.

require(zoo)
set.seed(123)
myseq <- c(rep(NA, 5), rnorm(15))
mydataframe <- data.frame("A" = myseq, 
                          "B" = na.locf(myseq, 
                                        na.rm = FALSE, 
                                         fromLast = TRUE))
mydataframe

##              A           B
## 1           NA -0.56047565
## 2           NA -0.56047565
## 3           NA -0.56047565
## 4           NA -0.56047565
## 5           NA -0.56047565
## 6  -0.56047565 -0.56047565
## 7  -0.23017749 -0.23017749
## 8   1.55870831  1.55870831
## 9   0.07050839  0.07050839
## 10  0.12928774  0.12928774
## 11  1.71506499  1.71506499
## 12  0.46091621  0.46091621
## 13 -1.26506123 -1.26506123
## 14 -0.68685285 -0.68685285
## 15 -0.44566197 -0.44566197
## 16  1.22408180  1.22408180
## 17  0.35981383  0.35981383
## 18  0.40077145  0.40077145
## 19  0.11068272  0.11068272
## 20 -0.55584113 -0.55584113

z <- c(1, 2, NA, 4, 5, 6)
na.approx(z)

## [1] 1 2 3 4 5 6

z <- c(1, 5, NA, NA, 15, 20)
na.approx(z)

## [1]  1.000000  5.000000  8.333333 11.666667 15.000000 20.000000

Exercise 1.6

What is the calculation for that interpolation?

Appendix

This is a list of some functions and ideas that you will likely come across.

POSIXct and POSIXlt. These represent a class of time and a list of time respectively. The first works like time we have been using already but with a time zone noted tz = UTC is the default for us. Internally, it is stored as seconds since the beginning of 1970. The second version has a list that includes the day of the week so that we can remove weekends if necessary.

An example of the use of POSIXct to reduce a timestamp by two hours

mytime <- as.POSIXct("2024-01-30 10:00:00") # create a time object
newtime <- mytime - (2 * 3600)
mytime

## [1] "2024-01-30 10:00:00 GMT"

newtime

## [1] "2024-01-30 08:00:00 GMT"

Key quant players

These are key quantitative hedge funds with research reports that can be mined for ideas and explanations.

Rob Arnott: Research Affiliates
Cliff Asness. AQR
Jane Street. Jane Street
Man Group. Man Group
AQR Capital Management. AQR