This is the introduction to Quantitative Trading and Investment with R.
Even though we are concentrating on investment and trading, most of the ideas that are presented here are relevant to other areas of finance, economics and the analysis of data. We will apply cointegration, GARCH and some other standard statistical tools to solve investment problems. We could use the same techniques to solve marketing or logistic or climate problems. We are using base R but what are doing is easy to translate into the tidyverse or Python or any other language.
The module has the following structure:
We will be using a number of key packages for this module:
zoo: provides methods for ordered indexed observations. It is particularly aimed at irregular time series.
xts: extends zoo to provide additional time series capabilities.
quantmod: will allow you to specify, build, trade and analyse quantitative financial trading strategies.
Performance Analytics: is a collection of econometric functions for performance and risk analysis.
urca: is used for testing whether time series are stationary and for cointegration.
vars: is used for Vector Autoregression modelling.
ruarch: will fit a variety of garch models
Create a vector of the following dates:
Make sure that they are of the Date class. How do you
check for that?
Add some imaginary prices to each of the dates and calculate the return or percentage change for each of the prices.
Zoo is a basic class that works on numbers and an index. The standard structure is:
zoo(x = NULL, order.by = index(x), frequency = NULL, calendar = getOption("zoo.calendar", TRUE))
For example,
require(zoo)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
x.Date <- as.Date("2003-02-01") + c(1,3,7,8,14)
x <- zoo(x = rnorm(5), order.by = x.Date)
plot(x)
time(x)
## [1] "2003-02-02" "2003-02-04" "2003-02-08" "2003-02-09" "2003-02-15"
str(x)
## 'zoo' series from 2003-02-02 to 2003-02-15
## Data: num [1:5] 1.413 -1.477 1.237 -0.432 1.358
## Index: Date[1:5], format: "2003-02-02" "2003-02-04" "2003-02-08" "2003-02-09" "2003-02-15"
One aspect of zoo that is useful for economics is
as.yearmon and as.yearqtr which will store
data as monthly or quarterly. For example,
x <- as.yearqtr(2000 + seq(0, 7)/4)
plot(x, rnorm(8), type = 'l', xlab = "Dates")
rollapply will allow you to apply rolling functions. So
that the mean or max of the last 5 days could be returned. For
example,
x <- as.yearmon(2000 + seq(0, 23)/12)
da <- data.frame("Date" = x, "Return" = rnorm(24))
da$MA3 <- rollapply(da$Return, width = 3, FUN = mean, na.pad = TRUE)
head(da)
## Date Return MA3
## 1 Jan 2000 -0.2948665 NA
## 2 Feb 2000 -0.3767053 -0.1429659
## 3 Mar 2000 0.2426741 -0.4039392
## 4 Apr 2000 -1.0777864 -0.5028438
## 5 May 2000 -0.6734193 -0.8818577
## 6 Jun 2000 -0.8943674 -0.4376195
Check how the moving average is calculated.
How would we change that to something more conventional?
This is a very useful set of functions that we will use throughout
the module. There are three components of an xts
object:
coredata: this is a matrix for xts but could be
a vector for zoo objects.
index: this can be a date or other chronological class
xtsAttributes: arbitrary attributes
require(xts)
xts1 <- xts(x = 1:10, order.by = Sys.Date() + 1:10)
head(xts1)
## [,1]
## 2024-02-03 1
## 2024-02-04 2
## 2024-02-05 3
## 2024-02-06 4
## 2024-02-07 5
## 2024-02-08 6
str(xts1)
## An xts object on 2024-02-03 / 2024-02-12 containing:
## Data: integer [10, 1]
## Index: Date [10] (TZ: "UTC")
or download data
da <- read.csv('../../Data/SPYTLT.csv')
dax <- xts(x = da[, -1], order.by = as.Date(da$Date, format = "%Y-%m-%d"))
head(dax)
## TLT SPY
## 2002-07-30 39.85169 60.46203
## 2002-07-31 40.34540 60.60829
## 2002-08-01 40.57519 59.02594
## 2002-08-02 40.99071 57.70287
## 2002-08-05 41.17163 55.69501
## 2002-08-06 40.81963 57.56989
str(dax)
## An xts object on 2002-07-30 / 2023-12-22 containing:
## Data: double [5389, 2]
## Columns: TLT, SPY
## Index: Date [5389] (TZ: "UTC")
da object into xts is because
xts is a matrix and therefore can only be one class. The
dates are characters and this will force the numbers to be characters
(which we do not want).We can take the end points of time periods with the
endpoints function. For example, to get the data for
Friday.
ep1 <- endpoints(dax, on = "weeks")
head(dax[ep1])
## TLT SPY
## 2002-08-02 40.99071 57.70287
## 2002-08-09 41.49423 60.69472
## 2002-08-16 41.58222 61.97788
## 2002-08-23 41.99285 62.89536
## 2002-08-30 42.56974 61.02049
## 2002-09-06 42.93941 59.83707
If you leave out weeks the system will default to the
end of the month. It is possible to use everything from
milliseconds to years. See documentation
?endpoints for full details.
ep2 <- endpoints(dax)
head(dax[ep2])
## TLT SPY
## 2002-07-31 40.34540 60.60829
## 2002-08-30 42.56974 61.02049
## 2002-09-30 44.38285 54.62237
## 2002-10-31 42.74318 59.11691
## 2002-11-29 42.35160 62.76328
## 2002-12-31 44.26818 59.21279
There are a number of ways that xts can deal with
NA or missing values.
na.omit() will only use the rows with values
na.locf() will fill missing values with the last value
na.locf(x = x, fromLast = TRUE): fill missing values with the next observation.
na.approx(): interpolate NAs using linear approximation.()
For example.
require(zoo)
set.seed(123)
myseq <- c(rep(NA, 5), rnorm(15))
mydataframe <- data.frame("A" = myseq, "B" = na.locf(myseq, na.rm = FALSE,
fromLast = TRUE))
mydataframe
## A B
## 1 NA -0.56047565
## 2 NA -0.56047565
## 3 NA -0.56047565
## 4 NA -0.56047565
## 5 NA -0.56047565
## 6 -0.56047565 -0.56047565
## 7 -0.23017749 -0.23017749
## 8 1.55870831 1.55870831
## 9 0.07050839 0.07050839
## 10 0.12928774 0.12928774
## 11 1.71506499 1.71506499
## 12 0.46091621 0.46091621
## 13 -1.26506123 -1.26506123
## 14 -0.68685285 -0.68685285
## 15 -0.44566197 -0.44566197
## 16 1.22408180 1.22408180
## 17 0.35981383 0.35981383
## 18 0.40077145 0.40077145
## 19 0.11068272 0.11068272
## 20 -0.55584113 -0.55584113
or
z <- c(1, 2, NA, 4, 5, 6)
na.approx(z)
## [1] 1 2 3 4 5 6
z <- c(1, 5, NA, NA, 15, 20)
na.approx(z)
## [1] 1.000000 5.000000 8.333333 11.666667 15.000000 20.000000
This is a list of some functions and ideas that you will likely come across.
tz = UTC is the default for us.
Internally, it is stored as seconds since the beginning of 1970. The
second version has a list that includes the day of the week so that we
can remove weekends if necessary.An example of the use of POSIXct to reduce a timestamp by two hours
mytime <- as.POSIXct("2024-01-30 10:00:00") # create a time object
newtime <- mytime - (2 * 3600)
mytime
## [1] "2024-01-30 10:00:00 GMT"
newtime
## [1] "2024-01-30 08:00:00 GMT"
Key quant players to keep an eye on
Rob Arnott: Research Affiliates
Cliff Asness. AQR