Equity Backtesting

Building the Backtest Environment



Project Description

If you are an equity investor, one of the most useful things you can do is perform backtesting of your investment strategy against historical data. This can help provide insight on the success, complexity, and profitability of the strategy. With all strategies, there will be changes made as you identify strong and weak points of the system. Strategy models may deviate over time depending on market conditions, risk acceptance, or desired returns. Running testing strategies can be very time consuming when you need to gather data repeatedly.

To combat that issue, I created a local equities backtest environment that I am able to run my strategies against. By creating the local environment, I can run as many tests with as many variations as I like without having to wait for my data to be retrieved from the financial sites. This allows me to refine strategies to optimize returns with immediate feedback saving many hours of program run time.


This project will build a 10 year backtest environment using data from the NYSE and NASDAQ exchanges for the years 2007-2016.

The main package being used is quantmod, the Quantitative Financial Modelling Framework.

Source: https://cran.r-project.org/web/packages/quantmod/index.html


Libraries Required

library(quantmod) # Quantitative financial strategies
library(dplyr)    # Data manipulation
library(knitr)    # Dynamic report generation

Working directory

setwd("U:/Equity Backtesting")

Prepare the Environment

In order to build the backtest environment, there are a few preparatory steps required:

Create Target Backtest Environment

bt_env_2007_2016 <- new.env()

The backtest environment will be the target for the historical pricing data that will be retrieved. The environment will be saved and then used later by the backtesting models.

Create Symbol Array and Data Range Variables

bt_symbols <- character()

Beg_Date <- as.Date("2007-01-01")
End_Date <- as.Date("2016-12-31")

The date range is January 1, 2007 through December 31, 2016. The symbol array, bt_symbols, will hold all of the good symbols that have their historical pricing data successfully retrieved.


Using the quantmod function stockSymbols, we can retrieve all of the symbols listed from the specified exchanges. I have selected NYSE and NASDAQ. AMEX is another option if you wish to evaluate equities listed there.

Retrieve Symbols from NYSE and NASDAQ

all_symbols <- stockSymbols(c("NYSE", "NASDAQ"))
Symbol Name LastSale MarketCap IPOyear Sector Industry Exchange
AAAP Advanced Accelerator Applications S.A. 38.330 $1.69B 2015 Health Care Major Pharmaceuticals NASDAQ
AAL American Airlines Group, Inc. 47.960 $23.62B NA Transportation Air Freight/Delivery Services NASDAQ
AAME Atlantic American Corporation 3.800 $77.58M NA Finance Life Insurance NASDAQ
AAOI Applied Optoelectronics, Inc. 70.490 $1.34B 2013 Technology Semiconductors NASDAQ
AAON AAON, Inc. 36.150 $1.9B NA Capital Goods Industrial Machinery/Components NASDAQ
AAPC Atlantic Alliance Partnership Corp. 10.300 $37.98M 2015 Consumer Services Services-Misc. Amusement & Recreation NASDAQ
AAPL Apple Inc. 153.670 $801.21B 1980 Technology Computer Manufacturing NASDAQ
AAWW Atlas Air Worldwide Holdings 49.025 $1.24B NA Transportation Transportation Services NASDAQ
AAXJ iShares MSCI All Country Asia ex Japan Index Fund 67.050 $3.38B NA NA NA NASDAQ
AAXN Axon Enterprise, Inc. 24.580 $1.3B NA Capital Goods Ordnance And Accessories NASDAQ
## [1] "Total Symbols Retrieved:  6375"

Included in the symbol output will be index funds and other types of equities other than publicly traded stocks. We will be using the symbol names retrieved in a processing loop to create the backtest environment. For that reason, we want to remove symbols from our list that we are not interested in retrieving data for. In order to remove the index funds and other non-stock symbols, I filter by complete.cases. That will remove any symbol that does not have a Sector or Industry associated with them, as can be seen by the symbol AAJX above.

For the purposes of creating the backtest environment, the variables LastSale, MarketCap, and IPOyear are not needed. If your backtest strategy focused on market cap size you might want to keep that column.

Part of an investment strategy I have is to look at stocks that are undervalued within their Sector or Industry, so I want to keep those columns as part of my final symbol table.

Filter Symbols for All Valid Sectors and Industries

all_symbols <- all_symbols %>%
               select(Symbol, Sector, Industry, Name, Exchange) %>%
               filter(complete.cases(.))
Symbol Sector Industry Name Exchange
AAAP Health Care Major Pharmaceuticals Advanced Accelerator Applications S.A. NASDAQ
AAL Transportation Air Freight/Delivery Services American Airlines Group, Inc. NASDAQ
AAME Finance Life Insurance Atlantic American Corporation NASDAQ
AAOI Technology Semiconductors Applied Optoelectronics, Inc. NASDAQ
AAON Capital Goods Industrial Machinery/Components AAON, Inc. NASDAQ
AAPC Consumer Services Services-Misc. Amusement & Recreation Atlantic Alliance Partnership Corp. NASDAQ
AAPL Technology Computer Manufacturing Apple Inc. NASDAQ
AAWW Transportation Transportation Services Atlas Air Worldwide Holdings NASDAQ
AAXN Capital Goods Ordnance And Accessories Axon Enterprise, Inc. NASDAQ
ABAC Consumer Non-Durables Farming/Seeds/Milling Aoxin Tianli Group, Inc. NASDAQ
## [1] "Total Symbols Remaining:  4965"

Now that we have a partially cleaned set of symbols, we can store the symbol names that we will be using in our data retrieval. We create a temporary symbol array for use in our data gathering loop which will use the quantmod function getSymbols to retrieve the historical data.

Create Working Symbol List

symbol_list <- all_symbols$Symbol

Load the Backtest Environment

The function getSymbols will be used to retrieve the historical data for the stock symbols in symbol_list.

Within the function call we will specify the following:

By specifying the backtest environment, the historical data is stored in its own unique environment, as opposed to the global environment. The backtest environment can then be saved locally and used during the backtest strategy analysis. The goal is to perform this build once and perform all testing and analysis on a locally saved environment.

I mentioned earlier that we now have a partially cleaned symbol table. When retrieving the historical data, not all symbols will have a successful lookup. The reason for this is that the remaining symbol list will contain preferred and other stock classes. When calling these symbols using getSymbols, an error will be encountered (HTTP status was ‘400 Bad Request’). For that reason, I employ the R function try. This function allows me to attempt the function call or test an expression and see the results of the attempt. A successfull function call with getSymbols will return an object with the name of the symbol used in the call.

By performing the function call using try and placing the results in get_try, I can make sure the call is successful and retrieves the data requested. Within the processing loop I check the value of get_try against the symbol name. If they are equal, I can consider the call successful and the data has been loaded to the backtest environment. This process allows me to create a single processing loop to load all of the good symbols without having to restart processing when an error is encountered.

There is another type of error that can occur. That error is a timeout when retrieving data. The reason for this is that the online sources where the data is retrieved (Yahoo, Google) will restrict or limit the number of calls allowed within a certain timeframe. In order to get around this issue I place a sleep delay of .5 seconds within the loop. Since I’m not worried about the speed of the processing, the delay is not an issue. I’m more concerned with completing the processing without errors or requiring a restart of the loop. The total time to complete the backtest environment will be around 2 hours.

When a successful call is made, I update the bt_symbols array with the processed symbol. I will use this array to filter all_symbols at the end to have my final cleaned environmental symbol table.

The processing loop requires two indexes.


For this example, the source for the data retrieval will be Yahoo. It is important to note that the code below will only work with quantmod version 0.4-9.

Load Summary Environment

j <- 1

for(i in 1:length(symbol_list)) {
  
  get_try <- try(getSymbols(symbol_list[i], env = bt_env_2007_2016, auto.assign = TRUE,
                            src = "yahoo", from = Beg_Date, to = End_Date))
  
  if(get_try == symbol_list[i]) {
         
         bt_symbols[j] <- symbol_list[i]
         
         j <- j + 1
         
  } 
  
  Sys.sleep(.5)
  
}

The contents of the environment can be checked to validate that the historical data has been properly loaded. The following example will look for symbols loaded that begin with “AA” versus listing all of the loaded symbols. If you want to check for all of the expected symbols, just remove the pattern argument from the command.

Verify Environment Contents

ls(bt_env_2007_2016, pattern = "^AA")
##  [1] "AA"   "AAAP" "AAC"  "AAL"  "AAME" "AAN"  "AAOI" "AAON" "AAP"  "AAPC"
## [11] "AAPL" "AAT"  "AAV"  "AAWW"

Once verified, the environment can be saved as an .Rdata file. This is the environment file that will be loaded during the backtesting projects.

Save Summary Environment

save(bt_env_2007_2016, file = "Environments/bt_env_2007_2016.Rdata")

The final step is to create a subset of all_symbols that reflects the stocks that are in the backtest environment.

Create Backtest Symbol Table

bt_env_symbols <- all_symbols %>%
                  filter(Symbol %in% bt_symbols)
Symbol Sector Industry Name Exchange
AAAP Health Care Major Pharmaceuticals Advanced Accelerator Applications S.A. NASDAQ
AAL Transportation Air Freight/Delivery Services American Airlines Group, Inc. NASDAQ
AAME Finance Life Insurance Atlantic American Corporation NASDAQ
AAOI Technology Semiconductors Applied Optoelectronics, Inc. NASDAQ
AAON Capital Goods Industrial Machinery/Components AAON, Inc. NASDAQ
AAPC Consumer Services Services-Misc. Amusement & Recreation Atlantic Alliance Partnership Corp. NASDAQ
AAPL Technology Computer Manufacturing Apple Inc. NASDAQ
AAWW Transportation Transportation Services Atlas Air Worldwide Holdings NASDAQ
ABAC Consumer Non-Durables Farming/Seeds/Milling Aoxin Tianli Group, Inc. NASDAQ
ABAX Capital Goods Industrial Machinery/Components ABAXIS, Inc. NASDAQ
## [1] "Total Backtest Symbols:  4667"

Save Backtest Symbol Table

saveRDS(bt_env_symbols, "Symbols/bt_env_symbols.rds")

Summary

We now have a complete local environment that can be loaded in subsequent backtesting projects without requiring time consuming external calls to Yahoo or Google. This allows for timely changes to models and effective testing conditions. By combining the data from the symbols table, you can test models and compare/contrast model results against specific sectors and industries.

In the next project, I will demonstrate how to use the backtest environment just created to test the Relative Strength Index (RSI) as a possible predictor of returns.




sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.16     dplyr_0.5.0    quantmod_0.4-9 TTR_0.23-1    
## [5] xts_0.9-7      zoo_1.8-0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.11     magrittr_1.5     lattice_0.20-35  R6_2.2.1        
##  [5] rlang_0.1.1      stringr_1.2.0    highr_0.6        tools_3.4.0     
##  [9] grid_3.4.0       DBI_0.6-1        htmltools_0.3.6  yaml_2.1.14     
## [13] lazyeval_0.2.0   rprojroot_1.2    digest_0.6.12    assertthat_0.2.0
## [17] tibble_1.3.3     evaluate_0.10    rmarkdown_1.5    stringi_1.1.5   
## [21] compiler_3.4.0   backports_1.1.0