If you are an equity investor, one of the most useful things you can do is perform backtesting of your investment strategy against historical data. This can help provide insight on the success, complexity, and profitability of the strategy. With all strategies, there will be changes made as you identify strong and weak points of the system. Strategy models may deviate over time depending on market conditions, risk acceptance, or desired returns. Running testing strategies can be very time consuming when you need to gather data repeatedly.
To combat that issue, I created a local equities backtest environment that I am able to run my strategies against. By creating the local environment, I can run as many tests with as many variations as I like without having to wait for my data to be retrieved from the financial sites. This allows me to refine strategies to optimize returns with immediate feedback saving many hours of program run time.
This project will build a 10 year backtest environment using data from the NYSE and NASDAQ exchanges for the years 2007-2016.
The main package being used is quantmod, the Quantitative Financial Modelling Framework.
Source: https://cran.r-project.org/web/packages/quantmod/index.html
Libraries Required
library(quantmod) # Quantitative financial strategies
library(dplyr) # Data manipulation
library(knitr) # Dynamic report generation
Working directory
setwd("U:/Equity Backtesting")
In order to build the backtest environment, there are a few preparatory steps required:
Create Target Backtest Environment
bt_env_2007_2016 <- new.env()
The backtest environment will be the target for the historical pricing data that will be retrieved. The environment will be saved and then used later by the backtesting models.
Create Symbol Array and Data Range Variables
bt_symbols <- character()
Beg_Date <- as.Date("2007-01-01")
End_Date <- as.Date("2016-12-31")
The date range is January 1, 2007 through December 31, 2016. The symbol array, bt_symbols, will hold all of the good symbols that have their historical pricing data successfully retrieved.
Using the quantmod function stockSymbols, we can retrieve all of the symbols listed from the specified exchanges. I have selected NYSE and NASDAQ. AMEX is another option if you wish to evaluate equities listed there.
Retrieve Symbols from NYSE and NASDAQ
all_symbols <- stockSymbols(c("NYSE", "NASDAQ"))
| Symbol | Name | LastSale | MarketCap | IPOyear | Sector | Industry | Exchange |
|---|---|---|---|---|---|---|---|
| AAAP | Advanced Accelerator Applications S.A. | 38.330 | $1.69B | 2015 | Health Care | Major Pharmaceuticals | NASDAQ |
| AAL | American Airlines Group, Inc. | 47.960 | $23.62B | NA | Transportation | Air Freight/Delivery Services | NASDAQ |
| AAME | Atlantic American Corporation | 3.800 | $77.58M | NA | Finance | Life Insurance | NASDAQ |
| AAOI | Applied Optoelectronics, Inc. | 70.490 | $1.34B | 2013 | Technology | Semiconductors | NASDAQ |
| AAON | AAON, Inc. | 36.150 | $1.9B | NA | Capital Goods | Industrial Machinery/Components | NASDAQ |
| AAPC | Atlantic Alliance Partnership Corp. | 10.300 | $37.98M | 2015 | Consumer Services | Services-Misc. Amusement & Recreation | NASDAQ |
| AAPL | Apple Inc. | 153.670 | $801.21B | 1980 | Technology | Computer Manufacturing | NASDAQ |
| AAWW | Atlas Air Worldwide Holdings | 49.025 | $1.24B | NA | Transportation | Transportation Services | NASDAQ |
| AAXJ | iShares MSCI All Country Asia ex Japan Index Fund | 67.050 | $3.38B | NA | NA | NA | NASDAQ |
| AAXN | Axon Enterprise, Inc. | 24.580 | $1.3B | NA | Capital Goods | Ordnance And Accessories | NASDAQ |
## [1] "Total Symbols Retrieved: 6375"
Included in the symbol output will be index funds and other types of equities other than publicly traded stocks. We will be using the symbol names retrieved in a processing loop to create the backtest environment. For that reason, we want to remove symbols from our list that we are not interested in retrieving data for. In order to remove the index funds and other non-stock symbols, I filter by complete.cases. That will remove any symbol that does not have a Sector or Industry associated with them, as can be seen by the symbol AAJX above.
For the purposes of creating the backtest environment, the variables LastSale, MarketCap, and IPOyear are not needed. If your backtest strategy focused on market cap size you might want to keep that column.
Part of an investment strategy I have is to look at stocks that are undervalued within their Sector or Industry, so I want to keep those columns as part of my final symbol table.
Filter Symbols for All Valid Sectors and Industries
all_symbols <- all_symbols %>%
select(Symbol, Sector, Industry, Name, Exchange) %>%
filter(complete.cases(.))
| Symbol | Sector | Industry | Name | Exchange |
|---|---|---|---|---|
| AAAP | Health Care | Major Pharmaceuticals | Advanced Accelerator Applications S.A. | NASDAQ |
| AAL | Transportation | Air Freight/Delivery Services | American Airlines Group, Inc. | NASDAQ |
| AAME | Finance | Life Insurance | Atlantic American Corporation | NASDAQ |
| AAOI | Technology | Semiconductors | Applied Optoelectronics, Inc. | NASDAQ |
| AAON | Capital Goods | Industrial Machinery/Components | AAON, Inc. | NASDAQ |
| AAPC | Consumer Services | Services-Misc. Amusement & Recreation | Atlantic Alliance Partnership Corp. | NASDAQ |
| AAPL | Technology | Computer Manufacturing | Apple Inc. | NASDAQ |
| AAWW | Transportation | Transportation Services | Atlas Air Worldwide Holdings | NASDAQ |
| AAXN | Capital Goods | Ordnance And Accessories | Axon Enterprise, Inc. | NASDAQ |
| ABAC | Consumer Non-Durables | Farming/Seeds/Milling | Aoxin Tianli Group, Inc. | NASDAQ |
## [1] "Total Symbols Remaining: 4965"
Now that we have a partially cleaned set of symbols, we can store the symbol names that we will be using in our data retrieval. We create a temporary symbol array for use in our data gathering loop which will use the quantmod function getSymbols to retrieve the historical data.
Create Working Symbol List
symbol_list <- all_symbols$Symbol
The function getSymbols will be used to retrieve the historical data for the stock symbols in symbol_list.
Within the function call we will specify the following:
By specifying the backtest environment, the historical data is stored in its own unique environment, as opposed to the global environment. The backtest environment can then be saved locally and used during the backtest strategy analysis. The goal is to perform this build once and perform all testing and analysis on a locally saved environment.
I mentioned earlier that we now have a partially cleaned symbol table. When retrieving the historical data, not all symbols will have a successful lookup. The reason for this is that the remaining symbol list will contain preferred and other stock classes. When calling these symbols using getSymbols, an error will be encountered (HTTP status was ‘400 Bad Request’). For that reason, I employ the R function try. This function allows me to attempt the function call or test an expression and see the results of the attempt. A successfull function call with getSymbols will return an object with the name of the symbol used in the call.
By performing the function call using try and placing the results in get_try, I can make sure the call is successful and retrieves the data requested. Within the processing loop I check the value of get_try against the symbol name. If they are equal, I can consider the call successful and the data has been loaded to the backtest environment. This process allows me to create a single processing loop to load all of the good symbols without having to restart processing when an error is encountered.
There is another type of error that can occur. That error is a timeout when retrieving data. The reason for this is that the online sources where the data is retrieved (Yahoo, Google) will restrict or limit the number of calls allowed within a certain timeframe. In order to get around this issue I place a sleep delay of .5 seconds within the loop. Since I’m not worried about the speed of the processing, the delay is not an issue. I’m more concerned with completing the processing without errors or requiring a restart of the loop. The total time to complete the backtest environment will be around 2 hours.
When a successful call is made, I update the bt_symbols array with the processed symbol. I will use this array to filter all_symbols at the end to have my final cleaned environmental symbol table.
The processing loop requires two indexes.
i sets the array index for the symbol list used in getSymbolsj sets the array index for the good symbols processed and loaded to bt_symbolsFor this example, the source for the data retrieval will be Yahoo. It is important to note that the code below will only work with quantmod version 0.4-9.
Load Summary Environment
j <- 1
for(i in 1:length(symbol_list)) {
get_try <- try(getSymbols(symbol_list[i], env = bt_env_2007_2016, auto.assign = TRUE,
src = "yahoo", from = Beg_Date, to = End_Date))
if(get_try == symbol_list[i]) {
bt_symbols[j] <- symbol_list[i]
j <- j + 1
}
Sys.sleep(.5)
}
The contents of the environment can be checked to validate that the historical data has been properly loaded. The following example will look for symbols loaded that begin with “AA” versus listing all of the loaded symbols. If you want to check for all of the expected symbols, just remove the pattern argument from the command.
Verify Environment Contents
ls(bt_env_2007_2016, pattern = "^AA")
## [1] "AA" "AAAP" "AAC" "AAL" "AAME" "AAN" "AAOI" "AAON" "AAP" "AAPC"
## [11] "AAPL" "AAT" "AAV" "AAWW"
Once verified, the environment can be saved as an .Rdata file. This is the environment file that will be loaded during the backtesting projects.
Save Summary Environment
save(bt_env_2007_2016, file = "Environments/bt_env_2007_2016.Rdata")
The final step is to create a subset of all_symbols that reflects the stocks that are in the backtest environment.
Create Backtest Symbol Table
bt_env_symbols <- all_symbols %>%
filter(Symbol %in% bt_symbols)
| Symbol | Sector | Industry | Name | Exchange |
|---|---|---|---|---|
| AAAP | Health Care | Major Pharmaceuticals | Advanced Accelerator Applications S.A. | NASDAQ |
| AAL | Transportation | Air Freight/Delivery Services | American Airlines Group, Inc. | NASDAQ |
| AAME | Finance | Life Insurance | Atlantic American Corporation | NASDAQ |
| AAOI | Technology | Semiconductors | Applied Optoelectronics, Inc. | NASDAQ |
| AAON | Capital Goods | Industrial Machinery/Components | AAON, Inc. | NASDAQ |
| AAPC | Consumer Services | Services-Misc. Amusement & Recreation | Atlantic Alliance Partnership Corp. | NASDAQ |
| AAPL | Technology | Computer Manufacturing | Apple Inc. | NASDAQ |
| AAWW | Transportation | Transportation Services | Atlas Air Worldwide Holdings | NASDAQ |
| ABAC | Consumer Non-Durables | Farming/Seeds/Milling | Aoxin Tianli Group, Inc. | NASDAQ |
| ABAX | Capital Goods | Industrial Machinery/Components | ABAXIS, Inc. | NASDAQ |
## [1] "Total Backtest Symbols: 4667"
Save Backtest Symbol Table
saveRDS(bt_env_symbols, "Symbols/bt_env_symbols.rds")
We now have a complete local environment that can be loaded in subsequent backtesting projects without requiring time consuming external calls to Yahoo or Google. This allows for timely changes to models and effective testing conditions. By combining the data from the symbols table, you can test models and compare/contrast model results against specific sectors and industries.
In the next project, I will demonstrate how to use the backtest environment just created to test the Relative Strength Index (RSI) as a possible predictor of returns.
sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 14393)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.16 dplyr_0.5.0 quantmod_0.4-9 TTR_0.23-1
## [5] xts_0.9-7 zoo_1.8-0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.11 magrittr_1.5 lattice_0.20-35 R6_2.2.1
## [5] rlang_0.1.1 stringr_1.2.0 highr_0.6 tools_3.4.0
## [9] grid_3.4.0 DBI_0.6-1 htmltools_0.3.6 yaml_2.1.14
## [13] lazyeval_0.2.0 rprojroot_1.2 digest_0.6.12 assertthat_0.2.0
## [17] tibble_1.3.3 evaluate_0.10 rmarkdown_1.5 stringi_1.1.5
## [21] compiler_3.4.0 backports_1.1.0