Algorithmic Trading with R

Loading Packages
Setting Up Your Workspace
A Simple Trading Strategy: Trend Following

Loading Packages

In this post, I will show how to use R to collect the stocks listed on loyal3, get historical data from Yahoo and then perform a simple algorithmic trading strategy. Along the way, you will learn some web scraping, a function hitting a finance API and an htmlwidget to make an interactive time series chart.

Setting Up Your Workspace

The TTR package is used to construct “Technical Trading Rules”. The TTR package can perform more sophisticated calculations and is worth exploring. The dygraphs library is a wrapper for a fast, open source JavaScript charting library. It is one of the htmlwidgets that makes R charting more dynamic and part of an html file instead of a static image. Lastly, the lubridate package is used for easy date manipulation

library(rvest)
library(pbapply)
library(TTR)
library(dygraphs)
library(lubridate)
library(tidyquant)
library(timetk)
pacman::p_load(dygraphs,DT)

Data Collection

Before you can look up individual daily stock prices to build your trading algorithm, you need to collect all available stocker tickers.The list of stock tickers or symbols can be scrapped from here. The first thing to do is declare stock.list as a URL string. Next use read_html() so your R session will create an Internet session and collect all the html information on the page as an XML node set. The page CSS has an ID called “.company-name”. Use this as a parameter when calling html_nodes() to select only the XML data associated to this node. Lastly, use html_text() so the actual text values for the company names is collected.

website <- read_html("https://www.marketwatch.com/tools/industry/stocklist.asp?bcind_ind=9535&bcind_period=3mo")

table <- html_table(html_nodes(website, "table")[[4]], fill = TRUE)

stocks.symbols<-table$X2
stocks.names<-table$X3

table1<-table[-1,-1]


colnames(table1)<-table[1,-1]



DT::datatable(table1)

Fig. 30

stock.list<-"https://www.marketwatch.com/tools/industry/stocklist.asp?bcind_ind=9535&bcind_period=3mo"
stocks<-read_html(stock.list)
stocks.names<-html_nodes(stocks,".lk01")
stocks.names<-html_text(stocks.names)

We delete the rows with no entries to reduce the number of rows in the table. First we replace the empty spaces with NA and the remove the rows which contain these NA’s with complete.cases function.

table1[table1==""] <- NA

table1<-table1[complete.cases(table1$Symbol),]


# to keep columns with no NA:

#table1 <- table1[, colSums(complete.cases(table1)) == 0]

#table1 %>%filter(complete.cases(.)) 



DT::datatable(table1)

Fig. 30

start.date<-Sys.Date()
end.date<-Sys.Date()-years(3)

start.date<-gsub('-','', start.date)
end.date<-gsub('-','', end.date)

start.date

## [1] "20180225"

end.date

## [1] "20150225"

# The symbols vector holds our tickers. 
symbols <- c("SPY","EFA", "IJS", "EEM","AGG")

# The prices object will hold our raw price data 
prices <- 
  getSymbols(symbols, src = 'yahoo', from = "2005-01-01", 
             auto.assign = TRUE, warnings = FALSE) %>% 
  map(~Ad(get(.))) %>%   #Extract (transformed) data from a suitable OHLC object. getSymbols('IBM',src='yahoo') Ad(IBM)
  reduce(merge) %>%   #reduce() combines from the left, reduce_right() combines from the right
  `colnames<-`(symbols)


head(prices)

##                 SPY      EFA      IJS      EEM      AGG
## 2005-01-03 92.46423 36.88009 49.51314 17.48657 66.25936
## 2005-01-04 91.33437 36.17309 48.62529 16.94819 66.19465
## 2005-01-05 90.70410 36.14990 47.72091 16.74072 66.16875
## 2005-01-06 91.16528 36.14990 48.00587 16.72934 66.21410
## 2005-01-07 91.03462 35.98765 47.38643 16.76173 66.19465
## 2005-01-10 91.46503 36.14990 47.79525 16.78273 66.16232

library(quantmod)

tickers <- c("AAPL", "MSFT","GOOGL","IBM","FB")
getSymbols(tickers)

## [1] "AAPL"  "MSFT"  "GOOGL" "IBM"   "FB"

closePrices <- do.call(merge, lapply(tickers, function(x) Cl(get(x))))

ParallelMap package

We take advantage of paralleMap package to reduce the time the stock data is read into r.

library(parallelMap)
parallelStartSocket(2) 
parallelStartMulticore(cpus=6)# start in socket mode and create 2 processes on localhost
f = function(x) Cl(get(x))     # define our job
y = parallelMap(f, tickers) # like R's Map but in parallel


mapdata<-do.call(cbind,y)




parallelStop()



mapdata%>%head()

##            AAPL.Close MSFT.Close GOOGL.Close IBM.Close FB.Close
## 2007-01-03   11.97143      29.86    234.0290     97.27       NA
## 2007-01-04   12.23714      29.81    241.8719     98.31       NA
## 2007-01-05   12.15000      29.64    243.8388     97.42       NA
## 2007-01-08   12.21000      29.93    242.0320     98.90       NA
## 2007-01-09   13.22429      29.96    242.9930    100.07       NA
## 2007-01-10   13.85714      29.66    244.9750     98.89       NA

Bioconductor

library(BiocParallel)



f = function(x) Ad(get(x))

options(MulticoreParam=quote(MulticoreParam(workers=4)))

param <- SnowParam(workers = 2, type = "SOCK")

vec=c(tickers[1],tickers[2],tickers[3],tickers[4])

#vec=c(paste0(quote(tickers),"[",1:length(tickers),"]",collapse=","))



multicoreParam <- MulticoreParam(workers = 7)




bio=bplapply(tickers, f, BPPARAM = multicoreParam)

biodata<-do.call(cbind, bio)



biodata%>%head()

##            AAPL.Adjusted MSFT.Adjusted GOOGL.Adjusted IBM.Adjusted
## 2007-01-03      8.104137      22.85825       234.0290     74.69364
## 2007-01-04      8.284014      22.81997       241.8719     75.49223
## 2007-01-05      8.225021      22.68984       243.8388     74.80882
## 2007-01-08      8.265641      22.91184       242.0320     75.94528
## 2007-01-09      8.952269      22.93480       242.9930     76.84376
## 2007-01-10      9.380682      22.70515       244.9750     75.93764
##            FB.Adjusted
## 2007-01-03          NA
## 2007-01-04          NA
## 2007-01-05          NA
## 2007-01-08          NA
## 2007-01-09          NA
## 2007-01-10          NA

AdjustedPrices<-biodata

dateWindow <- c("2016-01-01", "2017-09-01")

dygraph(AdjustedPrices, main = "Value", group = "stock") %>%
  dyRebase(value = 100) %>%
  dyRangeSelector(dateWindow = dateWindow)

Fig. 30

end<-Sys.Date()
start<-Sys.Date()-years(3)


prices  <- tq_get(symbols , get = "stock.prices", from = start,to=end)
prices%>%head()

## # A tibble: 6 x 8
##   symbol date        open  high   low close    volume adjusted
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 SPY    2015-02-25   212   212   211   212  73061700      199
## 2 SPY    2015-02-26   212   212   211   211  72697900      199
## 3 SPY    2015-02-27   211   212   211   211 108076000      198
## 4 SPY    2015-03-02   211   212   211   212  87491400      199
## 5 SPY    2015-03-03   211   212   210   211 110325800      199
## 6 SPY    2015-03-04   210   210   209   210 105665800      198

pacman::p_load(dygraph)

library(parallel)

# Calculate the number of cores
no_cores <- detectCores() - 1

# Initiate cluster
cl <- makeCluster(no_cores)

f<-function(x) Ad(get(x))

AdjustedPrices <- do.call(merge, bplapply(vec, f, BPPARAM = multicoreParam))

AdjustedPrices%>%head()

##            AAPL.Adjusted MSFT.Adjusted GOOGL.Adjusted IBM.Adjusted
## 2007-01-03      8.104137      22.85825       234.0290     74.69364
## 2007-01-04      8.284014      22.81997       241.8719     75.49223
## 2007-01-05      8.225021      22.68984       243.8388     74.80882
## 2007-01-08      8.265641      22.91184       242.0320     75.94528
## 2007-01-09      8.952269      22.93480       242.9930     76.84376
## 2007-01-10      9.380682      22.70515       244.9750     75.93764

dateWindow <- c("2016-01-01", "2017-09-01")

dygraph(AdjustedPrices, main = "Value", group = "stock") %>%
  dyRebase(value = 100) %>%
  dyRangeSelector(dateWindow = dateWindow)

Fig. 30

library(plotly)
library(quantmod)

getSymbols("AAPL",src='yahoo')

## [1] "AAPL"

# basic example of ohlc charts
df <- data.frame(Date=index(AAPL),coredata(AAPL))
df <- tail(df, 30)

# cutom colors
i <- list(line = list(color = '#FFD700'))
d <- list(line = list(color = '#0000ff'))

p <- df %>%
  plot_ly(x = ~Date, type="ohlc",
          open = ~AAPL.Open, close = ~AAPL.Close,
          high = ~AAPL.High, low = ~AAPL.Low,
          increasing = i, decreasing = d)

p

Fig. 30

library(plotly)

biodatadf<-tk_tbl(biodata, timetk_idx = TRUE)%>%rename(Date=index)

data<-biodatadf%>%filter(Date>"2014-01-11")


p <- plot_ly(data, x = ~Date, y = ~AAPL.Adjusted, name = 'AAPL', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~MSFT.Adjusted, name = 'MSFT', mode = 'lines')%>%
   add_trace(y = ~IBM.Adjusted, name = 'IBM', mode = 'lines')%>%
  add_trace(y = ~GOOGL.Adjusted, name = 'GOOGL', mode = 'lines')%>%
  layout(title = "Visualizing Adjusted Stock Prices",
         xaxis = list(title = "Time"),
         yaxis = list (title = "Adjusted Prices"))
 p

Fig. 30

A Simple Trading Strategy: Trend Following

In the code below, you will visualize a simple momentum trading strategy. Basically, you would want to calculate the 200 day and 50 day moving averages for a stock price.On any given day that the 50 day moving average is above the 200 day moving average, you would buy or hold your position. On days where the 200 day average is more than the 50 day moving average, you would sell your shares. This strategy is called a trend following strategy. The positive or negative nature between the two temporal based averages represents the stock’s momentum. The TTR package provides SMA() for calculating simple moving average.

tail(SMA(AdjustedPrices$AAPL.Adjusted, 200))

##                 SMA
## 2018-02-15 159.0440
## 2018-02-16 159.1823
## 2018-02-20 159.3203
## 2018-02-21 159.4425
## 2018-02-22 159.5518
## 2018-02-23 159.6714

tail(SMA(AdjustedPrices$AAPL.Adjusted, 50))

##                 SMA
## 2018-02-15 170.1216
## 2018-02-16 170.1911
## 2018-02-20 170.2617
## 2018-02-21 170.3104
## 2018-02-22 170.3868
## 2018-02-23 170.4574

data.frame(sma200=SMA(AdjustedPrices$AAPL.Adjusted, 200),sma50=SMA(AdjustedPrices$AAPL.Adjusted, 50))%>%head()

##            SMA SMA.1
## 2007-01-03  NA    NA
## 2007-01-04  NA    NA
## 2007-01-05  NA    NA
## 2007-01-08  NA    NA
## 2007-01-09  NA    NA
## 2007-01-10  NA    NA

sdata<-biodatadf%>%select(-Date)

#sdata<-tk_xts(data,date_var=Date)

df_50=as.data.frame.matrix(apply(sdata, 2, SMA,50))

colnames(df_50)=paste0(colnames(df_50),"_sma50")


df_200=as.data.frame.matrix(apply(sdata, 2, SMA,200))

colnames(df_200)=paste0(colnames(df_200),"_sma200")

df_all<-cbind.data.frame(Date=biodatadf$Date,  df_200,df_50)%>%drop_na()

# sma 50
f50<- function(x) SMA(x,50)

# sma 50
f200<- function(x) SMA(x,200)

#library(pryr)
#data %>% plyr::colwise() %>% f50

df_all<-tk_xts(df_all,date_var=Date)

df_all%>%head()

##            AAPL.Adjusted_sma200 MSFT.Adjusted_sma200 GOOGL.Adjusted_sma200
## 2013-03-07             57.49335             24.96830              341.9314
## 2013-03-08             57.46856             24.96565              342.5098
## 2013-03-11             57.43212             24.96036              343.0621
## 2013-03-12             57.39270             24.95521              343.6297
## 2013-03-13             57.34667             24.95290              344.1699
## 2013-03-14             57.30539             24.95172              344.7151
##            IBM.Adjusted_sma200 FB.Adjusted_sma200 AAPL.Adjusted_sma50
## 2013-03-07            167.0124           25.67010            49.92382
## 2013-03-08            167.0851           25.61875            49.77925
## 2013-03-11            167.1485           25.58930            49.66265
## 2013-03-12            167.2180           25.57345            49.52154
## 2013-03-13            167.2968           25.54885            49.39153
## 2013-03-14            167.3918           25.51890            49.22393
##            MSFT.Adjusted_sma50 GOOGL.Adjusted_sma50 IBM.Adjusted_sma50
## 2013-03-07            23.98699             380.5125           169.7385
## 2013-03-08            24.00741             381.7339           170.0599
## 2013-03-11            24.02904             382.9947           170.3837
## 2013-03-12            24.04962             384.2091           170.7027
## 2013-03-13            24.07753             385.4634           171.0965
## 2013-03-14            24.10652             386.6061           171.5250
##            FB.Adjusted_sma50
## 2013-03-07           28.8342
## 2013-03-08           28.8548
## 2013-03-11           28.8874
## 2013-03-12           28.9230
## 2013-03-13           28.9464
## 2013-03-14           28.9548

dim(df_all)

## [1] 1252   10

The custom function mov.avgs() accepts a single stock data frame to calculate the moving averages. The first line selects the closing prices because it indexes [,4] to create stock.close. Next, the function uses ifelse to check the number of rows in the data frame. Specifically if the nrow in the data frame is less than (2260), then the function will create a data frame of moving averages with “NA”. I chose this number because there is about 250 trading days a year so this will check that the time series is about 2 years or more in length. Loyal3 sometimes can get access to IPOs and if the stock is newly public there will not be enough data for a 200 day moving average. However, if the nrow value is greater than 2260 then the function will create a data frame with the original data along with 200 and 50 day moving averages as new columns. Using colnames, I declare the column names. The last part of the function uses complete.cases to check for values in the 200 day moving average column. Any rows that do not have a value are dropped in the final result.

mov.avgs<-function(df){

  ifelse((nrow(df)<(2*260)),
         x<-data.frame(df, 'NA', 'NA'),
         x<-data.frame( SMA(df, 200), SMA(df, 50)))
  colnames(x)<-c( 'sma_200','sma_50')
  x<-x[complete.cases(x$sma_200),]
  return(x)
}

dplyr::pull(sdata, AAPL.Adjusted)%>%head()

## [1] 8.104137 8.284014 8.225021 8.265641 8.952269 9.380682

This object is passed to dySeries() in the next 2 lines. You can refer to a column by name so dySeries() each plot a line for the “sma_50” and “sma_200” values in lines 2 and 3. This object is forwarded again to the dyRangeSelector() to adjust the selector’s height. Lastly, I added some shading to define periods when you would have wanted to buy or hold the equity and a period when you should have sold your shares or stayed away depending on your position.

var=names(df_all)[str_detect(names(df_all), "AAPL")]
df_all[,var]%>%head()

##            AAPL.Adjusted_sma200 AAPL.Adjusted_sma50
## 2013-03-07             57.49335            49.92382
## 2013-03-08             57.46856            49.77925
## 2013-03-11             57.43212            49.66265
## 2013-03-12             57.39270            49.52154
## 2013-03-13             57.34667            49.39153
## 2013-03-14             57.30539            49.22393

#works with dataframe
#select(df_all, contains("AAPL"))

# equivalent to above
#vars=names(df_all)[grepl('AAPL', names(df_all))]

#df_all[,vars]%>%head()



dateWindow=c("2014-01-01","2018-02-01")

dygraph(df_all[,var],main = 'Apple Moving Averages') %>%
  dySeries('AAPL.Adjusted_sma50', label = 'sma 50') %>%
  dySeries('AAPL.Adjusted_sma200', label = 'sma 200') %>%
  dyRangeSelector(height = 30) %>%
  dyShading(from = '2016-01-01', to = '2016-9-01', color = '#CCEBD6') %>%
  dyShading(from = '2016-9-01', to = '2017-01-01', color = '#FFE6E6')%>%
dyRangeSelector(dateWindow = dateWindow)

Fig. 30

The Apple moving averages with shaded regions for buying/holding versus selling

Algorithmic Trading with R

Nana Boateng

February 25, 2018

Loading Packages

Setting Up Your Workspace

Data Collection

ParallelMap package

Bioconductor

A Simple Trading Strategy: Trend Following