In this post, I will show how to use R to collect the stocks listed on loyal3, get historical data from Yahoo and then perform a simple algorithmic trading strategy. Along the way, you will learn some web scraping, a function hitting a finance API and an htmlwidget to make an interactive time series chart.
The TTR package is used to construct “Technical Trading Rules”. The TTR package can perform more sophisticated calculations and is worth exploring. The dygraphs library is a wrapper for a fast, open source JavaScript charting library. It is one of the htmlwidgets that makes R charting more dynamic and part of an html file instead of a static image. Lastly, the lubridate package is used for easy date manipulation
library(rvest)
library(pbapply)
library(TTR)
library(dygraphs)
library(lubridate)
library(tidyquant)
library(timetk)
pacman::p_load(dygraphs,DT)
Before you can look up individual daily stock prices to build your trading algorithm, you need to collect all available stocker tickers.The list of stock tickers or symbols can be scrapped from here. The first thing to do is declare stock.list as a URL string. Next use read_html() so your R session will create an Internet session and collect all the html information on the page as an XML node set. The page CSS has an ID called “.company-name”. Use this as a parameter when calling html_nodes() to select only the XML data associated to this node. Lastly, use html_text() so the actual text values for the company names is collected.
website <- read_html("https://www.marketwatch.com/tools/industry/stocklist.asp?bcind_ind=9535&bcind_period=3mo")
table <- html_table(html_nodes(website, "table")[[4]], fill = TRUE)
stocks.symbols<-table$X2
stocks.names<-table$X3
table1<-table[-1,-1]
colnames(table1)<-table[1,-1]
DT::datatable(table1)
Fig. 30
stock.list<-"https://www.marketwatch.com/tools/industry/stocklist.asp?bcind_ind=9535&bcind_period=3mo"
stocks<-read_html(stock.list)
stocks.names<-html_nodes(stocks,".lk01")
stocks.names<-html_text(stocks.names)
We delete the rows with no entries to reduce the number of rows in the table. First we replace the empty spaces with NA and the remove the rows which contain these NA’s with complete.cases function.
table1[table1==""] <- NA
table1<-table1[complete.cases(table1$Symbol),]
# to keep columns with no NA:
#table1 <- table1[, colSums(complete.cases(table1)) == 0]
#table1 %>%filter(complete.cases(.))
DT::datatable(table1)
Fig. 30
start.date<-Sys.Date()
end.date<-Sys.Date()-years(3)
start.date<-gsub('-','', start.date)
end.date<-gsub('-','', end.date)
start.date
## [1] "20180225"
end.date
## [1] "20150225"
# The symbols vector holds our tickers.
symbols <- c("SPY","EFA", "IJS", "EEM","AGG")
# The prices object will hold our raw price data
prices <-
getSymbols(symbols, src = 'yahoo', from = "2005-01-01",
auto.assign = TRUE, warnings = FALSE) %>%
map(~Ad(get(.))) %>% #Extract (transformed) data from a suitable OHLC object. getSymbols('IBM',src='yahoo') Ad(IBM)
reduce(merge) %>% #reduce() combines from the left, reduce_right() combines from the right
`colnames<-`(symbols)
head(prices)
## SPY EFA IJS EEM AGG
## 2005-01-03 92.46423 36.88009 49.51314 17.48657 66.25936
## 2005-01-04 91.33437 36.17309 48.62529 16.94819 66.19465
## 2005-01-05 90.70410 36.14990 47.72091 16.74072 66.16875
## 2005-01-06 91.16528 36.14990 48.00587 16.72934 66.21410
## 2005-01-07 91.03462 35.98765 47.38643 16.76173 66.19465
## 2005-01-10 91.46503 36.14990 47.79525 16.78273 66.16232
library(quantmod)
tickers <- c("AAPL", "MSFT","GOOGL","IBM","FB")
getSymbols(tickers)
## [1] "AAPL" "MSFT" "GOOGL" "IBM" "FB"
closePrices <- do.call(merge, lapply(tickers, function(x) Cl(get(x))))
We take advantage of paralleMap package to reduce the time the stock data is read into r.
library(parallelMap)
parallelStartSocket(2)
parallelStartMulticore(cpus=6)# start in socket mode and create 2 processes on localhost
f = function(x) Cl(get(x)) # define our job
y = parallelMap(f, tickers) # like R's Map but in parallel
mapdata<-do.call(cbind,y)
parallelStop()
mapdata%>%head()
## AAPL.Close MSFT.Close GOOGL.Close IBM.Close FB.Close
## 2007-01-03 11.97143 29.86 234.0290 97.27 NA
## 2007-01-04 12.23714 29.81 241.8719 98.31 NA
## 2007-01-05 12.15000 29.64 243.8388 97.42 NA
## 2007-01-08 12.21000 29.93 242.0320 98.90 NA
## 2007-01-09 13.22429 29.96 242.9930 100.07 NA
## 2007-01-10 13.85714 29.66 244.9750 98.89 NA
library(BiocParallel)
f = function(x) Ad(get(x))
options(MulticoreParam=quote(MulticoreParam(workers=4)))
param <- SnowParam(workers = 2, type = "SOCK")
vec=c(tickers[1],tickers[2],tickers[3],tickers[4])
#vec=c(paste0(quote(tickers),"[",1:length(tickers),"]",collapse=","))
multicoreParam <- MulticoreParam(workers = 7)
bio=bplapply(tickers, f, BPPARAM = multicoreParam)
biodata<-do.call(cbind, bio)
biodata%>%head()
## AAPL.Adjusted MSFT.Adjusted GOOGL.Adjusted IBM.Adjusted
## 2007-01-03 8.104137 22.85825 234.0290 74.69364
## 2007-01-04 8.284014 22.81997 241.8719 75.49223
## 2007-01-05 8.225021 22.68984 243.8388 74.80882
## 2007-01-08 8.265641 22.91184 242.0320 75.94528
## 2007-01-09 8.952269 22.93480 242.9930 76.84376
## 2007-01-10 9.380682 22.70515 244.9750 75.93764
## FB.Adjusted
## 2007-01-03 NA
## 2007-01-04 NA
## 2007-01-05 NA
## 2007-01-08 NA
## 2007-01-09 NA
## 2007-01-10 NA
AdjustedPrices<-biodata
dateWindow <- c("2016-01-01", "2017-09-01")
dygraph(AdjustedPrices, main = "Value", group = "stock") %>%
dyRebase(value = 100) %>%
dyRangeSelector(dateWindow = dateWindow)
Fig. 30
end<-Sys.Date()
start<-Sys.Date()-years(3)
prices <- tq_get(symbols , get = "stock.prices", from = start,to=end)
prices%>%head()
## # A tibble: 6 x 8
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SPY 2015-02-25 212 212 211 212 73061700 199
## 2 SPY 2015-02-26 212 212 211 211 72697900 199
## 3 SPY 2015-02-27 211 212 211 211 108076000 198
## 4 SPY 2015-03-02 211 212 211 212 87491400 199
## 5 SPY 2015-03-03 211 212 210 211 110325800 199
## 6 SPY 2015-03-04 210 210 209 210 105665800 198
pacman::p_load(dygraph)
library(parallel)
# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)
f<-function(x) Ad(get(x))
AdjustedPrices <- do.call(merge, bplapply(vec, f, BPPARAM = multicoreParam))
AdjustedPrices%>%head()
## AAPL.Adjusted MSFT.Adjusted GOOGL.Adjusted IBM.Adjusted
## 2007-01-03 8.104137 22.85825 234.0290 74.69364
## 2007-01-04 8.284014 22.81997 241.8719 75.49223
## 2007-01-05 8.225021 22.68984 243.8388 74.80882
## 2007-01-08 8.265641 22.91184 242.0320 75.94528
## 2007-01-09 8.952269 22.93480 242.9930 76.84376
## 2007-01-10 9.380682 22.70515 244.9750 75.93764
dateWindow <- c("2016-01-01", "2017-09-01")
dygraph(AdjustedPrices, main = "Value", group = "stock") %>%
dyRebase(value = 100) %>%
dyRangeSelector(dateWindow = dateWindow)
Fig. 30
library(plotly)
library(quantmod)
getSymbols("AAPL",src='yahoo')
## [1] "AAPL"
# basic example of ohlc charts
df <- data.frame(Date=index(AAPL),coredata(AAPL))
df <- tail(df, 30)
# cutom colors
i <- list(line = list(color = '#FFD700'))
d <- list(line = list(color = '#0000ff'))
p <- df %>%
plot_ly(x = ~Date, type="ohlc",
open = ~AAPL.Open, close = ~AAPL.Close,
high = ~AAPL.High, low = ~AAPL.Low,
increasing = i, decreasing = d)
p
Fig. 30
library(plotly)
biodatadf<-tk_tbl(biodata, timetk_idx = TRUE)%>%rename(Date=index)
data<-biodatadf%>%filter(Date>"2014-01-11")
p <- plot_ly(data, x = ~Date, y = ~AAPL.Adjusted, name = 'AAPL', type = 'scatter', mode = 'lines') %>%
add_trace(y = ~MSFT.Adjusted, name = 'MSFT', mode = 'lines')%>%
add_trace(y = ~IBM.Adjusted, name = 'IBM', mode = 'lines')%>%
add_trace(y = ~GOOGL.Adjusted, name = 'GOOGL', mode = 'lines')%>%
layout(title = "Visualizing Adjusted Stock Prices",
xaxis = list(title = "Time"),
yaxis = list (title = "Adjusted Prices"))
p
Fig. 30
In the code below, you will visualize a simple momentum trading strategy. Basically, you would want to calculate the 200 day and 50 day moving averages for a stock price.On any given day that the 50 day moving average is above the 200 day moving average, you would buy or hold your position. On days where the 200 day average is more than the 50 day moving average, you would sell your shares. This strategy is called a trend following strategy. The positive or negative nature between the two temporal based averages represents the stock’s momentum. The TTR package provides SMA() for calculating simple moving average.
tail(SMA(AdjustedPrices$AAPL.Adjusted, 200))
## SMA
## 2018-02-15 159.0440
## 2018-02-16 159.1823
## 2018-02-20 159.3203
## 2018-02-21 159.4425
## 2018-02-22 159.5518
## 2018-02-23 159.6714
tail(SMA(AdjustedPrices$AAPL.Adjusted, 50))
## SMA
## 2018-02-15 170.1216
## 2018-02-16 170.1911
## 2018-02-20 170.2617
## 2018-02-21 170.3104
## 2018-02-22 170.3868
## 2018-02-23 170.4574
data.frame(sma200=SMA(AdjustedPrices$AAPL.Adjusted, 200),sma50=SMA(AdjustedPrices$AAPL.Adjusted, 50))%>%head()
## SMA SMA.1
## 2007-01-03 NA NA
## 2007-01-04 NA NA
## 2007-01-05 NA NA
## 2007-01-08 NA NA
## 2007-01-09 NA NA
## 2007-01-10 NA NA
sdata<-biodatadf%>%select(-Date)
#sdata<-tk_xts(data,date_var=Date)
df_50=as.data.frame.matrix(apply(sdata, 2, SMA,50))
colnames(df_50)=paste0(colnames(df_50),"_sma50")
df_200=as.data.frame.matrix(apply(sdata, 2, SMA,200))
colnames(df_200)=paste0(colnames(df_200),"_sma200")
df_all<-cbind.data.frame(Date=biodatadf$Date, df_200,df_50)%>%drop_na()
# sma 50
f50<- function(x) SMA(x,50)
# sma 50
f200<- function(x) SMA(x,200)
#library(pryr)
#data %>% plyr::colwise() %>% f50
df_all<-tk_xts(df_all,date_var=Date)
df_all%>%head()
## AAPL.Adjusted_sma200 MSFT.Adjusted_sma200 GOOGL.Adjusted_sma200
## 2013-03-07 57.49335 24.96830 341.9314
## 2013-03-08 57.46856 24.96565 342.5098
## 2013-03-11 57.43212 24.96036 343.0621
## 2013-03-12 57.39270 24.95521 343.6297
## 2013-03-13 57.34667 24.95290 344.1699
## 2013-03-14 57.30539 24.95172 344.7151
## IBM.Adjusted_sma200 FB.Adjusted_sma200 AAPL.Adjusted_sma50
## 2013-03-07 167.0124 25.67010 49.92382
## 2013-03-08 167.0851 25.61875 49.77925
## 2013-03-11 167.1485 25.58930 49.66265
## 2013-03-12 167.2180 25.57345 49.52154
## 2013-03-13 167.2968 25.54885 49.39153
## 2013-03-14 167.3918 25.51890 49.22393
## MSFT.Adjusted_sma50 GOOGL.Adjusted_sma50 IBM.Adjusted_sma50
## 2013-03-07 23.98699 380.5125 169.7385
## 2013-03-08 24.00741 381.7339 170.0599
## 2013-03-11 24.02904 382.9947 170.3837
## 2013-03-12 24.04962 384.2091 170.7027
## 2013-03-13 24.07753 385.4634 171.0965
## 2013-03-14 24.10652 386.6061 171.5250
## FB.Adjusted_sma50
## 2013-03-07 28.8342
## 2013-03-08 28.8548
## 2013-03-11 28.8874
## 2013-03-12 28.9230
## 2013-03-13 28.9464
## 2013-03-14 28.9548
dim(df_all)
## [1] 1252 10
The custom function mov.avgs() accepts a single stock data frame to calculate the moving averages. The first line selects the closing prices because it indexes [,4] to create stock.close. Next, the function uses ifelse to check the number of rows in the data frame. Specifically if the nrow in the data frame is less than (2260), then the function will create a data frame of moving averages with “NA”. I chose this number because there is about 250 trading days a year so this will check that the time series is about 2 years or more in length. Loyal3 sometimes can get access to IPOs and if the stock is newly public there will not be enough data for a 200 day moving average. However, if the nrow value is greater than 2260 then the function will create a data frame with the original data along with 200 and 50 day moving averages as new columns. Using colnames, I declare the column names. The last part of the function uses complete.cases to check for values in the 200 day moving average column. Any rows that do not have a value are dropped in the final result.
mov.avgs<-function(df){
ifelse((nrow(df)<(2*260)),
x<-data.frame(df, 'NA', 'NA'),
x<-data.frame( SMA(df, 200), SMA(df, 50)))
colnames(x)<-c( 'sma_200','sma_50')
x<-x[complete.cases(x$sma_200),]
return(x)
}
dplyr::pull(sdata, AAPL.Adjusted)%>%head()
## [1] 8.104137 8.284014 8.225021 8.265641 8.952269 9.380682
This object is passed to dySeries() in the next 2 lines. You can refer to a column by name so dySeries() each plot a line for the “sma_50” and “sma_200” values in lines 2 and 3. This object is forwarded again to the dyRangeSelector() to adjust the selector’s height. Lastly, I added some shading to define periods when you would have wanted to buy or hold the equity and a period when you should have sold your shares or stayed away depending on your position.
var=names(df_all)[str_detect(names(df_all), "AAPL")]
df_all[,var]%>%head()
## AAPL.Adjusted_sma200 AAPL.Adjusted_sma50
## 2013-03-07 57.49335 49.92382
## 2013-03-08 57.46856 49.77925
## 2013-03-11 57.43212 49.66265
## 2013-03-12 57.39270 49.52154
## 2013-03-13 57.34667 49.39153
## 2013-03-14 57.30539 49.22393
#works with dataframe
#select(df_all, contains("AAPL"))
# equivalent to above
#vars=names(df_all)[grepl('AAPL', names(df_all))]
#df_all[,vars]%>%head()
dateWindow=c("2014-01-01","2018-02-01")
dygraph(df_all[,var],main = 'Apple Moving Averages') %>%
dySeries('AAPL.Adjusted_sma50', label = 'sma 50') %>%
dySeries('AAPL.Adjusted_sma200', label = 'sma 200') %>%
dyRangeSelector(height = 30) %>%
dyShading(from = '2016-01-01', to = '2016-9-01', color = '#CCEBD6') %>%
dyShading(from = '2016-9-01', to = '2017-01-01', color = '#FFE6E6')%>%
dyRangeSelector(dateWindow = dateWindow)
Fig. 30
The Apple moving averages with shaded regions for buying/holding versus selling