R Notebook

Today I wanted to walk through a quick example, combining scraping, calls to the Yahoo finance api and simple asset analysis.I also want to focus on functional programming and tidy iteration. Today I will be using the R programming langauge. If you have read any of my posts on linkedin or Medium in the past, you may have noticed that I ussually program in python. In general, I prefer the python programming language because it has simpler synthax, wider adoption and is more versatile.

In the R programming langauge there is a set of packages that make up what is called the tidyverse. These packages are mostly maintained by engineers at Rstudio and provide a simple, integrated and uniform way to maninipulate data in R.

! add tidyverse picture !

I’ll start by importing a the packages that I’ll need. You’ll notice that most of the packages that I use are from the tidyverse.

The next thing that I will do is define a variable with todays date. I will then use the subtract 3 months from todays date. This returns a another date object, indicating what day came 3 months before today. I will need this because I want to get the last 3 months of OHLC data for each ticker.

## [1] "2019-07-09"

I’m going to use the tidyquant package to get the financial data for all SP500 tickers. The tidyqunat packages core function is tq_get() which can be used to get various information about stocks. If I pass a string containing a ticker name to tq_get(), it will return Open, High, Low, Close or OHLC data.

Theres a few things to note above:

tq_get() returns a tidy dataframe
the ticker name is not in the dataframe
the ‘%>%’ operator is called a pipe. It passes the object that proceeds it as the first argument to the function that follows it.

I want this OHLC data for all SP500 tickers. In order to do this I will need to do a few things:

Get a list of all SP500 tickers
Iterate over this list and call tq_get on each element of the list, returning a dataframe for each ticker
combine all these dataframes into one dataframe

Wow! That sounds a little complicated right? luckily, with R, going about this will be pretty simple. Wikiepdia has a table of all 505 SP500 tickers (some companies, like Google, have multiple asset classes) located at this URL:

https://en.wikipedia.org/wiki/List_of_S%26P_500_companies

To get all the SP500 tickers we are going to scrape this table, using the rvest package. The Rvest package is a simple scraping package in R that is very similar to python’s beautiful soup. In programming, scraping is defined as programatically collecting human readable content from the internent and webpages.

In the code below I scrape the wikipedia table and create a list of all SP500 tickers. The hardest part of scraping is figuring out the xpath or css to indicate which html nodes to select. I really don’t know much about html or css, but using Google Chrome I was able to find the correct xpath.

##    Rows Columns
## 1 32185      11

I’m not sure what function call is throwing this error, it is not an error thrown by tq_transmute(). It’s being thrown by one of the functions being called by tq_transmute().