Today I wanted to walk through a quick example, combining scraping, calls to the Yahoo finance api and simple asset analysis.I also want to focus on functional programming and tidy iteration. Today I will be using the R programming langauge. If you have read any of my posts on linkedin or Medium in the past, you may have noticed that I ussually program in python. In general, I prefer the python programming language because it has simpler synthax, wider adoption and is more versatile.
In the R programming langauge there is a set of packages that make up what is called the tidyverse. These packages are mostly maintained by engineers at Rstudio and provide a simple, integrated and uniform way to maninipulate data in R.
! add tidyverse picture !
I’ll start by importing a the packages that I’ll need. You’ll notice that most of the packages that I use are from the tidyverse.
The next thing that I will do is define a variable with todays date. I will then use the subtract 3 months from todays date. This returns a another date object, indicating what day came 3 months before today. I will need this because I want to get the last 3 months of OHLC data for each ticker.
## [1] "2019-07-09"
I’m going to use the tidyquant package to get the financial data for all SP500 tickers. The tidyqunat packages core function is tq_get() which can be used to get various information about stocks. If I pass a string containing a ticker name to tq_get(), it will return Open, High, Low, Close or OHLC data.
Theres a few things to note above:
I want this OHLC data for all SP500 tickers. In order to do this I will need to do a few things:
Wow! That sounds a little complicated right? luckily, with R, going about this will be pretty simple. Wikiepdia has a table of all 505 SP500 tickers (some companies, like Google, have multiple asset classes) located at this URL:
https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
To get all the SP500 tickers we are going to scrape this table, using the rvest package. The Rvest package is a simple scraping package in R that is very similar to python’s beautiful soup. In programming, scraping is defined as programatically collecting human readable content from the internent and webpages.
In the code below I scrape the wikipedia table and create a list of all SP500 tickers. The hardest part of scraping is figuring out the xpath or css to indicate which html nodes to select. I really don’t know much about html or css, but using Google Chrome I was able to find the correct xpath.
## Rows Columns
## 1 32185 11
I’m not sure what function call is throwing this error, it is not an error thrown by tq_transmute(). It’s being thrown by one of the functions being called by tq_transmute().
`