The objective of this post is to present an alternative to getting data in R besides the great packages like quantmod and TFX see more options on this post. This packages used Yahoo and Google services, that even if they are accurate are not scalable. In my particular case I wanted data from more than 3 fiscal years, something that neither of those services offer.
Also this post is for those that don’t have a programming background, since the function I am going to be explaining might seem simple for those that are use to using APIs. However I find that even if you have a programming background if you are working with R this function might be useful.
For this purpose I’m going to be using Intrinio API that offers free plans perfect for students, start-ups or development phase. I use it for company valuation for Towers Capital Group. I think that they offer best balance between affordability and scalability. If you want to follow along you can get a free account and API key in their webpage.
The API’s in lay terms is a interface to access a service, in this case we want to retrieve financial information. To do so, we need to access an standardized links created depending on what the information we want. See it as a particular webpage that has the particular information you want and then you copy and paste that information for your use.
However since this drain resources the API uses a username and a KEY so that they control the access to the information. To authenticate in R we use the httr as follows:
library(httr)
## Warning: package 'httr' was built under R version 3.4.2
username <- "81956e74baabf633764478abdb0fd5d9"
password <- "30cbd96968dba7ca824880a27b0daa62"
link <- "https://api.intrinio.com/financials/standardized?identifier=AAPL&statement=income_statement&fiscal_year=2015&fiscal_period=FY"
tp <- GET(link, authenticate(username, password, type = "basic"))
The “link” variable is the actual link to an API information. If you copy and paste that link in your browser you will you will be prompted to authenticate. The tp is going to be a connection to that link, the authenticate function serve to authenticate in the link.
So the links are created depending on what you want to retrieve from the API service. This is important because at the end the function that we are going to create is simply a link (or API call) creator. The function will take the parameters accepted by the API and create a link based on that. If you go to the http://docs.intrinio.com you will see the full documentation, for the link we use above we can change:
For each of this parameters have a set of possible values, which specify you call. So in order to create the calls we just need to use the sapply function to a set of values that you want to go through. For example let say that you want the income statement for year 2014 to 2016 inclusive, then you will need to create 3 links (2014,2015,2016): we can do this with the paste function and sapply. First let’s built just link with the paste0 function:
year <- 2015
paste0("https://api.intrinio.com/financials/standardized?identifier=AAPL&statement=income_statement&fiscal_year=",year,"&fiscal_period=FY")
## [1] "https://api.intrinio.com/financials/standardized?identifier=AAPL&statement=income_statement&fiscal_year=2015&fiscal_period=FY"
Now we can define a list of all years and use the sapply function to create one link for each year:
years <- c(2014,2015,2016)
links <- sapply(years, function(year) {paste0("https://api.intrinio.com/financials/standardized?identifier=AAPL&statement=income_statement&fiscal_year=",year,"&fiscal_period=FY")}
)
Now we have a list of all the links for each year. Then we have to retrieve the information for each link in a programmatic way.
Previously we already use the get and authenticate functions to create a connection the a link. Now we have to retrieve the information contained in that connection. The content function retrieves the raw data from a provided connection and the unlist function makes the content readable in R.
library(magrittr) #so taht we can use the pippe %>% operator, for readability sake
library(jsonlite) #to parse the json file to a vector
content(tp, as = "text") %>% unlist() %>% fromJSON(flatten = FALSE) %>% .[[1]]
## No encoding supplied: defaulting to UTF-8.
## tag value
## 1 operatingrevenue 2.337150e+11
## 2 totalrevenue 2.337150e+11
## 3 operatingcostofrevenue 1.400890e+11
## 4 totalcostofrevenue 1.400890e+11
## 5 totalgrossprofit 9.362600e+10
## 6 sgaexpense 1.432900e+10
## 7 rdexpense 8.067000e+09
## 8 totaloperatingexpenses 2.239600e+10
## 9 totaloperatingincome 7.123000e+10
## 10 otherincome 1.285000e+09
## 11 totalotherincome 1.285000e+09
## 12 totalpretaxincome 7.251500e+10
## 13 incometaxexpense 1.912100e+10
## 14 netincomecontinuing 5.339400e+10
## 15 netincome 5.339400e+10
## 16 netincometocommon 5.339400e+10
## 17 weightedavebasicsharesos 5.753421e+09
## 18 basiceps 9.280000e+00
## 19 weightedavedilutedsharesos 5.793069e+09
## 20 dilutedeps 9.220000e+00
## 21 weightedavebasicdilutedsharesos 5.753700e+09
## 22 basicdilutedeps 9.280000e+00
## 23 cashdividendspershare 1.980000e+00
Remember that the tp object is the previously establish connection. The fromJSON function at the end is parsing the information from his original format (JSON) to a dataframe in R.
Now we just have to go through each link that we created and get the data frame. This will be somewhat convoluted because we will need to first create a connection for each link and then get the data frame for each connection, that will be two sapply function. We are better of creating a function. In the next post I will go through it.