Whenever the unfortunate idea to invest in the stock market forms in my head, I first start by playing with the google finance stock screener.

This time, instead of burning my money, i thought my time would be better spent building a stock screener of my own.

What follows is a quite simple tutorial on just how to do so. It is the first of a series of posts that will be related to stocks and stock-screening.

Step 0: Load the required libraries

library(rvest)
library(magrittr)
library(stringr)
library(dplyr)
library(ggplot2)

Step 1: Get a list of all stock symbols on the TSX

get_symbols <- function(letter){
  
  url <- sprintf(paste0("https://en.wikipedia.org/wiki/",
                        "Companies_listed_on_the_Toronto_Stock_Exchange_(%s)"),
                        letter)

  html <- read_html(url)
  
  df <- html %>% html_nodes("table") %>% extract2(2) %>% html_table()
  
  colnames(df) <- c("stock", "symbol")
  
  df$link <- paste0("http://web.tmxmoney.com/quote.php?qm_symbol=",
                    df$symbol)
  
  df

}

#loop over letters to get all stocks
all_stocks <- lapply(toupper(letters), get_symbols)

#put all results in a dataframe
stocks_df <- do.call(rbind, all_stocks)

head(stocks_df)

Step 2: Extract basic quote information

Source: http://web.tmxmoney.com

stock_info_basic <- function(symbol){
  print(symbol)
  
  url <- paste0("http://web.tmxmoney.com/quote.php?qm_symbol=", symbol)
  
  html <- read_html(url)
  
  outer_table <- html %>% html_nodes(".quote-tabs-content table")
  
  info_tables <- outer_table %>% html_nodes("table") %>% html_table()
  
  df = NULL
  
  try(
    df <- data.frame(
      symbol=symbol,
      
      beta=(info_tables[[1]] %>% filter(X1=="Beta:"))[,2],
      
      dividend=(info_tables[[3]] %>% filter(X1=="Dividend:"))[,2] %>% 
        str_extract("\\d+\\.\\d+") %>% as.numeric(),
      
      div_freq=(info_tables[[3]] %>% filter(X1=="Div. Frequency:"))[,2],
      
      PE=(info_tables[[3]] %>% filter(X1=="P/E Ratio:"))[,2],
      
      EPS=(info_tables[[3]] %>% filter(X1=="EPS:"))[,2],
      
      yield=(info_tables[[4]] %>% filter(X1=="Yield:"))[,2],
      
      market_cap=(info_tables[[4]] %>% filter(X1=="Market Cap:"))[,2] %>% 
        str_replace_all(",", "") %>% as.numeric(),
      
      PB=(info_tables[[4]] %>% filter(X1=="P/B Ratio:"))[,2]
    )
  )
  
  df
}

Notice that sometimes, we don’t find the information for a given stock symbol.

Step 3: Assemble the results

stocks_info_basic_df <- do.call(rbind, lapply(stocks_df$symbol, stock_info_basic))

stocks_info_basic_df <- stocks_df %>% inner_join(stocks_info_basic_df)

head(stocks_info_basic_df)
##   symbol  beta dividend  div_freq   PE   EPS  yield market_cap     PB
## 1  AW.UN 0.628    0.125   Monthly 20.1  1.16  5.199  349990111  3.597
## 2    FAP 0.485    0.040   Monthly   NA -0.06 10.000  250630157  0.939
## 3    AAB 1.917       NA       N/A   NA -0.05     NA   16717597  0.625
## 4    ABT 0.653    0.080 Quarterly 17.9  0.38  4.720  263363045 -4.878
## 5    ADN 0.347    0.250 Quarterly 21.4  0.83  5.634  296979084  1.112
## 6  AEF.A 0.058       NA       N/A   NA -1.69     NA  390425000 40.417

And because no R-related post would be complete without at least a simple chart:

plot_data <- subset(stocks_info_basic_df, PE>1 & PE<100 & PB>0.25 & PB<10)
label_data <- subset(plot_data, PE<15 & 1/PB>1.5)

ggplot(data=plot_data, aes(x=PE, y=1/PB)) +
  geom_point(aes(size=market_cap, col=factor(is.na(dividend))), alpha=0.3) + 
  scale_size(range=c(3,10)) +
  labs(title="TSX Stocks", x="Price/Earnings", y="BookValue/Price",
       col="Dividend", size="Market Cap") +
  geom_text(data=label_data, aes(label=symbol), check_overlap=TRUE)

In the above plot, I’ve removed outliers and displayed some stock symbols in the north-west corner, i.e. stocks with low Price/Earnings ratio and high BookValue/Price ratio.

In the next installment, I’ll be adding lots of interesting financial information to the stock database (revenue, income, expenses, etc.). You guessed it…we wil actually build a full-fledged stock-screener. Stay tuned!