Intro to Tidyquant in R

tidyquant pulls data directly from several different financial and economic sources into R and integrates the core financial packages zoo, xts, quantmod, TTR, and PerformanceAnalytics with the tidyverse syntax.


Load libraries

  • rm(list=ls()) clears all objects from the global environment
  • gc() frees up memory by cleaning up unused objects; important if you’re working on more memory-intensive analysis
rm(list=ls())
gc()
##          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 531578 28.4    1184037 63.3         NA   669514 35.8
## Vcells 980710  7.5    8388608 64.0      18432  1851810 14.2
library(tidyquant)
library(tidyverse)
library(DataExplorer)


Pull data

tq_get() pulls web-based financial data from different sources. Some require an API key.

Yahoo Finance

  • stock.prices and stock.prices.japan: open, high, low, close, volume and adjusted stock prices for a stock symbol
  • dividends: dividends for a stock symbol
  • splits: split ratio for a stock symbol

FRED

  • economic.data: economic data from FRED.

Requires API key and includes crypto and other more specialized and exotic datasets:*

tq_get_options()
##  [1] "stock.prices"       "stock.prices.japan" "dividends"         
##  [4] "splits"             "economic.data"      "quandl"            
##  [7] "quandl.datatable"   "tiingo"             "tiingo.iex"        
## [10] "tiingo.crypto"      "alphavantager"      "alphavantage"      
## [13] "rblpapi"


FRED

Pull economic data from FRED using the series code on the top-right of the chart:


Canada vs. China imports

tq_get(c("IMPCA", "IMPCH"), get="economic.data")
## # A tibble: 242 × 3
##    symbol date        price
##    <chr>  <date>      <dbl>
##  1 IMPCA  2015-01-01 25844.
##  2 IMPCA  2015-02-01 23266.
##  3 IMPCA  2015-03-01 25778.
##  4 IMPCA  2015-04-01 24763.
##  5 IMPCA  2015-05-01 24283.
##  6 IMPCA  2015-06-01 27352.
##  7 IMPCA  2015-07-01 24718.
##  8 IMPCA  2015-08-01 24817.
##  9 IMPCA  2015-09-01 25289.
## 10 IMPCA  2015-10-01 23669.
## # ℹ 232 more rows
tq_get(c("IMPCA", "IMPCH"), get="economic.data") %>%
  ggplot(aes(x=date, y=price, color=symbol))+
  geom_line() +
  ggtitle("U.S. Imports of Goods by Customs Basis from China and Canada") +
  ylab("Millions of Dollars") +
  scale_y_continuous(labels = scales::dollar_format(prefix="$", suffix = "M"))


Stock Prices

  • open, high, low, and close: the opening, high, low, and closing stock prices that day.
  • [volume:](https://www.investopedia.com/articles/technical/02/010702.asp#:~:text=Volume%20measures%20the%20number%20of,prices%20fall%20on%20increasing%20volume.) the number of trades that day.
  • adjusted stock price: While the closing price simply refers to the cost of shares at the end of the day, the adjusted closing price takes dividends, stock splits, and new stock offerings into account.


Apple

Jan. 1, 1980 is entered as the start date, but Apple went public on Dec. 12, 1980, so this is the earliest date pulled.

tq_get("AAPL", get = "stock.prices", 
                from = " 1980-01-01")
## # A tibble: 11,149 × 8
##    symbol date        open  high   low close    volume adjusted
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
##  1 AAPL   1980-12-12 0.128 0.129 0.128 0.128 469033600   0.0987
##  2 AAPL   1980-12-15 0.122 0.122 0.122 0.122 175884800   0.0936
##  3 AAPL   1980-12-16 0.113 0.113 0.113 0.113 105728000   0.0867
##  4 AAPL   1980-12-17 0.116 0.116 0.116 0.116  86441600   0.0889
##  5 AAPL   1980-12-18 0.119 0.119 0.119 0.119  73449600   0.0914
##  6 AAPL   1980-12-19 0.126 0.127 0.126 0.126  48630400   0.0970
##  7 AAPL   1980-12-22 0.132 0.133 0.132 0.132  37363200   0.102 
##  8 AAPL   1980-12-23 0.138 0.138 0.138 0.138  46950400   0.106 
##  9 AAPL   1980-12-24 0.145 0.146 0.145 0.145  48003200   0.112 
## 10 AAPL   1980-12-26 0.158 0.159 0.158 0.158  55574400   0.122 
## # ℹ 11,139 more rows


Multiple stocks

stocks <- tq_get(c("NVDA", "AMZN", "META", "AAPL"),
                get = "stock.prices",
                from = "2024-01-01",
                to = "2025-03-07")

head(stocks)
## # A tibble: 6 × 8
##   symbol date        open  high   low close    volume adjusted
##   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1 NVDA   2024-01-02  49.2  49.3  47.6  48.2 411254000     48.2
## 2 NVDA   2024-01-03  47.5  48.2  47.3  47.6 320896000     47.6
## 3 NVDA   2024-01-04  47.8  48.5  47.5  48.0 306535000     48.0
## 4 NVDA   2024-01-05  48.5  49.5  48.3  49.1 415039000     49.1
## 5 NVDA   2024-01-08  49.5  52.3  49.5  52.3 642510000     52.2
## 6 NVDA   2024-01-09  52.4  54.3  51.7  53.1 773100000     53.1

For instance, you can see that Nvidia has a much higher volume of trades than the other tech stocks.

stocks %>%
  ggplot(aes(x=date, y=volume, color=symbol)) +
  geom_line() +
  theme_tq() +
  #facet_wrap(~symbol, scales="free_y", ncol=2) +
  scale_y_continuous(labels = scales::comma_format()) 


Other tq options

tq_exchange_options()
## [1] "AMEX"   "NASDAQ" "NYSE"
tq_fund_source_options()
## [1] "SSGA"
tq_index_options()
## [1] "DOW"       "DOWGLOBAL" "SP400"     "SP500"     "SP600"
nyse <- tq_exchange("NYSE")
## Getting data...
head(nyse)
## # A tibble: 6 × 7
##   symbol company            last.sale.price market.cap country ipo.year industry
##   <chr>  <chr>                        <dbl>      <dbl> <chr>      <int> <chr>   
## 1 A      "Agilent Technolo…           122.     3.48e10 "Unite…     1999 "Biotec…
## 2 AA     "Alcoa Corporatio…            31.0    8.02e 9 "Unite…     2016 "Alumin…
## 3 AACT   "Ares Acquisition…            11.1    0       ""          2023 "Blank …
## 4 AAM    "AA Mission Acqui…            10.2    0       ""          2024 ""      
## 5 AAMI   "Acadian Asset Ma…            23.6    8.87e 8 "Unite…     2014 "Invest…
## 6 AAP    "Advance Auto Par…            36.6    2.19e 9 "Unite…       NA "Auto &…


Mutating & Charts

tq_mutate adds columns to the existing dataframe. tq_transmute works exactly like tq_mutate except it only returns the newly created columns. This is helpful when changing the periodicity in the data, such as from daily to quarterly returns, where the new columns would not have the same number of rows.

More Tutorials:

tq_mutate_fun_options()
## $zoo
##  [1] "rollapply"          "rollapplyr"         "rollmax"           
##  [4] "rollmax.default"    "rollmaxr"           "rollmean"          
##  [7] "rollmean.default"   "rollmeanr"          "rollmedian"        
## [10] "rollmedian.default" "rollmedianr"        "rollsum"           
## [13] "rollsum.default"    "rollsumr"          
## 
## $xts
##  [1] "apply.daily"     "apply.monthly"   "apply.quarterly" "apply.weekly"   
##  [5] "apply.yearly"    "diff.xts"        "lag.xts"         "period.apply"   
##  [9] "period.max"      "period.min"      "period.prod"     "period.sum"     
## [13] "periodicity"     "to.daily"        "to.hourly"       "to.minutes"     
## [17] "to.minutes10"    "to.minutes15"    "to.minutes3"     "to.minutes30"   
## [21] "to.minutes5"     "to.monthly"      "to.period"       "to.quarterly"   
## [25] "to.weekly"       "to.yearly"       "to_period"      
## 
## $quantmod
##  [1] "allReturns"      "annualReturn"    "ClCl"            "dailyReturn"    
##  [5] "Delt"            "HiCl"            "Lag"             "LoCl"           
##  [9] "LoHi"            "monthlyReturn"   "Next"            "OpCl"           
## [13] "OpHi"            "OpLo"            "OpOp"            "periodReturn"   
## [17] "quarterlyReturn" "seriesAccel"     "seriesDecel"     "seriesDecr"     
## [21] "seriesHi"        "seriesIncr"      "seriesLo"        "weeklyReturn"   
## [25] "yearlyReturn"   
## 
## $TTR
##  [1] "adjRatios"          "ADX"                "ALMA"              
##  [4] "aroon"              "ATR"                "BBands"            
##  [7] "CCI"                "chaikinAD"          "chaikinVolatility" 
## [10] "CLV"                "CMF"                "CMO"               
## [13] "CTI"                "DEMA"               "DonchianChannel"   
## [16] "DPO"                "DVI"                "EMA"               
## [19] "EMV"                "EVWMA"              "GMMA"              
## [22] "growth"             "HMA"                "keltnerChannels"   
## [25] "KST"                "lags"               "MACD"              
## [28] "MFI"                "momentum"           "OBV"               
## [31] "PBands"             "ROC"                "rollSFM"           
## [34] "RSI"                "runCor"             "runCov"            
## [37] "runMAD"             "runMax"             "runMean"           
## [40] "runMedian"          "runMin"             "runPercentRank"    
## [43] "runSD"              "runSum"             "runVar"            
## [46] "SAR"                "SMA"                "SMI"               
## [49] "SNR"                "stoch"              "TDI"               
## [52] "TRIX"               "ultimateOscillator" "VHF"               
## [55] "VMA"                "volatility"         "VWAP"              
## [58] "VWMA"               "wilderSum"          "williamsAD"        
## [61] "WMA"                "WPR"                "ZigZag"            
## [64] "ZLEMA"             
## 
## $PerformanceAnalytics
## [1] "Return.annualized"        "Return.annualized.excess"
## [3] "Return.clean"             "Return.cumulative"       
## [5] "Return.excess"            "Return.Geltner"          
## [7] "zerofill"


Comparing Returns

tq_get(c("NVDA", "META", "AAPL", "MSFT"),
       get = "stock.prices",
       from = "2024-01-01") %>%
  group_by(symbol) %>%
  tq_transmute(select = close, 
               mutate_fun = periodReturn,
               period = "quarterly",
               type = "arithmetic") %>%
  ggplot(aes(x=date, y=quarterly.returns, fill=symbol)) +
  geom_col(position = "dodge") +
  # facet_wrap(~ symbol, ncol = 2) +
  theme_tq() + 
  scale_fill_tq() +
  scale_y_continuous(labels = scales::percent)


Bar Chart

This is a type of bar chart that plots the open, close, high, and low of the daily stock returns and color-codes the bars based on whether the day ended with the stock price up (blue) or down (red).

Source: HowToTrade.com

Tidyquant also has functions for candlestick charts, geom_candlestick, and for displaying moving averages, geom_ma() and geom_bbands() for more complex Bollinger Bands.

tq_get("NVDA", 
       get = "stock.prices",
       from = "2025-01-01",
       to = "2025-03-07") %>%
  ggplot(aes(x = date, y = close)) +
  geom_barchart(aes(open = open, high = high, low = low, close = close),   na.rm = TRUE) +
  labs(title = "Nvidia Daily Returns", y = "Closing Price", x = "") + 
  theme_tq() +
 # geom_ma() + # moving average
  scale_y_continuous(labels = scales::dollar_format())


Correlation Matrix

The correlation matrix uses the DataExplorer package.

cor <- tq_get(c("META", "NVDA", # Tech
                "GLD", "PPLT", # gold and platinum
                "AAL", # American airlines
                "CVS", # healthcare
                "XOM",  "DBO"), # Exxon &
       get = "stock.prices",
       from = "2020-01-01") %>%
  group_by(symbol) %>%
  tq_transmute(select = close, 
               mutate_fun = periodReturn,
               period = "monthly",
               type = "arithmetic") %>%
  select(date, symbol, return = monthly.returns) %>%
  spread(symbol, return)

cor
## # A tibble: 63 × 9
##    date           AAL      CVS     DBO      GLD     META    NVDA    PPLT     XOM
##    <date>       <dbl>    <dbl>   <dbl>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>
##  1 2020-01-31 -0.0773 -0.0855  -0.151   0.0374  -0.0375  -0.0145 -0.0233 -0.124 
##  2 2020-02-28 -0.290  -0.127   -0.106  -0.00636 -0.0468   0.142  -0.100  -0.172 
##  3 2020-03-31 -0.360   0.00253 -0.245  -0.00222 -0.133   -0.0240 -0.163  -0.262 
##  4 2020-04-30 -0.0148  0.0374  -0.0962  0.0726   0.227    0.109   0.0901  0.224 
##  5 2020-05-29 -0.126   0.0653   0.173   0.0259   0.0996   0.215   0.0681 -0.0215
##  6 2020-06-30  0.245  -0.00915  0.0754  0.0274   0.00880  0.0701 -0.0113 -0.0165
##  7 2020-07-31 -0.149  -0.0312   0.0486  0.108    0.117    0.118   0.0907 -0.0590
##  8 2020-08-31  0.174  -0.0130   0.0491 -0.00324  0.156    0.260   0.0308 -0.0509
##  9 2020-09-30 -0.0582 -0.0599  -0.0650 -0.0417  -0.107    0.0117 -0.0444 -0.140 
## 10 2020-10-30 -0.0822 -0.0396  -0.107  -0.00519  0.00462 -0.0736 -0.0521 -0.0498
## # ℹ 53 more rows
plot_correlation(na.omit(cor, maxcat = 5L))
## Warning in dummify(data, maxcat = maxcat): Ignored all discrete features since
## `maxcat` set to 20 categories!