R Markdown

Description: This meetup is for anyone interested in learning and sharing knowledge about scraping data from Yahoo Finance using R. Yahoo Finance provides a wealth of financial data that can be used for research, analysis, and investment purposes. In this meetup, we will discuss the basics of web scraping, explore the structure of Yahoo Finance pages, and walk through the process of scraping data from Yahoo Finance and analyse the data using R and its libraries such as ggplot2, quantmod, and forecast.

Agenda:

Introduction to web scraping and its applications

Overview of R libraries such as ggplot2, quantmod, and forecast

Live coding session on scraping data from Yahoo Finance using R and its libraries

Tips and tricks for efficient web scraping and handling common issues

Perform very basic time series analysis

Discussion and Q&A session

Who should attend?

Anyone who is interested in learning about web scraping and its application to financial data, from beginners to experienced data analysts and investors. This meetup is open to all skill levels.

Requirements: Participants should bring their laptops to the online event. Basic knowledge of R programming is recommended, but not required. Internet access will be required to access Yahoo Finance pages during the live coding session.

Intro to Quantmod

Quantmod is an R package that provides a suite of tools for quantitative financial modeling and analysis. It enables users to access and manipulate financial data from various sources, including Yahoo Finance. In this tutorial, we will walk through the steps of using quantmod to retrieve and analyze Yahoo Finance data.

To start using quantmod and other free libraries, we need to load the package into R by running the following command:

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Retrieving Yahoo Finance Data

The first step in using quantmod to retrieve Yahoo Finance data is to specify the ticker symbol for the stock you want to analyze. For example, if you want to retrieve data for Tyson Foods and the Froster Farms, the ticker symbol are GOOG and NVDA, seprarately.

Once you have the ticker symbol, you can use the getSymbols() function to retrieve the data. This function downloads data from various sources, including Yahoo Finance, and returns it as an object that can be manipulated in R.

To retrieve data for GOOG and NVDA, run the following command:

getSymbols('GOOG', src = 'yahoo', 
           from = "2010-01-01", to = Sys.Date())
## [1] "GOOG"
Stock1 <- data.frame(
  GOOG,
  date = as.Date(rownames(data.frame(GOOG)))
)

getSymbols('NVDA', src = 'yahoo', 
           from = "2010-01-01", to = Sys.Date())
## [1] "NVDA"
Stock2 <- data.frame(
  NVDA,
  date = as.Date(rownames(data.frame(NVDA)))
)
head(Stock2)
##            NVDA.Open NVDA.High NVDA.Low NVDA.Close NVDA.Volume NVDA.Adjusted
## 2010-01-04    4.6275    4.6550   4.5275     4.6225    80020400      4.240428
## 2010-01-05    4.6050    4.7400   4.6050     4.6900    72864800      4.302351
## 2010-01-06    4.6875    4.7300   4.6425     4.7200    64916800      4.329869
## 2010-01-07    4.6950    4.7150   4.5925     4.6275    54779200      4.245015
## 2010-01-08    4.5900    4.6700   4.5625     4.6375    47816800      4.254188
## 2010-01-11    4.6625    4.6825   4.5075     4.5725    55661200      4.194561
##                  date
## 2010-01-04 2010-01-04
## 2010-01-05 2010-01-05
## 2010-01-06 2010-01-06
## 2010-01-07 2010-01-07
## 2010-01-08 2010-01-08
## 2010-01-11 2010-01-11

This will download the daily historical data for these two stocks.

Note that we specified the start date using the from argument and the end date using the to argument. We set the end date to Sys.Date(), which retrieves data up to the current date.

Exploring the Data

Once you have retrieved the data, you can use various functions to explore and manipulate it. Here are a few examples:

Summary Statistics

To get a summary of the data, run the summary() function. ### Summary Statistics To get the first six rows of the data, run the head() function.

Now let’s dive deeper!

##        Stock1   Stock2
## [1,] 15.61024 4.240428
## [2,] 15.54150 4.302351
## [3,] 15.14972 4.329869
## [4,] 14.79704 4.245015
## [5,] 14.99430 4.254188
## [6,] 14.97163 4.194561
## [1] 3.681289 3.612327 3.498885 3.485744 3.524597 3.569296

#Now let’s run an event analysis and a time series analysis (under development)

## [1] "GOOG"
## [1] "SPY"
##            SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 2010-01-04   112.37   113.39  111.51    113.33  118944600     87.12997
## 2010-01-05   113.26   113.68  112.85    113.63  111579900     87.36063
## 2010-01-06   113.52   113.99  113.43    113.71  116074400     87.42211
## 2010-01-07   113.50   114.33  113.18    114.19  131091100     87.79115
## 2010-01-08   113.89   114.62  113.66    114.57  126402800     88.08327
## 2010-01-11   115.08   115.13  114.24    114.73  106375700     88.20628
##                  date
## 2010-01-04 2010-01-04
## 2010-01-05 2010-01-05
## 2010-01-06 2010-01-06
## 2010-01-07 2010-01-07
## 2010-01-08 2010-01-08
## 2010-01-11 2010-01-11
##   GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Adjusted       date
## 1  15.61522  15.67898 15.54772   15.61024    78541293      15.61024 2010-01-04
## 2  15.62095  15.63739 15.48048   15.54150   120638494      15.54150 2010-01-05
## 3  15.58807  15.58807 15.10239   15.14972   159744526      15.14972 2010-01-06
## 4  15.17811  15.19305 14.76092   14.79704   257533695      14.79704 2010-01-07
## 5  14.74473  15.02493 14.67275   14.99430   189680313      14.99430 2010-01-08
## 6  15.05507  15.05507 14.79554   14.97163   289597429      14.97163 2010-01-11
##   SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted
## 1   112.37   113.39  111.51    113.33  118944600     87.12997
## 2   113.26   113.68  112.85    113.63  111579900     87.36063
## 3   113.52   113.99  113.43    113.71  116074400     87.42211
## 4   113.50   114.33  113.18    114.19  131091100     87.79115
## 5   113.89   114.62  113.66    114.57  126402800     88.08327
## 6   115.08   115.13  114.24    114.73  106375700     88.20628
## 
## Call:
## lm(formula = GOOG.Adjusted ~ SPY.Adjusted, data = time_series)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.6519  -4.8145  -0.4098   4.0659  22.1487 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -25.555777   0.269429  -94.85   <2e-16 ***
## SPY.Adjusted   0.344626   0.001036  332.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.106 on 3551 degrees of freedom
## Multiple R-squared:  0.9689, Adjusted R-squared:  0.9689 
## F-statistic: 1.106e+05 on 1 and 3551 DF,  p-value: < 2.2e-16

References: Intro to the quantmod package. https://www.quantmod.com/ Using R for Time Series Analysis https://a-little-book-of-r-for-time-series.readthedocs.io/en/latest/src/timeseries.html