This is your first R project.
Run R Studio (R runs when R Studio runs, because R Studio is a “shell” or “interface” of R). After you install, run, and complete the first module of the Swirl program, install the package quantmod. This package will help R pull data directly from the FRED databank. Once installed, you can then type the following:
library(quantmod) # Package to pull data from FRED to your R Studio
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Version 0.4-0 included new data defaults. See ?getSymbols.
getSymbols('GDPC1',src='FRED') # Pulling the US Real GDP (quarterly)
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
##
## This message is shown once per session and may be disabled by setting
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## [1] "GDPC1"
str(GDPC1) # To see the structure of your data set
## An 'xts' object on 1947-01-01/2020-01-01 containing:
## Data: num [1:293, 1] 2033 2028 2023 2055 2086 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr "GDPC1"
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "FRED"
## $ updated: POSIXct[1:1], format: "2020-05-18 21:54:50"
You see that your data set is a “list” with two objects: (1) the source: FRED, and (2) the quarterly time series of US Real GDP from 1/1/1947 to 1/1/2020. In turn, the time series is, in fact, also a set of two objects: (i) the time sequence indicating the quarter and year and (ii) the actual Real GDP values.
To see the first few rows and the last few rows of the data=:
head(GDPC1) # The first 6 values
## GDPC1
## 1947-01-01 2033.061
## 1947-04-01 2027.639
## 1947-07-01 2023.452
## 1947-10-01 2055.103
## 1948-01-01 2086.017
## 1948-04-01 2120.450
tail(GDPC1) # The last 6 values
## GDPC1
## 2018-10-01 18783.55
## 2019-01-01 18927.28
## 2019-04-01 19021.86
## 2019-07-01 19121.11
## 2019-10-01 19221.97
## 2020-01-01 18987.88
Now, plot the data:
plot(GDPC1, main="USA Real GDP (2012 $ billions)")
Now, let’s pull some data directly from either Yahoo! Finance or Google Finance, if the data are available. To pull financial data from these sites, you need to load and install another R package: BatchGetSymbols. And to do some data transformations and create beautiful plots, you also need to download and install the superpackage tidyverse, which contains the subpackage ggplot2. Once the packages are loaded, type the following:
rm(list=ls()) # This clears your R environment
library(BatchGetSymbols)
## Loading required package: rvest
## Loading required package: xml2
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ stringr 1.4.0
## ✓ tidyr 1.0.3 ✓ forcats 0.5.0
## ✓ readr 1.3.1
## ── Conflicts ───────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::first() masks xts::first()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x dplyr::lag() masks stats::lag()
## x dplyr::last() masks xts::last()
## x purrr::pluck() masks rvest::pluck()
first.date <- Sys.Date() - 830 # Approximately 2 years of daily data
last.date <- Sys.Date()
freq.data <- 'daily'
# Get data for Procter & Gamble, Google (Alphabet), 3M, and the S&P500
tickers <- c('PG', 'GOOG', 'MMM', '^GSPC')
# We put all the data in this object
l.out <- BatchGetSymbols(tickers = tickers,
first.date = first.date,
last.date = last.date,
freq.data = freq.data,
cache.folder = file.path(tempdir(),
'BGS_Cache') ) # cache in tempdir()
##
## Running BatchGetSymbols for:
##
## tickers =PG, GOOG, MMM, ^GSPC
## Downloading data for benchmark ticker
## ^GSPC | yahoo (1|1) | Not Cached | Saving cache
## PG | yahoo (1|4) | Not Cached | Saving cache - Got 100% of valid prices | OK!
## GOOG | yahoo (2|4) | Not Cached | Saving cache - Got 100% of valid prices | OK!
## MMM | yahoo (3|4) | Not Cached | Saving cache - Got 100% of valid prices | Well done!
## ^GSPC | yahoo (4|4) | Found cache file - Got 100% of valid prices | Good job!
str(l.out) # Structure of the data
## List of 2
## $ df.control: tibble [4 × 6] (S3: tbl_df/tbl/data.frame)
## ..$ ticker : chr [1:4] "PG" "GOOG" "MMM" "^GSPC"
## ..$ src : chr [1:4] "yahoo" "yahoo" "yahoo" "yahoo"
## ..$ download.status : chr [1:4] "OK" "OK" "OK" "OK"
## ..$ total.obs : int [1:4] 571 571 571 571
## ..$ perc.benchmark.dates: num [1:4] 1 1 1 1
## ..$ threshold.decision : chr [1:4] "KEEP" "KEEP" "KEEP" "KEEP"
## $ df.tickers:'data.frame': 2284 obs. of 10 variables:
## ..$ price.open : num [1:2284] 82 81 80.6 81.2 81.1 ...
## ..$ price.high : num [1:2284] 82.2 81 81.7 81.8 81.3 ...
## ..$ price.low : num [1:2284] 80.2 78.6 80.6 80.9 80.3 ...
## ..$ price.close : num [1:2284] 80.2 79.9 81.3 81.5 80.7 ...
## ..$ volume : num [1:2284] 14596100 18838800 10769000 9278700 9412000 ...
## ..$ price.adjusted : num [1:2284] 75 74.7 76 76.2 75.4 ...
## ..$ ref.date : Date[1:2284], format: "2018-02-08" "2018-02-09" ...
## ..$ ticker : chr [1:2284] "PG" "PG" "PG" "PG" ...
## ..$ ret.adjusted.prices: num [1:2284] NA -0.00374 0.01764 0.00209 -0.01006 ...
## ..$ ret.closing.prices : num [1:2284] NA -0.00374 0.01764 0.00209 -0.01006 ...
# Plots of adjusted close prices
p <- ggplot(l.out$df.tickers, aes(x = ref.date, y = price.close))
p <- p + geom_line()
p <- p + facet_wrap(~ticker, scales = 'free_y')
print(p)
# Plots of daily returns
p2 <- ggplot(l.out$df.tickers, aes(x = ref.date, y = ret.adjusted.prices))
p2 <- p2 + geom_line()
p2 <- p2 + facet_wrap(~ticker, scales = 'free_y')
print(p2)
## Warning: Removed 1 row(s) containing missing values (geom_path).
Nice plots! You’re done for now.