Title- S&P 500 Research Project
## Goal Questions-
### Question 1- What are the best and worst daily returns in the sample. This will be good to look at what was going on in the world during these times
### Question 2- How often does the market experience a 5%-20% drawdown in a short period of time, what causes this.
### Question 3- what times in history have there been high volatility. Good to see what causes the volatility
# Data selection-
## Used Yahoo Finance Data
#install.packages("tidyquant")
library(tidyquant)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Warning: package 'zoo' was built under R version 4.4.3
## ── Attaching core tidyquant packages ─────────────────────── tidyquant 1.0.11 ──
## ✔ PerformanceAnalytics 2.0.8 ✔ TTR 0.24.4
## ✔ quantmod 0.4.28 ✔ xts 0.14.1
## ── Conflicts ────────────────────────────────────────── tidyquant_conflicts() ──
## ✖ zoo::as.Date() masks base::as.Date()
## ✖ zoo::as.Date.numeric() masks base::as.Date.numeric()
## ✖ PerformanceAnalytics::legend() masks graphics::legend()
## ✖ quantmod::summary() masks base::summary()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:xts':
##
## first, last
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
spx <- tq_get("^GSPC", from = "2000-01-01", to = Sys.Date())
head(spx)
## # A tibble: 6 × 8
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ^GSPC 2000-01-03 1469. 1478 1438. 1455. 931800000 1455.
## 2 ^GSPC 2000-01-04 1455. 1455. 1397. 1399. 1009000000 1399.
## 3 ^GSPC 2000-01-05 1399. 1413. 1378. 1402. 1085500000 1402.
## 4 ^GSPC 2000-01-06 1402. 1412. 1392. 1403. 1092300000 1403.
## 5 ^GSPC 2000-01-07 1403. 1441. 1401. 1441. 1225200000 1441.
## 6 ^GSPC 2000-01-10 1441. 1464. 1441. 1458. 1064800000 1458.
# Clean Data
## Organizing by date and ensuring no nas
colSums(is.na(spx))
## symbol date open high low close volume adjusted
## 0 0 0 0 0 0 0 0
spx_arranged <- spx %>%
arrange(date)
spx_arranged
## # A tibble: 6,578 × 8
## symbol date open high low close volume adjusted
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ^GSPC 2000-01-03 1469. 1478 1438. 1455. 931800000 1455.
## 2 ^GSPC 2000-01-04 1455. 1455. 1397. 1399. 1009000000 1399.
## 3 ^GSPC 2000-01-05 1399. 1413. 1378. 1402. 1085500000 1402.
## 4 ^GSPC 2000-01-06 1402. 1412. 1392. 1403. 1092300000 1403.
## 5 ^GSPC 2000-01-07 1403. 1441. 1401. 1441. 1225200000 1441.
## 6 ^GSPC 2000-01-10 1441. 1464. 1441. 1458. 1064800000 1458.
## 7 ^GSPC 2000-01-11 1458. 1459. 1434. 1439. 1014000000 1439.
## 8 ^GSPC 2000-01-12 1439. 1443. 1427. 1432. 974600000 1432.
## 9 ^GSPC 2000-01-13 1432. 1454. 1432. 1450. 1030400000 1450.
## 10 ^GSPC 2000-01-14 1450. 1473 1450. 1465. 1085900000 1465.
## # ℹ 6,568 more rows