Overview

I will build a workflow around VOO (Vanguard S&P 500 ETF) historical market data from Yahoo Finance. The goal is to 1. acquire the dataset from an accessible source 2. load it into R in a clean, well-documented way 3. perform a small set of transformations that make the data analysis-ready for later weeks.

Related article (anchor for motivation and context):

Vanguard triumphs over State Street to take largest ETF crown (Financial Times, https://www.ft.com/content/641e9fd7-c989-4831-917b-23b3250be7db)
This article discusses VOO’s scale and why low-cost index ETFs have become dominant.

Dataset and source

Dataset: VOO (Vanguard S&P 500 ETF) historical daily price series
Source: Yahoo Finance VOO historical data: https://finance.yahoo.com/quote/VOO/history/

What the dataset contains

VOO historical market data provides a daily time series with the following columns: Date, Open, High, Low, Close, Adj.Close, Volume. The “Close” is adjusted for splits, while “Adj.Close” is adjusted for dividends and splits, which makes it better for return calculations.

VOO itself is an index ETF designed to track the performance of the S&P 500 Index.

Motivation for selecting this dataset

Planned approach (how I will tackle the problem)

Anticipated data challenges

Conclusion / next steps

After producing a clean, transformed daily dataset for VOO, my next step will be to extend the analysis in later weeks by: