Proposal

For the Final Project, we intend to analyze and predict stocks’ outcome based on the historical end of trading day data for 30 companies on the Dow Jones Industrial Average. For each company, there are five years of historical trading data up to February 2019. From a methodology standpoint, we will explore modeling techniques as part of this course, such as linear, binomial, and panel modeling, and modeling techniques outside of the course scope, such as time series forecasting using autoregressive modeling. The goal will be to predict the outcome of stock prices in either absolute returns or positive/negative closings depending on the selected model.

Dataset

A quick summary of the data set hosted on Kaggle, shows that for the given 30 publicly traded companies, there are approximately 36,000 observations. The variables accompanying this data are as the following:

dataset <- read_csv('https://raw.githubusercontent.com/salma71/Data_621/master/Project_Proposal/stocks_combined.csv')
tickers <- read_csv('https://raw.githubusercontent.com/salma71/Data_621/master/Project_Proposal/tickers.csv')
skim(dataset)
Data summary
Name dataset
Number of rows 36850
Number of columns 13
_______________________
Column type frequency:
character 3
numeric 10
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
date 0 1 8 10 0 1258 0
ticker 0 1 1 4 0 30 0
label 0 1 5 9 0 1258 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
open 0 1 94.52 52.68 18.17 56.83 84.09 118.30 417.14 ▇▅▁▁▁
high 0 1 95.25 53.14 18.41 57.40 84.67 119.14 421.84 ▇▅▁▁▁
low 0 1 93.76 52.19 18.11 56.19 83.54 117.39 417.11 ▇▅▁▁▁
close 0 1 94.53 52.68 18.18 56.77 84.09 118.32 421.55 ▇▅▁▁▁
volume 0 1 11196571.67 12461717.47 305358.00 3977137.50 7045060.00 13777850.50 618237630.00 ▇▁▁▁▁
unadjustedVolume 0 1 10967241.59 12152154.22 305358.00 3817578.50 6895243.00 13581565.00 618237630.00 ▇▁▁▁▁
change 0 1 0.05 1.50 -23.45 -0.43 0.04 0.56 65.29 ▁▇▁▁▁
changePercent 0 1 0.05 1.30 -14.34 -0.56 0.05 0.70 16.67 ▁▁▇▁▁
vwap 0 1 94.52 52.68 13.79 56.79 84.11 118.24 420.32 ▇▅▁▁▁
changeOverTime 0 1 0.38 0.50 -0.41 0.06 0.24 0.54 3.19 ▇▅▁▁▁

The variables description are as follows:

Variable Description
date Trading Date
open Price of the stock at market open
high Highest price reached in the trade day
low Lowest price reached in the trade day
close Price of the stock at market close
volume Number of shares traded
unadjustedVolume Volume for stocks, unadjusted by stock splits
change Change in closing price from prior trade day close
changePercent Percentage change in closing price from prior trade day close
vwap Volume weighted average price (VWAP) is the ratio of the value traded to total volume traded
label Trading Date
changeOverTime Percent change of each interval relative to first value. Useful for comparing multiple stocks.
ticker Abbreviation used to uniquely identify publicly traded shares

The companies included in the dataset are:

tickers %>% kableExtra::kable()%>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE, position = "left")
Ticker Company
AAPL Apple Inc
AXP American Express Company
BA Boeing Co
CAT Caterpillar Inc. 
CSCO Cisco Systems, Inc. 
CVX Chevron Corporation
DIS Walt Disney Co
DWDP DowDuPont
GS GlaxoSmithKline plc
HD The Home Depot, Inc. 
IBM International Business Machines Corporation
INTC Intel Corporation
JNJ Johnson & Johnson
JPM JPMorgan Chase & Co. 
KO The Coca-Cola Company
MCD McDonald’s Corporation
MMM 3M Company
MRK Merck & Co., Inc. 
MSFT Microsoft Corporation
NKE NIKE, Inc. 
PFE Pfizer Inc. 
PG The Procter & Gamble Company
TRV The Travelers Companies, Inc. 
UNH UnitedHealth Group Incorporated
UTX Raytheon Technologies Corp
V Visa Inc. 
VZ Verizon Communications Inc. 
WBA Walgreens Boots Alliance, Inc. 
WMT Walmart Inc. 
XOM Exxon Mobil Corporation

Acknowledgements

The original dataset can be found here at the (EOD data for all Dow Jones stocks)[https://www.kaggle.com/timoboz/stock-data-dow-jones].