For the Final Project, we intend to analyze and predict stocks’ outcome based on the historical end of trading day data for 30 companies on the Dow Jones Industrial Average. For each company, there are five years of historical trading data up to February 2019. From a methodology standpoint, we will explore modeling techniques as part of this course, such as linear, binomial, and panel modeling, and modeling techniques outside of the course scope, such as time series forecasting using autoregressive modeling. The goal will be to predict the outcome of stock prices in either absolute returns or positive/negative closings depending on the selected model.
A quick summary of the data set hosted on Kaggle, shows that for the given 30 publicly traded companies, there are approximately 36,000 observations. The variables accompanying this data are as the following:
dataset <- read_csv('https://raw.githubusercontent.com/salma71/Data_621/master/Project_Proposal/stocks_combined.csv')
tickers <- read_csv('https://raw.githubusercontent.com/salma71/Data_621/master/Project_Proposal/tickers.csv')
skim(dataset)
Name | dataset |
Number of rows | 36850 |
Number of columns | 13 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 10 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
date | 0 | 1 | 8 | 10 | 0 | 1258 | 0 |
ticker | 0 | 1 | 1 | 4 | 0 | 30 | 0 |
label | 0 | 1 | 5 | 9 | 0 | 1258 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
open | 0 | 1 | 94.52 | 52.68 | 18.17 | 56.83 | 84.09 | 118.30 | 417.14 | ▇▅▁▁▁ |
high | 0 | 1 | 95.25 | 53.14 | 18.41 | 57.40 | 84.67 | 119.14 | 421.84 | ▇▅▁▁▁ |
low | 0 | 1 | 93.76 | 52.19 | 18.11 | 56.19 | 83.54 | 117.39 | 417.11 | ▇▅▁▁▁ |
close | 0 | 1 | 94.53 | 52.68 | 18.18 | 56.77 | 84.09 | 118.32 | 421.55 | ▇▅▁▁▁ |
volume | 0 | 1 | 11196571.67 | 12461717.47 | 305358.00 | 3977137.50 | 7045060.00 | 13777850.50 | 618237630.00 | ▇▁▁▁▁ |
unadjustedVolume | 0 | 1 | 10967241.59 | 12152154.22 | 305358.00 | 3817578.50 | 6895243.00 | 13581565.00 | 618237630.00 | ▇▁▁▁▁ |
change | 0 | 1 | 0.05 | 1.50 | -23.45 | -0.43 | 0.04 | 0.56 | 65.29 | ▁▇▁▁▁ |
changePercent | 0 | 1 | 0.05 | 1.30 | -14.34 | -0.56 | 0.05 | 0.70 | 16.67 | ▁▁▇▁▁ |
vwap | 0 | 1 | 94.52 | 52.68 | 13.79 | 56.79 | 84.11 | 118.24 | 420.32 | ▇▅▁▁▁ |
changeOverTime | 0 | 1 | 0.38 | 0.50 | -0.41 | 0.06 | 0.24 | 0.54 | 3.19 | ▇▅▁▁▁ |
The variables description are as follows:
Variable | Description |
---|---|
date | Trading Date |
open | Price of the stock at market open |
high | Highest price reached in the trade day |
low | Lowest price reached in the trade day |
close | Price of the stock at market close |
volume | Number of shares traded |
unadjustedVolume | Volume for stocks, unadjusted by stock splits |
change | Change in closing price from prior trade day close |
changePercent | Percentage change in closing price from prior trade day close |
vwap | Volume weighted average price (VWAP) is the ratio of the value traded to total volume traded |
label | Trading Date |
changeOverTime | Percent change of each interval relative to first value. Useful for comparing multiple stocks. |
ticker | Abbreviation used to uniquely identify publicly traded shares |
The companies included in the dataset are:
tickers %>% kableExtra::kable()%>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE, position = "left")
Ticker | Company |
---|---|
AAPL | Apple Inc |
AXP | American Express Company |
BA | Boeing Co |
CAT | Caterpillar Inc. |
CSCO | Cisco Systems, Inc. |
CVX | Chevron Corporation |
DIS | Walt Disney Co |
DWDP | DowDuPont |
GS | GlaxoSmithKline plc |
HD | The Home Depot, Inc. |
IBM | International Business Machines Corporation |
INTC | Intel Corporation |
JNJ | Johnson & Johnson |
JPM | JPMorgan Chase & Co. |
KO | The Coca-Cola Company |
MCD | McDonald’s Corporation |
MMM | 3M Company |
MRK | Merck & Co., Inc. |
MSFT | Microsoft Corporation |
NKE | NIKE, Inc. |
PFE | Pfizer Inc. |
PG | The Procter & Gamble Company |
TRV | The Travelers Companies, Inc. |
UNH | UnitedHealth Group Incorporated |
UTX | Raytheon Technologies Corp |
V | Visa Inc. |
VZ | Verizon Communications Inc. |
WBA | Walgreens Boots Alliance, Inc. |
WMT | Walmart Inc. |
XOM | Exxon Mobil Corporation |
The original dataset can be found here at the (EOD data for all Dow Jones stocks)[https://www.kaggle.com/timoboz/stock-data-dow-jones].