In this section we’re collecting the data for the target variable (the variable we want to forecast) and for potentially useful explanatory variables. Since the goal is to forecast, we will have to have least 250 observations at our disposal. Based on the frequency of observation time horizon will be different.
That means we could use daily, weekly, monthly (we’ll need at least 21 years) or quarterly data (we’ll need at least 63 years).
We settled on using daily data from period of 01-01-2019 to 06-10-2021, with 616 observations in total, nearly two and a half years of trading data.
For the analysis, our target variable is Microsoft (MSFT) stock. Some of the explanatory variables (used for regression modeling and forecasting and for initial data analysis) are:
For data collection process we’re using quantmod library.
## MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
## 2021-06-03 245.22 246.34 243.00 245.71 25307700 245.71
## 2021-06-04 247.76 251.65 247.51 250.79 25281100 250.79
## 2021-06-07 249.98 254.09 249.81 253.81 23079200 253.81
## 2021-06-08 255.16 256.01 252.51 252.57 22455000 252.57
## 2021-06-09 253.81 255.53 253.21 253.59 17937600 253.59
## 2021-06-10 254.29 257.46 253.67 257.24 24563600 257.24
## GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
## 2021-06-03 4191.43 4204.39 4167.93 4192.85 4579450000 4192.85
## 2021-06-04 4206.05 4233.45 4206.05 4229.89 3487070000 4229.89
## 2021-06-07 4229.34 4232.34 4215.66 4226.52 3835570000 4226.52
## 2021-06-08 4233.81 4236.74 4208.41 4227.26 3943870000 4227.26
## 2021-06-09 4232.99 4237.09 4218.74 4219.55 3902870000 4219.55
## 2021-06-10 4228.56 4249.74 4220.34 4239.18 3502480000 4239.18
## IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
## 2021-06-03 13655.75 13684.13 13548.93 13614.51 5367460000 13614.51
## 2021-06-04 13697.25 13826.82 13692.01 13814.49 4341800000 13814.49
## 2021-06-07 13802.82 13889.11 13784.89 13881.72 4602940000 13881.72
## 2021-06-08 13946.32 13981.72 13831.98 13924.91 5894140000 13924.91
## 2021-06-09 13980.23 14003.50 13906.45 13911.75 5607720000 13911.75
## 2021-06-10 13933.88 14031.19 13904.40 14020.33 4889500000 14020.33
## AAPL.Open AAPL.High AAPL.Low AAPL.Close AAPL.Volume AAPL.Adjusted
## 2021-06-03 124.68 124.85 123.13 123.54 76229200 123.54
## 2021-06-04 124.07 126.16 123.85 125.89 75169300 125.89
## 2021-06-07 126.17 126.32 124.83 125.90 71057600 125.90
## 2021-06-08 126.60 128.46 126.21 126.74 74403800 126.74
## 2021-06-09 127.21 127.75 126.52 127.13 56877900 127.13
## 2021-06-10 127.02 128.19 125.94 126.11 71186400 126.11
## GOOG.Open GOOG.High GOOG.Low GOOG.Close GOOG.Volume GOOG.Adjusted
## 2021-06-03 2395.02 2409.745 2382.830 2404.61 917300 2404.61
## 2021-06-04 2422.52 2453.859 2417.770 2451.76 1297400 2451.76
## 2021-06-07 2451.32 2468.000 2441.073 2466.09 1192500 2466.09
## 2021-06-08 2479.90 2494.495 2468.240 2482.85 1253000 2482.85
## 2021-06-09 2499.50 2505.000 2487.330 2491.40 1006300 2491.40
## 2021-06-10 2494.01 2523.260 2494.000 2521.60 1561700 2521.60
## IBM.Open IBM.High IBM.Low IBM.Close IBM.Volume IBM.Adjusted
## 2021-06-03 144.91 145.88 144.04 145.55 4130600 145.55
## 2021-06-04 146.00 147.55 145.76 147.42 3117900 147.42
## 2021-06-07 147.55 148.74 147.17 148.02 3462700 148.02
## 2021-06-08 148.12 150.20 148.12 149.07 5080100 149.07
## 2021-06-09 149.03 151.07 148.82 150.67 5303300 150.67
## 2021-06-10 151.47 152.84 149.76 150.54 4758500 150.54
## MMM.Open MMM.High MMM.Low MMM.Close MMM.Volume MMM.Adjusted
## 2021-06-03 202.50 204.67 201.81 203.67 1902100 203.67
## 2021-06-04 204.11 206.12 203.77 206.05 1868600 206.05
## 2021-06-07 206.35 206.81 203.31 203.73 1531100 203.73
## 2021-06-08 202.00 204.04 201.14 203.59 1701400 203.59
## 2021-06-09 203.56 203.57 201.94 202.74 1707600 202.74
## 2021-06-10 204.15 204.97 202.78 203.13 1953100 203.13
Statistics and Financial Data Analysis
A work by: Nikola Krivacevic, Aleksandar Milinkovic and Milos Milunovic
Entire forecasting project on github
(https://github.com/mcf-long-short/statistics-stocks-forecasting)