BDA Project

James Whelan

March 20, 2017

Predicting Crude Oil Futures Weekly Trends

Introduction:

The stock market is an exchange market for shares of publicly traded companies. There are many types of publicly traded stocks on the market, including options, futures, and stocks. Oil is a basic resource for the entire economy since it runs all machines necessary to produce goods and do services. Crude oil futures, West Texas Intermediate, are futures located on the NYSE, New York Stock Exchange, that commonly represent the price of an oil barrel in the United States. The future, WTI, is highly volatile; OPEC regulates crude oil prices by slowing production. Every week a report is released with metrics on the previous week’s progress, including inventory and production in the United States. Based on the inventories and production in the United States many investors trade futures on Wednesday as the data is released. With these factors and more attributes saved by the stock market, models can be made to predict the trend of the crude oil futures daily.

Statement of Problem:

Money, risk, gambling are all terms associated with the stock market. If you know the probability of success then you can make an analytic response. Currently professionals from all around the world are trading on the stock market. Many are able to make a rational guess, but still are unaware of the exact factors included in the result. Possible investors in a solution to predicting oil futures include day-to-day traders, big banks, investment bankers, brokerage firms, Oil producers and OPEC. All of these agencies have billions of dollars tied to the price of oil, whether it be the production of oil or the trading of the stock. My motivation for this project is that for the last four months I have been trading oil futures and have made 12% profit on my investment. Recently, I lost 5% of my profit because I was unable to predict a big swing before and after it had occurred. I hope to learn more from these factors so that I do not sustain as many loses and can increase my profit.Therefore I need a risk management model that predicts the direction of the stock over the next day based on current data, while including a confidence metric. A research team from the University of Ballarat had previously worked on a model similar to my proposed model. The Ballarat team used a three layers feed forward network. Using an ANN (Artificial Neural Network) model. First they determined how they planned to measure the performance of the system; a risk management tool. From there they determined the system only had to determine the direction of the price instead of the magnitude of the change. The metrics they used were RMSE, R2, MSE, MAE, and SSE. The team also used an information coefficient, which “provides an indication of the prediction compared to the trivial predictor based on the random walk” ( [3]). For their base data, West Texas Intermediate (WTI) futures were selected; from daily frequency to the 1, 2, 3, 4 months maturity. The data was preprocessed by changing the future price to force and momentum (equations 3.3, 3.4). To train and test the team used a 90/10 split. Previous work related to this problem brought forward more features to be used; on the most basic scale crude oil price is set by supply and demand. Demand varies by season and supply is regulated by OPEC. Difficulties of modeling brought up by other researchers of oil futures are detailed below: . Stochastic and nonlinear data - A logarithmic transformation is the suggested solution . Chaotic, noisy, large dataset - A moving 3 day average mitigates some of the noise . Some features unavailable on a daily frequency - Use the data for the entire week . Data is somewhat old and may be irrelevant compared to newer data - Use two separate datasets; full data, and the previous 5 year’s data. In my opinion the most important difference between my work and previous work is the confidence metric given in each prediction. The confidence metric will be useful for real world application. For example, If the model predicts upwards trend, but gives a confidence of 20%, then it is likely that I would not trade that day. But in another example if the model predicts upwards trend with 60% confidence, it is more likely that I would trade that day. This will be useful to any trader of stocks as they know how confident the model is of the direction.

Proposed Project & Purpose:

My project is to better understand crude oil futures and the highest weighted features. The goal is to determine the direction of oil futures based on today’s data. The model I propose is a regression model similar to the Ballarat team’s ANN, predicting daily change instead of daily and monthly, since larger trends of the data are much more spontaneous currently than in the past. I am proposing a risk management tool as well, so that the model only has to determine the direction of price instead of magnitude of change. Since the magnitude of change is effected much more by speculation. Also I plan on using the same metrics as the Ballarat team; RMSE, R2, MSE, MAE, and SSE.
The Ballarat Team and I differ on the later stages of modeling. I plan on using the more data than their team, WTI on a daily basis, as well as including features mentioned in the section below. I plan on preprocessing the data by making it normal and attempting a logarithmic transformation on the daily WTI data. Based on the size of the data an 80/20 split may be better for the data, as it is highly volatile data and over-training is a possibility. Since my methodology is similar to the Ballarat team’s I believe it will perform fairly well, as we differ opinions with data collection and processing and training. My proposed project is a risk management tool to predict the direction of crude oil futures on a daily basis using a time series regression model. This particular project is relevant because there is value in predicting the direction of any future and previous research has found adequate solutions. I plan on using time series regression to model the project since the data is based on time. Other models make the assumption that all data points are independent, which cannot be true since the direction of a future is based on its previous price. I include more data than the Ballarat team so that I can explore for any variables that haven’t been considered as features before. With less features I may be able to get a slimmer, more lean model, yet it may not account for all possible attributes that have an effect on oil futures. I am using WTI as my prime feature since WTI is the basic measurement industry-wide for United States crude oil futures. The alternatives to WTI would be other foreign countries’ oil futures, as well as the New York oil price, which is less commonly used for the US industry. My approach to this project is sound as I modeled my entire project around the Ballarat team’s project, which was fairly successful. Therefore, my project goal is to reach the same success in predictions as Ballarat. (78% on day, 65% two day) The result of this project will be functional value to me for increasing monetary gains and intellectual to the rest of the students in class. With a working model I can invest in the futures market with more certainty and success than before. The rest of the students in class will have a new understanding of the stock amrket and may be surprised to learn that the stock market can be modeled. The field will not be changed, as if this is successful, then I would n’t share it with my competitors for fear that if everyone uses the model, then the market would swing more amplitude than before. Therefore it would likely only affect the persons in this class.

Data and Description:

The data includes attributes: Y-Variable - The direction of crude oil futures over the next day, from close to close as a moving average of three days Crude Oil Price - The spot price of crude oil stock symbol WTI (West Texas Intermediate); The price of a crude oil barrel in the United States. Crude Oil Products Gas Price - The price per barrel for gasoline in the United States (A product of Crude Oil) Heating Oil - The price of Heating oil for the US US Economy Averages Dow Price - The Dow Jones Transportation Average (30 large publicly owned companies) Nasdaq Price - The National Association of Securities Dealers Automated Quotations Exchange Rates of the USD Euro - Importer of Oil China - Importer of Oil Canada - Exporter of Oil Saudi Arabia - Exporter of Oil Russia - Exporter of Oil Weekly Report Oil Production in US Oil Imports in US Yvalue on trend is 1 or -1. 1 represents an upwards trend and -1 represents a downward trend. The attributes are mainly scalar and somewhat arbitrary as they as stocks based on what the market determines the price should be. The rest of the data is in numeric form but is then scaled to mean 0 and standard deviation of 1. The weekly report includes stock information, such as volume and WIP and other types of oil. It also includes weekly imports and exports and 4-week averages. ##Plan of Activities with Deadlines: Current Deadlines for the project are below: 3/20 - Updated Project Proposal 3/22 - Clean Data Controlling for missing values Creating appropriate headers and labels for the data Delete any undefined data points 3/31 - Preprocessing Data Organizing the data in an excel document Normalizing the data Creating the Y-value Office Hours 4/15- Initial Report Updating this report for my model Adding missing elements currently not in this report Office Hours 4/22- Initial Presentation Formatting a presentation Creating a general presentation Adding specifics that will likely be needed for class Office Hours 5/1 - Finalized Report and Presentation Update report based on proposed changes Update presentation based on proposed changes Data has been taken from the US department of energy for Crude oil and weekly reports Yahoo Finance has historical data for exchange rates and US economy averages.

Evaluation:

Deliverables

A working and documented program Sample code Example of code being put into use The evaluation criteria includes the following: Testing greater than 75% correctness for one day prediction 3:1 ratio of correct predictions High success rate of prediction, larger profits from trading Ballarat team was able to predict with 78% correct R-squared greater than .5 The features explain more than 50% of the variations in the data Much of the variance is due to speculation of the market At least 4 factors found to be significant These factors can be monitored during the day, which will help predicting movement

Why use these evaluation criteria?

Testing Greater than 75% Correct for One day prediction Predicting more than 50% correct wouldn’t be a great feat, since predicting up every day would result in greater than a 52% correctness value. With 75% correct, the model cannot predict upwards each time, it will need to use the data to determine the swing. Also the effect will be 3 gains and 1 loss for day traders, resulting in 2 gains every four days, assuming all magnitudes are equal. R-square greater than .5 This again shows that the features are explaining the model. The less arbitrary values, the better. If the data didn’t explain the model then predict up everyday and make small profits. At least 4 features found to be significant These four features can be scrutinized throughout the day, so that if changes occur the trader is aware that the result of the model may have changed as well. I would classify these features as “Jaws” features; if they aren’t watched then your money will be eaten by a shark. Combined Together these three measures will determine whether this model can be used in the real world for day traders. The model will be correct more often than incorrect; better than current predictions. The features will explain the model and therefore it can be understood by day traders. With four significant features, day traders can keep an eye on changes during the day of these features and can leave the market if any large swings occur.

Technical Content:

(Once the model has been finalized)