Deeper Analytics And Price Prediction

Requirements

In this assignment we are going to analyze time-series data for equities in our portfolio, building off of assignment #2. The app relies on heavy use of analytics, visual and display charts. Try to make the data as real-time as possible.

  1. At minimum, compute the following (by adding a column)
  • Daily returns (price today / price yesterday - 1) (Hint: Use Numpy for calculating this)

  • Daily price difference (price today - price yesterday)

  1. Allow the user to see the (at minimum) following time-series data for any of the equities in the portfolio. Make the data as real-time as possible.

Use http://www.nasdaq.com/g00/quotes/historical-quotes.aspx for grabbing historical data

  • Price
  • Global mean (a flat line)
  • 5 day moving average
  • 20 day moving average
  • 5 day moving standard deviation
  • 20 day moving standard deviation
  • Bollinger bands 2 standard deviation
  • Daily returns %
  • Daily return mean %
  • Price range for the day (bars)
  • Daily price difference (high - low)
  • Distribution of prices of a stock (Gaussian distribution, aka bell curve)
  1. Correlation - Find an additional time-series data source that can be used to correlate with stock prices. An example might be weather data:

Using data from WeatherUnderground, we are going to explore the correlation between any equity in our portfolio and weather in New York City. You may wish to load this data up on start up to enable the charting of the equities graphs to be as fast as possible.

https://www.wunderground.com/history/

Hint - Use DataFrames functions like join() to help you pull the data together into one DataFrame. Remember to clean out non-trading days from the weather data.

The correlation (-1 to 1) can be displayed an a column on the P/L table

  1. Machine Learning

Employ a machine learning algorithm of your choice to predict stock prices for any of the equities in the portfolio. The prediction does not have to be correct (but if it is consistently, give me a call and let’s talk :) ) but should demonstrate how you can apply the algorithm. For example, you might add a column to the P/L table labeled “Predicted price” where you state what you think the next price is.

In your README file please make sure you clearly state what algorithm you are use, why you chose that algorithm and what are some of the pros / cons you’ve learned of the algorithm while employing it in this trading system.

Data Source

Software and Database

Application is developed using

  • Ubuntu 16.04
  • Python 3.6
  • Flask 0.12.2
  • Jinja 2
  • Matplotlib
  • BeautifulSoup
  • SQLite 3
  • SQLAlchemy
  • Bootstrap
  • Docker

Application

The application is developed using Python data structures. Most of the web scraping is done using packages requests and BeautifulSoup (bs4). For persistence, data is stored in the Sqlite database. It is accessed using object-relational mapper (ORM) SQLAlchemy. Web pages use the Bootstrap framework. Tables are exposed as classes.

Graphs on the website are developed using Matplotlib package.

Application source code uploaded to GitHub https://github.com/akulapa/data602-assignment3. Currently, it is set up as a private folder (course requirement).

Docker image is created with empty Sqlite database traders.db. Image is uploaded to Docker Cloud and can be downloaded using the following command

docker pull akulapa/ubuntu1604:data602-assignment3

After downloading, to create a Docker container execute the following

docker run -v /etc/localtime:/etc/localtime:ro -p 8080:5000 akulapa/ubuntu1604:data602-assignment3

-v option sets local time inside docker container using time from the host machine. The application runs on the port 5000 inside the container and is mapped to 8080 while accessing the website.

http://localhost:8080

Once Docker container is started and on first access of website, complete symbol list is downloaded from NASDAQ website. The downloaded file is saved to the folder. After saving the file it is uploaded into the Sqlite database, tickerData table. Additionally, Symbols can be refreshed anytime by clicking on top right button Ticker List Update.

/usr/src/data602-assignment3/app/instance/temp/tickerlist.csv

Home Page

Home page for the application is displayed as follows. It gives information about current portfolio.

Ticker Information

All the metrics about the stock are found on single page

Available funds is displayed on each page.

The symbol can be keyed in manually or by using the searchable drop down. Enter Key or click on Search button will get traders to the details page. Symbol validation is done against the list downloaded from NASDAQ website. If Symbol does not exist in the list following message is displayed.

Stock information,

Graphs

The graph is scatter plot displaying the amount of shares sold at a certain price in last 100 trades.

100 Day menu

Menu has various options,

Closing Price

Candlestick

Daily Returns

Avg. Closing Price & Volume

5 Day SMA & Volatility

5 Day EMA & Volatility

20 Day SMA & Volatility

20 Day EMA & Volatility

Distribution menu has following options,

100 Day Closing Price

Graph shows Gaussian distribution, functions used to generate the graph is hist and plot from matplotlib package and linspace function from numpy package, to smooth the curve. Graph also give characteristics of closing price for last 100 days, mean(\(\mu\)) and standard deviation(\(\sigma\)). They are calculated using norm fuction from scipy package.

100 Day Volume

Above graph is right skewed, reason is on daily basis stock volume traded is between 100000 and 1000000. There were few instances volume was above 1000000, close to 8000000.

Bollinger Bands menu has following options,

The graph contains four different types of data. Blue line displays moving average(SMA or EMA) for a given period. I used 5-Day and 20-Day period, for 100 days closing price. The green line represents upper band, and the red line represents lower band. Both lines are 2-standard deviations away from the blue line. Candlestick graph is used to display daily OHLC data. SMA(5,2) represents, mean is calculated using SMA, using 5-day moving averages and upper band and lower band are 2-standard deviations away from mean.

Tabular data

Menu 100 Day has following options

OHLC option displays daily closing data. Additional column Intraday Price Diff., calculated as High - Low is added to the table.

Daily Returns

Above table shows

  • Daily Return%, derived using \(\bigg(\frac{Close}{Previous~ Days~ Close} - 1\bigg)\times100\).
  • Daily Price Diff., \(Close - Previous~ Days~ Close\)
  • Intraday Price Diff., is calculated using days \(High - Low\)

5 Day SMA and Volatility, 5 Day EMA and Volatility

This table displays Closing Price movement with respect to 5-Day SMA. Volatility explains actual movement.

Menu option Weather shows todays weather conditions in New York City. Data is used to verify, if there is any correlation between closing price and weather conditions.

Correlation

Pearson method is used to estimate correlation. Calculation is derived using the corr function from the pandas package. Data displayed shows there is no strong correlation between weather conditions and daily trading. Following heat map shows \(-1\), red color as inversely related and \(1\), green color highly correlated.

Prediction

To predict closing price, I have used linear, poly, rbf, sigmoid method from the SVM function of the sklearn package. I also used the LinearRegression function from the sklearn package.

The method I have used under given conditions of closing price can be any price. Example, 11/14/2017 closing price, \(\$1.47\) could have been closing price of 11/21/2017, \(\$1.23\). I used forward \(5-trading~ days\) price. In this case, estimated closing price shifts forward \(5-trading~ days\) leaving last \(5-trading~ days\) with no value. Estimation of the closing price for missing \(5-Days\) gives future \(5-Days\) price. Using those \(5-days\) price, I calculated mean and standard deviation.

Two years worth of data is used to predict the price. Ratio 80:20 is used to separate training and testing datasets. Last 5-days are used to predict the stock price.

My prediction is price should be close to mean, if not it should be within \(\pm2\) standard deviations.

Rest of the screen displays additional metrics.

  • Average 5 and 20 day volume
  • Avg. Daily Returns %
  • Avg. Daily Price Difference %
  • Three major stock market indices (S&P 500, NASDAQ, NYSE)

Based on analytical information provided on the page, the trader can either buy or sell a stock. To complete a transaction trader needs to select the type of transaction and quantity, and then click on gavel button. Validation is done on transaction selected, and the quantity entered

    - cannot be negative
    - has to be numeric
    - cannot be greater than current balance while selling
    - cannot exceed current available funds can afford, (quantity * current price <= available funds)

If submitted data fails during validation a message is shown on the screen as displayed below.

Profit & Loss

P&L page displays the current position details along side correlation with weather conditions and predictive price.

Blotter

Blotter page contains all the transactions.