class: title-slide .row[ .col-7[ .title[ # Stock Analysis ] .subtitle[ ## Stock Analysis ] .author[ ### Laxmikant Soni <br> [Web-Site](https://laxmikants.github.io) <br> [<i class="fab fa-github"></i>](https://github.com/laxmiaknts) [<i class="fab fa-twitter"></i>](https://twitter.com/laxmikantsoni09) ] .affiliation[ ] ] .col-5[ .logo[ <!-- --> ] Slides:<br> [laxmikants.github.io/datasets/slides](https://laxmikants.github.io/datasets/slides/Plantgrowth.html#1) Materials:<br> [github.com/laxmikants/datasets](https://github.com/laxmikants/datasets) ] ] --- class: inverse, center, middle # Getting the Dataset --- class: body # 1. Getting the dataset -- .pull-left[ [1] import `quandl` [2] `quandl`.ApiConfig.api_key = "YBM9uPgpnsaDYPkAn539" [3] `dataset` = `quandl`.get("NSE/INFY",start_date = "2000-01-01", end_date = "2020-01-16") [4] `dataset`.dropna() [5] `dataset` = `dataset`[['Open', 'High', 'Low', 'Close']] ] -- .pull-right[ | Date | Open | High | Low | Close | |------|------|--------|--------|--------| | 2018-12-31 | 660.00 | 662.0 | 655.80 | 658.95 | | 2019-01-01 | 660.95 | 666.3 | 654.15 | 665.05 | > Where : > 👉 Date: The date on which the stock opened > 👉 Open: The opening price of the stock > 👉 Close: The closing price of the stock > 👉 Low: The lowest price of the stock on that date ] |Quandl | |-----------------------| |Quandl is a marketplace for financial, economic and alternative data delivered in modern formats for today's analysts, including Python, Excel, Matlab, R| --- class: inverse, center, middle # Preparing the Dataset --- class: body # 2. Preparing the dataset -- .pull-left[ [1] `dataset`['O-C'] = `dataset`['Close'] - `dataset`['Open']) [2] `dataset`['3day MA'] = `dataset`['Close'].shift(1).rolling(window = 3).`mean()` [3] `dataset`['10day MA'] = `dataset`['Close'].shift(1).rolling(window = 10).`mean()` [4] `dataset`['30day MA'] = `dataset`['Close'].shift(1).rolling(window = 30).`mean()` [5] `dataset`['Std_dev']= `dataset`['Close'].rolling(5).`std()` [6] `dataset`['RSI'] = talib.`RSI`(`dataset`['Close'].values, timeperiod = 9) [7] `dataset`['Williams %R'] = talib.`WILLR`(dataset['High'].values,dataset['Low'].values, dataset['Close'].values, 7) ] -- .pull-right[ | Date | 3day MA | ... | RSI | Williams %R | |------------ |------------ |--------|--------|--------------| | 2018-12-31 | 652.600000 |... | 6.352854 | 34 | | 2019-01-01 | 657.566667 |... | 6.352854 | 51 | > 👉 High minus Low price > 👉 Close minus Open price > 👉 Three day moving average > 👉 Ten day moving average > 👉 30 day moving average > 👉 Standard deviation for a period of 5 days > 👉 Relative Strength Index: strength or weakness of a stock or market based on the closing prices of a recent trading period > 👉 Williams %R: is a technical analysis oscillator showing the current closing price in relation to the high and low of the past N days -- ] --- class: body # 2. Preparing the dataset.. cont.. -- .pull-left[ [1] dataset[‘Price_Rise’] = np.where(dataset[‘Close’].shift(-1) > dataset[‘Close’], 1,0) [2] dataset = dataset.dropna() [3] X = dataset.iloc[[:, 4:-1]] [4] y = dataset.iloc[[:, -1]] ] -- .pull-right[ > 👉 Define the output value as price rise, which is a binary variable storing 1 when the closing price of tomorrow is greater than the closing price of today. > 👉 Clean dataset by dropping null values > 👉 Dataframe X stores the input features > 👉 Dataframe y stores the value we want to predict i.e price rise ] --- class: inverse, center, middle # Splitting the Dataset --- class: body # 3. Splitting the dataset -- .pull-left[ [1] split = int(len(dataset)*0.8) [2] X_train, X_test, y_train, y_test = X[:split], X[split:], y[:split], y[split:] ] -- .pull-right[ > 👉 Train Set: Dataset used to train the model > 👉 Test Set: Dataset used to validate the model ] --- class: inverse, center, middle # Feature Scaling --- class: body # 4. Feature scaling -- .pull-left[ [1] from sklearn.preprocessing import StandardScaler [2] sc = StandardScaler() [3] X_train = sc.fit_transform(X_train) [4] X_test = sc.transform(X_test) ] -- .pull-right[ > 👉 makes the mean of all the`input features equal to zero > 👉 converts their variance to 1 > 👉 ensures that there is no bias while training the model due to the different scales of all input features ] --- class: inverse, center, middle # Building the Artificial Neural Network --- class: body # 5. Building the Artificial Neural Network -- .pull-left[ [1] from keras.models import Sequential [2] from keras.layers import Dense [3] from keras.layers import Dropout [4] classifier = Sequential() ] -- .pull-right[ > 👉 sequentially build the layers of the neural networks ] --- class: inverse, center, middle # Building the Artificial Neural Network --- class: body # 5. Building the Artificial Neural Network... -- .pull-left[ [1] classifier.add(Dense(units = 128, kernel_initializer = ‘uniform’, activation = ‘relu’, input_dim = X.shape[1])) [2] classifier.add(Dense(units = 128, kernel_initializer = ‘uniform’, activation = ‘relu’)) [3] classifier.add(Dense(units = 1, kernel_initializer = ‘uniform’, activation = ‘sigmoid’)) ] -- .pull-right[ > 👉 Units: This defines the number of nodes or neurons in that particular layer. > 👉 Kernel_initializer: This defines the starting values for the weights of the different neurons in the hidden layer > 👉 Activation: This is the activation function for the neurons in the particular hidden layer > 👉 Input_dim: This defines the number of inputs to the hidden layer, we have defined this value to be equal to the number of columns of our input feature dataframe ] --- class: inverse, center, middle # Building the Artificial Neural Network --- class: body # 5. Building the Artificial Neural Network... -- .pull-left[ [1] classifier.compile(optimizer = ‘adam’, loss = ‘mean_squared_error’, metrics = [‘accuracy’]) [2] classifier.fit(X_train, y_train, batch_size = 10, epochs = 100) ] -- .pull-right[ > 👉 Optimizer: The optimizer is chosen to be ‘adam’, which is an extension of the stochastic gradient descent. > 👉 Loss: This defines the loss to be optimized during the training period. We define this loss to be the mean squared error. > 👉 Metrics: This defines the list of metrics to be evaluated by the model during the testing and training phase. We have chosen accuracy as our evaluation metric > 👉 fit the neural network that we have created to our train datasets ] --- class: inverse, center, middle # Predicting the Movement of a Stock --- class: body # 6. Predicting the Movement of a Stock... -- .pull-left[ [1] y_pred = classifier.predict(X_test) [2] y_pred = (y_pred > 0.5) [3] dataset[‘y_pred’] = np.NaN [4] dataset.iloc[(len(dataset) - len(y_pred)):,-1:] = y_pred [5] trade_dataset = dataset.dropna() ] -- .pull-right[ > 👉 use the predict() method for making the prediction > 👉 store the result in a variable named y_pred > 👉 convert y_pred to store binary values by storing the condition y_pred > 0.5 > 👉 create a new column in the dataframe dataset with the column header ‘y_pred’ and store NaN values in the column > 👉 store the values of y_pred into this new column, starting from the rows of the test dataset ] --- class: inverse, center, middle # Computing Strategy Returns --- class: body # 6. Computing Strategy Returns -- .pull-left[ [1] trade_dataset[‘Tomorrows Returns’] = 0. [2] trade_dataset[‘Tomorrows Returns’] = np.log(trade_dataset[‘Close’]/trade_dataset[‘Close’].shift(1)) [3] trade_dataset[‘Tomorrows Returns’] = trade_dataset[‘Tomorrows Returns’].shift(-1) ] -- .pull-right[ > 👉 long position if predicted value of y is true and short position when the predicted signal is false > 👉 Tomorrows Returns is the closing price of today divided by the closing price of yesterday > 👉 Shift these values upwards by one element so that tomorrow’s returns are stored against the prices of today ] --- class: inverse, center, middle # Computing Strategy Returns --- class: body # 6. Computing Strategy Returns... -- .pull-left[ [1] trade_dataset[‘Strategy Returns’] = 0. [2] trade_dataset[‘Strategy Returns’] = np.where(trade_dataset[‘y_pred’] == True, trade_dataset[‘Tomorrows Returns’], - trade_dataset[‘Tomorrows Returns’]) ] -- .pull-right[ > 👉 compute the Strategy Returns > 👉 positive if the value in the ‘y_pred’ column stores True (a long position), else we would store negative of the value in the column ‘Tomorrows Returns’ (a short position); into the ‘Strategy Returns’ column. ] --- class: inverse, center, middle # Computing Strategy Returns --- class: body # 6. Computing Strategy Returns... -- .pull-left[ [1] trade_dataset[‘Cumulative Market Returns’] = np.cumsum(trade_dataset[‘Tomorrows Returns’]) [2] trade_dataset[‘Cumulative Strategy Returns’] = np.cumsum(trade_dataset[‘Strategy Returns’]) ] -- .pull-right[ > 👉 compute the cumulative returns for both the market and the strategy > 👉 use the cumulative sum to plot the graph of market and strategy returns ] --- class: inverse, center, middle # Plotting the graph of the results --- class: body # 7. Plotting the graph of the results -- .pull-left[ [1] import matplotlib.pyplot as plt [2] plt.figure(figsize=(10,5)) [3] plt.plot(trade_dataset[‘Cumulative Market Returns’], color=’r’, label=’Market Returns’) [4] plt.plot(trade_dataset[‘Cumulative Strategy Returns’], color=’g’, label=’Strategy Returns’) [5] plt.legend() [6] plt.show() ] -- .pull-right[ > 👉 plot the market returns and our strategy returns to visualize how our strategy is performing against the market > 👉  ] --- class: inverse, center, middle # Thanks ```