Stock Market Analysis Using Python Model of SLTM¶

Re-analyze From Kagkle: https://www.kaggle.com/code/faressayah/stock-market-analysis-prediction-using-lstm

1.) What was the change in price of the stock over time? 2.) What was the daily return of the stock on average? 3.) What was the moving average of the various stocks? 4.) What was the correlation between different stocks'? 5.) How much value do we put at risk by investing in a particular stock? 6.) How can we attempt to predict future stock behavior? (Predicting the closing price stock price of APPLE inc using LSTM)

What was the change in price of the stock overtime?¶¶

In [ ]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
plt.style.use("fivethirtyeight")
%matplotlib inline

# For reading stock data from yahoo
from pandas_datareader.data import DataReader
import yfinance as yf
from pandas_datareader import data as pdr

yf.pdr_override()

# For time stamps
from datetime import datetime

# The tech stocks we'll use for this analysis
tech_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']

# Set up End and Start times for data grab

end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)

for stock in tech_list:
    globals()[stock] = yf.download(stock, start, end)
    

company_list = [AAPL, GOOG, MSFT, AMZN] # type: ignore
company_name = ["APPLE", "GOOGLE", "MICROSOFT", "AMAZON"]

for company, com_name in zip(company_list, company_name):
    company["company_name"] = com_name
    
df = pd.concat(company_list, axis=0)
df.tail(10)
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
Out[ ]:
Open High Low Close Adj Close Volume company_name
Date
2024-04-15 187.429993 188.690002 183.000000 183.619995 183.619995 48052400 AMAZON
2024-04-16 183.270004 184.830002 182.259995 183.320007 183.320007 32891300 AMAZON
2024-04-17 184.309998 184.570007 179.820007 181.279999 181.279999 31359700 AMAZON
2024-04-18 181.470001 182.389999 178.649994 179.220001 179.220001 30723800 AMAZON
2024-04-19 178.740005 179.000000 173.440002 174.630005 174.630005 55950000 AMAZON
2024-04-22 176.940002 178.869995 174.559998 177.229996 177.229996 37924900 AMAZON
2024-04-23 178.080002 179.929993 175.979996 179.539993 179.539993 37046500 AMAZON
2024-04-24 179.940002 180.320007 176.179993 176.589996 176.589996 34185100 AMAZON
2024-04-25 169.679993 173.919998 166.320007 173.669998 173.669998 49249400 AMAZON
2024-04-26 177.800003 180.820007 176.130005 179.619995 179.619995 42033000 AMAZON
In [ ]:
print(df.head())
print(df.info())
print(df["company_name"].unique())
                  Open        High         Low       Close   Adj Close  \
Date                                                                     
2023-04-28  168.490005  169.850006  167.880005  169.679993  168.779099   
2023-05-01  169.279999  170.449997  168.639999  169.589996  168.689575   
2023-05-02  170.089996  170.350006  167.539993  168.539993  167.645157   
2023-05-03  169.500000  170.919998  167.160004  167.449997  166.560944   
2023-05-04  164.889999  167.039993  164.309998  165.789993  164.909760   

              Volume company_name  
Date                               
2023-04-28  55209200        APPLE  
2023-05-01  52472900        APPLE  
2023-05-02  48425700        APPLE  
2023-05-03  65136000        APPLE  
2023-05-04  81235400        APPLE  
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1004 entries, 2023-04-28 to 2024-04-26
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          1004 non-null   float64
 1   High          1004 non-null   float64
 2   Low           1004 non-null   float64
 3   Close         1004 non-null   float64
 4   Adj Close     1004 non-null   float64
 5   Volume        1004 non-null   int64  
 6   company_name  1004 non-null   object 
dtypes: float64(5), int64(1), object(1)
memory usage: 62.8+ KB
None
['APPLE' 'GOOGLE' 'MICROSOFT' 'AMAZON']

Information About the APPLE Data

In [ ]:
# Summary Stats
AAPL.describe()
Out[ ]:
Open High Low Close Adj Close Volume
count 251.000000 251.000000 251.000000 251.000000 251.000000 2.510000e+02
mean 181.169682 182.628924 179.809004 181.282909 180.891567 5.781501e+07
std 8.735188 8.594316 8.702606 8.674535 8.639346 1.765801e+07
min 164.889999 166.399994 164.080002 165.000000 164.909760 2.404830e+07
25% 173.240005 174.905006 172.050003 173.690002 173.265129 4.678545e+07
50% 180.669998 182.229996 178.550003 180.710007 180.238220 5.366560e+07
75% 189.294998 189.990005 187.695000 189.334999 188.909805 6.401530e+07
max 198.020004 199.619995 197.000000 198.110001 197.857529 1.366826e+08
In [ ]:
AAPL.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 251 entries, 2023-04-28 to 2024-04-26
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Open          251 non-null    float64
 1   High          251 non-null    float64
 2   Low           251 non-null    float64
 3   Close         251 non-null    float64
 4   Adj Close     251 non-null    float64
 5   Volume        251 non-null    int64  
 6   company_name  251 non-null    object 
dtypes: float64(5), int64(1), object(1)
memory usage: 15.7+ KB

Closing prices

In [ ]:
# Let's see a historical view of the closing price
plt.figure(figsize=(15, 10))
plt.subplots_adjust(top=1.25, bottom=1.2)

for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    company['Adj Close'].plot()
    plt.ylabel('Adj Close')
    plt.xlabel(None)
    plt.title(f"Closing Price of {tech_list[i - 1]}")
    
plt.tight_layout()
No description has been provided for this image

Volumns of sales

In [ ]:
# Now let's plot the total volume of stock being traded each day
plt.figure(figsize=(15, 10))
plt.subplots_adjust(top=1.25, bottom=1.2)

for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    company['Volume'].plot()
    plt.ylabel('Volume')
    plt.xlabel(None)
    plt.title(f"Sales Volume for {tech_list[i - 1]}")
    
plt.tight_layout()
No description has been provided for this image

What was the moving average of the various stocks?¶¶

In [ ]:
ma_day = [10, 20, 50]

for ma in ma_day:
    for company in company_list:
        column_name = f"MA for {ma} days"
        company[column_name] = company['Adj Close'].rolling(ma).mean()
        

fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(10)
fig.set_figwidth(15)

AAPL[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,0])
axes[0,0].set_title('APPLE')

GOOG[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,1])
axes[0,1].set_title('GOOGLE')

MSFT[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,0])
axes[1,0].set_title('MICROSOFT')

AMZN[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,1])
axes[1,1].set_title('AMAZON')

fig.tight_layout()
No description has been provided for this image

What was the daily return of the stock on average?¶

In [ ]:
# We'll use pct_change to find the percent change for each day
for company in company_list:
    company['Daily Return'] = company['Adj Close'].pct_change()

# Then we'll plot the daily return percentage
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(10)
fig.set_figwidth(15)

AAPL['Daily Return'].plot(ax=axes[0,0], legend=True, linestyle='--', marker='o')
axes[0,0].set_title('APPLE')

GOOG['Daily Return'].plot(ax=axes[0,1], legend=True, linestyle='--', marker='o')
axes[0,1].set_title('GOOGLE')

MSFT['Daily Return'].plot(ax=axes[1,0], legend=True, linestyle='--', marker='o')
axes[1,0].set_title('MICROSOFT')

AMZN['Daily Return'].plot(ax=axes[1,1], legend=True, linestyle='--', marker='o')
axes[1,1].set_title('AMAZON')

fig.tight_layout()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(12, 9))

for i, company in enumerate(company_list, 1):
    plt.subplot(2, 2, i)
    company['Daily Return'].hist(bins=50)
    plt.xlabel('Daily Return')
    plt.ylabel('Counts')
    plt.title(f'{company_name[i - 1]}')
    
plt.tight_layout()
No description has been provided for this image

What was the correlation between different stocks closing prices?¶¶

In [ ]:
# Grab all the closing prices for the tech stock list into one DataFrame

closing_df = pdr.get_data_yahoo(tech_list, start=start, end=end)['Adj Close']

# Make a new tech returns DataFrame
tech_rets = closing_df.pct_change()
tech_rets.head()
[*********************100%%**********************]  4 of 4 completed
Out[ ]:
Ticker AAPL AMZN GOOG MSFT
Date
2023-04-28 NaN NaN NaN NaN
2023-05-01 -0.000530 -0.032243 -0.004713 -0.005533
2023-05-02 -0.006191 0.015483 -0.016062 -0.000491
2023-05-03 -0.006467 0.000193 0.001321 -0.003307
2023-05-04 -0.009913 0.003377 -0.008575 0.003318

Now we can compare the daily percentage return of two stocks to check how correlated. First let's see a sotck compared to itself.

In [ ]:
# Comparing Google to itself should show a perfectly linear relationship
sns.jointplot(x='GOOG', y='GOOG', data=tech_rets, kind='scatter', color='seagreen')
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[ ]:
<seaborn.axisgrid.JointGrid at 0x158d79550>
No description has been provided for this image
In [ ]:
# We'll use joinplot to compare the daily returns of Google and Microsoft
sns.jointplot(x='GOOG', y='MSFT', data=tech_rets, kind='scatter')
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[ ]:
<seaborn.axisgrid.JointGrid at 0x1589a2890>
No description has been provided for this image
In [ ]:
# We can simply call pairplot on our DataFrame for an automatic visual analysis 
# of all the comparisons

sns.pairplot(tech_rets, kind='reg')
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x158707250>
No description has been provided for this image
In [ ]:
# Set up our figure by naming it returns_fig, call PairPLot on the DataFrame
return_fig = sns.PairGrid(tech_rets.dropna())

# Using map_upper we can specify what the upper triangle will look like.
return_fig.map_upper(plt.scatter, color='purple')

# We can also define the lower triangle in the figure, inclufing the plot type (kde) 
# or the color map (BluePurple)
return_fig.map_lower(sns.kdeplot, cmap='cool_d')

# Finally we'll define the diagonal as a series of histogram plots of the daily return
return_fig.map_diag(plt.hist, bins=30)
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x158af9c90>
No description has been provided for this image
In [ ]:
# Set up our figure by naming it returns_fig, call PairPLot on the DataFrame
returns_fig = sns.PairGrid(closing_df)

# Using map_upper we can specify what the upper triangle will look like.
returns_fig.map_upper(plt.scatter,color='purple')

# We can also define the lower triangle in the figure, inclufing the plot type (kde) or the color map (BluePurple)
returns_fig.map_lower(sns.kdeplot,cmap='cool_d')

# Finally we'll define the diagonal as a series of histogram plots of the daily return
returns_fig.map_diag(plt.hist,bins=30)
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead.
  with pd.option_context('mode.use_inf_as_na', True):
Out[ ]:
<seaborn.axisgrid.PairGrid at 0x159a52b90>
No description has been provided for this image
In [ ]:
plt.figure(figsize=(12, 10))

plt.subplot(2, 2, 1)
sns.heatmap(tech_rets.corr(), annot=True, cmap='summer')
plt.title('Correlation of stock return')

plt.subplot(2, 2, 2)
sns.heatmap(closing_df.corr(), annot=True, cmap='summer')
plt.title('Correlation of stock closing price')
Out[ ]:
Text(0.5, 1.0, 'Correlation of stock closing price')
No description has been provided for this image

How much value do we put at risk by investing in a particular stock?¶¶

In [ ]:
rets = tech_rets.dropna()

area = np.pi * 20

plt.figure(figsize=(10, 8))
plt.scatter(rets.mean(), rets.std(), s=area)
plt.xlabel('Expected return')
plt.ylabel('Risk')

for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
    plt.annotate(label, xy=(x, y), xytext=(50, 50), textcoords='offset points', ha='right', va='bottom', 
                 arrowprops=dict(arrowstyle='-', color='blue', connectionstyle='arc3,rad=-0.3'))
No description has been provided for this image

Predicting the closing price stock price of GOOGLE inc:¶

In [ ]:
# Get the stock quote
df = pdr.get_data_yahoo('GOOG', start='2012-01-01', end=datetime.now())
# Show teh data
df
[*********************100%%**********************]  1 of 1 completed
Out[ ]:
Open High Low Close Adj Close Volume
Date
2012-01-03 16.262545 16.641375 16.248346 16.573130 16.573130 147611217
2012-01-04 16.563665 16.693678 16.453827 16.644611 16.644611 114989399
2012-01-05 16.491436 16.537264 16.344486 16.413727 16.413727 131808205
2012-01-06 16.417213 16.438385 16.184088 16.189817 16.189817 108119746
2012-01-09 16.102144 16.114599 15.472754 15.503389 15.503389 233776981
... ... ... ... ... ... ...
2024-04-22 156.009995 159.184998 155.660004 157.949997 157.949997 17243900
2024-04-23 158.589996 160.479996 157.964996 159.919998 159.919998 16115400
2024-04-24 159.089996 161.389999 158.820007 161.100006 161.100006 19485700
2024-04-25 153.360001 158.279999 152.768005 157.949997 157.949997 36197800
2024-04-26 175.990005 176.419998 171.399994 173.690002 173.690002 55186700

3099 rows × 6 columns

In [ ]:
plt.figure(figsize=(16,6))
plt.title('Close Price History')
plt.plot(df['Close'])
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()
No description has been provided for this image
In [ ]:
# Create a new dataframe with only the 'Close column 
data = df.filter(['Close'])
# Convert the dataframe to a numpy array
dataset = data.values
# Get the number of rows to train the model on
training_data_len = int(np.ceil( len(dataset) * .95 ))

training_data_len
Out[ ]:
2945
In [ ]:
# Scale the data
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)

scaled_data
Out[ ]:
array([[0.01658095],
       [0.01702836],
       [0.01558322],
       ...,
       [0.92119725],
       [0.90148085],
       [1.        ]])
In [ ]:
# Create the training data set 
# Create the scaled training data set
train_data = scaled_data[0:int(training_data_len), :]
# Split the data into x_train and y_train data sets
x_train = []
y_train = []

for i in range(60, len(train_data)):
    x_train.append(train_data[i-60:i, 0])
    y_train.append(train_data[i, 0])
    if i<= 61:
        print(x_train)
        print(y_train)
        print()
        
# Convert the x_train and y_train to numpy arrays 
x_train, y_train = np.array(x_train), np.array(y_train)

# Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
# x_train.shape
[array([0.01658095, 0.01702836, 0.01558322, 0.01418174, 0.00988528,
       0.00999128, 0.01043091, 0.0110046 , 0.01027969, 0.01083935,
       0.01151438, 0.01255263, 0.0041998 , 0.00412653, 0.00341098,
       0.00162754, 0.00141084, 0.00326288, 0.00290588, 0.00328314,
       0.00339539, 0.00406261, 0.00581175, 0.00780097, 0.00743929,
       0.00791945, 0.00817043, 0.00730522, 0.00828579, 0.00790541,
       0.00725066, 0.00740032, 0.00710724, 0.00856641, 0.00762169,
       0.0073364 , 0.00792724, 0.00783526, 0.00925078, 0.00922896,
       0.00987592, 0.00969664, 0.00860538, 0.00715712, 0.00744396,
       0.00749698, 0.00642286, 0.00718674, 0.00915568, 0.00887664,
       0.00967794, 0.01028749, 0.01168118, 0.01160479, 0.01261655,
       0.01356283, 0.01302344, 0.01407416, 0.01371405, 0.01507657])]
[0.013930743734310483]

[array([0.01658095, 0.01702836, 0.01558322, 0.01418174, 0.00988528,
       0.00999128, 0.01043091, 0.0110046 , 0.01027969, 0.01083935,
       0.01151438, 0.01255263, 0.0041998 , 0.00412653, 0.00341098,
       0.00162754, 0.00141084, 0.00326288, 0.00290588, 0.00328314,
       0.00339539, 0.00406261, 0.00581175, 0.00780097, 0.00743929,
       0.00791945, 0.00817043, 0.00730522, 0.00828579, 0.00790541,
       0.00725066, 0.00740032, 0.00710724, 0.00856641, 0.00762169,
       0.0073364 , 0.00792724, 0.00783526, 0.00925078, 0.00922896,
       0.00987592, 0.00969664, 0.00860538, 0.00715712, 0.00744396,
       0.00749698, 0.00642286, 0.00718674, 0.00915568, 0.00887664,
       0.00967794, 0.01028749, 0.01168118, 0.01160479, 0.01261655,
       0.01356283, 0.01302344, 0.01407416, 0.01371405, 0.01507657]), array([0.01702836, 0.01558322, 0.01418174, 0.00988528, 0.00999128,
       0.01043091, 0.0110046 , 0.01027969, 0.01083935, 0.01151438,
       0.01255263, 0.0041998 , 0.00412653, 0.00341098, 0.00162754,
       0.00141084, 0.00326288, 0.00290588, 0.00328314, 0.00339539,
       0.00406261, 0.00581175, 0.00780097, 0.00743929, 0.00791945,
       0.00817043, 0.00730522, 0.00828579, 0.00790541, 0.00725066,
       0.00740032, 0.00710724, 0.00856641, 0.00762169, 0.0073364 ,
       0.00792724, 0.00783526, 0.00925078, 0.00922896, 0.00987592,
       0.00969664, 0.00860538, 0.00715712, 0.00744396, 0.00749698,
       0.00642286, 0.00718674, 0.00915568, 0.00887664, 0.00967794,
       0.01028749, 0.01168118, 0.01160479, 0.01261655, 0.01356283,
       0.01302344, 0.01407416, 0.01371405, 0.01507657, 0.01393074])]
[0.013930743734310483, 0.01281297586807846]

In [ ]:
from keras.models import Sequential
from keras.layers import Dense, LSTM

# Build the LSTM model
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
/Users/anaconda3/lib/python3.11/site-packages/keras/src/layers/rnn/rnn.py:204: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
2885/2885 ━━━━━━━━━━━━━━━━━━━━ 53s 17ms/step - loss: 0.0016
Out[ ]:
<keras.src.callbacks.history.History at 0x16ba34ad0>
In [ ]:
# Create the testing data set
# Create a new array containing scaled values from index 1543 to 2002 
test_data = scaled_data[training_data_len - 60: , :]
# Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])
    
# Convert the data to a numpy array
x_test = np.array(x_test)

# Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))

# Get the models predicted price values 
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)

# Get the root mean squared error (RMSE)
rmse = np.sqrt(np.mean(((predictions - y_test) ** 2)))
rmse
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 73ms/step
Out[ ]:
6.551480523450407
In [ ]:
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
# Visualize the data
plt.figure(figsize=(16,6))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')
plt.show()
/var/folders/cx/3wbhcqyd3cld6gvk_xjkvr_40000gn/T/ipykernel_5985/2388977846.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  valid['Predictions'] = predictions
No description has been provided for this image
In [ ]:
#Show the valid and predicted prices
valid
Out[ ]:
Close Predictions
Date
2023-09-18 177.970001 184.988815
2023-09-19 179.070007 184.812439
2023-09-20 175.490005 185.066147
2023-09-21 173.929993 184.940063
2023-09-22 174.789993 184.476044
... ... ...
2024-04-22 165.839996 177.382996
2024-04-23 166.899994 176.453674
2024-04-24 169.020004 175.887527
2024-04-25 169.889999 175.859253
2024-04-26 169.300003 176.175049

154 rows × 2 columns

Summary¶¶

In this notebook, you discovered and explored stock data.

Specifically, you learned:

How to load stock market data from the YAHOO Finance website using yfinance. How to explore and visualize time-series data using Pandas, Matplotlib, and Seaborn. How to measure the correlation between stocks. How to measure the risk of investing in a particular stock.