Stock Market Analysis Using Python Model of SLTM¶
Re-analyze From Kagkle: https://www.kaggle.com/code/faressayah/stock-market-analysis-prediction-using-lstm
1.) What was the change in price of the stock over time? 2.) What was the daily return of the stock on average? 3.) What was the moving average of the various stocks? 4.) What was the correlation between different stocks'? 5.) How much value do we put at risk by investing in a particular stock? 6.) How can we attempt to predict future stock behavior? (Predicting the closing price stock price of APPLE inc using LSTM)
What was the change in price of the stock overtime?¶¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
plt.style.use("fivethirtyeight")
%matplotlib inline
# For reading stock data from yahoo
from pandas_datareader.data import DataReader
import yfinance as yf
from pandas_datareader import data as pdr
yf.pdr_override()
# For time stamps
from datetime import datetime
# The tech stocks we'll use for this analysis
tech_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']
# Set up End and Start times for data grab
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)
for stock in tech_list:
globals()[stock] = yf.download(stock, start, end)
company_list = [AAPL, GOOG, MSFT, AMZN] # type: ignore
company_name = ["APPLE", "GOOGLE", "MICROSOFT", "AMAZON"]
for company, com_name in zip(company_list, company_name):
company["company_name"] = com_name
df = pd.concat(company_list, axis=0)
df.tail(10)
[*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed [*********************100%%**********************] 1 of 1 completed
Open | High | Low | Close | Adj Close | Volume | company_name | |
---|---|---|---|---|---|---|---|
Date | |||||||
2024-04-15 | 187.429993 | 188.690002 | 183.000000 | 183.619995 | 183.619995 | 48052400 | AMAZON |
2024-04-16 | 183.270004 | 184.830002 | 182.259995 | 183.320007 | 183.320007 | 32891300 | AMAZON |
2024-04-17 | 184.309998 | 184.570007 | 179.820007 | 181.279999 | 181.279999 | 31359700 | AMAZON |
2024-04-18 | 181.470001 | 182.389999 | 178.649994 | 179.220001 | 179.220001 | 30723800 | AMAZON |
2024-04-19 | 178.740005 | 179.000000 | 173.440002 | 174.630005 | 174.630005 | 55950000 | AMAZON |
2024-04-22 | 176.940002 | 178.869995 | 174.559998 | 177.229996 | 177.229996 | 37924900 | AMAZON |
2024-04-23 | 178.080002 | 179.929993 | 175.979996 | 179.539993 | 179.539993 | 37046500 | AMAZON |
2024-04-24 | 179.940002 | 180.320007 | 176.179993 | 176.589996 | 176.589996 | 34185100 | AMAZON |
2024-04-25 | 169.679993 | 173.919998 | 166.320007 | 173.669998 | 173.669998 | 49249400 | AMAZON |
2024-04-26 | 177.800003 | 180.820007 | 176.130005 | 179.619995 | 179.619995 | 42033000 | AMAZON |
print(df.head())
print(df.info())
print(df["company_name"].unique())
Open High Low Close Adj Close \ Date 2023-04-28 168.490005 169.850006 167.880005 169.679993 168.779099 2023-05-01 169.279999 170.449997 168.639999 169.589996 168.689575 2023-05-02 170.089996 170.350006 167.539993 168.539993 167.645157 2023-05-03 169.500000 170.919998 167.160004 167.449997 166.560944 2023-05-04 164.889999 167.039993 164.309998 165.789993 164.909760 Volume company_name Date 2023-04-28 55209200 APPLE 2023-05-01 52472900 APPLE 2023-05-02 48425700 APPLE 2023-05-03 65136000 APPLE 2023-05-04 81235400 APPLE <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 1004 entries, 2023-04-28 to 2024-04-26 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Open 1004 non-null float64 1 High 1004 non-null float64 2 Low 1004 non-null float64 3 Close 1004 non-null float64 4 Adj Close 1004 non-null float64 5 Volume 1004 non-null int64 6 company_name 1004 non-null object dtypes: float64(5), int64(1), object(1) memory usage: 62.8+ KB None ['APPLE' 'GOOGLE' 'MICROSOFT' 'AMAZON']
Information About the APPLE Data
# Summary Stats
AAPL.describe()
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
count | 251.000000 | 251.000000 | 251.000000 | 251.000000 | 251.000000 | 2.510000e+02 |
mean | 181.169682 | 182.628924 | 179.809004 | 181.282909 | 180.891567 | 5.781501e+07 |
std | 8.735188 | 8.594316 | 8.702606 | 8.674535 | 8.639346 | 1.765801e+07 |
min | 164.889999 | 166.399994 | 164.080002 | 165.000000 | 164.909760 | 2.404830e+07 |
25% | 173.240005 | 174.905006 | 172.050003 | 173.690002 | 173.265129 | 4.678545e+07 |
50% | 180.669998 | 182.229996 | 178.550003 | 180.710007 | 180.238220 | 5.366560e+07 |
75% | 189.294998 | 189.990005 | 187.695000 | 189.334999 | 188.909805 | 6.401530e+07 |
max | 198.020004 | 199.619995 | 197.000000 | 198.110001 | 197.857529 | 1.366826e+08 |
AAPL.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 251 entries, 2023-04-28 to 2024-04-26 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Open 251 non-null float64 1 High 251 non-null float64 2 Low 251 non-null float64 3 Close 251 non-null float64 4 Adj Close 251 non-null float64 5 Volume 251 non-null int64 6 company_name 251 non-null object dtypes: float64(5), int64(1), object(1) memory usage: 15.7+ KB
Closing prices
# Let's see a historical view of the closing price
plt.figure(figsize=(15, 10))
plt.subplots_adjust(top=1.25, bottom=1.2)
for i, company in enumerate(company_list, 1):
plt.subplot(2, 2, i)
company['Adj Close'].plot()
plt.ylabel('Adj Close')
plt.xlabel(None)
plt.title(f"Closing Price of {tech_list[i - 1]}")
plt.tight_layout()
Volumns of sales
# Now let's plot the total volume of stock being traded each day
plt.figure(figsize=(15, 10))
plt.subplots_adjust(top=1.25, bottom=1.2)
for i, company in enumerate(company_list, 1):
plt.subplot(2, 2, i)
company['Volume'].plot()
plt.ylabel('Volume')
plt.xlabel(None)
plt.title(f"Sales Volume for {tech_list[i - 1]}")
plt.tight_layout()
What was the moving average of the various stocks?¶¶
ma_day = [10, 20, 50]
for ma in ma_day:
for company in company_list:
column_name = f"MA for {ma} days"
company[column_name] = company['Adj Close'].rolling(ma).mean()
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(10)
fig.set_figwidth(15)
AAPL[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,0])
axes[0,0].set_title('APPLE')
GOOG[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[0,1])
axes[0,1].set_title('GOOGLE')
MSFT[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,0])
axes[1,0].set_title('MICROSOFT')
AMZN[['Adj Close', 'MA for 10 days', 'MA for 20 days', 'MA for 50 days']].plot(ax=axes[1,1])
axes[1,1].set_title('AMAZON')
fig.tight_layout()
What was the daily return of the stock on average?¶
# We'll use pct_change to find the percent change for each day
for company in company_list:
company['Daily Return'] = company['Adj Close'].pct_change()
# Then we'll plot the daily return percentage
fig, axes = plt.subplots(nrows=2, ncols=2)
fig.set_figheight(10)
fig.set_figwidth(15)
AAPL['Daily Return'].plot(ax=axes[0,0], legend=True, linestyle='--', marker='o')
axes[0,0].set_title('APPLE')
GOOG['Daily Return'].plot(ax=axes[0,1], legend=True, linestyle='--', marker='o')
axes[0,1].set_title('GOOGLE')
MSFT['Daily Return'].plot(ax=axes[1,0], legend=True, linestyle='--', marker='o')
axes[1,0].set_title('MICROSOFT')
AMZN['Daily Return'].plot(ax=axes[1,1], legend=True, linestyle='--', marker='o')
axes[1,1].set_title('AMAZON')
fig.tight_layout()
plt.figure(figsize=(12, 9))
for i, company in enumerate(company_list, 1):
plt.subplot(2, 2, i)
company['Daily Return'].hist(bins=50)
plt.xlabel('Daily Return')
plt.ylabel('Counts')
plt.title(f'{company_name[i - 1]}')
plt.tight_layout()
What was the correlation between different stocks closing prices?¶¶
# Grab all the closing prices for the tech stock list into one DataFrame
closing_df = pdr.get_data_yahoo(tech_list, start=start, end=end)['Adj Close']
# Make a new tech returns DataFrame
tech_rets = closing_df.pct_change()
tech_rets.head()
[*********************100%%**********************] 4 of 4 completed
Ticker | AAPL | AMZN | GOOG | MSFT |
---|---|---|---|---|
Date | ||||
2023-04-28 | NaN | NaN | NaN | NaN |
2023-05-01 | -0.000530 | -0.032243 | -0.004713 | -0.005533 |
2023-05-02 | -0.006191 | 0.015483 | -0.016062 | -0.000491 |
2023-05-03 | -0.006467 | 0.000193 | 0.001321 | -0.003307 |
2023-05-04 | -0.009913 | 0.003377 | -0.008575 | 0.003318 |
Now we can compare the daily percentage return of two stocks to check how correlated. First let's see a sotck compared to itself.
# Comparing Google to itself should show a perfectly linear relationship
sns.jointplot(x='GOOG', y='GOOG', data=tech_rets, kind='scatter', color='seagreen')
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
<seaborn.axisgrid.JointGrid at 0x158d79550>
# We'll use joinplot to compare the daily returns of Google and Microsoft
sns.jointplot(x='GOOG', y='MSFT', data=tech_rets, kind='scatter')
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
<seaborn.axisgrid.JointGrid at 0x1589a2890>
# We can simply call pairplot on our DataFrame for an automatic visual analysis
# of all the comparisons
sns.pairplot(tech_rets, kind='reg')
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
<seaborn.axisgrid.PairGrid at 0x158707250>
# Set up our figure by naming it returns_fig, call PairPLot on the DataFrame
return_fig = sns.PairGrid(tech_rets.dropna())
# Using map_upper we can specify what the upper triangle will look like.
return_fig.map_upper(plt.scatter, color='purple')
# We can also define the lower triangle in the figure, inclufing the plot type (kde)
# or the color map (BluePurple)
return_fig.map_lower(sns.kdeplot, cmap='cool_d')
# Finally we'll define the diagonal as a series of histogram plots of the daily return
return_fig.map_diag(plt.hist, bins=30)
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
<seaborn.axisgrid.PairGrid at 0x158af9c90>
# Set up our figure by naming it returns_fig, call PairPLot on the DataFrame
returns_fig = sns.PairGrid(closing_df)
# Using map_upper we can specify what the upper triangle will look like.
returns_fig.map_upper(plt.scatter,color='purple')
# We can also define the lower triangle in the figure, inclufing the plot type (kde) or the color map (BluePurple)
returns_fig.map_lower(sns.kdeplot,cmap='cool_d')
# Finally we'll define the diagonal as a series of histogram plots of the daily return
returns_fig.map_diag(plt.hist,bins=30)
/Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): /Users/anaconda3/lib/python3.11/site-packages/seaborn/_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
<seaborn.axisgrid.PairGrid at 0x159a52b90>
plt.figure(figsize=(12, 10))
plt.subplot(2, 2, 1)
sns.heatmap(tech_rets.corr(), annot=True, cmap='summer')
plt.title('Correlation of stock return')
plt.subplot(2, 2, 2)
sns.heatmap(closing_df.corr(), annot=True, cmap='summer')
plt.title('Correlation of stock closing price')
Text(0.5, 1.0, 'Correlation of stock closing price')
How much value do we put at risk by investing in a particular stock?¶¶
rets = tech_rets.dropna()
area = np.pi * 20
plt.figure(figsize=(10, 8))
plt.scatter(rets.mean(), rets.std(), s=area)
plt.xlabel('Expected return')
plt.ylabel('Risk')
for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
plt.annotate(label, xy=(x, y), xytext=(50, 50), textcoords='offset points', ha='right', va='bottom',
arrowprops=dict(arrowstyle='-', color='blue', connectionstyle='arc3,rad=-0.3'))
Predicting the closing price stock price of GOOGLE inc:¶
# Get the stock quote
df = pdr.get_data_yahoo('GOOG', start='2012-01-01', end=datetime.now())
# Show teh data
df
[*********************100%%**********************] 1 of 1 completed
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2012-01-03 | 16.262545 | 16.641375 | 16.248346 | 16.573130 | 16.573130 | 147611217 |
2012-01-04 | 16.563665 | 16.693678 | 16.453827 | 16.644611 | 16.644611 | 114989399 |
2012-01-05 | 16.491436 | 16.537264 | 16.344486 | 16.413727 | 16.413727 | 131808205 |
2012-01-06 | 16.417213 | 16.438385 | 16.184088 | 16.189817 | 16.189817 | 108119746 |
2012-01-09 | 16.102144 | 16.114599 | 15.472754 | 15.503389 | 15.503389 | 233776981 |
... | ... | ... | ... | ... | ... | ... |
2024-04-22 | 156.009995 | 159.184998 | 155.660004 | 157.949997 | 157.949997 | 17243900 |
2024-04-23 | 158.589996 | 160.479996 | 157.964996 | 159.919998 | 159.919998 | 16115400 |
2024-04-24 | 159.089996 | 161.389999 | 158.820007 | 161.100006 | 161.100006 | 19485700 |
2024-04-25 | 153.360001 | 158.279999 | 152.768005 | 157.949997 | 157.949997 | 36197800 |
2024-04-26 | 175.990005 | 176.419998 | 171.399994 | 173.690002 | 173.690002 | 55186700 |
3099 rows × 6 columns
plt.figure(figsize=(16,6))
plt.title('Close Price History')
plt.plot(df['Close'])
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()
# Create a new dataframe with only the 'Close column
data = df.filter(['Close'])
# Convert the dataframe to a numpy array
dataset = data.values
# Get the number of rows to train the model on
training_data_len = int(np.ceil( len(dataset) * .95 ))
training_data_len
2945
# Scale the data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
scaled_data
array([[0.01658095], [0.01702836], [0.01558322], ..., [0.92119725], [0.90148085], [1. ]])
# Create the training data set
# Create the scaled training data set
train_data = scaled_data[0:int(training_data_len), :]
# Split the data into x_train and y_train data sets
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
if i<= 61:
print(x_train)
print(y_train)
print()
# Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)
# Reshape the data
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
# x_train.shape
[array([0.01658095, 0.01702836, 0.01558322, 0.01418174, 0.00988528, 0.00999128, 0.01043091, 0.0110046 , 0.01027969, 0.01083935, 0.01151438, 0.01255263, 0.0041998 , 0.00412653, 0.00341098, 0.00162754, 0.00141084, 0.00326288, 0.00290588, 0.00328314, 0.00339539, 0.00406261, 0.00581175, 0.00780097, 0.00743929, 0.00791945, 0.00817043, 0.00730522, 0.00828579, 0.00790541, 0.00725066, 0.00740032, 0.00710724, 0.00856641, 0.00762169, 0.0073364 , 0.00792724, 0.00783526, 0.00925078, 0.00922896, 0.00987592, 0.00969664, 0.00860538, 0.00715712, 0.00744396, 0.00749698, 0.00642286, 0.00718674, 0.00915568, 0.00887664, 0.00967794, 0.01028749, 0.01168118, 0.01160479, 0.01261655, 0.01356283, 0.01302344, 0.01407416, 0.01371405, 0.01507657])] [0.013930743734310483] [array([0.01658095, 0.01702836, 0.01558322, 0.01418174, 0.00988528, 0.00999128, 0.01043091, 0.0110046 , 0.01027969, 0.01083935, 0.01151438, 0.01255263, 0.0041998 , 0.00412653, 0.00341098, 0.00162754, 0.00141084, 0.00326288, 0.00290588, 0.00328314, 0.00339539, 0.00406261, 0.00581175, 0.00780097, 0.00743929, 0.00791945, 0.00817043, 0.00730522, 0.00828579, 0.00790541, 0.00725066, 0.00740032, 0.00710724, 0.00856641, 0.00762169, 0.0073364 , 0.00792724, 0.00783526, 0.00925078, 0.00922896, 0.00987592, 0.00969664, 0.00860538, 0.00715712, 0.00744396, 0.00749698, 0.00642286, 0.00718674, 0.00915568, 0.00887664, 0.00967794, 0.01028749, 0.01168118, 0.01160479, 0.01261655, 0.01356283, 0.01302344, 0.01407416, 0.01371405, 0.01507657]), array([0.01702836, 0.01558322, 0.01418174, 0.00988528, 0.00999128, 0.01043091, 0.0110046 , 0.01027969, 0.01083935, 0.01151438, 0.01255263, 0.0041998 , 0.00412653, 0.00341098, 0.00162754, 0.00141084, 0.00326288, 0.00290588, 0.00328314, 0.00339539, 0.00406261, 0.00581175, 0.00780097, 0.00743929, 0.00791945, 0.00817043, 0.00730522, 0.00828579, 0.00790541, 0.00725066, 0.00740032, 0.00710724, 0.00856641, 0.00762169, 0.0073364 , 0.00792724, 0.00783526, 0.00925078, 0.00922896, 0.00987592, 0.00969664, 0.00860538, 0.00715712, 0.00744396, 0.00749698, 0.00642286, 0.00718674, 0.00915568, 0.00887664, 0.00967794, 0.01028749, 0.01168118, 0.01160479, 0.01261655, 0.01356283, 0.01302344, 0.01407416, 0.01371405, 0.01507657, 0.01393074])] [0.013930743734310483, 0.01281297586807846]
from keras.models import Sequential
from keras.layers import Dense, LSTM
# Build the LSTM model
model = Sequential()
model.add(LSTM(128, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(64, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
# Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
/Users/anaconda3/lib/python3.11/site-packages/keras/src/layers/rnn/rnn.py:204: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs)
2885/2885 ━━━━━━━━━━━━━━━━━━━━ 53s 17ms/step - loss: 0.0016
<keras.src.callbacks.history.History at 0x16ba34ad0>
# Create the testing data set
# Create a new array containing scaled values from index 1543 to 2002
test_data = scaled_data[training_data_len - 60: , :]
# Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
# Convert the data to a numpy array
x_test = np.array(x_test)
# Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))
# Get the models predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
# Get the root mean squared error (RMSE)
rmse = np.sqrt(np.mean(((predictions - y_test) ** 2)))
rmse
5/5 ━━━━━━━━━━━━━━━━━━━━ 1s 73ms/step
6.551480523450407
# Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid['Predictions'] = predictions
# Visualize the data
plt.figure(figsize=(16,6))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')
plt.show()
/var/folders/cx/3wbhcqyd3cld6gvk_xjkvr_40000gn/T/ipykernel_5985/2388977846.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy valid['Predictions'] = predictions
#Show the valid and predicted prices
valid
Close | Predictions | |
---|---|---|
Date | ||
2023-09-18 | 177.970001 | 184.988815 |
2023-09-19 | 179.070007 | 184.812439 |
2023-09-20 | 175.490005 | 185.066147 |
2023-09-21 | 173.929993 | 184.940063 |
2023-09-22 | 174.789993 | 184.476044 |
... | ... | ... |
2024-04-22 | 165.839996 | 177.382996 |
2024-04-23 | 166.899994 | 176.453674 |
2024-04-24 | 169.020004 | 175.887527 |
2024-04-25 | 169.889999 | 175.859253 |
2024-04-26 | 169.300003 | 176.175049 |
154 rows × 2 columns
Summary¶¶
In this notebook, you discovered and explored stock data.
Specifically, you learned:
How to load stock market data from the YAHOO Finance website using yfinance. How to explore and visualize time-series data using Pandas, Matplotlib, and Seaborn. How to measure the correlation between stocks. How to measure the risk of investing in a particular stock.