import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from pandas_datareader import data, wb
import fix_yahoo_finance as yf
yf.pdr_override()
import numpy as np
import datetime
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
#Plotly Method Imports
import plotly
import cufflinks as cf
cf.go_offline()
import warnings
warnings.filterwarnings('ignore')
from IPython.display import HTML
HTML('''<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
start = datetime.datetime(2006, 1, 1)
end = datetime.datetime(2018, 1, 1)
# Bank of America
BAC = data.get_data_yahoo('BAC', start, end)
# CitiGroup
C = data.get_data_yahoo("C", start, end)
# Goldman Sachs
GS = data.get_data_yahoo("GS", start, end)
# JPMorgan Chase
JPM = data.get_data_yahoo("JPM", start, end)
# Morgan Stanley
MS = data.get_data_yahoo("MS", start, end)
# Wells Fargo
WFC = data.get_data_yahoo("WFC", start, end)
tickers = ['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']
Combing all the stock price data of the banks from 2006 to 2017 into a dataset.
bank_stocks = pd.concat([BAC, C, GS, JPM, MS, WFC],axis=1,keys=tickers)
bank_stocks.columns.names = ['Bank Ticker','Stock Info']
Check the head of the bank_stocks dataframe.
bank_stocks.head()
bank_stocks.xs(key='Close',axis=1,level='Stock Info').max()
Creating the returns for each bank's stock. Returns are typically defined by:
$$r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$$
The first 5 rows of the returns for these banks:
returns = pd.DataFrame()
for tick in tickers:
returns[tick+' Return'] = bank_stocks[tick]['Close'].pct_change()
returns.head()
It's the pairplot of the returns for the 6 banks. From the graph, we can see that all the banks are centured at the 0 retuen rate, but some banks like Morgan Stanley and CitiGroup have higher fluctuation (deviation) for the return rates, which means that they have the higher risk for their stocks.
sns.set(style="ticks", color_codes=True)
sns.pairplot(returns[1:],kind="reg", plot_kws={'line_kws':{'color':'red', 'linestyle': ':'}})
When did the banks have the lowest return rate over the years?
returns.idxmin()
When did the banks have the highest return rate over the years?
returns.idxmax()
Looking at the standard deviation of the returns, CitiGroup is the riskiest over the entire time period
returns.std()
For these 6 banks, Bank of America, CitiGroup, JPMorgan Chase, Morgan Stanley, and Wells Fargo, their retuen rates went extended during 2018 and 2019. It means that they encountered some financial impact during these times and it caused them to have a huge fluctuation on their return rates. There were higher risks for their stocks. Around 2018, the global financial crisis happened and it undoubtedly impacted the banks and the stock markets dramastically dropped.
Among this 6 banks, we can see that Bank of America and CitiGroup had huger impact on their stock prices; on the other side, Golden Sachs and JPMorgan Chase had the less impact during this time. Golden Sachs and JPMorgan Chase have more stable stock prices through the years.
import warnings
warnings.filterwarnings(action='once')
f, axes = plt.subplots(2, 3, figsize=(12, 8), sharex=True)
sns.distplot(returns.loc['2006-01-01':'2006-12-31']['BAC Return'].dropna(),color='red',bins=50,ax=axes[0,0])
sns.distplot(returns.loc['2007-01-01':'2007-12-31']['BAC Return'].dropna(),color='red',bins=50,ax=axes[0,1])
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['BAC Return'].dropna(),color='red',bins=50,ax=axes[0,2])
sns.distplot(returns.loc['2009-01-01':'2009-12-31']['BAC Return'].dropna(),color='red',bins=50,ax=axes[1,0])
sns.distplot(returns.loc['2010-01-01':'2010-12-31']['BAC Return'].dropna(),color='red',bins=50,ax=axes[1,1])
sns.distplot(returns.loc['2011-01-01':'2011-12-31']['BAC Return'].dropna(),color='red',bins=50,ax=axes[1,2])
axes[0,0].set_title("2006")
axes[0,1].set_title("2007")
axes[0,2].set_title("2008")
axes[1,0].set_title("2009")
axes[1,1].set_title("2010")
axes[1,2].set_title("2011")
axes[0,0].set_xlim([-0.4,0.4])
plt.tight_layout()
f, axes = plt.subplots(2, 3, figsize=(12, 8), sharex=True)
sns.distplot(returns.loc['2006-01-01':'2006-12-31']['C Return'].dropna(),color='orange',bins=50,ax=axes[0,0])
sns.distplot(returns.loc['2007-01-01':'2007-12-31']['C Return'].dropna(),color='orange',bins=50,ax=axes[0,1])
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['C Return'].dropna(),color='orange',bins=50,ax=axes[0,2])
sns.distplot(returns.loc['2009-01-01':'2009-12-31']['C Return'].dropna(),color='orange',bins=50,ax=axes[1,0])
sns.distplot(returns.loc['2010-01-01':'2010-12-31']['C Return'].dropna(),color='orange',bins=50,ax=axes[1,1])
sns.distplot(returns.loc['2011-01-01':'2011-12-31']['C Return'].dropna(),color='orange',bins=50,ax=axes[1,2])
axes[0,0].set_title("2006")
axes[0,1].set_title("2007")
axes[0,2].set_title("2008")
axes[1,0].set_title("2009")
axes[1,1].set_title("2010")
axes[1,2].set_title("2011")
axes[0,0].set_xlim([-0.4,0.4])
plt.tight_layout()
f, axes = plt.subplots(2, 3, figsize=(12, 8), sharex=True)
sns.distplot(returns.loc['2006-01-01':'2006-12-31']['GS Return'].dropna(),color='brown',bins=50,ax=axes[0,0])
sns.distplot(returns.loc['2007-01-01':'2007-12-31']['GS Return'].dropna(),color='brown',bins=50,ax=axes[0,1])
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['GS Return'].dropna(),color='brown',bins=50,ax=axes[0,2])
sns.distplot(returns.loc['2009-01-01':'2009-12-31']['GS Return'].dropna(),color='brown',bins=50,ax=axes[1,0])
sns.distplot(returns.loc['2010-01-01':'2010-12-31']['GS Return'].dropna(),color='brown',bins=50,ax=axes[1,1])
sns.distplot(returns.loc['2011-01-01':'2011-12-31']['GS Return'].dropna(),color='brown',bins=50,ax=axes[1,2])
axes[0,0].set_title("2006")
axes[0,1].set_title("2007")
axes[0,2].set_title("2008")
axes[1,0].set_title("2009")
axes[1,1].set_title("2010")
axes[1,2].set_title("2011")
axes[0,0].set_xlim([-0.4,0.4])
plt.tight_layout()
f, axes = plt.subplots(2, 3, figsize=(12, 8), sharex=True)
sns.distplot(returns.loc['2006-01-01':'2006-12-31']['JPM Return'].dropna(),color='green',bins=50,ax=axes[0,0])
sns.distplot(returns.loc['2007-01-01':'2007-12-31']['JPM Return'].dropna(),color='green',bins=50,ax=axes[0,1])
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['JPM Return'].dropna(),color='green',bins=50,ax=axes[0,2])
sns.distplot(returns.loc['2009-01-01':'2009-12-31']['JPM Return'].dropna(),color='green',bins=50,ax=axes[1,0])
sns.distplot(returns.loc['2010-01-01':'2010-12-31']['JPM Return'].dropna(),color='green',bins=50,ax=axes[1,1])
sns.distplot(returns.loc['2011-01-01':'2011-12-31']['JPM Return'].dropna(),color='green',bins=50,ax=axes[1,2])
axes[0,0].set_title("2006")
axes[0,1].set_title("2007")
axes[0,2].set_title("2008")
axes[1,0].set_title("2009")
axes[1,1].set_title("2010")
axes[1,2].set_title("2011")
axes[0,0].set_xlim([-0.4,0.4])
plt.tight_layout()
f, axes = plt.subplots(2, 3, figsize=(12, 8), sharex=True)
sns.distplot(returns.loc['2006-01-01':'2006-12-31']['MS Return'].dropna(),color='blue',bins=50,ax=axes[0,0])
sns.distplot(returns.loc['2007-01-01':'2007-12-31']['MS Return'].dropna(),color='blue',bins=50,ax=axes[0,1])
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['MS Return'].dropna(),color='blue',bins=50,ax=axes[0,2])
sns.distplot(returns.loc['2009-01-01':'2009-12-31']['MS Return'].dropna(),color='blue',bins=50,ax=axes[1,0])
sns.distplot(returns.loc['2010-01-01':'2010-12-31']['MS Return'].dropna(),color='blue',bins=50,ax=axes[1,1])
sns.distplot(returns.loc['2011-01-01':'2011-12-31']['MS Return'].dropna(),color='blue',bins=50,ax=axes[1,2])
axes[0,0].set_title("2006")
axes[0,1].set_title("2007")
axes[0,2].set_title("2008")
axes[1,0].set_title("2009")
axes[1,1].set_title("2010")
axes[1,2].set_title("2011")
axes[0,0].set_xlim([-0.4,0.4])
plt.tight_layout()
f, axes = plt.subplots(2, 3, figsize=(12, 8), sharex=True)
sns.distplot(returns.loc['2006-01-01':'2006-12-31']['WFC Return'].dropna(),color='purple',bins=50,ax=axes[0,0])
sns.distplot(returns.loc['2007-01-01':'2007-12-31']['WFC Return'].dropna(),color='purple',bins=50,ax=axes[0,1])
sns.distplot(returns.loc['2008-01-01':'2008-12-31']['WFC Return'].dropna(),color='purple',bins=50,ax=axes[0,2])
sns.distplot(returns.loc['2009-01-01':'2009-12-31']['WFC Return'].dropna(),color='purple',bins=50,ax=axes[1,0])
sns.distplot(returns.loc['2010-01-01':'2010-12-31']['WFC Return'].dropna(),color='purple',bins=50,ax=axes[1,1])
sns.distplot(returns.loc['2011-01-01':'2011-12-31']['WFC Return'].dropna(),color='purple',bins=50,ax=axes[1,2])
axes[0,0].set_title("2006")
axes[0,1].set_title("2007")
axes[0,2].set_title("2008")
axes[1,0].set_title("2009")
axes[1,1].set_title("2010")
axes[1,2].set_title("2011")
axes[0,0].set_xlim([-0.4,0.4])
plt.tight_layout()
for tick in tickers:
bank_stocks[tick]['Close'].plot(figsize=(15,6),label=tick)
plt.legend()
To analyze the moving averages for these stocks in the year 2008. Plot the rolling 30 day average against the Close Price for each bank for the year 2008.
From the below graphs, we can see that for these 6 banks in 2008, their stock prices are generally dropped due to the impact of financial crisis. However, JPMorgan Chase and Wells Fargo had a better performance on thier stocks compared to the other banks who had the dramatic drop during this time.
f, axes = plt.subplots(3, 2, figsize=(20, 15))
axes[0,0].plot(BAC['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean(), label='30 Day Avg', color='red',ls=':')
axes[0,0].plot(BAC['Close'].loc['2008-01-01':'2009-01-01'], label='BAC CLOSE')
axes[0,0].set_title('Bank of America')
axes[0,1].plot(C['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean(), label='30 Day Avg', color='red',ls=':')
axes[0,1].plot(C['Close'].loc['2008-01-01':'2009-01-01'], label='BAC CLOSE')
axes[0,1].set_title('CitiGroup')
axes[1,0].plot(GS['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean(), label='30 Day Avg', color='red',ls=':')
axes[1,0].plot(GS['Close'].loc['2008-01-01':'2009-01-01'], label='BAC CLOSE')
axes[1,0].set_title('Goldman Sachs')
axes[1,1].plot(JPM['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean(), label='30 Day Avg', color='red',ls=':')
axes[1,1].plot(JPM['Close'].loc['2008-01-01':'2009-01-01'], label='BAC CLOSE')
axes[1,1].set_title('JPMargan Chase')
axes[2,0].plot(MS['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean(), label='30 Day Avg', color='red',ls=':')
axes[2,0].plot(MS['Close'].loc['2008-01-01':'2009-01-01'], label='BAC CLOSE')
axes[2,0].set_title('Morgan Stanley')
axes[2,1].plot(WFC['Close'].loc['2008-01-01':'2009-01-01'].rolling(window=30).mean(), label='30 Day Avg', color='red',ls=':')
axes[2,1].plot(WFC['Close'].loc['2008-01-01':'2009-01-01'], label='BAC CLOSE')
axes[2,1].set_title('Wells Fargo')
From the heatmap and clustermap, we can tell that Bank of America, CitiGroup and Morgan Stanley have higher correlation with each other, which means that their stock prices may have certain level of impact among themselves. Another group, JPMorgan Chase, Goldman Sachs and Wells Fargo have higher correlation with each other's performance of the stock prices.
plt.figure(figsize=(9,7.2))
sns.heatmap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr(),cmap='RdBu_r',annot=True)
Optional: Use seaborn's clustermap to cluster the correlations together:
sns.clustermap(bank_stocks.xs(key='Close', axis=1, level='Stock Info').corr(),cmap='RdBu_r',annot=True)
You can zoom in these plots and see the details of the time that you are interested in
From this plor, we can see that the financial crisis around 2018 impacted Bank of America very well. After the economic depression, Bank of America has the slighly increse after year 2012.
BAC[['Open', 'High', 'Low', 'Close']].loc['2006-01-01':'2018-01-01'].iplot(kind='candle')
CitiGroup had the dramastically drop on the stock price around 2008. They were hugely impacted by the financial crisis. Their stock price plummet from around 545 to around 10. Their stock price collapsed in these 2 years. After the economic downturn, they went up slightly at around 50.
C[['Open', 'High', 'Low', 'Close']].loc['2006-01-01':'2018-01-01'].iplot(kind='candle')
For the Goldman Sachs, they also have the sudden drop around 2018 but after the global financial tsunami, they bounced back and were grafually increasing through the years.
GS[['Open', 'High', 'Low', 'Close']].loc['2006-01-01':'2018-01-01'].iplot(kind='candle')
JPMorgan Chase had the best performance among these 6 banks. Although they had a drop around 2009 but the impact was not that huge. It went back to the price as the one previouly and it was still growing on the stock price over the years. It has had a clear increase since 2012.
JPM[['Open', 'High', 'Low', 'Close']].loc['2006-01-01':'2018-01-01'].iplot(kind='candle')
Morgan Stanley had a serious drop during the financial crisis and then they went up to around 30 and then in 2012, it had another slight drop as well as in 2016. However, overall, they have a slight increase after the crisis.
MS[['Open', 'High', 'Low', 'Close']].loc['2006-01-01':'2018-01-01'].iplot(kind='candle')
Wells Fargo was also suffered from the economic downturn, so they had the drop around 2009, and then they recovered quickly and went up until 2016. It dropped in 2016 and 2017. It may because of the sales scandal.
WFC[['Open', 'High', 'Low', 'Close']].loc['2006-01-01':'2018-01-01'].iplot(kind='candle')
BAC['Close'].loc['2017-01-01':'2018-01-01'].ta_plot(study='boll')
C['Close'].loc['2017-01-01':'2018-01-01'].ta_plot(study='boll')
JPM['Close'].loc['2017-01-01':'2018-01-01'].ta_plot(study='boll')