import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import yfinance as yf
import seaborn as sns
#%matplotlib inline
Stocks Uncharted: Exploring Trends Through Data Visualization in Python
Independent Data Analysis Project
This project delves into stock price data to explore historical trends, volatility, and seasonal patterns using Python’s data visualization and analysis tools. By conducting a comprehensive exploratory data analysis (EDA), we aim to uncover key insights into stock price movements over time, with a particular focus on periods of heightened market activity. Notably, our analysis reveals that the highest levels of volatility occurred during the 2008 global financial crisis, highlighting the profound impact of economic downturns on market behavior. This project emphasizes skill-building in data visualization and the application of Pandas, and it is intended purely as an educational exercise rather than a basis for financial advice or decision-making.
Data analysis, Python, Pandas, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning
Background
In this project, we will conduct an exploratory data analysis (EDA) of stock price data, focusing on visualizing key trends and patterns over time. The primary objective is to practice data visualization techniques and strengthen proficiency in using Python’s Pandas library for data manipulation. This analysis will include a range of visualizations to explore stock price fluctuations, highlight trends, and identify any seasonal patterns. It is important to note that this project is intended solely as a learning exercise in data analytics and should not be considered a comprehensive financial analysis or relied upon for investment advice.
We’ll focus on bank stocks and see how they progressed throughout the financial crisis all the way to October 2024.
Data
We begin by getting data using yfinance
. We will get stock information for the following banks:
- Bank of America (BAC)
- CitiGroup (C)
- Goldman Sachs (GS)
- JPMorgan Chase (JPM)
- Morgan Stanley (MS)
- Wells Fargo (WFC)
I get the stock data from Jan 1st 2006 to October 31 2024 for each of these banks. I set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This involves a few steps:
- Use datetime to Morgan Stanleyset start and end datetime objects.
- Figure out the ticker symbol for each bank.
- Figure out how to use yfinance to grab info on the stock.
## Set time limits
= datetime(2006, 1, 1)
start = datetime(2024, 10, 31) end
# Get the tickers for target companies
# Bank of America (BAC) ----
= yf.Ticker("BAC").history(start = start, end = end)
BAC
"bank"] = "BAC"
BAC[
# Citigroup
= yf.Ticker("C").history(start = start, end = end)
CG
"bank"] = "CG"
CG[# Goldman Sachs (GS)
= yf.Ticker("GS").history(start = start, end = end)
GS
"bank"] = "GS"
GS[# PMorgan Chase (JPM)
= yf.Ticker("JPM").history(start = start, end = end)
JPM
"bank"] = "JPM"
JPM[
# Morgan Stanley (MS)
= yf.Ticker("MS").history(start = start, end = end)
MS
"bank"] = "MS"
MS[
# Wells Fargo (WFC)
= yf.Ticker("WFC").history(start = start, end = end)
WFC
"bank"] = "WFC" WFC[
= pd.concat([BAC, CG, GS, JPM, MS, WFC])
stocks stocks.head()
Open | High | Low | Close | Volume | Dividends | Stock Splits | bank | |
---|---|---|---|---|---|---|---|---|
Date | ||||||||
2006-01-03 00:00:00-05:00 | 31.437709 | 31.611918 | 30.921790 | 31.544916 | 16296700 | 0.0 | 0.0 | BAC |
2006-01-04 00:00:00-05:00 | 31.491308 | 31.652116 | 31.122794 | 31.209898 | 17757900 | 0.0 | 0.0 | BAC |
2006-01-05 00:00:00-05:00 | 31.209895 | 31.377402 | 31.035687 | 31.250095 | 14970700 | 0.0 | 0.0 | BAC |
2006-01-06 00:00:00-05:00 | 31.357304 | 31.431007 | 31.055791 | 31.203197 | 12599800 | 0.0 | 0.0 | BAC |
2006-01-09 00:00:00-05:00 | 31.303702 | 31.471209 | 31.062492 | 31.223297 | 15619400 | 0.0 | 0.0 | BAC |
The result is a dataframe with 8 variables (columns) and 28439 observations (rows of data).
stocks.shape
(28434, 8)
EDA
I start by creating a function to allow us summarise the data by bank.
def summaries(bank):
= stocks.loc[stocks['bank'] == bank, :]
dat return dat.describe()
I then loop over the list of banks and create a summary for each, which allows us to see summaries like maximum, mimimum, median and dispersion of prices over the period.
- Summaries: Bank of America (BAC)
"BAC") summaries(
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
count | 4739.000000 | 4739.000000 | 4739.000000 | 4739.000000 | 4.739000e+03 | 4739.000000 | 4739.0 |
mean | 21.668702 | 21.919907 | 21.403028 | 21.657642 | 1.002618e+08 | 0.002773 | 0.0 |
std | 10.782920 | 10.867387 | 10.702971 | 10.788068 | 1.024020e+08 | 0.032446 | 0.0 |
min | 2.564069 | 2.794993 | 2.009106 | 2.500365 | 4.835400e+06 | 0.000000 | 0.0 |
25% | 12.140291 | 12.273633 | 11.969863 | 12.121308 | 4.204220e+07 | 0.000000 | 0.0 |
50% | 21.571707 | 21.905641 | 21.206467 | 21.492443 | 6.683950e+07 | 0.000000 | 0.0 |
75% | 30.939527 | 31.214085 | 30.654451 | 30.942898 | 1.204894e+08 | 0.000000 | 0.0 |
max | 46.384301 | 46.570174 | 45.566466 | 45.891735 | 1.226791e+09 | 0.640000 | 0.0 |
- Summaries: CitiGroup (C)
"CG") summaries(
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
count | 4739.000000 | 4739.000000 | 4739.000000 | 4739.000000 | 4.739000e+03 | 4739.000000 | 4739.000000 |
mean | 79.280590 | 80.216864 | 78.210487 | 79.158650 | 2.398256e+07 | 0.014237 | 0.000021 |
std | 93.620117 | 94.420961 | 92.702660 | 93.510291 | 2.190130e+07 | 0.230598 | 0.001453 |
min | 7.847383 | 8.232059 | 7.462706 | 7.847382 | 6.328600e+05 | 0.000000 | 0.000000 |
25% | 36.159763 | 36.488568 | 35.697883 | 36.104925 | 1.263660e+07 | 0.000000 | 0.000000 |
50% | 44.310362 | 44.821505 | 43.708280 | 44.312912 | 1.829400e+07 | 0.000000 | 0.000000 |
75% | 59.690058 | 60.257005 | 58.995303 | 59.651955 | 2.889885e+07 | 0.000000 | 0.000000 |
max | 393.738679 | 396.521284 | 389.209767 | 392.416931 | 3.772638e+08 | 5.400000 | 0.100000 |
- Summaries: Goldman Sachs (GS)
"GS") summaries(
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
count | 4739.000000 | 4739.000000 | 4739.000000 | 4739.000000 | 4.739000e+03 | 4739.000000 | 4739.0 |
mean | 185.518849 | 187.606137 | 183.444982 | 185.547886 | 5.583694e+06 | 0.014631 | 0.0 |
std | 94.109512 | 94.949841 | 93.325169 | 94.135548 | 6.555060e+06 | 0.150347 | 0.0 |
min | 41.286103 | 41.698962 | 36.247667 | 39.756989 | 4.601000e+05 | 0.000000 | 0.0 |
25% | 125.218123 | 126.723861 | 123.655421 | 124.965111 | 2.400100e+06 | 0.000000 | 0.0 |
50% | 154.742873 | 156.225312 | 152.767405 | 154.532333 | 3.478100e+06 | 0.000000 | 0.0 |
75% | 206.495709 | 208.284994 | 204.934901 | 206.527725 | 5.998600e+06 | 0.000000 | 0.0 |
max | 538.799988 | 540.510010 | 528.229980 | 529.859985 | 1.145907e+08 | 3.000000 | 0.0 |
- Summaries: JPMorgan Chase (JPM)
"JPM") summaries(
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
count | 4739.000000 | 4739.000000 | 4739.000000 | 4739.000000 | 4.739000e+03 | 4739.000000 | 4739.0 |
mean | 68.797871 | 69.488900 | 68.109048 | 68.814119 | 2.284516e+07 | 0.008696 | 0.0 |
std | 48.225127 | 48.614146 | 47.859363 | 48.255914 | 2.006463e+07 | 0.079514 | 0.0 |
min | 10.399280 | 11.698345 | 10.121876 | 10.757875 | 2.926400e+06 | 0.000000 | 0.0 |
25% | 29.273920 | 29.592038 | 28.892567 | 29.253004 | 1.112370e+07 | 0.000000 | 0.0 |
50% | 46.816695 | 47.215364 | 46.420093 | 46.854862 | 1.573690e+07 | 0.000000 | 0.0 |
75% | 96.339535 | 97.336999 | 95.690239 | 96.699444 | 2.766055e+07 | 0.000000 | 0.0 |
max | 225.220001 | 226.750000 | 223.309998 | 225.500000 | 2.172942e+08 | 1.250000 | 0.0 |
- Summaries: Morgan Stanley (MS)
"WFC") summaries(
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
count | 4739.000000 | 4739.000000 | 4739.000000 | 4739.000000 | 4.739000e+03 | 4739.000000 | 4739.000000 |
mean | 32.789865 | 33.152898 | 32.426560 | 32.791581 | 2.868360e+07 | 0.004541 | 0.000422 |
std | 11.955833 | 12.011066 | 11.915451 | 11.961528 | 2.786614e+07 | 0.038970 | 0.029053 |
min | 5.830824 | 6.026308 | 5.257853 | 5.473559 | 2.392000e+06 | 0.000000 | 0.000000 |
25% | 21.406942 | 21.671963 | 21.143000 | 21.415617 | 1.498130e+07 | 0.000000 | 0.000000 |
50% | 36.005509 | 36.358258 | 35.558216 | 35.966167 | 2.094040e+07 | 0.000000 | 0.000000 |
75% | 42.471899 | 42.879078 | 42.120639 | 42.502254 | 3.256150e+07 | 0.000000 | 0.000000 |
max | 65.849998 | 66.400002 | 65.239998 | 65.610001 | 4.787366e+08 | 0.510000 | 2.000000 |
- Summaries: Wells Fargo (WFC)
"WFC") summaries(
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
count | 4739.000000 | 4739.000000 | 4739.000000 | 4739.000000 | 4.739000e+03 | 4739.000000 | 4739.000000 |
mean | 32.789865 | 33.152898 | 32.426560 | 32.791581 | 2.868360e+07 | 0.004541 | 0.000422 |
std | 11.955833 | 12.011066 | 11.915451 | 11.961528 | 2.786614e+07 | 0.038970 | 0.029053 |
min | 5.830824 | 6.026308 | 5.257853 | 5.473559 | 2.392000e+06 | 0.000000 | 0.000000 |
25% | 21.406942 | 21.671963 | 21.143000 | 21.415617 | 1.498130e+07 | 0.000000 | 0.000000 |
50% | 36.005509 | 36.358258 | 35.558216 | 35.966167 | 2.094040e+07 | 0.000000 | 0.000000 |
75% | 42.471899 | 42.879078 | 42.120639 | 42.502254 | 3.256150e+07 | 0.000000 | 0.000000 |
max | 65.849998 | 66.400002 | 65.239998 | 65.610001 | 4.787366e+08 | 0.510000 | 2.000000 |
= "bank") sns.pairplot(stocks, hue
Finally, we look at the trends in closing prices over the period.
= stocks[["Close", "bank"]].reset_index()
new = "Date", y = "Close", data = new, hue = "bank", palette = "mako") sns.lineplot(x
Detailed Analysis
I start by computing the daily stock returns for each of the stock for the period 2006-2024.
"return"] = np.log(stocks.groupby('bank').Close.pct_change().add(1)) stocks[
Plotting the returns data for each bank over time, we see the high volatility in returns during the 2008 financial crisis. This has smoothened out over the years as the crisis abated.
= stocks.reset_index()
old = "Close", y = "return", hue = "bank", data = old, alpha = 0.3) sns.lineplot(x
let us find out the maximum and mimimum daily return for each company.
'bank')["return"].max() old.groupby(
bank
BAC 0.302096
CG 0.456316
GS 0.234818
JPM 0.223917
MS 0.625850
WFC 0.283407
Name: return, dtype: float64
The lowest daily returns for each of the firms are listed below:
'bank')["return"].min() old.groupby(
bank
BAC -0.342059
CG -0.494696
GS -0.210222
JPM -0.232278
MS -0.299658
WFC -0.272101
Name: return, dtype: float64
Looking into the details, we can check which days had the highest and lowest returns. We see that the highest and lowest daily returns were during the 2008 financial crisis, highlighting the high volatility in stock prices duting this period as the returns plot above shows.
# largest daily returns
10, "return")[["Date", "return"]] old.nlargest(
Date | return | |
---|---|---|
19655 | 2008-10-13 00:00:00-04:00 | 0.625850 |
5468 | 2008-11-24 00:00:00-05:00 | 0.456316 |
5539 | 2009-03-10 00:00:00-04:00 | 0.322773 |
822 | 2009-04-09 00:00:00-04:00 | 0.302096 |
19685 | 2008-11-24 00:00:00-05:00 | 0.286189 |
24332 | 2008-07-16 00:00:00-04:00 | 0.283407 |
24517 | 2009-04-09 00:00:00-04:00 | 0.275350 |
5506 | 2009-01-21 00:00:00-05:00 | 0.270572 |
767 | 2009-01-21 00:00:00-05:00 | 0.269878 |
5543 | 2009-03-16 00:00:00-04:00 | 0.269255 |
# largest daily returns
10, "return")[["Date", "return"]] old.nsmallest(
Date | return | |
---|---|---|
5532 | 2009-02-27 00:00:00-05:00 | -0.494696 |
766 | 2009-01-20 00:00:00-05:00 | -0.342059 |
5466 | 2008-11-20 00:00:00-05:00 | -0.306610 |
695 | 2008-10-07 00:00:00-04:00 | -0.304163 |
19653 | 2008-10-09 00:00:00-04:00 | -0.299658 |
793 | 2009-02-27 00:00:00-05:00 | -0.297758 |
19651 | 2008-10-07 00:00:00-04:00 | -0.286264 |
828 | 2009-04-20 00:00:00-04:00 | -0.278916 |
19637 | 2008-09-17 00:00:00-04:00 | -0.277283 |
24461 | 2009-01-20 00:00:00-05:00 | -0.272101 |
Correlation Analysis
In this section, I do correlation analysis. Specifically, I run the correlation analysis between the stock returns for the different banks.
= stocks.reset_index().pivot_table(index = "Date", columns = "bank", values = "return").corr()
correlations
= "viridis")
sns.heatmap(correlations, cmap "Heatmap of Correlations in Bank Returns, 2006-2024") plt.title(
Text(0.5, 1.0, 'Heatmap of Correlations in Bank Returns, 2006-2024')
= "viridis")
sns.clustermap(correlations, cmap "Clustermap of Correlations in Bank Returns, 2006-2024") plt.title(
Text(0.5, 1.0, 'Clustermap of Correlations in Bank Returns, 2006-2024')
Conclusion
In this project, we conducted an exploratory data analysis (EDA) of stock price data, focusing on visualizing key trends and patterns over time. The primary objective is to practice data visualization techniques and strengthen proficiency in using Python’s Pandas library for data manipulation. This analysis will include a range of visualizations to explore stock price fluctuations, highlight trends, and identify any seasonal patterns. It is important to note that this project is intended solely as a learning exercise in data analytics and should not be considered a comprehensive financial analysis or relied upon for investment advic (Muddana and Vinayakam 2024).