Stocks Uncharted: Exploring Trends Through Data Visualization in Python

Independent Data Analysis Project

Author

Affiliations

John Karuitha, PhD

Karatina University, Department of Business and Economics

University of the Witwatersrand, School of Construction Economics & Management

Published

November 2, 2024

Modified

November 2, 2024

Executive Summary

This project delves into stock price data to explore historical trends, volatility, and seasonal patterns using Python’s data visualization and analysis tools. By conducting a comprehensive exploratory data analysis (EDA), we aim to uncover key insights into stock price movements over time, with a particular focus on periods of heightened market activity. Notably, our analysis reveals that the highest levels of volatility occurred during the 2008 global financial crisis, highlighting the profound impact of economic downturns on market behavior. This project emphasizes skill-building in data visualization and the application of Pandas, and it is intended purely as an educational exercise rather than a basis for financial advice or decision-making.

Keywords

Data analysis, Python, Pandas, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning

Background

In this project, we will conduct an exploratory data analysis (EDA) of stock price data, focusing on visualizing key trends and patterns over time. The primary objective is to practice data visualization techniques and strengthen proficiency in using Python’s Pandas library for data manipulation. This analysis will include a range of visualizations to explore stock price fluctuations, highlight trends, and identify any seasonal patterns. It is important to note that this project is intended solely as a learning exercise in data analytics and should not be considered a comprehensive financial analysis or relied upon for investment advice.

We’ll focus on bank stocks and see how they progressed throughout the financial crisis all the way to October 2024.

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import yfinance as yf
import seaborn as sns
#%matplotlib inline

Data

We begin by getting data using yfinance. We will get stock information for the following banks:

Bank of America (BAC)
CitiGroup (C)
Goldman Sachs (GS)
JPMorgan Chase (JPM)
Morgan Stanley (MS)
Wells Fargo (WFC)

I get the stock data from Jan 1st 2006 to October 31 2024 for each of these banks. I set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This involves a few steps:

Use datetime to Morgan Stanleyset start and end datetime objects.
Figure out the ticker symbol for each bank.
Figure out how to use yfinance to grab info on the stock.

## Set time limits 
start = datetime(2006, 1, 1)
end = datetime(2024, 10, 31)

# Get the tickers for target companies
# Bank of America (BAC) ----
BAC = yf.Ticker("BAC").history(start = start, end = end)

BAC["bank"] = "BAC"

# Citigroup
CG = yf.Ticker("C").history(start = start, end = end)

CG["bank"] = "CG"
# Goldman Sachs (GS)
GS = yf.Ticker("GS").history(start = start, end = end)

GS["bank"] = "GS"
# PMorgan Chase (JPM)
JPM = yf.Ticker("JPM").history(start = start, end = end)

JPM["bank"] = "JPM"

# Morgan Stanley (MS)
MS = yf.Ticker("MS").history(start = start, end = end)

MS["bank"] = "MS"

# Wells Fargo (WFC)
WFC = yf.Ticker("WFC").history(start = start, end = end)

WFC["bank"] = "WFC"

stocks = pd.concat([BAC, CG, GS, JPM, MS, WFC])
stocks.head()

	Open	High	Low	Close	Volume	Dividends	Stock Splits	bank
Date
2006-01-03 00:00:00-05:00	31.437709	31.611918	30.921790	31.544916	16296700	0.0	0.0	BAC
2006-01-04 00:00:00-05:00	31.491308	31.652116	31.122794	31.209898	17757900	0.0	0.0	BAC
2006-01-05 00:00:00-05:00	31.209895	31.377402	31.035687	31.250095	14970700	0.0	0.0	BAC
2006-01-06 00:00:00-05:00	31.357304	31.431007	31.055791	31.203197	12599800	0.0	0.0	BAC
2006-01-09 00:00:00-05:00	31.303702	31.471209	31.062492	31.223297	15619400	0.0	0.0	BAC

The result is a dataframe with 8 variables (columns) and 28439 observations (rows of data).

stocks.shape

(28434, 8)

EDA

I start by creating a function to allow us summarise the data by bank.

def summaries(bank):
  dat = stocks.loc[stocks['bank'] == bank, :]
  return dat.describe()

I then loop over the list of banks and create a summary for each, which allows us to see summaries like maximum, mimimum, median and dispersion of prices over the period.

Summaries: Bank of America (BAC)

summaries("BAC")

	Open	High	Low	Close	Volume	Dividends	Stock Splits
count	4739.000000	4739.000000	4739.000000	4739.000000	4.739000e+03	4739.000000	4739.0
mean	21.668702	21.919907	21.403028	21.657642	1.002618e+08	0.002773	0.0
std	10.782920	10.867387	10.702971	10.788068	1.024020e+08	0.032446	0.0
min	2.564069	2.794993	2.009106	2.500365	4.835400e+06	0.000000	0.0
25%	12.140291	12.273633	11.969863	12.121308	4.204220e+07	0.000000	0.0
50%	21.571707	21.905641	21.206467	21.492443	6.683950e+07	0.000000	0.0
75%	30.939527	31.214085	30.654451	30.942898	1.204894e+08	0.000000	0.0
max	46.384301	46.570174	45.566466	45.891735	1.226791e+09	0.640000	0.0

Summaries: CitiGroup (C)

summaries("CG")

	Open	High	Low	Close	Volume	Dividends	Stock Splits
count	4739.000000	4739.000000	4739.000000	4739.000000	4.739000e+03	4739.000000	4739.000000
mean	79.280590	80.216864	78.210487	79.158650	2.398256e+07	0.014237	0.000021
std	93.620117	94.420961	92.702660	93.510291	2.190130e+07	0.230598	0.001453
min	7.847383	8.232059	7.462706	7.847382	6.328600e+05	0.000000	0.000000
25%	36.159763	36.488568	35.697883	36.104925	1.263660e+07	0.000000	0.000000
50%	44.310362	44.821505	43.708280	44.312912	1.829400e+07	0.000000	0.000000
75%	59.690058	60.257005	58.995303	59.651955	2.889885e+07	0.000000	0.000000
max	393.738679	396.521284	389.209767	392.416931	3.772638e+08	5.400000	0.100000

Summaries: Goldman Sachs (GS)

summaries("GS")

	Open	High	Low	Close	Volume	Dividends	Stock Splits
count	4739.000000	4739.000000	4739.000000	4739.000000	4.739000e+03	4739.000000	4739.0
mean	185.518849	187.606137	183.444982	185.547886	5.583694e+06	0.014631	0.0
std	94.109512	94.949841	93.325169	94.135548	6.555060e+06	0.150347	0.0
min	41.286103	41.698962	36.247667	39.756989	4.601000e+05	0.000000	0.0
25%	125.218123	126.723861	123.655421	124.965111	2.400100e+06	0.000000	0.0
50%	154.742873	156.225312	152.767405	154.532333	3.478100e+06	0.000000	0.0
75%	206.495709	208.284994	204.934901	206.527725	5.998600e+06	0.000000	0.0
max	538.799988	540.510010	528.229980	529.859985	1.145907e+08	3.000000	0.0

Summaries: JPMorgan Chase (JPM)

summaries("JPM")

	Open	High	Low	Close	Volume	Dividends	Stock Splits
count	4739.000000	4739.000000	4739.000000	4739.000000	4.739000e+03	4739.000000	4739.0
mean	68.797871	69.488900	68.109048	68.814119	2.284516e+07	0.008696	0.0
std	48.225127	48.614146	47.859363	48.255914	2.006463e+07	0.079514	0.0
min	10.399280	11.698345	10.121876	10.757875	2.926400e+06	0.000000	0.0
25%	29.273920	29.592038	28.892567	29.253004	1.112370e+07	0.000000	0.0
50%	46.816695	47.215364	46.420093	46.854862	1.573690e+07	0.000000	0.0
75%	96.339535	97.336999	95.690239	96.699444	2.766055e+07	0.000000	0.0
max	225.220001	226.750000	223.309998	225.500000	2.172942e+08	1.250000	0.0

Summaries: Morgan Stanley (MS)

summaries("WFC")

	Open	High	Low	Close	Volume	Dividends	Stock Splits
count	4739.000000	4739.000000	4739.000000	4739.000000	4.739000e+03	4739.000000	4739.000000
mean	32.789865	33.152898	32.426560	32.791581	2.868360e+07	0.004541	0.000422
std	11.955833	12.011066	11.915451	11.961528	2.786614e+07	0.038970	0.029053
min	5.830824	6.026308	5.257853	5.473559	2.392000e+06	0.000000	0.000000
25%	21.406942	21.671963	21.143000	21.415617	1.498130e+07	0.000000	0.000000
50%	36.005509	36.358258	35.558216	35.966167	2.094040e+07	0.000000	0.000000
75%	42.471899	42.879078	42.120639	42.502254	3.256150e+07	0.000000	0.000000
max	65.849998	66.400002	65.239998	65.610001	4.787366e+08	0.510000	2.000000

Summaries: Wells Fargo (WFC)

summaries("WFC")

	Open	High	Low	Close	Volume	Dividends	Stock Splits
count	4739.000000	4739.000000	4739.000000	4739.000000	4.739000e+03	4739.000000	4739.000000
mean	32.789865	33.152898	32.426560	32.791581	2.868360e+07	0.004541	0.000422
std	11.955833	12.011066	11.915451	11.961528	2.786614e+07	0.038970	0.029053
min	5.830824	6.026308	5.257853	5.473559	2.392000e+06	0.000000	0.000000
25%	21.406942	21.671963	21.143000	21.415617	1.498130e+07	0.000000	0.000000
50%	36.005509	36.358258	35.558216	35.966167	2.094040e+07	0.000000	0.000000
75%	42.471899	42.879078	42.120639	42.502254	3.256150e+07	0.000000	0.000000
max	65.849998	66.400002	65.239998	65.610001	4.787366e+08	0.510000	2.000000

sns.pairplot(stocks, hue = "bank")

Finally, we look at the trends in closing prices over the period.

new = stocks[["Close", "bank"]].reset_index()
sns.lineplot(x = "Date", y = "Close", data = new, hue = "bank", palette = "mako")

Detailed Analysis

I start by computing the daily stock returns for each of the stock for the period 2006-2024.

stocks["return"] = np.log(stocks.groupby('bank').Close.pct_change().add(1))

Plotting the returns data for each bank over time, we see the high volatility in returns during the 2008 financial crisis. This has smoothened out over the years as the crisis abated.

old = stocks.reset_index()
sns.lineplot(x = "Close", y = "return", hue = "bank", data = old, alpha = 0.3)

let us find out the maximum and mimimum daily return for each company.

old.groupby('bank')["return"].max()

bank
BAC    0.302096
CG     0.456316
GS     0.234818
JPM    0.223917
MS     0.625850
WFC    0.283407
Name: return, dtype: float64

The lowest daily returns for each of the firms are listed below:

old.groupby('bank')["return"].min()

bank
BAC   -0.342059
CG    -0.494696
GS    -0.210222
JPM   -0.232278
MS    -0.299658
WFC   -0.272101
Name: return, dtype: float64

Looking into the details, we can check which days had the highest and lowest returns. We see that the highest and lowest daily returns were during the 2008 financial crisis, highlighting the high volatility in stock prices duting this period as the returns plot above shows.

# largest daily returns 
old.nlargest(10, "return")[["Date", "return"]]

	Date	return
19655	2008-10-13 00:00:00-04:00	0.625850
5468	2008-11-24 00:00:00-05:00	0.456316
5539	2009-03-10 00:00:00-04:00	0.322773
822	2009-04-09 00:00:00-04:00	0.302096
19685	2008-11-24 00:00:00-05:00	0.286189
24332	2008-07-16 00:00:00-04:00	0.283407
24517	2009-04-09 00:00:00-04:00	0.275350
5506	2009-01-21 00:00:00-05:00	0.270572
767	2009-01-21 00:00:00-05:00	0.269878
5543	2009-03-16 00:00:00-04:00	0.269255

# largest daily returns 
old.nsmallest(10, "return")[["Date", "return"]]

	Date	return
5532	2009-02-27 00:00:00-05:00	-0.494696
766	2009-01-20 00:00:00-05:00	-0.342059
5466	2008-11-20 00:00:00-05:00	-0.306610
695	2008-10-07 00:00:00-04:00	-0.304163
19653	2008-10-09 00:00:00-04:00	-0.299658
793	2009-02-27 00:00:00-05:00	-0.297758
19651	2008-10-07 00:00:00-04:00	-0.286264
828	2009-04-20 00:00:00-04:00	-0.278916
19637	2008-09-17 00:00:00-04:00	-0.277283
24461	2009-01-20 00:00:00-05:00	-0.272101

Correlation Analysis

In this section, I do correlation analysis. Specifically, I run the correlation analysis between the stock returns for the different banks.

correlations = stocks.reset_index().pivot_table(index = "Date", columns = "bank", values = "return").corr()

sns.heatmap(correlations, cmap = "viridis")
plt.title("Heatmap of Correlations in Bank Returns, 2006-2024")

Text(0.5, 1.0, 'Heatmap of Correlations in Bank Returns, 2006-2024')

sns.clustermap(correlations, cmap = "viridis")
plt.title("Clustermap of Correlations in Bank Returns, 2006-2024")

Text(0.5, 1.0, 'Clustermap of Correlations in Bank Returns, 2006-2024')

Conclusion

In this project, we conducted an exploratory data analysis (EDA) of stock price data, focusing on visualizing key trends and patterns over time. The primary objective is to practice data visualization techniques and strengthen proficiency in using Python’s Pandas library for data manipulation. This analysis will include a range of visualizations to explore stock price fluctuations, highlight trends, and identify any seasonal patterns. It is important to note that this project is intended solely as a learning exercise in data analytics and should not be considered a comprehensive financial analysis or relied upon for investment advic (Muddana and Vinayakam 2024).

References

Muddana, A Lakshmi, and Sandhya Vinayakam. 2024. Python for Data Science. Springer.