Stocks Uncharted: Exploring Trends Through Data Visualization in Python

Independent Data Analysis Project

Published

November 2, 2024

Modified

November 2, 2024

Executive Summary

This project delves into stock price data to explore historical trends, volatility, and seasonal patterns using Python’s data visualization and analysis tools. By conducting a comprehensive exploratory data analysis (EDA), we aim to uncover key insights into stock price movements over time, with a particular focus on periods of heightened market activity. Notably, our analysis reveals that the highest levels of volatility occurred during the 2008 global financial crisis, highlighting the profound impact of economic downturns on market behavior. This project emphasizes skill-building in data visualization and the application of Pandas, and it is intended purely as an educational exercise rather than a basis for financial advice or decision-making.

Keywords

Data analysis, Python, Pandas, Seaborn, Numpy, Descriptive Analysis, Data Science, Machine Learning

Background

In this project, we will conduct an exploratory data analysis (EDA) of stock price data, focusing on visualizing key trends and patterns over time. The primary objective is to practice data visualization techniques and strengthen proficiency in using Python’s Pandas library for data manipulation. This analysis will include a range of visualizations to explore stock price fluctuations, highlight trends, and identify any seasonal patterns. It is important to note that this project is intended solely as a learning exercise in data analytics and should not be considered a comprehensive financial analysis or relied upon for investment advice.

We’ll focus on bank stocks and see how they progressed throughout the financial crisis all the way to October 2024.

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import yfinance as yf
import seaborn as sns
#%matplotlib inline

Data

We begin by getting data using yfinance. We will get stock information for the following banks:

  • Bank of America (BAC)
  • CitiGroup (C)
  • Goldman Sachs (GS)
  • JPMorgan Chase (JPM)
  • Morgan Stanley (MS)
  • Wells Fargo (WFC)

I get the stock data from Jan 1st 2006 to October 31 2024 for each of these banks. I set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This involves a few steps:

  1. Use datetime to Morgan Stanleyset start and end datetime objects.
  2. Figure out the ticker symbol for each bank.
  3. Figure out how to use yfinance to grab info on the stock.
## Set time limits 
start = datetime(2006, 1, 1)
end = datetime(2024, 10, 31)
# Get the tickers for target companies
# Bank of America (BAC) ----
BAC = yf.Ticker("BAC").history(start = start, end = end)

BAC["bank"] = "BAC"

# Citigroup
CG = yf.Ticker("C").history(start = start, end = end)

CG["bank"] = "CG"
# Goldman Sachs (GS)
GS = yf.Ticker("GS").history(start = start, end = end)

GS["bank"] = "GS"
# PMorgan Chase (JPM)
JPM = yf.Ticker("JPM").history(start = start, end = end)

JPM["bank"] = "JPM"

# Morgan Stanley (MS)
MS = yf.Ticker("MS").history(start = start, end = end)

MS["bank"] = "MS"

# Wells Fargo (WFC)
WFC = yf.Ticker("WFC").history(start = start, end = end)

WFC["bank"] = "WFC"
stocks = pd.concat([BAC, CG, GS, JPM, MS, WFC])
stocks.head()
Open High Low Close Volume Dividends Stock Splits bank
Date
2006-01-03 00:00:00-05:00 31.437709 31.611918 30.921790 31.544916 16296700 0.0 0.0 BAC
2006-01-04 00:00:00-05:00 31.491308 31.652116 31.122794 31.209898 17757900 0.0 0.0 BAC
2006-01-05 00:00:00-05:00 31.209895 31.377402 31.035687 31.250095 14970700 0.0 0.0 BAC
2006-01-06 00:00:00-05:00 31.357304 31.431007 31.055791 31.203197 12599800 0.0 0.0 BAC
2006-01-09 00:00:00-05:00 31.303702 31.471209 31.062492 31.223297 15619400 0.0 0.0 BAC

The result is a dataframe with 8 variables (columns) and 28439 observations (rows of data).

stocks.shape
(28434, 8)

EDA

I start by creating a function to allow us summarise the data by bank.

def summaries(bank):
  dat = stocks.loc[stocks['bank'] == bank, :]
  return dat.describe()

I then loop over the list of banks and create a summary for each, which allows us to see summaries like maximum, mimimum, median and dispersion of prices over the period.

  • Summaries: Bank of America (BAC)
summaries("BAC")
Open High Low Close Volume Dividends Stock Splits
count 4739.000000 4739.000000 4739.000000 4739.000000 4.739000e+03 4739.000000 4739.0
mean 21.668702 21.919907 21.403028 21.657642 1.002618e+08 0.002773 0.0
std 10.782920 10.867387 10.702971 10.788068 1.024020e+08 0.032446 0.0
min 2.564069 2.794993 2.009106 2.500365 4.835400e+06 0.000000 0.0
25% 12.140291 12.273633 11.969863 12.121308 4.204220e+07 0.000000 0.0
50% 21.571707 21.905641 21.206467 21.492443 6.683950e+07 0.000000 0.0
75% 30.939527 31.214085 30.654451 30.942898 1.204894e+08 0.000000 0.0
max 46.384301 46.570174 45.566466 45.891735 1.226791e+09 0.640000 0.0
  • Summaries: CitiGroup (C)
summaries("CG")
Open High Low Close Volume Dividends Stock Splits
count 4739.000000 4739.000000 4739.000000 4739.000000 4.739000e+03 4739.000000 4739.000000
mean 79.280590 80.216864 78.210487 79.158650 2.398256e+07 0.014237 0.000021
std 93.620117 94.420961 92.702660 93.510291 2.190130e+07 0.230598 0.001453
min 7.847383 8.232059 7.462706 7.847382 6.328600e+05 0.000000 0.000000
25% 36.159763 36.488568 35.697883 36.104925 1.263660e+07 0.000000 0.000000
50% 44.310362 44.821505 43.708280 44.312912 1.829400e+07 0.000000 0.000000
75% 59.690058 60.257005 58.995303 59.651955 2.889885e+07 0.000000 0.000000
max 393.738679 396.521284 389.209767 392.416931 3.772638e+08 5.400000 0.100000
  • Summaries: Goldman Sachs (GS)
summaries("GS")
Open High Low Close Volume Dividends Stock Splits
count 4739.000000 4739.000000 4739.000000 4739.000000 4.739000e+03 4739.000000 4739.0
mean 185.518849 187.606137 183.444982 185.547886 5.583694e+06 0.014631 0.0
std 94.109512 94.949841 93.325169 94.135548 6.555060e+06 0.150347 0.0
min 41.286103 41.698962 36.247667 39.756989 4.601000e+05 0.000000 0.0
25% 125.218123 126.723861 123.655421 124.965111 2.400100e+06 0.000000 0.0
50% 154.742873 156.225312 152.767405 154.532333 3.478100e+06 0.000000 0.0
75% 206.495709 208.284994 204.934901 206.527725 5.998600e+06 0.000000 0.0
max 538.799988 540.510010 528.229980 529.859985 1.145907e+08 3.000000 0.0
  • Summaries: JPMorgan Chase (JPM)
summaries("JPM")
Open High Low Close Volume Dividends Stock Splits
count 4739.000000 4739.000000 4739.000000 4739.000000 4.739000e+03 4739.000000 4739.0
mean 68.797871 69.488900 68.109048 68.814119 2.284516e+07 0.008696 0.0
std 48.225127 48.614146 47.859363 48.255914 2.006463e+07 0.079514 0.0
min 10.399280 11.698345 10.121876 10.757875 2.926400e+06 0.000000 0.0
25% 29.273920 29.592038 28.892567 29.253004 1.112370e+07 0.000000 0.0
50% 46.816695 47.215364 46.420093 46.854862 1.573690e+07 0.000000 0.0
75% 96.339535 97.336999 95.690239 96.699444 2.766055e+07 0.000000 0.0
max 225.220001 226.750000 223.309998 225.500000 2.172942e+08 1.250000 0.0
  • Summaries: Morgan Stanley (MS)
summaries("WFC")
Open High Low Close Volume Dividends Stock Splits
count 4739.000000 4739.000000 4739.000000 4739.000000 4.739000e+03 4739.000000 4739.000000
mean 32.789865 33.152898 32.426560 32.791581 2.868360e+07 0.004541 0.000422
std 11.955833 12.011066 11.915451 11.961528 2.786614e+07 0.038970 0.029053
min 5.830824 6.026308 5.257853 5.473559 2.392000e+06 0.000000 0.000000
25% 21.406942 21.671963 21.143000 21.415617 1.498130e+07 0.000000 0.000000
50% 36.005509 36.358258 35.558216 35.966167 2.094040e+07 0.000000 0.000000
75% 42.471899 42.879078 42.120639 42.502254 3.256150e+07 0.000000 0.000000
max 65.849998 66.400002 65.239998 65.610001 4.787366e+08 0.510000 2.000000
  • Summaries: Wells Fargo (WFC)
summaries("WFC")
Open High Low Close Volume Dividends Stock Splits
count 4739.000000 4739.000000 4739.000000 4739.000000 4.739000e+03 4739.000000 4739.000000
mean 32.789865 33.152898 32.426560 32.791581 2.868360e+07 0.004541 0.000422
std 11.955833 12.011066 11.915451 11.961528 2.786614e+07 0.038970 0.029053
min 5.830824 6.026308 5.257853 5.473559 2.392000e+06 0.000000 0.000000
25% 21.406942 21.671963 21.143000 21.415617 1.498130e+07 0.000000 0.000000
50% 36.005509 36.358258 35.558216 35.966167 2.094040e+07 0.000000 0.000000
75% 42.471899 42.879078 42.120639 42.502254 3.256150e+07 0.000000 0.000000
max 65.849998 66.400002 65.239998 65.610001 4.787366e+08 0.510000 2.000000
sns.pairplot(stocks, hue = "bank")

Finally, we look at the trends in closing prices over the period.

new = stocks[["Close", "bank"]].reset_index()
sns.lineplot(x = "Date", y = "Close", data = new, hue = "bank", palette = "mako")

Detailed Analysis

I start by computing the daily stock returns for each of the stock for the period 2006-2024.

stocks["return"] = np.log(stocks.groupby('bank').Close.pct_change().add(1))

Plotting the returns data for each bank over time, we see the high volatility in returns during the 2008 financial crisis. This has smoothened out over the years as the crisis abated.

old = stocks.reset_index()
sns.lineplot(x = "Close", y = "return", hue = "bank", data = old, alpha = 0.3)

let us find out the maximum and mimimum daily return for each company.

old.groupby('bank')["return"].max()
bank
BAC    0.302096
CG     0.456316
GS     0.234818
JPM    0.223917
MS     0.625850
WFC    0.283407
Name: return, dtype: float64

The lowest daily returns for each of the firms are listed below:

old.groupby('bank')["return"].min()
bank
BAC   -0.342059
CG    -0.494696
GS    -0.210222
JPM   -0.232278
MS    -0.299658
WFC   -0.272101
Name: return, dtype: float64

Looking into the details, we can check which days had the highest and lowest returns. We see that the highest and lowest daily returns were during the 2008 financial crisis, highlighting the high volatility in stock prices duting this period as the returns plot above shows.

# largest daily returns 
old.nlargest(10, "return")[["Date", "return"]]
Date return
19655 2008-10-13 00:00:00-04:00 0.625850
5468 2008-11-24 00:00:00-05:00 0.456316
5539 2009-03-10 00:00:00-04:00 0.322773
822 2009-04-09 00:00:00-04:00 0.302096
19685 2008-11-24 00:00:00-05:00 0.286189
24332 2008-07-16 00:00:00-04:00 0.283407
24517 2009-04-09 00:00:00-04:00 0.275350
5506 2009-01-21 00:00:00-05:00 0.270572
767 2009-01-21 00:00:00-05:00 0.269878
5543 2009-03-16 00:00:00-04:00 0.269255
# largest daily returns 
old.nsmallest(10, "return")[["Date", "return"]]
Date return
5532 2009-02-27 00:00:00-05:00 -0.494696
766 2009-01-20 00:00:00-05:00 -0.342059
5466 2008-11-20 00:00:00-05:00 -0.306610
695 2008-10-07 00:00:00-04:00 -0.304163
19653 2008-10-09 00:00:00-04:00 -0.299658
793 2009-02-27 00:00:00-05:00 -0.297758
19651 2008-10-07 00:00:00-04:00 -0.286264
828 2009-04-20 00:00:00-04:00 -0.278916
19637 2008-09-17 00:00:00-04:00 -0.277283
24461 2009-01-20 00:00:00-05:00 -0.272101

Correlation Analysis

In this section, I do correlation analysis. Specifically, I run the correlation analysis between the stock returns for the different banks.

correlations = stocks.reset_index().pivot_table(index = "Date", columns = "bank", values = "return").corr()

sns.heatmap(correlations, cmap = "viridis")
plt.title("Heatmap of Correlations in Bank Returns, 2006-2024")
Text(0.5, 1.0, 'Heatmap of Correlations in Bank Returns, 2006-2024')

sns.clustermap(correlations, cmap = "viridis")
plt.title("Clustermap of Correlations in Bank Returns, 2006-2024")
Text(0.5, 1.0, 'Clustermap of Correlations in Bank Returns, 2006-2024')

Conclusion

In this project, we conducted an exploratory data analysis (EDA) of stock price data, focusing on visualizing key trends and patterns over time. The primary objective is to practice data visualization techniques and strengthen proficiency in using Python’s Pandas library for data manipulation. This analysis will include a range of visualizations to explore stock price fluctuations, highlight trends, and identify any seasonal patterns. It is important to note that this project is intended solely as a learning exercise in data analytics and should not be considered a comprehensive financial analysis or relied upon for investment advic (Muddana and Vinayakam 2024).

References

Muddana, A Lakshmi, and Sandhya Vinayakam. 2024. Python for Data Science. Springer.