import warnings
import statistics
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import matplotlib.patches as mpatches
from matplotlib.ticker import FuncFormatter

warnings.filterwarnings("ignore")

Financial Indices and Indicators Analysis

For my second deliverable in my IS460 Data Visualization course at Loyola University Maryland, I chose to continue to tap into interests stemming from being a Finance major by analyzing financial indices and indicators. I have spent a lot of time this spring looking at financial and economic data in a wealth management internship where I support financial advisors ; using the data as a guide and not as a rule is an extremely beneficial I have learned. This data set has the date, the prices of various indices (the S&P500, the Dow Jones Industrial Average, the Hang Seng Index, the VIX, etc.) on a given day, some of the indices’ trading volume, economic indicators, and a few other assorted variables. The main index in this data set is the S&P 500, which is often referred to as “the market” and has the largest reputation of any American equity index. It is the stereotypical benchmark that public equity securities are often compared against. Publicly-traded companies such as Google, Microsoft, Apple, Amazon, Nvidia, Berkshire Hathaway, JPMorgan Chase, and numerous more are some of the reputable leaders in this index.

This data set has the following variables: date, vix, sp500, sp500_volume, djia, djia_volume, hsi, ads, us3m, joblessness, epu, GPRD, prev_day

Disclaimer: This analysis of financial indices and indicators data does not solicit competent financial advice. This is a face-value analysis of patterns and trends found in the data.


# Uploading Data from CSV
sAp500Data = pd.read_csv("C:/Users/raymo/OneDrive/Documents/IS460/Python/stock_data.csv")

# Renaming variable names in dataframe
sAp500Data.rename(columns = {'dt' : 'Date',
                             'vix' : 'VIX',
                             'sp500' : 'SP500_Price',
                             'sp500_volume' : 'SP500_Vol',
                             'djia' : 'DJIA_Price',
                             'djia_volume' : 'DJIA_Vol',
                             'hsi' : 'HSI',
                             'ads' : 'ADS',
                             'us3m' : 'US3Month',
                             'joblessness' : 'Joblessness',
                             'epu' : 'EPU',
                             'prev_day' : 'Prev_Close_SP500_Price'}, inplace=True)
                             
# Changing 'Date' data type for deriving future variables
sAp500Data['Date'] = pd.to_datetime(sAp500Data['Date'], format = '%Y-%m-%d')

# Adding more of the data-time variables
sAp500Data['Day'] = sAp500Data.Date.dt.day
sAp500Data['WeekDay'] = sAp500Data.Date.dt.strftime('%a')
sAp500Data['Month'] = sAp500Data.Date.dt.month
sAp500Data['MonthName'] = sAp500Data.Date.dt.strftime('%b')
sAp500Data['Quarter'] = sAp500Data.Date.dt.quarter
sAp500Data['QuarterName'] = 'Quarter ' + sAp500Data.Date.dt.quarter.astype(str)
sAp500Data['Year'] = sAp500Data.Date.dt.year

# Creating a daily return variable for S&P 500 returns
sAp500Data['SP500_Daily_Return'] = (sAp500Data['SP500_Price'] - sAp500Data['Prev_Close_SP500_Price']) / sAp500Data['Prev_Close_SP500_Price']

# Removing 2024 data from sAp500Data
sAp500Data = sAp500Data[sAp500Data.Year != 2024]

Visualization 1: Annualized Daily Returns S&P500 Returns (Bar Graph)

# Visualization 1: Annualized Daily Returns S&P500 Returns (Bar Graph)
daily_returns_df = sAp500Data[['Year', 'SP500_Daily_Return']]

# Grouping daily returnings by year and summing up to calculate annual returns
annual_returns_df = daily_returns_df.groupby(['Year'])['SP500_Daily_Return'].sum().reset_index(name='AnnualReturn')

# Multiplying 'AnnualReturn' for improved visualization quality
annual_returns_df['AnnualReturn'] = annual_returns_df['AnnualReturn']*100
# Visualization 1: Annualized Daily Returns S&P500 Returns (Bar Graph)

# Formatting Functions for Bar Graph

# Coloring function...
def pick_colors_according_to_mean_count(this_data):
  colors = [] # Intentional empty list
  # avg =  this_data.Count.mean()
  for each in this_data:
    if each > 0:
      colors.append('green')
    elif each < 0:
      colors.append('red')
    else:
      colors.append('gray')
  return colors

# Auto-labeling function
def autolabel(these_bars, this_axis, place_of_decimals, symbol):
  for each_bar in these_bars:
    return_height = each_bar.get_height()
    if return_height >= 0:
      direction = 0.25
      display_height = return_height + direction

    else:
      direction = -1.375
      display_height = return_height + direction

    this_axis.text(each_bar.get_x() + each_bar.get_width()/2,
                   display_height,
                   format(return_height, place_of_decimals)+symbol,
                   fontsize=8,
                   color='black',
                   ha='center',
                   va='bottom')
# Visualization 1: Annualized Daily Returns S&P500 Returns (Bar Graph)

# Setting up the colors
sp500_return_colors = pick_colors_according_to_mean_count(annual_returns_df['AnnualReturn'])

Positive  = mpatches.Patch(color='green', label='Positive')
Neutral   = mpatches.Patch(color='red',   label='Neutral')
Negative  = mpatches.Patch(color='gray',  label='Negative')

# Setting up figure plot
fig = plt.figure(figsize=(16,10))
ax = fig.add_subplot(1,1,1)

# Calling the colors for the bar graph
sp500_bars = ax.bar(annual_returns_df['Year'], annual_returns_df['AnnualReturn'], label='S&P 500 Annual Return', color=sp500_return_colors)

# Setting up titles, ticks, ticklabels, labeling, legend, and axisline
plt.title('Bar Graph of S&P 500 Annual Returns between 1990 and2023', fontsize=14)
plt.xlabel('Year', fontsize=14, labelpad=15)
plt.ylabel('S&P 500 Return', fontsize=14, labelpad=15)

ax.set_xticks(annual_returns_df['Year'])
ax.set_xticklabels(annual_returns_df['Year'], rotation=90, fontsize=10)

ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos:('%1.2f')%(x) + '%'))
autolabel(sp500_bars, ax, '.1f', '%')

ax.legend(loc='lower left', fontsize=10, handles = [Positive, Neutral, Negative]) # mpatches label
plt.axhline(y=0, color='black')

plt.show()

The first visualization captures the annual returns of the S&P 500 - which is made of 500 of the largest publicly traded companies - index from 1990 to 2023. I choose to visualize the data in this manner because of the ever timely conversation that takes place throughout each trading day: how much will the S&P 500 return this year?

Estimates for this year’s return have drastically been altered since I submitted my last deliverable last month due to the brewing conflict with the U.S., Israel, and Iran in the Middle East. In preparation for discussing this question, I wanted to emphasize the importance of looking at historical data to derive trends in financial markets; the S&P has returned an average of ~8-9% during these years. Three areas of importance are as follows: 1995-2002 (the Dot-com Boom and Crash), 2008 (the Global Financial Crisis), and 2020-2023 (COVID and a post-pandemic world).

Within these eight years, the Internet became exponentially accepted and integrated into the business world. Despite five consecutive years of high growth - approximately 20% - of the index, the market eventually crashed downwards in the final three years of the 20th Century. Businesses were forever changed because of the ‘‘Technological Revolution’; at face value, I think that we may be entering a similar trend in terms of future returns, with this period being called the “Informational Revolution” or the “Intelligence Revolution”, due to Artificial Intelligence.

2008 has become historically revered as the worst year for financial markets in recent history. Mortgage-backed securities and the popping of a housing bubble were the key drivers that enabled the infrastructure of the global financial systems to collapse. These two variables coincided with each other and erased more than $19T from financial markets across the globe.

At the turn of the third decade in the 21st Century, our world succumbed to the Coronavirus pandemic; the index dropped ~32% within four weeks between February and March. The market soon recovered, ending the year with a return of 21.1% and the subsequent year (2021) returned 24.7%. The S&P did not maintain this momentum into 2022, as the index returned -18.7% due to rapidly rising inflation, rising geopolitical tensions with Russia invading Ukraine, and financial markets experiencing growing pains in a post-COVID world.

While not visualized due to a lack of data, the S&P 500 returned ~25% in 2024 and ~18% in 2025. The index has been off to a rough start this year, with the index returning ~-4% at the end of March; perhaps this is a red year or the start of consecutive red years?

Visualization 2: Monthly Average Returns between 1990 and 2023 (Waterfall Diagram)

# Visualization 2: Monthly Average Returns between 1990 and 2023 (Waterfall Diagram)

waterfall_data_df = sAp500Data[['Date', 'Month',    'MonthName',    'Year', 'SP500_Daily_Return']]

# Creating data frame with monthly average S&P 500 returns
wf_mean_monthly_df = waterfall_data_df.groupby(['MonthName'])['SP500_Daily_Return'].mean().reset_index(name='MonthlyReturndiv30')
wf_mean_monthly_df['MonthlyReturn'] = wf_mean_monthly_df['MonthlyReturndiv30'] * 30 # Multiply them by 30 to represent
del wf_mean_monthly_df['MonthlyReturndiv30'] # Removes unnecessary column

# Organizing the average monthly returns by month
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

wf_mean_monthly_df['MonthlyReturnScaled'] = wf_mean_monthly_df['MonthlyReturn'] * 100 # Improving visualization

wf_mean_monthly_df.MonthName = pd.Categorical(wf_mean_monthly_df.MonthName, categories=months, ordered=True) # DOES not actually sort!
wf_mean_monthly_df.sort_values(by='MonthName', inplace=True) # Sorting by month order
wf_mean_monthly_df.reset_index(inplace=True, drop=True)
# Visualization 2: Monthly Average Returns between 1990 and 2023 (Waterfall Diagram)

fig = go.Figure(go.Waterfall(name='', orientation='v',
                             x = wf_mean_monthly_df['MonthName'],
                             textposition='outside',
                             measure = ['relative', 'relative', 'relative','relative', 'relative', 'relative',
                                        'relative', 'relative', 'relative','relative', 'relative', 'relative'],

                             y= wf_mean_monthly_df['MonthlyReturnScaled'],
                             text = ['{:.2f}%'.format(each) for each in wf_mean_monthly_df['MonthlyReturnScaled']],

                             decreasing={'marker' : {'color' : 'red'}},
                             increasing={'marker' : {'color' : 'green'}},
                             
                             # modifying the hover template
                             hovertemplate = 'Historical Average Monthly Change for %{x}: %{text}' + '<br>' +
                                             'Cumulative Annual Return in %{x}: ' + '%{y:,.2f}%'))

# Cleaning up some more things here:
fig.layout = go.Layout(yaxis=dict(tickformat ='.2f',  range = [-1, 15])) # Updates tick formating and the displayed range on the y-axis
fig.update_xaxes(title_text='Months', title_font = {'size' : 18})
fig.update_yaxes(title_text='Total Fines (Running Total $M)',
                 title_font = {'size' : 18},
                 dtick=1,
                 ticksuffix ='%',
                 zeroline=True)
                 
fig.update_layout(title= 'Average Monthly Change in the S&P 500 Index between 1990 and 2023 (Waterfall Diagram)<br>' +
                          'Positive Months apprear in Green, Negative Months appear in Red',
                  font = dict(family='Arial',
                              size=16,
                              color='black'),
                  title_x = 0.5,
                  showlegend = False,
                  autosize=True,
                  margin=dict(l=30, r=30, t=90, b=30),
                  template = 'simple_white')
                  
fig.show() # I do not know why this is outputting four times!

Returning to the historical returns of the S&P 500, the third visualization above is an interactive waterfall diagram that captures the average return that the S&P 500 has delivered for each month between 1990 and 2023. Each month has a colored bar, indicating how much the index return averaged between these 34 years. When you hover over each bar, the visualization will display the monthly average return and the cumulative annual return as the months progress.

The first two months of the year are quite historically flat in returns on average, with January and February netting 35 bps in returns. The spring months - March, April, and May - is the first noticeable growth period for the index, with each month returning above 1.25% on average. A possible theory of mine as to why these consecutive three months have been historically strong is that Q4 earnings from the previous year have ought to be released by this point in time. Q4 is typically the strongest three month period in terms of equity performance; I will discuss this further momentarily.

Then, the index stalls once the warmer months begin; on average, the index loses value in June, August, and September. July is the black sheep of the summer months, by returning an average of 2.09% in the past 34 years. The fourth quarter of the fiscal year - October, November, and December - is typically known as one of the better performing periods of the fiscal year. This visualization confirms this historical tendency, with October returning 2.16%, November returning 3.17%, and December returning 2.02%; these three consecutive months return ~7.35% for the year. Two possible reasons that I believe to drive this quarter as the strongest quarter are end-of-year pressure to achieve higher stock return goals and greater consumer spending with Halloween, Thanksgiving, and the Holiday Season.

When comparing this chart to the first visualization of annualized bar charts of S&P 500 returns, there is a discrepancy between these two visualizations pertaining to how much the S&P 500 will return in an average year. The average annual return from the bar chart was between 8-9%, whereas cumulative annual return based on average monthly returns from the waterfall diagram is +13%. Misrepresenting or falsifying return statistics to enhance the performance of a financial security or index is a frowned-upon practice in the Finance industry. While this visualization does offer insights into which months/quarters/seasons have performed well in the past three plus decades, one takeaway to be mindful of is that the S&P 500 does not return ~13% in the long run.

Visualization 3: Standardized Mean Quarterly S&P 500 Drivers (Multi-line Graph)

# Visualization 3: Standardized Mean Quarterly S&P 500 Drivers (Multi-line Graph)

# Creating a new data frame to visualize multiple lines at once
trendlines_df = sAp500Data[['Date', 'SP500_Daily_Return', 'US3Month', 'EPU', 'GPRD', 'VIX', 'Month', 'Quarter', 'QuarterName', 'Year']]

# Grouping all of these variables together by year first and then quarter second
grouped_tls_df = trendlines_df.groupby(['Year', 'Quarter'])[['SP500_Daily_Return', 'US3Month', 'EPU', 'GPRD', 'VIX']].mean().reset_index()

# Creating a variable to serve as x-axis labels; will utilize later...
grouped_tls_df['TickTime'] = grouped_tls_df['Year'].astype(str) + ' Q' + grouped_tls_df['Quarter'].astype(str)
# Visualization 3: Standardized Mean Quarterly S&P 500 Drivers (Multi-line Graph)

# Creating labels for visualization
count = 1
tick_labels = []
standardized_labels = []

for period in grouped_tls_df['TickTime']:
  if count % 2 == 0: # Only collects the even counts and quarters
    standardized_labels.append(period)
    tick_labels.append(count)
  count += 1

Correlation Table of Quarterly Variables

# Visualization 3: Standardized Mean Quarterly S&P 500 Drivers (Multi-line Graph)

# Creating Monthly S&P500 Quarterly Returns
grouped_tls_df['SP500_QReturns'] = grouped_tls_df['SP500_Daily_Return']*90

# Creating GPRD/VIX ratio
grouped_tls_df['GPRDoverVIX'] = grouped_tls_df['GPRD'] / grouped_tls_df['VIX']

# Setting up new data frame
standardized_tls_df = grouped_tls_df[['Year', 'Quarter']]

# ------------------------------------------------------------------------------

# Standardizing all of the variables...

# Syntax for std. dev. function # https://numpy.org/doc/stable/reference/generated/numpy.std.html

# First, GPRDoverVIX
ratio_mean = grouped_tls_df.GPRDoverVIX.mean()
ratio_sd = np.std(grouped_tls_df['GPRDoverVIX'])
standardized_tls_df['S_GPRDoverVIX'] = (grouped_tls_df['GPRDoverVIX'] - ratio_mean)/ratio_sd

# Second, EPU
epu_mean = grouped_tls_df.EPU.mean()
epu_sd = np.std(grouped_tls_df['EPU'])
standardized_tls_df['S_EPU'] = (grouped_tls_df['EPU'] - epu_mean)/epu_sd

# Third, US3Month
us3month_mean = grouped_tls_df.US3Month.mean()
us3month_sd = np.std(grouped_tls_df['US3Month'])
standardized_tls_df['S_US3Month'] = (grouped_tls_df['US3Month'] - us3month_mean)/us3month_sd

# Fourth, S&P500_QReturns
sp500_mean = grouped_tls_df.SP500_QReturns.mean()
sp500_sd = np.std(grouped_tls_df['SP500_QReturns'])
standardized_tls_df['S_SP500_QReturns'] = (grouped_tls_df['SP500_QReturns'] - sp500_mean)/sp500_sd

# Correlation between standardized variables
standardized_tls_df[['S_GPRDoverVIX', 'S_EPU', 'S_US3Month', 'S_SP500_QReturns']].corr()
# https://realpython.com/numpy-scipy-pandas-correlation-python/#pearson-correlation-pandas-implementation
# Visualization 3: Standardized Mean Quarterly S&P 500 Drivers (Multi-line Graph)

# Setting up plot, axis, and basic visualization
fig = plt.figure(figsize=(14,10))
ax = fig.add_subplot(1,1,1)
standardized_tls_df.plot(kind='line',
                         ax=ax,
                         y=['S_GPRDoverVIX', 'S_EPU', 'S_US3Month', 'S_SP500_QReturns'],
                         marker = '8', 
                         markersize = 5  # 'markersize' argument : https://matplotlib.org/stable/tutorials/pyplot.html
                         )

# Adding a chart title and axis titles
plt.title('Standardized Mean Quarterly S&P 500 Drivers (Multi-line Graph)', fontsize=18, pad=10)
plt.xlabel('Year and Quarter', fontsize=14, labelpad=15)
plt.ylabel('Standardized Z-Score (+/-)', fontsize=14, labelpad=15)

# Additional formatting: x-axis, y-axis, and legend
start, end = ax.get_xlim()
ax.xaxis.set_ticks(np.arange(start, end, 2))
_ = ax.set_xticks(ticks=tick_labels,labels=standardized_labels) # https://matplotlib.org/stable/gallery/ticks/ticklabels_rotation.html
plt.xticks(rotation=90, ha='center', fontsize=8)
plt.axhline(y=0, color='black', linewidth=1)
plt.legend(loc='upper left', fontsize=14)

plt.show()

After creating the previous visualization, I was curious to see how certain variables included in the data set are historically viable as indicators for the S&P 500’s performance and relative to each other. One challenge that I initially created when developing this visualization was that these three variables all have different scales; I standardized these variables with their Z-scores to bring them all onto the same scale for effective visualization.

The GPRD/VIX (blue line) ratio is the quotient for the geopolitical risk index and the volatility index of the S&P500; my intention with creating this variable was to see how geopolitical risk translated into trading volatility in the market. Larger spikes above +2 or below -1 represent the lows of the Dot-Com crash, 9/11, the Global Financial Crisis, and the Coronavirus pandemic.

The EPU (Economic Policy Uncertainty; orange line) variable is an index that intends to quantify global economic uncertainty that is related to policy; in similar nature to the GPRD/VIX ratio, this variable has surged in response to 9/11, the Global Financial Crisis, and the Coronavirus pandemic. In 1990, the EPU index started the year at 100; the index surged to 170 after 9/11; EPU spiked to ~187 in October 2008 and remained elevated above 100 until July 2013; the index rose into the high 100s / low 200s before the 2020s began and rose to the low 300s in the spring of 2020.

The 3 Month Treasury variable (green line) represents the annualized yield that the United States Treasury will pay to 3 month treasury bill holders. There are different durations of treasuries; one/two/three/six month(s) and one/two/three/five/seven/ten are some of the most common kinds of treasuries. Treasury yields change daily in response to market sentiment, economic data, and other factors as well. Standardized yields were quite high in the early and late 1990s; yields declined for a few years in the early 2000s before climbing up to a Z-score of 1. At the time of the Global Financial Crisis, treasury yields sharply fell and remained quite flat for about eight years until 2016. Yields slowly increased until 2020, where rates dropped back near zero due to the Coronavirus pandemic. After 2020, rates surged to highs that had not been seen in nearly 30 years due to spiking inflation and post-COVID world struggling to regain its footing in financial markets.

With respect to the correlation table, all three of these variables move inversely to each other in varying degrees: the GPRD/VIX ratio has a weak negative correlation to the EPU variable (-0.244) and a very weak negative correlation to the US 3 month treasury variable (-0.039); the EPU variable has a moderate negative correlation to US 3 month treasury variable (-0.506). All of these variables have quite a weak relationship to mean quarterly S&P 500 returns (GPRD/VIX : 0.127, EPU: -0.012, US3M: 0.006), which indicate they all have lackluster correlations. One thing I would be curious to see if how the total quarterly returns differ from the average quarterly returns; I believe that this does have some influence on the lack of correlation between these variables and S&P returns.

Visualization 4: VIX Volatility Levels by Year (Stacked Bar Plot)

# Visualization 4: VIX Volatility Levels by Year (Stacked Bar Plot)

vix_df = sAp500Data[['Year', 'MonthName', 'VIX']]
vix_df

###################
### FROM CLAUDE ###
# My comments below

# Sets the conditions for the four levels
conditions = [vix_df['VIX'] < 15, 
              vix_df['VIX'] < 25,
              vix_df['VIX'] < 30,
              vix_df['VIX'] >= 30]

# Used the following link to determine thresholds: https://www.td.com/ca/en/investing/direct-investing/articles/understanding-vix

# Places labels onto the four levels
choices = ['Low', 'Medium', 'High', 'Extreme'] 

# Creates the new variables with the np.select function
vix_df['VIX_Level'] = np.select(conditions, choices, default=pd.NA)

####################

grpd_stckd_df = vix_df.groupby(['Year', 'VIX_Level'])['MonthName'].count().reset_index(name='Level_Count')
grpd_stckd_df

pvtd_grpd_stckd_df = grpd_stckd_df.pivot(index='Year', columns='VIX_Level', values='Level_Count') # Reformats the data frame
pvtd_grpd_stckd_df

levels_order = ['Low', 'Medium', 'High', 'Extreme']
pvtd_grpd_stckd_df = pvtd_grpd_stckd_df.reindex(columns=(levels_order))
pvtd_grpd_stckd_df
# Visualization 4: VIX Volatility Levels by Year (Stacked Bar Plot)

# Additional visualization supports

# My own color map; the warmer the color, the more volatile the S&P 500 is displaying
stckd_bars_colors = {'Low'     : 'mediumseagreen',
                     'Medium'  : 'gold',
                     'High'    : 'orange',
                     'Extreme' : 'red'            } 

# Incrementting the y-axis ticks by 25 days intervals
y_ticks = list(range(0, 276, 25))

# Setting up plot, axis, and basic visualization
fig = plt.figure(figsize=(16,10))
ax = fig.add_subplot(1,1,1)
pvtd_grpd_stckd_df.plot(kind='bar', stacked=True, color= stckd_bars_colors, ax=ax)

# Adding a chart title and axis titles
plt.title('VIX Volatility Levels by Year (Stacked Bar Plot)', fontsize=18)
plt.xlabel('Year', fontsize=14, labelpad=10)
plt.ylabel('Days per Year', fontsize=14, labelpad=10)

# Additional formatting: x-axis, y-axis, and legend
plt.xticks(fontsize=10, rotation=90, ha='center')
plt.yticks(fontsize=10, ticks = y_ticks)
plt.ylim(0, 295)
plt.legend(loc='best')

plt.show()

Visualization 4: VIX Volatility Levels by Year (Stacked Bar Plot)

The VIX is the Volatility Index for the S&P 500, the main index that has been under focus in this assignment; this index tracks the expected variability in the S&P 500’s price change, which means it is a predictive indicator of the S&P 500 returns. This visualization displays the annual volatility of the VIX with four levels: low, medium, high, and extreme. The following colors represent the value thresholds the index has recorded over the past three decades: green indicates under 15, yellow indicates under 25, orange represents under 30, and red represents 30 and above.

In tandem with the first visualization depicting annual S&P 500 returns, this visualization speaks to which years the market has yielded favorable returns and sizable losses. I expect this index does have a stronger correlation than the other variables I examined in the prior visualization (GPRD/VIX, EPU, and US3M). While stacked bars with an abundance of warmer colors do not directly indicate a poor year for S&P 500 returns, this indicates that there is an elevated amount of risk that securities in the market exhibit.

Visualizing the data in this manner displays the VIX to be slightly cyclical in nature. Year/periods of more stability and consistency have lengthy green bars for their base, indicating lots of days with low volatility; years/periods with lengthy yellow bars often have at least a fourth to a half of the bar made up of orange and red, indicating that medium volatility is becoming the new normal and high and extreme volatility is becoming more probable. The following events reflect the change in annual VIX level counts and S&P 500 returns: many years in the early 1990s before the Dot-Com crash, the Dot-com Crash and 9/11, a brief pocket of stability, the Global Financial Crisis, stability for the remainder of the 2010s, elevated volatility at the turn of the decade with the Coronavirus pandemic in 2020 and the global bear market of 2022.

Visualization 5: Top and Bottom 10 Years of S&P500 Returns (Bump Chart)

# Visualization 5: Top and Bottom 10 Years of S&P500 Returns (Bump Chart)

# # From sorted_annual returns:
# # Best 5 years: 2023, 1991, 2021, 2009, 2003
# # Worst 5 years: 2008, 2002, 2022, 2001, 2000

bump_df = sAp500Data.groupby(['Year','MonthName'])['SP500_Daily_Return'].sum().reset_index(name='SP500MonthlyReturn')
bump_df = bump_df.pivot(index='Year', columns='MonthName', values='SP500MonthlyReturn')

month_order = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec',]

bump_df = bump_df.reindex(columns= month_order)
bump_df
# Visualization 5: Top and Bottom 10 Years of S&P500 Returns (Bump Chart)

# Best 5 years: 2023, 1991, 2021, 2009, 2003

# top5_years_bum_df = bump_df.iloc[[8,29,23,7,5], ]
top5_years_bum_df = bump_df.iloc[[5,7,23,29,8], ]

top5_years_bum_df

top5_years_bump_df_ranked = top5_years_bum_df.rank(0, ascending=False, method='min') # 0 means row by row, 1 means column by column
top5_years_bump_df_ranked = top5_years_bump_df_ranked.T
top5_years_bump_df_ranked

# Worst 5 years: 2008, 2002, 2022, 2001, 2000

# bottom5_years_bum_df = bump_df.iloc[[10,11,12,18,32], ]
bottom5_years_bum_df = bump_df.iloc[[18,12,32,11,10], ]
bottom5_years_bum_df

bottom5_years_bump_df_ranked = bottom5_years_bum_df.rank(0, ascending=False, method='min') # 0 means row by row, 1 means column by column
bottom5_years_bump_df_ranked = bottom5_years_bump_df_ranked.T
bottom5_years_bump_df_ranked
# Visualization 5: Top and Bottom 10 Years of S&P500 Returns (Bump Chart)

# Plotting a Bump Chart with and without text labels inside the markers
fig = plt.figure(figsize=(16,8))
fig.suptitle('Bump Charts of Top 5 and Bottom 5 Years of Annual S&P 500 Returns', fontsize=24)
ax1= fig.add_subplot(1,2,1) # Left frame: Top 5 Years of Returns ranked Monthly
ax2 = fig.add_subplot(1,2,2) # Right frame: Bottom 5 Years of Returns ranked Monthly

# ------------------------------------------------------------------------------
# Left frame: Top 10 Years ranked
top5_years_bump_df_ranked.plot(kind='line', ax=ax1, marker='o',
                                      markeredgewidth=2, linewidth=2,
                                      markersize=30, markerfacecolor='white')

ax1.invert_yaxis() # Inverts the y-axis
ax1_num_rows = top5_years_bump_df_ranked.shape[0]
ax1_num_cols = top5_years_bump_df_ranked.shape[1]

ax1.set_xticks(np.arange(ax1_num_rows), month_order, fontsize=14)
ax1.set_yticks(range(1,ax1_num_cols+1,1))
ax1.set_ylabel('Monthly Ranking')
ax1.set_title('Top 5 Years of S&P 500 Returns ranked Monthly')

handles, labels = ax1.get_legend_handles_labels()
ax1_handles = [handles[0],handles[1], handles[2], handles[3], handles[4]]
ax1_labels =  [labels[0],labels[1], labels[2], labels[3], labels[4]]
ax1.legend(ax1_handles, ax1_labels, bbox_to_anchor=(-0.06, 1.01), fontsize=10,
           labelspacing=1, markerscale=0.4, borderpad=1, handletextpad=0.8)

# Text for left frame
i = 0 # For counting rows
j = 0 # For counting columns

for eachcol in top5_years_bump_df_ranked.columns:
  for eachrow in top5_years_bump_df_ranked.index:
    this_rank_inf_fig1 = top5_years_bump_df_ranked.iloc[i,j]
    ax1.text(i, this_rank_inf_fig1,
            str(round(top5_years_bum_df.iloc[j,i]*100,1)) + '%',

             ha = 'center', va = 'center', fontsize=8, fontweight='bold')
    i += 1
  j +=1
  i = 0
# # ------------------------------------------------------------------------------

# # ------------------------------------------------------------------------------
# # Right frame: Top 10 Years ranked
bottom5_years_bump_df_ranked.plot(kind='line', ax=ax2, marker='o',
                                      markeredgewidth=2, linewidth=2,
                                      markersize=30, markerfacecolor='white')

ax2.invert_yaxis() # Inverts the y-axis
ax2_num_rows = bottom5_years_bump_df_ranked.shape[0]
ax2_num_cols = bottom5_years_bump_df_ranked.shape[1]

ax2.set_xticks(np.arange(ax2_num_rows), month_order, fontsize=14)
ax2.set_yticks(range(1,ax2_num_cols+1,1))
ax2.set_ylabel('Monthly Ranking')
ax2.set_title('Bottom 5 Years of S&P 500 Returns ranked Monthly')



handles, labels = ax2.get_legend_handles_labels()
ax2_handles = [handles[0],handles[1], handles[2], handles[3], handles[4]]
ax2_labels =  [labels[0],labels[1], labels[2], labels[3], labels[4]]
ax2.legend(ax2_handles, ax2_labels, bbox_to_anchor=(1.01, 1.01), fontsize=10,
           labelspacing=1, markerscale=0.4, borderpad=1, handletextpad=0.8)

# Text for left frame
m = 0 # For counting rows
n = 0 # For counting columns

for eachcol in bottom5_years_bump_df_ranked.columns:
  for eachrow in bottom5_years_bump_df_ranked.index:
    this_rank_inf_fig2 = bottom5_years_bump_df_ranked.iloc[m,n]
    ax2.text(m, this_rank_inf_fig2,
             str(round(bottom5_years_bum_df.iloc[n,m]*100,1)) + '%',
             ha = 'center', va = 'center', fontsize=8, fontweight='bold')
    m += 1
  n +=1
  m = 0
# ------------------------------------------------------------------------------

plt.show()

The final visualization above displays two charts focusing on the ranking of monthly returns for the best five years and worst five years the S&P 500 has returned between 1990 and 2023.

The bump chart on the left focuses on the years 1998, 2019, 2013, 1997, and 1995; these were the five strongest years for S&P 500 returns. While each of these years returned between 25-29%, they all had different month-over-month trends. 1995 was the best year for S&P 500 returns in the entire data set, returning 29.7% (number from Visualization 1); the index captured between ~2-4% in ten of twelve months, with the two remaining months decreasing by less than 50bps in each respective month. While it only had one month where the year was ranked first, this was in the month of August where the index saw drops in valuation, ranging from -0.0% (1995) to -15.3% (1998).

This visualization depicts 1998 as a noticeably volatile year, with this year being ranked last four of twelve months and ranked first in monthly returns six of twelve times. A prime instance of this year’s volatility: after the aforementioned 15.3% decrease in August 1998, the S&P 500 returned +5.5% for four consecutive months to close the year. 1997 was another noticeably volatile year, with monthly ranking and returns fluctuating frequently. This index did not have any double digit monthly losses; seven of the twelve months returned between 4-7%, which bolstered S&P 500 returns for the year.

One historical macrotrend that I realized from this visualization is that all of the years were either before or after a period of historical underperformance. 1995 has the early 1990s, 1997 & 1998 has the Dot-Com Crash, 2013 has the Global Financial Crisis, and 2019 had the Coronavirus pandemic. I theorize that returns of this magnitude are a result of poor market performance, not an indicator to predict when markets will turn into bear markets.

The bump chart on the right focuses on the years 2008, 2002, 2022, 2001, 2000; these were the five weakest years for S&P 500 returns. When initially comparing this bump chart to the previous one, I assert that there is more variance in the rankings and returns. The lengths of changes in month-over-month rankings pass the eye test in appearing to be longer than the lines in the top five years of returns.2008 (blue line) is the worst year of all years in terms of S&P 500 returns, amounting to -40.1% (number from Visualization 1). Only three of twelve months yielded positive returns and half of 2008’s months were ranked either fourth and/or fifth.

While making this visualization, I spoke with my finance professor and he asked me which period was worse: 2008 or 2000 to 2002? While we are comparing one poor year to three poor years, he pointed out that this three-year period was the only one in the 35 years observed with consecutive negative performance. Further, these three periods amount to more total losses in the three years than in 2008 alone. 2000 (purple line), 2001 (red line), and 2002 (orange line) were all volatile years of varying degrees. 2001 and 2002 each had three bottom ranked months of returns and periodically fluctuate between rankings of first through third. Building on this theme of volatility, 2022 (green line) displayed highly volatile monthly returns, with seven months returning greater than 5% (two months) and/or less than -5% (five months).

Key takeaways: change in month-over-month performance of the S&P 500 differs drastically depending on the magnitude of performance. Both bump charts had one or two months each where the S&P 500 decreased by double digit percentages and one month each where the top returning year was negative. While visually intense, long-term investors could benefit if further research were to be conducted on the difference in monthly returns by year.

Conclusion: Takeaways, Recommnedations, and Next Steps

After completing this assignment, I have the following takeaways, recommendations, and next steps. Visualizations with time a variable strengthens the quality of visualizations. I believe that I was not able to account for time in all of my visualizations for my last final deliverable and wanted to do such for this assignment. Visualizing financial effectively proved to be a bit more challenging and reward than I originally anticipated. One regret I had was not taking the extra steps to graphically compare the S&P to the DJIA or the HSI; I chose out of simplicity to focus on the S&P 500 because I could easily calculate the daily return as a percentage. I recommend to find a bigger data set - variables wise - as I felt a little limited to use what I had and break out the data into different time periods. One problem with having 30+ years of data is that visualizing data in days or weeks (sometimes even months) may actually be ineffective; utilizing quarters and years to examine macrotrends was the most successful method I discovered. My recommendations are that, for those who are interested,this data set is a starting point in examine financial indices and indicators. Using APIs to passively recent real-time data can be an effective method to visualize data from the latest points of 2024 - which I filtered out initially - to the present day in 2026. Looking at the change in S&P 500 returns in relationship to commodities, fixed income, and alternative assets are some promising paths to look into when comparing equity benchmarks. In short, I enjoyed this assignment and am looking forward to the next one!