Top Billionaires - Data Visualization Python

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/users/pptallon/appdata/local/Anaconda3/Library/plugins/platforms'

import numpy as np
import pandas as pd
import seaborn as sns 
from matplotlib.ticker import FuncFormatter
import matplotlib.pyplot as plt
import plotly.graph_objects as go

Introduction

This data visualization will work with a data set that provides different types of information about billionaires around the world. I will be responsible for providing visualizations along with an analysis explaining what the different types of visualizations translte in each scenario where the data was created. It is also important to emphasize that all of the N/As in the data set were removed from this specific analysis for accuracy reasons and a better understanding of the data. Thank you!

Dataset

The data set provides information such as net worth, rankings, marital status, source of wealth, number of children, if they dropped out of college or not, if they were responsible for creating their own success, country where they are from and residency, industry, etc. These are all very interesting information that can be analyzed in order to understand something about the success from all these people.

Findings

The first interesting finding I had was that Elon Musk is the billionaire ranked number one with the highest net worth value and surprisingly, he has ten children with multiple women which was very mind blowing to me. The second interesting finding was that Elon Musk having the highest net worth between them all billionaires has a philanthropy score 1 which means that they have given less than 5% of their fortune to charity whereas Bill Gates with a lesser net worth value has a philanthropy score of 5 which means that they have given more than 20% of their fortune to charity. Another interesting finding to have in mind was that the industry with the most net worth value above the average net worth of the United States’ industries combined is the Technology industry with almost $450 billion dollars above the average.

Scatter Plot: Top 10 Billionaires’ Net Worth x Age

Visualization Analysis:

In this visualization, we can see the top 10 billionaires’s net worth in comparison to their ages. Right off the bat, we can see that Elon Musk stands out from all of them with the highest net worth value and he sits around the age of 50 years old which has other three billionaires such as Sergey Brin and Larry Page. It is fair to say that the rest of the names are spread out in the plot, showing off five names in the “younger” side and five names in the “older” part. We can arguably say that Elon Musk was a point outside of the curve because we see Mark Zuckenberg as the youngest with the least net worth value which can sound about right and Warren Buffett as the oldest above the 90s right in the middle of the net worth values overall with a little bit less than $120 billion dollars.


path = "C:/users/pptallon/desktop/"

filename = "forbes_2640_billionaires.csv"

df = pd.read_csv(path + filename)

# reading the columns from the data frame that I will use #
df = pd.read_csv(path + filename, usecols = ['rank', 'net_worth', 'age', 'name', 'Philanthropy Score', 'industry', 'country', 'Marital Status', 'Source of Wealth', 'Self-Made Score', 'Children', 'source'])

# getting rid of all the N/As in the data set #
df = df[df['net_worth'].notna() & df['age'].notna() & df['Source of Wealth'].notna() & df['Self-Made Score'].notna() & df['Philanthropy Score'].notna() & df['Marital Status'].notna() & df['Children'].notna() & df['source'].notna()]

# sorting the df to select top ten billionaires #
df = df.sort_values(by='net_worth', ascending=False)
topten = df.head(10)

# plotting the scatter plot #
plt.figure(figsize = (15,10))
plt.scatter(topten.age, topten.net_worth, marker = '8')
plt.ylabel("Net Worth (in Billions)", fontsize=16)
plt.xlabel("Age", fontsize=16)
plt.title("Top Ten Billionaires Age vs. Net Worth", fontsize=25)

for i, row in topten.iterrows():
    plt.scatter(row['age'], row['net_worth'], label=row['name'], s=200)

plt.legend(loc='best', fontsize= "small")
plt.show()

Bar Chart: Top 10 Billionaires’ Net Worth x Philanthropy Score

Visualization Analysis: Here in this visualization, we are displaying the top ten billionaires in correlation with their Philanthropy scores which represents a percentage given from their fortune as donations for charities or other institutions. The billionaires with the highest philanthropy scores 5 are Bill Gates and Warren Buffett which translates to the billionaires that donate to charities over 20% of all their fortune which is impressive because they do not hold the highest net worth values in the visualization. The billionaires who donate the less their fortune are the predominantly the ones with the most net worth values which can say a lot about them as people.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/users/pptallon/Anaconda3/Library/plugins/platforms'

# Plot the bar plot for the top ten billionaires
plt.figure(figsize=(10, 14))
bar_width = 0.4

index = range(len(topten))
plt.bar(index, topten['net_worth'], bar_width, label='Net Worth (in Billions)', color = 'orange');
plt.bar([i + bar_width for i in index], topten['Philanthropy Score'], bar_width, label='Philanthropy Score', color = 'purple');

plt.xlabel("Names of Billionaires", fontsize = 10);
plt.ylabel("Values (in Billions)", fontsize = 10);
plt.title("Top Ten Billionaires' Net Worth and Philanthropy Score", fontsize = 14);
plt.xticks([i + bar_width / 2 for i in index], topten['name'], fontsize = 10, rotation = 45);
plt.legend(loc = 'upper right', frameon = True, borderpad = 1);

# Add labels for each bar
for i, (nw, ps) in enumerate(zip(topten['net_worth'], topten['Philanthropy Score'])):
    plt.text(i, nw + 1, str(nw), ha='center', va='bottom', fontsize=8, color='black')
    plt.text(i + bar_width, ps + 1, str(ps), ha='center', va='bottom', fontsize=8, color='black');

plt.show()

Pie Chart: Top 1 Billionaires by Marital Status

Visualization Analysis: This visualization will be analyzing the top one billionaire from each of the Marital Statuses given in the data set. We can see that Elon Musk appear pretty much in every visualization because he is the most powerful when it comes to net worth value. We can see that his Marital Status is “Single” even though he was married before and does possess ten children with multiple women. The billionaire with the least net worth value from all of the Marital Statuses is Julia Koch and family which has a Marital Status as “Widowed”.


# Select the top two billionaires for each marital status
top_two_df = (
    df.groupby('Marital Status', group_keys=False)
    .apply(lambda group: group.nlargest(1, 'net_worth'))
    .reset_index(drop=True)
)

# Calculate total net worth by marital status and age for the top two billionaires
maritalstats_df = top_two_df.groupby(['Marital Status', 'age', 'name'])[['net_worth']].sum().reset_index()
maritalstats_df = maritalstats_df.sort_values(by='net_worth', ascending=False).reset_index(drop=True)

# Filter for specific marital statuses ('Single', 'Married', 'Divorced', 'Widowed')
selected_statuses = ['Single', 'Married', 'Divorced', 'Widowed']
maritalstats_df = maritalstats_df[maritalstats_df['Marital Status'].isin(selected_statuses)]

# Create a pie chart with more space at the bottom
fig, ax = plt.subplots(figsize=(15, 10))
plt.subplots_adjust(bottom=0.2)

# Extract data for the pie chart
labels = maritalstats_df.apply(lambda row: f"{row['name']} (${row['net_worth']:,.0f})", axis=1)
sizes = maritalstats_df['net_worth']

# Plot the pie chart
ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, pctdistance = 0.85);
ax.axis('equal');
plt.title('Top One Billionaire by Marital Status', fontsize= 18);
ax.legend(labels=maritalstats_df['Marital Status'], title='Marital Status', bbox_to_anchor=(1.1, 1));

# Show the pie chart
plt.show()

WaterFall Diagram: Top 5 Deviation between Net Worth x Average Net Worth by Industry in the United States

Visualization Analysis: In this Waterfall Diagram, we are taking into consideration the top 5 industries with the most net worth value in the United States in comparison with the average net worth value considering the industries displayed in the visualization which are: Technology, Finance & Investments, Fashion & Retail, Automotive, Media and Entertainment. In the visualization, the green color appears for the industry that has a net worth value above the average net worth value between these industries in the United States and the color red appears for the industries that have a net worth below the average net worth. As we can see, the industries with a higher than average net worth value are Technology and Finance & Investments. The ones that are below the average are Fashion & Retail, Automotive, and Media and Entertainment.


# creating a waterfall diagram
industries_df = df[df['country'] == 'United States'].groupby(['industry'])['net_worth'].sum().reset_index(name='TotalNetWorth')

# Sort the DataFrame by 'TotalNetWorth' in descending order
industries_df = industries_df.sort_values(by='TotalNetWorth', ascending=False)

# Select only the top 5 industries
industries_df = industries_df.head(5)

# Calculate average net worth and deviation using the original industries_df
average_net_usa = round(industries_df['TotalNetWorth'].mean(), 1)
industries_df['AverageNetWorth'] = average_net_usa
industries_df['Deviation'] = industries_df.TotalNetWorth - industries_df.AverageNetWorth
industries_df.reset_index(drop = True, inplace = True)

fig = go.Figure();
fig.add_trace(go.Waterfall(name='', orientation='v', x=industries_df['industry'], textposition='outside',
    measure=['relative', 'relative', 'relative', 'relative', 'relative', 'Total'],
    y=industries_df['Deviation'],
    text=['${:.2f}B'.format(each) for each in industries_df['TotalNetWorth']])
    );

# Set layout of diagram

fig.update_layout(
    yaxis=dict(tickprefix='$', ticksuffix='B'),
    xaxis_title='Industry',
    yaxis_title='Net Worth (in Billions)',
    xaxis_title_font={'size': 18},
    yaxis_title_font={'size': 18},
    title_text='Top 5 Deviation between Net Worth x Average Net Worth by Industry in the United States <br>' +
               'Net Worth above Average appear in Green, Below the Average appear in Red',
    font=dict(family='Arial', size=16, color='black'),
    template='simple_white',
    showlegend=False,
    autosize=True,
    title_x=0.5,
    title_font=dict(size=20),
    margin=dict(l=30, r=30, t=100, b=30),
    yaxis_range=[0,industries_df['TotalNetWorth'].max()]
    );

fig.show();

import plotly.io as pio
pio.write_html(fig, path+"plotly_result.html", auto_open = False)

Heatmap: Top 200 Billionaire’s Net Worth by Industry x Number of Children

Visualization Analysis: In this heat map, we are showing the billionaires’ net worth taking into consideration the industry in which they are placed and the number of children in order to analyze if that can tell us anything about their net worth values. The section that interests me the most is where we can see that there are three billionaires out of 200 who have ten children which is not expected at all. I would think this case would be one outlier and not three of them. One of them stands out with the highest net worth value in the whole map where the other two are very low in the color bar. Analyzing the heat map as a whole, I would say that there are not any big discrepancies between net worth values, it seems to be very leveled out throughout industries. Definitely, the most predominant number of children for billionaires to have appears to be between two and four children.


heatmap_df = df.nlargest(200, 'net_worth')[['industry', 'Children', 'net_worth']]
heatmap_data = pd.pivot_table(heatmap_df, index='industry', columns='Children', values='net_worth')

comma_format= FuncFormatter(lambda x, p: format(x, ','))

# Create the heatmap
plt.figure(figsize=(10, 6))
hmap = sns.heatmap(heatmap_data, annot=True, cmap='viridis', fmt='.0f', linewidths=.5, annot_kws={'size': 11},
           cbar_kws={'format': comma_format, 'orientation': 'vertical', 'extend': 'neither'})

# Set plot labels and title
plt.ylabel('Industry', fontsize=18, labelpad=10)
plt.xlabel('Number of Children', fontsize=18, labelpad=10)
plt.title('Top 200 Billionaires Net Worth Heat Map by Industry and Number of Children', fontsize=18, pad=15)
plt.yticks(size=14);
plt.xticks(size=14);

cbar = hmap.collections[0].colorbar
colorbar = [*range(0, 180, 20)]
cbar.set_ticks(colorbar);
cbar.set_label('Net Worth (in Billions)', rotation = 270, fontsize=14, color='black', labelpad=20);

# Show the heatmap
plt.show()

Conclusion

This is the end of my data visualization. Some final key takeaways from this analysis are that Elon Musk appears in almost every single visualization that involves the highest net worth values from the data set because he is the top 1 in our data set with the highest net worth value overall. It was interesting to visualize different factors that somehow play a role in these people’s lives and their success to become billionaires around the world.

Thank you!