knitr::include_graphics("https://static.wixstatic.com/media/7f663d_5e1aab377ea2480ebb3811c12a80d907~mv2.jpg")

Introduction

In my own professional development, I have thought a lot about the qualities leaders possess and what makes a leader. I’ve reflected a lot on the leaders around me and those that I admire in my organization and career but have never really looked too much into leaders outside of my industry or in the public eye. Seeing that many leaders in the corporate space are pretty easy to get information on, I’ve taken a data set from Kaggle (https://www.kaggle.com/ash316/forbes-top-2000-companies) and added to it. This data set is the Forbes Top 2000 companies from 2017 (the latest available from this Kaggle data set). In order to add information to this data set, I began by narrowing the Top 200 companies list down to only look at the companies that are based in the United States. From there, I added columns for the company’s CEOs, their birthdate, age, astrological sign, gender, race and birthplace. Once I hit over 200 data points, I am ready to begin my analysis.

While this information won’t necessarily give me the qualities of the leaders, it will tell me some surface level information about top corporate leaders in the United States. The questions I will be looking to answer and visualize with the following charts are:

• What are the basic demographics of the top CEOs?

• How does industry segment correlate to the overall profit in each sector?

• How do those demographics of the top CEOs correlate to their company sector?

• What is the astrological sun sign distribution for the top CEOs?

• Does the astrological sun sign of CEOs correlate to their company sector?

Descriptive Statistics

To begin our descriptive, it’s important to understand what has been included in this data set. The originally downloaded data set included:

• Company rank (On the Forbes 2017 list)

• Company

• Country

• Sales

• Profits

• Assets

• Market Value

• Sector

• Industry

Then to dig a little deeper beyond the company and apply information about the CEOs, I have narrowed down the scope to just be the companies from the United States as well as just include the top 205. From that point, I began researching and adding data. I’ve added:

• CEO

• Birthdate

• Sun Sign

• Age

• Gender

• Race

• Country (CEO’s Nationality)

In including this additional information, I was unable to establish some CEO’s birthdates. All had the year of birth available, but not always the date. This impacted the ability to determine the CEO’s sun sign, but at least provided us age information. The final uploaded dataset appears as:

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/ProgramData/Anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")

path = "U:/"
filename = path + 'Forbes_Top2000_2017_USONLY_200.csv'
df = pd.read_csv(filename, encoding='latin1', usecols = [' Rank', 'Company', 'Profits', 'Market Value', 'Sector', 'Industry', 'CEO ', 'Birthdate', 'SunSign', 'Age', 'Gender', 'Race', 'Born'])

df.head()

Demographic Analysis

The demographic analysis will serve as a continuation of the descriptive statistics for the data set. This will also allow us to begin to dive into the CEO information to better shape the later analysis and data visualizations. The demographic analysis will briefly looks at age, gender and race. Given that this list does not appear to be too diverse, I am working within the confines of only showing gender and race as something that’s binary. Gender will be shown as male or female and race will be shown in terms of descriptive statistics as white or not white and then in the included pie chart as a single race.

Age Analysis

Using basic descriptive statistics, we can look to learn a little more about the CEOs and the distributions of their basic demographics. Firstly, we have age. For the most part as I was adding this data to our dataset, there were a lot of CEOs that had a close range in terms of their birth year, but I also noticed some outliers. Using the min, max, average, and median functions, we can see what this information looks like for our database below.

The oldest CEO in the list is Warren Buffet with the youngest as Mark Zuckerberg with quite a few CEOs falling at the median age of 60.

OldestCEO = max(df['Age'])
print("The oldest CEO:", OldestCEO)  
## The oldest CEO: 90
YoungestCEO = min(df['Age'])
print("The youngest CEO:", YoungestCEO)  
## The youngest CEO: 36
AverageCEO = (sum(df['Age'])/len(df['Age']))
print("The average age for CEOs:", "{:.1f}".format(AverageCEO))  
## The average age for CEOs: 59.6
MedianCEO = np.median(df['Age'])
print("The median age CEO age:", "{:.0f}".format(MedianCEO))
## The median age CEO age: 60

Race Analysis

As we continue to look at the demographics of these top CEOs, race is where the variance in demographics is much smaller. Based on the top 205, if we just look at race as white versus not white there are 183 white CEOs and 21 that are not white. Breaking this down even further, we can get the pie chart below that shows that white CEOs make up 89.7% of the CEOs with the other 10.3% of CEOs being African American, Asian, Latin and Middle Eastern.

Race_df = df.apply(lambda x : True
            if x['Race'] == "White" else False, axis = 1) 
  
num_rows_white = len(Race_df[Race_df == True].index) 
num_rows_notwhite = len(Race_df[Race_df == False].index) 
  
print('White CEOs in the dataset : ', num_rows_white ) 
## White CEOs in the dataset :  183
print('Non-White CEOs in the dataset : ', num_rows_notwhite )
## Non-White CEOs in the dataset :  21
Full_Race_df = df.groupby(['Race'])['Race'].count().reset_index(name='RaceCount')
Full_Race_df = Full_Race_df.set_index('Race')

fig = plt.figure(figsize=(4, 4))
ax = fig.add_subplot(1, 1, 1)
def my_autopct(pct):
    return ('%1.1f%%' % pct) if pct > 3 else ''

ax.pie(Full_Race_df['RaceCount'], labels=Full_Race_df.index, colors = ['indigo', 'darkmagenta', 'mediumorchid','deeppink', 'pink'],
       autopct=my_autopct, pctdistance = 0.8, labeldistance = 1.07, 
        wedgeprops = dict(edgecolor='w'), textprops= {'fontsize':8},
        shadow=False, startangle=55)
txt="Percentages in the pie chart above that are below 3% are hidden"

plt.title('Race Distribution of Forbes \n Top CEOs ', fontsize=10)
fig.text(.05,.05,txt)
ax.axis('equal')  
plt.show()

Gender Analysis (Male vs. Female)

Gender is where this demographic breakdown gets a bit more interesting in terms of lack of diversity. It’s often noted in media and perhaps a stereotype that older, white men make up the leadership of most organizations. We have seen this to be proven for the most part based solely on race where 89.7% of the CEOs in this data set are white. However, a whopping 94.5% of the CEOs are male, proving this stereotype and media depiction of CEOs to be even more correct just based on the data alone. We can see this breakdown in the pie chart below.


Gender_df = df.apply(lambda x : True
            if x['Gender'] == "Male" else False, axis = 1) 
  
num_rows_male = len(Gender_df[Gender_df == True].index) 
num_rows_female = len(Gender_df[Gender_df == False].index) 
  
print('Male CEOs in the dataset : ', num_rows_male ) 
## Male CEOs in the dataset :  192
print('Female CEOs in the dataset : ', num_rows_female )
## Female CEOs in the dataset :  12

pieLabels = 'Female', 'Male'
sizes = [5.88, 94.12]

fig = plt.figure(figsize=(4, 4))
ax = fig.add_subplot(1, 1, 1)

ax.pie(sizes, labels=pieLabels, colors = ['mediumorchid', 'indigo'], autopct='%1.1f%%', pctdistance = 0.8, labeldistance = 1.07, 
        wedgeprops = dict(edgecolor='w'), textprops= {'fontsize':8},
        shadow=False, startangle=55)
plt.title('Gender Distribution of Forbes \n Top CEOs ', fontsize=10)
ax.axis('equal')  
plt.show()

Demographic and Sector Correlations

Now that we’ve broadly looked at the basic demographics of the dataset and the U.S.CEOs at the top of the Forbes list, are there any relationships that we can draw? Based on some general ideas I have about the companies we see on this list, my hypothesis is that a company’s sector is going to be highly correlated to the demographics of their CEOs. For example, I imagine that the CEOs of a Financials company is likely to be much more straight-forward, analytical and skew older than perhaps a CEO in the Information Technology sector. The following visualizations will look to explore these ideas.

Profit by Sector based on Industry

Something I’ve noticed about this data set is that there is data available for both the company sector and industry. However, I wonder if this difference really matters in the grand scheme of things. It makes sense that some sectors have different profits than others despite being at the top of the Forbes list, but does industry have an even further impact?

As we can see from the stacked bar chart below, the answer is definitely yes. While there is a lot of breakdown in this data set, it makes a significant difference in profit and company rank.

For example, in the Financials sector, we see that most of the profit for this Sector comes from the Major Banks industry, followed by Life & Health Insurance, and then followed by Regional Banks and closely by Consumer Financial Services. We see similar break downs in the Health Care sector where most of the profits come from Pharmaceuticals, Medical Equipment & Supplies and Biotechs followed in order then by Managed Health Care and Health Care Services.

However, we do see in a sector like mine, Utilities, that most of the profit comes from Electric Utilities.

stacked_df = df.groupby(['Sector', 'Industry'])['Profits'].sum().reset_index(name='TotalIndustryProfits')

stacked_df2 = stacked_df.pivot(index='Sector', columns='Industry', values='TotalIndustryProfits')

stacked_df2['TotalbySector'] = stacked_df2.sum(axis=1)

stacked_df2_sorted = stacked_df2.sort_values('TotalbySector', ascending=False)

del stacked_df2_sorted["TotalbySector"]
 

from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(1, 1, 1)

stacked_df2_sorted.plot(kind='bar', stacked=True, ax=ax, cmap='RdPu')
plt.ylabel('Total Profit by Sector', fontsize=16, labelpad=10)
plt.title('Profit by Industry for Each Sector \n  Stacked Bar Plot', fontsize=18)
plt.xticks(rotation=-10, horizontalalignment = 'center', fontsize=8)
plt.yticks(fontsize=12)
ax.set_xlabel('Sector', fontsize=16)

ax.yaxis.set_major_formatter(FuncFormatter( lambda x, pos:('$%1.0fM')%(x) ))
ax.legend(bbox_to_anchor=(.88, 0.042), fontsize=8)

plt.show()

Age Compared to Company Sector

Now that we know that Industries within each sector can have a significant impact the remaining charts will look at just sector but bare the Industry impact in mind as there are too many Industries to gather meaningful visualizations from. The next assumption that I have is that age by sector varies as well. If you recall, my hypothesis was that Financial companies would skew older while sectors like Information Technology would skew younger. To test if my hypothesis is correct or see if we can draw any conclusions to this question, I’ve created the line chart below.

The chart is grouped by sector so that each sector is represented by their own line with the X axis being the age group of the company CEOs and the Y axis being the number of company CEOs that fall into that age group. As we can see, the most prominent correlation is that most Financial sector company’s have CEOs that are in their 60s and that the oldest CEO, 90 year-old Warren Buffet, is also in this sector category.

While a lot of the sectors seem to have CEOs that are in their 60s, sectors like Consumer Staples, Health Care, Information Technology, Telecommunicates Services, and Utilities tend to skew a bit lower with most their CEOs in their 50s. This is somewhat surprising to me and goes against my hypothesis that really only technology would skew slightly lower than the norm. That’s still true but there are a few more sectors shown below that I did not consider.

Age_df = df[['Profits','Sector', 'Age']].copy()
Age_df.groupby('Sector')
Age_df['AgeGroup'] = pd.cut(x=Age_df['Age'], bins=[29, 39, 49, 59, 69, 79, 89, 99], 
                            labels=['30s', '40s', '50s', '60s', '70s', '80s', '90+'])
Age_df = Age_df[['Sector', 'AgeGroup']].copy() 
Age_df_Summary = Age_df.groupby(['Sector', 'AgeGroup'])['AgeGroup'].count().reset_index(name='AgeGroupCount')
from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize = (20, 15))
ax = fig.add_subplot(1, 1, 1)

my_colors = {'Consumer Discretionary':'indigo',
            'Consumer Staples':'darkmagenta',
            'Energy':'darkorchid',
            'Financials':'mediumvioletred',
            'Health Care':'palevioletred',
            'Industrials':'deeppink',
            'Information Technology':'hotpink',
            'Materials':'pink',
            'Telecommunication Services':'bisque',
            'Utilities':'linen'}

for key, grp in Age_df_Summary.groupby(['Sector']):
    grp.plot(ax=ax, kind='line', x='AgeGroup', y ='AgeGroupCount', color=my_colors[key], label=key, linewidth=3, marker='8')
    
plt.title('CEO Age Group by Sector', fontsize=18)
ax.set_xlabel('Age Group', fontsize=16)
ax.set_ylabel('Count of CEOs by Age Group', fontsize=16, labelpad=10)
ax.tick_params(axis='x', labelsize=12, rotation=0)
ax.tick_params(axis='y', labelsize=12, rotation=0)
    
plt.show()

Astrological Signs of CEOs

As noted in the introduction, this analysis isn’t telling us too terribly much so far about the actual traits of the top corporate leaders in the U.S. beyond just their demographics and how this information correlates to various sectors and profit. However, from a surface level analysis on traits, I personally find this section to be the most exciting (because of my personal interest in astrology) and telling part of the analysis. The following charts will use the astrological signs of CEOs to look at traits that are commonly associated with specific sun signs and translate that to possible traits of these leaders.

Astrological Sign by Volume

As I embarked on this analysis, I did some preliminary research to see if there are any star signs that are perceived to be more successful than others. In the Business Insider article “The most common zodiac signs of the world’s successful CEOs — and what else they have in common” (https://www.businessinsider.com/the-most-common-zodiac-signs-of-worlds-most-successful-ceos-2020-3), their findings were that the most common zodiacs were Taurus followed by Leo, Pisces, Scorpio, and Vigo in that order as the top five.

However, based on this data set of the Forbes Top 200 U.S. Companies, this is not the case. Our data set finds the top five zodiac signs by volume on this list as Cancer, Libra, Pisces, Capricorn in that order with a three-way tie for fifth place between Taurus, Sagittarius, and Leo.

I’m going to go ahead and chalk these difference up to different data sets, timing and overall what Business Insider deemed successful CEOs to be. However, even in our data set there is a ranking, but not a huge difference in terms of total count per zodiac sign. Using our top five zodiac’s and their commonly associated positive traits (https://i.thehoroscope.co/), we can make an assumption that most of our CEOs have the following qualities:

  1. Cancer- Tenacious, dependable, and persuasive

  2. Libra- Tactful, eloquent, and charming

  3. Pisces- Sensible, compassionate, and intuitive

  4. Capricorn- Faithful, responsible, and ambitious

5.Taurus- Practical, loyal, and trustworthy

  1. Sagittarius- Generous, frank and enthusiastic

  2. Leo- Magnanimous, elegant and dedicated

All of these traits would seemingly make for a good leader and/or translate to success in business with some traits equating to corporate success more so than others. For example, based on just one horoscope website’s positive traits for these signs, I see Cancers, Pisces’, Capricorns, and Taurus’ being the most successful as a CEO based on this analysis.

# Data Cleanup for CEOs without Birthdates
df.SunSign.fillna("Not Available", inplace=True)
NoBirthdayonFile = df[ df['SunSign'].str.contains('Not Available')]

# Remove CEOs without birthdates
df_birthdates = df[ -df['SunSign'].isin(NoBirthdayonFile.SunSign) ]
SunSignsAvailable = df_birthdates
SunSignsAvailable['Birthdate'] = pd.to_datetime(SunSignsAvailable['Birthdate'], format='%m/%d/%Y')

# Birthdate Time Transformation
SunSignsAvailable['Day'] = SunSignsAvailable.Birthdate.dt.day
SunSignsAvailable['Month'] = SunSignsAvailable.Birthdate.dt.month
SunSignsAvailable['Year'] = SunSignsAvailable.Birthdate.dt.year
SunSignsAvailable['WeekDay'] = SunSignsAvailable.Birthdate.dt.strftime('%a')
SunSignsAvailable['MonthName'] = SunSignsAvailable.Birthdate.dt.strftime('%b')

# Donut Chart / Nested Pie Chart Data - Build up Dataframe

pie_df = SunSignsAvailable.groupby(['SunSign', 'MonthName'])['SunSign'].sum().reset_index(name='Sign')

Inner_Pie =  SunSignsAvailable.groupby(['SunSign','MonthName']).size().reset_index(name='MonthCount')

Inner_Pie['MonthName'] = pd.Categorical(Inner_Pie['MonthName'], ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"])

Inner_Pie_Sorted = Inner_Pie.sort_values("MonthName")

Outer_Pie =  SunSignsAvailable.groupby(['SunSign']).size().reset_index(name='SignCount')

Outer_Pie['SunSign'] = pd.Categorical(Outer_Pie['SunSign'], ["Aquarius", "Pisces", "Aries", "Taurus", "Gemini", "Cancer", "Leo", "Virgo", "Libra", "Scorpio", "Sagittarius", "Capricorn"])

Outer_Pie_Sorted = Outer_Pie.sort_values("SunSign")
# Set up inside and outside reference numbers for colors - reference numbers draw from Color Map (CMAP)

number_outside_colors = len(Outer_Pie.SunSign.unique())
outside_color_ref_number = np.arange(number_outside_colors)

number_inside_colors = len(Inner_Pie.MonthName.unique())
all_color_ref_number = np.arange(number_outside_colors + number_inside_colors)

inside_color_ref_number = []
for each in all_color_ref_number:
    if each not in outside_color_ref_number:
        inside_color_ref_number.append(each)
sign_pie_outside_colors = [ 'indigo', 'darkmagenta','deeppink','pink']
element_sign = ['Air Signs', 'Water Signs', 'Fire Signs', 'Earth Signs']

plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

all_CEOs = Outer_Pie.SignCount.sum()

Outer_Pie_Sorted.groupby(['SunSign'])['SignCount'].sum().plot(
       kind='pie', radius=1, colors = sign_pie_outside_colors, pctdistance = 0.85, labeldistance = 1.05,
       wedgeprops = dict(edgecolor='w'), textprops= {'fontsize':12},
       autopct = lambda p: '{:.2f}%\n({:.0f})'.format(p,(p/100)*all_CEOs),
       startangle=90)

inner_colors = colormap(inside_color_ref_number)
#Inner_Pie_Sorted.groupby(['MonthName'])['MonthCount'].sum().plot(kind='pie', radius=0.7, colors = inner_colors, pctdistance = 0.55, labeldistance = 0.8, wedgeprops = dict(edgecolor='w'), textprops= {'fontsize':13}, labels = Inner_Pie_Sorted.MonthName, autopct = '%1.2f%%', startangle=45)

hole = plt.Circle((0,0), 0.45, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Forbes Top U.S. Companies: \n CEOs Astrological Signs', fontsize=18)
plt.legend(element_sign, loc="upper right")

ax.axis('equal')
plt.tight_layout()

plt.show()

Astrological Sign by Sector

After the analysis on the zodiac sign’s by volume, there wasn’t a massive difference or one sign that was really reigning supreme. In looking more at the traits of these signs, it struck me that all kinds of signs would be suited for corporate success, but perhaps the company’s sector is more relevant in terms of certain traits prevailing in leadership over others. In taking this same data, we will look at astrological signs and the number of CEOs with each zodiac sign in a given company sector.

In the scatter chart below, we can see that Capricorns are a natural front runner in the Financials sector with seven CEOs being Capricorns. This is followed by Capricorns again in the Consumer Discretionary sector, Leos in Financials, Pisces in Financials and Virgos in Financials. It makes sense that we’re seeing a lot of Financials in this chart since there are a lot of Financial sector companies in this data set. Looking at these traits for just the CEOs that are in the Financial sector, the signs that are most likely to be CEOs are:

• Capricorn- Faithful, responsible, and ambitious

• Leo- Magnanimous, elegant and dedicated

• Pisces- Sensible, compassionate, and intuitive

• Virgo- Meticulous, elegant and persuasive

These make sense that these CEOs are most prominent in the Financial sector given some of the key traits are responsible, ambitious, dedicate, sensible, intuitive, meticulous and persuasive depending on the sign. However, some of the least likely traits to be a CEO of a Financial company (based on this data) are:

• Aries- Versatile, courageous and spontaneous

• Sagittarius- Generous, frank and enthusiastic

Again, just from a surface level, I could see how these prevailing traits, while still positive, would not fit well with the CEO of a Financial company.

bar_sector_signs = SunSignsAvailable.groupby(['Sector', 'SunSign'])['SunSign'].count().reset_index(name='SignCount')
bar_sector_signs = bar_sector_signs.fillna(0)

#MostLikely_sector_signs = bar_sector_signs.idxmax(axis=1)
plt.figure(figsize=(20, 15))

plt.scatter(bar_sector_signs['SunSign'], bar_sector_signs['Sector'], marker='8', cmap='RdPu',
           c=bar_sector_signs['SignCount'], s=(bar_sector_signs['SignCount']*700), edgecolors='black')

plt.title('CEO Sun Sign by Company Sector', fontsize=18)
plt.xlabel('Sun Sign of Company CEOs', fontsize=16)
plt.ylabel('Company Sector', fontsize=16)

cbar = plt.colorbar()
cbar.set_label('Number of CEOs', rotation=270, fontsize=16, color='black', labelpad=20)

plt.show()

Conclusion

As we look back at this analysis, some basic information about the CEOs on the 2017 Forbes Top Companies is true. From a demographic perspective, most of the CEOs in the dataset are white men in their 50s and 60s which we can see from the pie charts on gender, race and the line chart for age. We can also tell from the stacked bar chart that some sectors are much more profitable than others and various industries within each sector are more profitable as well. Finally, when we try to extract some information about the traits of these CEOs based on the standard information available, based on their zodiacs, we can see that most CEOs are dependable, persuasive, sensible, ambitious, and responsible to name a few. However, in looking further, there are some differences per sector on the demographics of leaders and the traits that are likely applicable to being a CEO of a company in a specific sector. A common myth about the zodiac signs of successful leaders was also disproven as it relates to our data set. While zodiac signs are not a firm way to determine the traits of the top CEOs in our country, this was definitely a fun way to use and look at the data.