Introduction

This project examines whether observable CEO background characteristics are associated with differences in firm revenue among Fortune 100 companies. The response variable is company revenue, measured as revenue_m, and because revenue is strongly right skewed, I also use log_revenue in plots and summaries. The analysis focuses on whether variables such as tenure, athletic background, gender, Ivy League education, and academic major are associated with variation in firm revenue.

Data

The dataset contains 100 Fortune 100 CEOs and the companies they lead, so each row represents one CEO-company pair. The population of interest is CEOs of Fortune 100 firms in 2024.

Building the dataset was a tedious manual process because the information was not available in one place. To collect undergraduate school, postgraduate school, academic major, and tenure, I first searched each CEO on LinkedIn. When that information was not available there, I used the executive biography or leadership page on the company website. Information on collegiate sports participation and revenue came from a dataset published by Psychology Today. Because this is a targeted dataset rather than a random sample, the results should be interpreted as describing Fortune 100 CEOs rather than all corporate executives.

Data preparation

Before beginning the analysis, I checked that the data were in a rectangular format with one row per CEO and one column per variable. I verified that the variables had the expected types, converted the indicator variables into factors with readable labels, and created log_revenue to better handle the skewness in company revenue.

I then checked for missing values and unusual features. The only variables with missing values were postgrad_college and postgrad_major_category. In these cases, the missing values reflected CEOs who did not attend postgraduate school, so I replaced those values with "None" to make the dataset easier to interpret. The quantitative variables, especially revenue and tenure, include some large values, but these appear to be real observations rather than obvious errors, so they were kept in the analysis.

After replacing these values, there are no remaining missing values in the dataset.

## There are no remaining missing values in the dataset.

First rows of the data

The first several rows of the data frame are shown below. There are 100 total observations in the dataset.

First eight rows of the Fortune 100 CEO dataset
ceo_name female_indicator athlete_indicator undergrad_college undergrad_major_category postgrad_college postgrad_major_category ivy_undergrad ivy_postgrad company revenue_m fortune_rank tenure log_revenue
Douglas McMillon Male Athlete University of Arkansas Business University of Tulsa Business Non-Ivy Non-Ivy Walmart 648125 1 10 13.38184
Andrew Jassy Male Athlete Harvard University Unknown Harvard Business School Business Ivy Ivy Amazon 574785 2 3 13.26175
Tim Cook Male Athlete Auburn University Engineering Duke University Business Non-Ivy Non-Ivy Apple 383285 3 13 12.85653
Andrew Witty Male Athlete University of Nottingham Economics None None Non-Ivy Non-Ivy UnitedHealth Group 371622 4 4 12.82563
Warren Buffett Male Non-Athlete University of Nebraska Business Columbia University Economics Non-Ivy Ivy Berkshire Hathaway 364482 5 54 12.80623
Karen Lynch Female Athlete Boston College Business Boston University Business Non-Ivy Non-Ivy CVS Health 357776 6 3 12.78766
Darren Woods Male Athlete Texas A&M University Engineering Northwestern University Business Non-Ivy Non-Ivy Exxon Mobil 344582 7 7 12.75009
Sundar Pichai Male Athlete Indian Institute of Technology Kharagpur Engineering Stanford University Mixed Non-Ivy Ivy Alphabet 307394 8 9 12.63589

Each row represents one Fortune 100 CEO. The main columns used in this analysis are revenue_m, company revenue in millions of dollars; log_revenue, the natural log of revenue; tenure, years as CEO; female_indicator, CEO gender; athlete_indicator, whether the CEO has an athletic background; ivy_undergrad, whether the CEO attended an Ivy League college for undergraduate study; ivy_postgrad, whether the CEO attended an Ivy League institution for postgraduate study; undergrad_major_category, broad undergraduate major category; and postgrad_major_category, broad postgraduate major category.

Graphical and numerical summaries

Because this dataset contains several variables, it is important to be selective rather than show every possible graph or summary. I focus on the variables that are most relevant to the research question: company revenue, log revenue, tenure, gender, athletic background, Ivy League undergraduate education, and undergraduate major category. This keeps the analysis coherent and highlights the variables that are most informative for understanding possible differences in firm success.

For quantitative variables, the main features I focus on are center, spread, skewness, and possible outliers. For categorical variables, I focus on counts and whether the distribution is balanced or concentrated in only a few groups. I also summarize revenue carefully because it is the main measure of firm success in this project.

Revenue

Numerical summary of revenue
n mean_revenue median_revenue sd_revenue min_revenue max_revenue
100 122435.7 80296 107869.7 43452 648125

Revenue is strongly right skewed, with a few companies having much larger revenues than the rest. The mean is noticeably larger than the median, which is another sign of right skewness. The histogram shows that most firms are grouped at lower revenue values, while a small number of very large companies create a long right tail.

Log revenue

Numerical summary of log revenue
mean_log_revenue median_log_revenue sd_log_revenue min_log_revenue max_log_revenue
11.47 11.29 0.65 10.68 13.38

Taking the log of revenue makes the distribution much more symmetric and easier to analyze. The extreme right tail seen in the original revenue variable is reduced, so the histogram now shows a more balanced shape. This transformation makes comparisons across groups clearer and gives a better overall summary of how revenue is distributed.

Tenure

Numerical summary of tenure
mean_tenure median_tenure sd_tenure min_tenure max_tenure
7.85 5.5 7.75 1 54

Tenure is also right skewed. Most CEOs have relatively short tenures, while a smaller number have been in the role for many years. The histogram shows a clear concentration at lower values and a few larger observations in the tail, which suggests that long serving CEOs are less common in this group.

Gender

Counts for CEO gender
female_indicator n
Male 89
Female 11

The gender distribution is highly unbalanced, with men making up the large majority of CEOs in the dataset. This is useful context for the rest of the analysis because it shows that some background characteristics are much more common than others among Fortune 100 CEOs. It also means any comparisons involving gender should be interpreted carefully because the groups are very uneven in size.

Athlete background

Counts for athlete background
athlete_indicator n
Non-Athlete 32
Athlete 68

This bar chart shows that most CEOs in the dataset are classified as non-athletes, though a substantial minority have an athletic background. Since the categories are not evenly split, comparisons across these groups should be interpreted carefully. Still, the plot is useful because it quickly shows how common athletic participation is among Fortune 100 CEOs.

Ivy League undergraduate education

Counts for Ivy League undergraduate education
ivy_undergrad n
Non-Ivy 86
Ivy 14

Most CEOs in the dataset did not attend an Ivy League school for their undergraduate degree. The distribution is fairly unbalanced, with the non-Ivy group much larger than the Ivy group. This suggests that while Ivy League education may be notable, it is not the most common undergraduate path among these CEOs.

Undergraduate major category

Counts for undergraduate major category
undergrad_major_category n
Business 44
Economics 5
Engineering 22
Humanities 9
Mixed 6
Science 9
Unknown 5

The distribution of undergraduate majors is concentrated in a few categories, especially business and engineering. Other categories appear less often, which suggests that some academic backgrounds are much more common than others among Fortune 100 CEOs. The bar chart makes the imbalance across categories easy to see.

Relationships between variables

To understand how CEO background characteristics relate to firm success, I focus on a small number of relationship plots rather than plotting every possible pair of variables. These plots were chosen because they connect directly to the research question and compare the response variable, log revenue, to several meaningful CEO characteristics. The scatterplot is useful for looking at the association between two quantitative variables, while the boxplots are useful for comparing a quantitative outcome across categorical groups.

Tenure and log revenue

I chose this scatterplot because both tenure and revenue are central to the research question, and it allows me to examine whether more years in the CEO role are associated with differences in firm revenue. The plot suggests a slight positive relationship, meaning firms led by longer tenured CEOs may have somewhat higher revenues on average. However, the points are fairly spread out, so the association appears weak rather than strong.

Athlete background and log revenue

I included this boxplot because athletic background was one of the main CEO characteristics collected in the dataset. The distributions overlap heavily, which suggests that firms led by former athletes and non-athletes have fairly similar revenue patterns. While there may be some difference in center, the plot does not show a large or obvious separation between the two groups.

Ivy League undergraduate education and log revenue

This boxplot was chosen to examine whether CEOs with Ivy League undergraduate education tend to lead higher revenue firms. The two groups look fairly similar, with substantial overlap in both spread and center. This suggests that Ivy League undergraduate education does not have a strong visible relationship with revenue in this dataset.

Undergraduate major category and log revenue

I also included undergraduate major because it is another meaningful background characteristic that varies across CEOs. Some categories appear to have slightly different centers, but there is still a large amount of overlap, and no category stands out as dramatically different from the others. This suggests that undergraduate major may not be strongly related to firm revenue on its own.

Regression summary

As a final descriptive check, I fit a simple linear regression using log_revenue as the response and tenure as the predictor. I include this model only to summarize the same pattern shown in the scatterplot above. Since this project is primarily exploratory, I use the regression table as a compact numerical summary rather than as the main focus of the report.

Simple regression of log revenue on tenure
term estimate std.error statistic p.value
(Intercept) 11.3945 0.0927 122.9529 0.0000
tenure 0.0094 0.0084 1.1118 0.2689

The estimated slope for tenure is positive, which matches the slight upward trend in the scatterplot. At the same time, the estimated relationship is small, and the scatterplot shows substantial variability around that trend. Taken together, the visual and numerical evidence suggest that tenure is associated with only a modest difference in revenue.

Conclusion

Overall, the descriptive analysis shows that revenue varies widely across Fortune 100 firms and is strongly right skewed, which makes the log transformation useful for interpretation. Among the CEO characteristics considered here, none appears to separate firms into clearly distinct revenue groups.

The relationship plots show substantial overlap and considerable within-group variability, suggesting that background traits such as athletic status, Ivy League education, and undergraduate major are not strongly associated with revenue on their own. Tenure shows the clearest positive pattern, but even that relationship appears modest. Because the data are observational and limited to Fortune 100 CEOs, these findings should be interpreted as descriptive associations rather than causal effects.