This project examines whether observable CEO background
characteristics are associated with differences in firm revenue among
Fortune 100 companies. The response variable is company revenue,
measured as revenue_m, and because revenue is strongly
right skewed, I also use log_revenue in plots and
summaries. The analysis focuses on whether variables such as tenure,
athletic background, gender, Ivy League education, and academic major
are associated with variation in firm revenue.
The dataset contains 100 Fortune 100 CEOs and the companies they lead, so each row represents one CEO-company pair. The population of interest is CEOs of Fortune 100 firms in 2024.
Building the dataset was a tedious manual process because the information was not available in one place. To collect undergraduate school, postgraduate school, academic major, and tenure, I first searched each CEO on LinkedIn. When that information was not available there, I used the executive biography or leadership page on the company website. Information on collegiate sports participation and revenue came from a dataset published by Psychology Today. Because this is a targeted dataset rather than a random sample, the results should be interpreted as describing Fortune 100 CEOs rather than all corporate executives.
Before beginning the analysis, I checked that the data were in a
rectangular format with one row per CEO and one column per variable. I
verified that the variables had the expected types, converted the
indicator variables into factors with readable labels, and created
log_revenue to better handle the skewness in company
revenue.
I then checked for missing values and unusual features. The only
variables with missing values were postgrad_college and
postgrad_major_category. In these cases, the missing values
reflected CEOs who did not attend postgraduate school, so I replaced
those values with "None" to make the dataset easier to
interpret. The quantitative variables, especially revenue and tenure,
include some large values, but these appear to be real observations
rather than obvious errors, so they were kept in the analysis.
After replacing these values, there are no remaining missing values in the dataset.
## There are no remaining missing values in the dataset.
The first several rows of the data frame are shown below. There are 100 total observations in the dataset.
| ceo_name | female_indicator | athlete_indicator | undergrad_college | undergrad_major_category | postgrad_college | postgrad_major_category | ivy_undergrad | ivy_postgrad | company | revenue_m | fortune_rank | tenure | log_revenue |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Douglas McMillon | Male | Athlete | University of Arkansas | Business | University of Tulsa | Business | Non-Ivy | Non-Ivy | Walmart | 648125 | 1 | 10 | 13.38184 |
| Andrew Jassy | Male | Athlete | Harvard University | Unknown | Harvard Business School | Business | Ivy | Ivy | Amazon | 574785 | 2 | 3 | 13.26175 |
| Tim Cook | Male | Athlete | Auburn University | Engineering | Duke University | Business | Non-Ivy | Non-Ivy | Apple | 383285 | 3 | 13 | 12.85653 |
| Andrew Witty | Male | Athlete | University of Nottingham | Economics | None | None | Non-Ivy | Non-Ivy | UnitedHealth Group | 371622 | 4 | 4 | 12.82563 |
| Warren Buffett | Male | Non-Athlete | University of Nebraska | Business | Columbia University | Economics | Non-Ivy | Ivy | Berkshire Hathaway | 364482 | 5 | 54 | 12.80623 |
| Karen Lynch | Female | Athlete | Boston College | Business | Boston University | Business | Non-Ivy | Non-Ivy | CVS Health | 357776 | 6 | 3 | 12.78766 |
| Darren Woods | Male | Athlete | Texas A&M University | Engineering | Northwestern University | Business | Non-Ivy | Non-Ivy | Exxon Mobil | 344582 | 7 | 7 | 12.75009 |
| Sundar Pichai | Male | Athlete | Indian Institute of Technology Kharagpur | Engineering | Stanford University | Mixed | Non-Ivy | Ivy | Alphabet | 307394 | 8 | 9 | 12.63589 |
Each row represents one Fortune 100 CEO. The main columns used in
this analysis are revenue_m, company revenue in millions of
dollars; log_revenue, the natural log of revenue;
tenure, years as CEO; female_indicator, CEO
gender; athlete_indicator, whether the CEO has an athletic
background; ivy_undergrad, whether the CEO attended an Ivy
League college for undergraduate study; ivy_postgrad,
whether the CEO attended an Ivy League institution for postgraduate
study; undergrad_major_category, broad undergraduate major
category; and postgrad_major_category, broad postgraduate
major category.
Because this dataset contains several variables, it is important to be selective rather than show every possible graph or summary. I focus on the variables that are most relevant to the research question: company revenue, log revenue, tenure, gender, athletic background, Ivy League undergraduate education, and undergraduate major category. This keeps the analysis coherent and highlights the variables that are most informative for understanding possible differences in firm success.
For quantitative variables, the main features I focus on are center, spread, skewness, and possible outliers. For categorical variables, I focus on counts and whether the distribution is balanced or concentrated in only a few groups. I also summarize revenue carefully because it is the main measure of firm success in this project.
| n | mean_revenue | median_revenue | sd_revenue | min_revenue | max_revenue |
|---|---|---|---|---|---|
| 100 | 122435.7 | 80296 | 107869.7 | 43452 | 648125 |
Revenue is strongly right skewed, with a few companies having much larger revenues than the rest. The mean is noticeably larger than the median, which is another sign of right skewness. The histogram shows that most firms are grouped at lower revenue values, while a small number of very large companies create a long right tail.
| mean_log_revenue | median_log_revenue | sd_log_revenue | min_log_revenue | max_log_revenue |
|---|---|---|---|---|
| 11.47 | 11.29 | 0.65 | 10.68 | 13.38 |
Taking the log of revenue makes the distribution much more symmetric and easier to analyze. The extreme right tail seen in the original revenue variable is reduced, so the histogram now shows a more balanced shape. This transformation makes comparisons across groups clearer and gives a better overall summary of how revenue is distributed.
| mean_tenure | median_tenure | sd_tenure | min_tenure | max_tenure |
|---|---|---|---|---|
| 7.85 | 5.5 | 7.75 | 1 | 54 |
Tenure is also right skewed. Most CEOs have relatively short tenures, while a smaller number have been in the role for many years. The histogram shows a clear concentration at lower values and a few larger observations in the tail, which suggests that long serving CEOs are less common in this group.
| female_indicator | n |
|---|---|
| Male | 89 |
| Female | 11 |
The gender distribution is highly unbalanced, with men making up the large majority of CEOs in the dataset. This is useful context for the rest of the analysis because it shows that some background characteristics are much more common than others among Fortune 100 CEOs. It also means any comparisons involving gender should be interpreted carefully because the groups are very uneven in size.
| athlete_indicator | n |
|---|---|
| Non-Athlete | 32 |
| Athlete | 68 |
This bar chart shows that most CEOs in the dataset are classified as non-athletes, though a substantial minority have an athletic background. Since the categories are not evenly split, comparisons across these groups should be interpreted carefully. Still, the plot is useful because it quickly shows how common athletic participation is among Fortune 100 CEOs.
| ivy_undergrad | n |
|---|---|
| Non-Ivy | 86 |
| Ivy | 14 |
Most CEOs in the dataset did not attend an Ivy League school for their undergraduate degree. The distribution is fairly unbalanced, with the non-Ivy group much larger than the Ivy group. This suggests that while Ivy League education may be notable, it is not the most common undergraduate path among these CEOs.
| undergrad_major_category | n |
|---|---|
| Business | 44 |
| Economics | 5 |
| Engineering | 22 |
| Humanities | 9 |
| Mixed | 6 |
| Science | 9 |
| Unknown | 5 |
The distribution of undergraduate majors is concentrated in a few categories, especially business and engineering. Other categories appear less often, which suggests that some academic backgrounds are much more common than others among Fortune 100 CEOs. The bar chart makes the imbalance across categories easy to see.
To understand how CEO background characteristics relate to firm success, I focus on a small number of relationship plots rather than plotting every possible pair of variables. These plots were chosen because they connect directly to the research question and compare the response variable, log revenue, to several meaningful CEO characteristics. The scatterplot is useful for looking at the association between two quantitative variables, while the boxplots are useful for comparing a quantitative outcome across categorical groups.
I chose this scatterplot because both tenure and revenue are central to the research question, and it allows me to examine whether more years in the CEO role are associated with differences in firm revenue. The plot suggests a slight positive relationship, meaning firms led by longer tenured CEOs may have somewhat higher revenues on average. However, the points are fairly spread out, so the association appears weak rather than strong.
I included this boxplot because athletic background was one of the main CEO characteristics collected in the dataset. The distributions overlap heavily, which suggests that firms led by former athletes and non-athletes have fairly similar revenue patterns. While there may be some difference in center, the plot does not show a large or obvious separation between the two groups.
This boxplot was chosen to examine whether CEOs with Ivy League undergraduate education tend to lead higher revenue firms. The two groups look fairly similar, with substantial overlap in both spread and center. This suggests that Ivy League undergraduate education does not have a strong visible relationship with revenue in this dataset.
I also included undergraduate major because it is another meaningful background characteristic that varies across CEOs. Some categories appear to have slightly different centers, but there is still a large amount of overlap, and no category stands out as dramatically different from the others. This suggests that undergraduate major may not be strongly related to firm revenue on its own.
As a final descriptive check, I fit a simple linear regression using
log_revenue as the response and tenure as the
predictor. I include this model only to summarize the same pattern shown
in the scatterplot above. Since this project is primarily exploratory, I
use the regression table as a compact numerical summary rather than as
the main focus of the report.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 11.3945 | 0.0927 | 122.9529 | 0.0000 |
| tenure | 0.0094 | 0.0084 | 1.1118 | 0.2689 |
The estimated slope for tenure is positive, which matches the slight upward trend in the scatterplot. At the same time, the estimated relationship is small, and the scatterplot shows substantial variability around that trend. Taken together, the visual and numerical evidence suggest that tenure is associated with only a modest difference in revenue.
Overall, the descriptive analysis shows that revenue varies widely across Fortune 100 firms and is strongly right skewed, which makes the log transformation useful for interpretation. Among the CEO characteristics considered here, none appears to separate firms into clearly distinct revenue groups.
The relationship plots show substantial overlap and considerable within-group variability, suggesting that background traits such as athletic status, Ivy League education, and undergraduate major are not strongly associated with revenue on their own. Tenure shows the clearest positive pattern, but even that relationship appears modest. Because the data are observational and limited to Fortune 100 CEOs, these findings should be interpreted as descriptive associations rather than causal effects.