This is a dataset of CEO salaries, education, tenure, and industries. The data is available from this link: https://asayanalytics.com/ceo_salaries-csv.
Importing Data We are loading the data directly from the host website.
## Parsed with column specification:
## cols(
## .default = col_double(),
## WideIndustry = col_character(),
## Company = col_character(),
## CEO = col_character(),
## CityofBirth = col_character(),
## StateofBirth = col_character(),
## Undergrad = col_character(),
## UGDegree = col_character(),
## Graduate = col_character(),
## GradDegree = col_character(),
## Bonus = col_character(),
## Industry = col_character()
## )
## See spec(...) for full column specifications.
Cleaning Data We identified missing values and created a new data frame without the 10 (of 800 total) observations without any compensation information.
## TotalComp WideIndustry Company CEO CityofBirth
## 10 0 0 0 7
## StateofBirth Age Undergrad UGDegree UGDate
## 56 0 82 84 82
## AgeOfUnder Graduate GradDegree MBA? MasterPhd?
## 82 396 0 0 0
## G_date AgeOfGradu YearsFirm YearsCEO Salary
## 396 396 0 0 11
## Bonus Other StGains Compfor5Yrs StockOwned
## 131 54 500 176 0
## Sales Profits ReturnOver5Yrs Industry IndustryCode
## 0 4 22 0 0
## [1] NA
Dummy Variables
We added three interesting dummy variables to the data frame. 1. We categorized every CEO as “Young!” if they were under 58 years old (the average CEO age) or “Old :(” if at or above 58 years old. 2. We added a column ranking CEOs by their total compensation. The most highly compensated CEO is Mr. Eisner of Disney. 3. We categorized CEOs are “Bad” if their ‘Return over 5 Yrs’ was negative and “Good” if their return was positive.
Summary Statistics These are some summary statistics for the variables I think are most intersting. 1. The mean total compensation is $2,818,743 with $28,816 as the min and $203,020,000. 2. About 26% of CEOs have their MBA. 3. The average years at the firm is 22 years and the average years as a CEO is 8 years. 4. The table output below shows the mean, max, min, and standard deviation of total compensation by industry.
## # A tibble: 19 x 5
## WideIndustry meansal maxsal minsal StDevSal
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Aerospacedefense 5325846. 33689900 600300 7992740.
## 2 Business 2923050. 23136400 341057 4547738.
## 3 Capital goods 1832233. 9093260 530000 1882191.
## 4 Chemicals 1946326. 6680930 536794 1338473.
## 5 ComputersComm 2793552 12837900 52000 2847982.
## 6 Construction 3136825. 10568400 454497 3103086.
## 7 Consumer 2960651. 14176500 401124 3152536.
## 8 Energy 2046005. 7782330 311792 1761505.
## 9 Entertainment 3781097. 16755600 250000 4631554.
## 10 Financial 2229594. 53110900 28816 4684130.
## 11 Food 2740661. 20658300 325360 4162935.
## 12 Forest 1800940. 12203300 428299 2590824.
## 13 Health 2686604. 32582300 281063 4692687.
## 14 Insurance 3334761. 38675400 100000 5902776.
## 15 Metals 1729998. 6972760 416667 1631993.
## 16 Retailing 3301795. 32228000 423278 5625890.
## 17 Transport 2750049. 6333720 420079 2199752.
## 18 Travel 17331232. 203020000 526881 52233080.
## 19 Utility 812676. 2239950 280396 390729.
Summary Visualizations We were interested in several exploratory data visualizations. 1. Histogram of CEO Salaries. This is a nice bell curve with a few outliers making a very large salary.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Question 1 Question: What is the difference in compensation for CEOs with masters degrees and when they graduated? This is interesting because the question of the value of college education is always apparent, so we’ll see what these people have to say about it.
Approach: As part of my data discovery, I want to look at how many CEOs have a masters degree, how long they’ve been a CEO, and what their compensation is.
Visualization: This shows that 24% of CEOs have a masters or PHD outside of a MBA
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Results: I think this visualization is interesting because it shows that compensation varies most widely for the oldest and most recent graduates, where those in the middle are much more condensed
Statistical Test: An interesting statistical test to look at would be correlation to show how the year graduated (or years since) correlates to the amount of income
Question 2 Question: Does average CEO tenure differ by industry? Which industries tend to retain their CEOs longest? Shortest?
Approach: Calculate in a table and create a bar graph showing the average tenure of CEOs by industry
Visualization and Table:
## # A tibble: 19 x 2
## WideIndustry YearsCEO
## <chr> <dbl>
## 1 Aerospacedefense 5.67
## 2 Business 13.6
## 3 Capital goods 7.68
## 4 Chemicals 6.16
## 5 ComputersComm 7.09
## 6 Construction 6.45
## 7 Consumer 9.68
## 8 Energy 6.29
## 9 Entertainment 9.46
## 10 Financial 8.42
## 11 Food 8.24
## 12 Forest 9.16
## 13 Health 7.60
## 14 Insurance 9
## 15 Metals 6.56
## 16 Retailing 11.8
## 17 Transport 5.73
## 18 Travel 9.33
## 19 Utility 5.46
Results: The analysis performed is both a bar chart and a list of the average tenure by industry. Both confirm that the Business and Retailing industries have the longest average CEO tenure. Aerospace and Utility have the shortest average tenure CEOs. All average tenures are greater than 5 years and less than 14 years.
Statistical Test: Use an ANOVA test to analyze the difference in means. I would expect statistical significance differentiating Business and Retailing but no statisitical difference in the remaining 17 industries.
Question 3 Question: Does the amount of years as a CEO affect how much stock they own within their company?
Approach: I intend to include the CEO names, the Years as the CEO, and the Stock owned. I will group by the stock owned, and make a geom_jitter to visualize the data. +I hypothesize that the longer someone has been with the company, the more percent they will own in stock of that company
Visualization:
Results: The results show that my hypothesis is correct. Generally, longer years at the CEO position mean more Stock percentage owned. The graph shows this by displaying a higher percentage as age increases.
Statistical Tests: More statistical tests on the CEOs i their first few years could be done. This is because this data set is skewed by most of the data including CEOs in their first few years. This could mean creating new variables for year ranges for the CEOs.I expect to see less crowding of data points close to zero. Their is an increase, just not noticable yet.