CEO salaries have drawn increasingly negative attention from shareholders and media outlets. The scope of this assignment is to identify what variables contribute to CEO salaries and bonuses. Each row is an observation of a corporate CEO and each variable is some type of descriptor for the CEO and/or firm.
In my approach I estimated for each seperately as salary and bonus did not seem to be an accurate predictor of one another. Adding both estimates would be the best model to predict total compensation.
For estimating Salary the interaction between Sales and Profit as well as the interaction between Age and YearsCEO have the most influence on predicting Salary. For estimating Bonus Compfor5Yrs coefficient helps explain Bonus the most.
##
## Call:
## lm(formula = Salary ~ Age * YearsFirm + Age * YearsCEO + YearsFirm *
## YearsCEO + StGains + Compfor5Yrs + poly(Sales, 3, raw = TRUE) *
## Profits, data = ceo_salary)
##
## Standardized Coefficients::
## (Intercept) Age
## 0.00000000 0.04579288
## YearsFirm YearsCEO
## 1.43602782 -1.66437129
## StGains Compfor5Yrs
## -0.23949606 0.26978291
## poly(Sales, 3, raw = TRUE)1 poly(Sales, 3, raw = TRUE)2
## 1.88665860 -4.84469295
## poly(Sales, 3, raw = TRUE)3 Profits
## 5.22425860 0.44663280
## Age:YearsFirm Age:YearsCEO
## -1.41876482 2.73956436
## YearsFirm:YearsCEO poly(Sales, 3, raw = TRUE)1:Profits
## -0.79927243 -1.36777581
## poly(Sales, 3, raw = TRUE)2:Profits poly(Sales, 3, raw = TRUE)3:Profits
## 4.15508416 -4.89859395
##
## Call:
## lm(formula = Bonus ~ YearsFirm + Other + Compfor5Yrs + Profits +
## ReturnOver5Yrs, data = ceo_salary)
##
## Standardized Coefficients::
## (Intercept) YearsFirm Other Compfor5Yrs Profits
## 0.0000000 0.1106899 0.1126141 0.2557700 0.1879157
## ReturnOver5Yrs
## 0.1634008
The Salary model can account for 60% of variance in Salaries, while the Bonus model can only determine 19% of variance.
Based on the Normal Q-Q chart, at positve 1.5 standard deviations the models are over fitting – bonus a lot more due to it’s low confidence. Salary does slight under fitting below negative 2 standard deviations.
Homoskedasticity exists with the dataset, but with independent variable transformations of polynomials and interactions the data became more normalized. I also checked for autocorellation and there does not appear to exist in the dataset.
## lag Autocorrelation D-W Statistic p-value
## 1 -0.002822324 1.972921 0.83
## Alternative hypothesis: rho != 0
## lag Autocorrelation D-W Statistic p-value
## 1 0.1158413 1.766219 0.032
## Alternative hypothesis: rho != 0
Where there is missing data for a variables, R will exclude that data point from the calculation and will not account for it. As an analyst we whould not allow this and should correct or perform transforms to fix this issue. In addittion to the transformation alreay performed, we Could try applying log functions to variables or include dummy variables as supplemental variables in the model when there is missing data.
I could imagine a prior role(s) variable in the case a COO would more likely be promoted. In addition I would be intrested in historical CEOs data per company as a company may have a limit on CEO compensation pacakges. Also, International company, Publicly traded, where the companies is located.