I think salary and bonuses should be combined. My guess is that salary will always make a small portion of the net compensation whereas all kind of bonuses will constitute the majority of the comp. Based on this data set, I think we want to adress the question of “Why CEO’s make so much money and how is that determined?”. I don’t know if we can fully answer the first part of that question, but we can get close to an approximation of how their compensations are explained.

compmodel <- lm (log(total.comp) ~ log(Profits) + poly(Compfor5Yrs, degree = 3, raw = F)  + log(StockOwned)  + log(I(YearsCEO/YearsFirm)), data = comp)
summary(compmodel)
## 
## Call:
## lm(formula = log(total.comp) ~ log(Profits) + poly(Compfor5Yrs, 
##     degree = 3, raw = F) + log(StockOwned) + log(I(YearsCEO/YearsFirm)), 
##     data = comp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.46975 -0.27369  0.00373  0.27835  2.19532 
## 
## Coefficients:
##                                          Estimate Std. Error t value
## (Intercept)                             13.105329   0.095969 136.558
## log(Profits)                             0.220313   0.019347  11.388
## poly(Compfor5Yrs, degree = 3, raw = F)1  5.147667   0.563554   9.134
## poly(Compfor5Yrs, degree = 3, raw = F)2 -4.671657   0.522454  -8.942
## poly(Compfor5Yrs, degree = 3, raw = F)3  2.687438   0.540177   4.975
## log(StockOwned)                          0.029915   0.008677   3.448
## log(I(YearsCEO/YearsFirm))               0.061986   0.021083   2.940
##                                         Pr(>|t|)    
## (Intercept)                              < 2e-16 ***
## log(Profits)                             < 2e-16 ***
## poly(Compfor5Yrs, degree = 3, raw = F)1  < 2e-16 ***
## poly(Compfor5Yrs, degree = 3, raw = F)2  < 2e-16 ***
## poly(Compfor5Yrs, degree = 3, raw = F)3 8.66e-07 ***
## log(StockOwned)                         0.000607 ***
## log(I(YearsCEO/YearsFirm))              0.003414 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.485 on 566 degrees of freedom
##   (59 observations deleted due to missingness)
## Multiple R-squared:  0.4703, Adjusted R-squared:  0.4647 
## F-statistic: 83.77 on 6 and 566 DF,  p-value: < 2.2e-16
vif(compmodel)
##                                            GVIF Df GVIF^(1/(2*Df))
## log(Profits)                           1.255702  1        1.120581
## poly(Compfor5Yrs, degree = 3, raw = F) 1.192303  3        1.029748
## log(StockOwned)                        1.158036  1        1.076121
## log(I(YearsCEO/YearsFirm))             1.118842  1        1.057753
dwtest(compmodel)
## 
##  Durbin-Watson test
## 
## data:  compmodel
## DW = 1.7658, p-value = 0.002348
## alternative hypothesis: true autocorrelation is greater than 0
crPlots(compmodel)

Based on the liner model above, the variable that explains most of the variation in CEO compensation is profits. The log-log model says that a 1% increase in profits would increase compensation by 0.25%

The best fitting model explains almost 50% of the variation in total compensation.

R discard missing data. This is not appropiate because we are missing observations that could enhance our model fit, so it is better to always try to fix this.

I think some other variable that are not accounted for would be wether or not the CEO is a founder, a measure of his relationship with the beard of executives, wether or not he is the chairman of the board, how powerful he is in general terms (that’s pretty confusing and hard to measure, but intuitively one would think that the CEO of J&J, P&G, JP Morgan, or FB, has more power than the CEO of a middle market manufacturing firm), wether or not the firm pays dividends, the firms operating cashflows over years.