Introduction

The consulting industry is known for its competitive compensation structures, fast-paced work environments, and demanding workloads. This study aims to explore salary trends and working conditions across leading consulting firms, with a particular focus on how factors such as firm classification (Big 3 vs. Big 5 vs. other firms), geographic location (Canada vs. the USA), and work hours influence overall earnings, including salaries and bonuses.
- Big 3 (MBB - McKinsey & Company, Boston Consulting Group, Bain & Company): Considered the most prestigious firms, these companies typically offer higher base salaries, larger performance-based bonuses, and a more rigorous workload.
- Big 5 (Deloitte, PwC, EY, KPMG, Accenture): While still competitive, these firms generally have slightly lower compensation packages than the Big 3 but may offer better work-life balance and career flexibility.

Installing Necessary Libraries

options(repos = c(CRAN = "https://cloud.r-project.org/"))
file.edit("~/.Rprofile")
install.packages("readxl")
## package 'readxl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("ggplot")
install.packages("dplyr")
## package 'dplyr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("tidyr")
## package 'tidyr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("car")
## package 'car' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("scales")
## package 'scales' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("gridExtra")
## package 'gridExtra' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("corrplot")
## package 'corrplot' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
library(readxl)
library(dplyr)
library(ggplot2)
library(tidyr)
library(car)
library(scales)
library(gridExtra)
library(corrplot)

Load the Dataset

file_path <- "C:/Users/aitha/Downloads/W38325-XLS-ENG (1).xlsx"  # Adjust the path if needed
df <- read_excel(file_path, sheet = "OfferData")

Clean column names (remove spaces)

colnames(df) <- trimws(colnames(df))

Display Column Names

colnames(df)
##  [1] "Base_salary"          "Bonus"                "Total_salary"        
##  [4] "Big_3"                "Big_5"                "Worked_hours_less_50"
##  [7] "Worked_hours_50_59"   "Worked_hours_60_69"   "Worked_hours_70+"    
## [10] "LOCATION"

Descriptive Statistics for Big 3 and Big 5 Firms

# Descriptive Statistics for Big 3 and Big 5 firms.

salary_stats <- df %>%
  mutate(Firm = case_when(
    Big_3 == 1 ~ "Big 3",
    Big_5 == 1 ~ "Big 5",
    TRUE ~ "Other"
  )) %>%
  filter(Firm %in% c("Big 3","Big 5")) %>%
  group_by(Firm) %>%
  summarise(
    count = n(),
    Mean_Salary = mean(Total_salary, na.rm = TRUE),
    Median_Salary = median(Total_salary, na.rm = TRUE),
    Std_Dev_Salary = sd(Total_salary, na.rm = TRUE),
    Min_Salary = min(Total_salary, na.rm = TRUE),
    Max_Salary = max(Total_salary, na.rm = TRUE)
  )

# Print results
print(salary_stats)
## # A tibble: 2 × 7
##   Firm  count Mean_Salary Median_Salary Std_Dev_Salary Min_Salary Max_Salary
##   <chr> <int>       <dbl>         <dbl>          <dbl>      <dbl>      <dbl>
## 1 Big 3   185     102614.         97000         61719.      6070      420000
## 2 Big 5   927      70399.         74700         37653.      4658.     210000

Salary Varaitions and Outliers in between the firms.

# Create a new column to classify firms
df <- df %>%
  mutate(Firm_Type = ifelse(Big_3 == 1, "Big 3", "Big 5"))

# Define colors for better visualization
big3_color <- "#1f77b4"  # Blue
big5_color <- "#ff7f0e"  # Orange

# Boxplot to compare salary distribution in Big 3 vs. Big 5
ggplot(df, aes(x = Firm_Type, y = Total_salary, fill = Firm_Type)) +
  geom_boxplot(alpha = 0.9, outlier.shape = 16, outlier.size = 2) +
  scale_y_continuous(labels = scales::dollar_format()) +  # Format salary axis
  scale_fill_manual(values = c("Big 3" = big3_color, "Big 5" = big5_color)) +
  labs(title = "Salary Variations & Outliers",
       x = "Firm Type",
       y = "Salary ($)") +
  theme_minimal()

KEY INSIGHTS FROM DESCRIPTIVE STATISTICS & BOXPLOT W.R.T FIRM TYPE :

  • Salary Levels and Distribution:

    • Big 3 firms offer significantly higher salaries than Big 5 firms, with a mean salary of $102,614 compared to $70,399.

    • The median salary also follows a similar trend: $97,000 (Big 3) vs. $74,700 (Big 5).

  • Salary Variation and Dispersion:

    • The standard deviation is higher for Big 3 ($61,719) compared to Big 5 ($37,653), indicating that salary distribution in Big 3 firms is more spread out..

    • The box plot visually confirms this as the Big 3 firms show a wider interquartile range (IQR) and longer whiskers.

  • Presence of outliers:

    • Both groups show outliers, but Big 3 firms have a few extremely high salaries exceeding $400,000, which significantly skews their distribution whereas Big 5 firms also have some outliers, but they are more concentrated around the $200,000 mark, indicating a narrower salary range.

Total Salary Statistics by Location.

salary_stats <- df %>%
  group_by(LOCATION) %>%
  summarise(
    Average_Salary = mean(Total_salary, na.rm = TRUE),
    Median_Salary = median(Total_salary, na.rm = TRUE),
    Salary_Std_Dev = sd(Total_salary, na.rm = TRUE),
    Number_of_Entries = n()
  )

# Print salary statistics
print(salary_stats)
## # A tibble: 2 × 5
##   LOCATION Average_Salary Median_Salary Salary_Std_Dev Number_of_Entries
##   <chr>             <dbl>         <dbl>          <dbl>             <int>
## 1 CANADA           71188.        75150          34868.               484
## 2 USA              79280.        80250.         50019.               628

Box Plot for Total Salary Distribution by Location.

ggplot(df, aes(x = LOCATION, y = Total_salary, fill = LOCATION)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Salary Distribution by Location",
       x = "Location",
       y = "Total Salary ($)") +  # Update Y-axis label
  scale_y_continuous(labels = dollar) 

KEY INSIGHTS FROM DESCRIPTIVE STATISTICS & BOXPLOT W.R.T LOCATION :

  • Salary Levels:

    • USA salaries are generally higher than Canada’s:

      • Mean Salary: USA ($79,280) vs. Canada ($71,188) → ~11% higher in the USA.

      • Median Salary: USA ($80,250) vs. Canada ($75,150) → USA remains higher, but the gap is narrower.

  • Salary Variation and Dispersion:

    • USA has a wider salary range, indicated by - Higher standard deviation ($50,019 vs. $34,868) → Salaries in the USA are more spread out.

    • The box plot shows a wider interquartile range (IQR) for the USA, meaning more variation in mid-range salaries.

    • Greater variability in the USA suggests wider salary opportunities but also larger income inequality, Canada’s salaries are more consistent, with fewer extreme outliers.

    • For professionals seeking higher earning potential, the USA may offer better opportunities, but it comes with greater salary fluctuations.

ANOVA Test: Does Location Affect Salary?

# Null Hypothesis (H₀): Salaries are the same across all locations.
# Alternative Hypothesis (H₁): Salaries are not the same across all locations (i.e., there are significant differences).

anova_result <- aov(Total_salary ~ LOCATION, data = df)
summary(anova_result)
##               Df    Sum Sq   Mean Sq F value  Pr(>F)   
## LOCATION       1 1.790e+10 1.790e+10   9.216 0.00246 **
## Residuals   1110 2.156e+12 1.942e+09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

KEY ANOVA RESULTS:

  • Since p < 0.05, the result is statistically significant. This means there is strong evidence that salary differences between the USA and Canada are not due to random chance.

  • We reject the null hypothesis, meaning that the average salary in Canada and the USA is significantly different.

Tukey’s HSD Test

# Perform Tukey's HSD test
tukey_result <- TukeyHSD(anova_result)

# Print the results
print(tukey_result)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Total_salary ~ LOCATION, data = df)
## 
## $LOCATION
##                diff      lwr     upr     p adj
## USA-CANADA 8092.226 2861.955 13322.5 0.0024553
# Convert Tukey results to a data frame for better readability
tukey_df <- as.data.frame(tukey_result$LOCATION)

# Sort by adjusted p-value
tukey_df <- tukey_df[order(tukey_df$`p adj`), ]
print(tukey_df)
##                diff      lwr     upr       p adj
## USA-CANADA 8092.226 2861.955 13322.5 0.002455327

KEY TUKEY HSD RESULTS FOR DEEPER INSIGHTS:

  • There is strong statistical evidence that professionals in the USA earn significantly more than their Canadian counterparts.

  • The exact difference may vary between $2,862 and $13,322, but on average, it is around $8,092.

Correlation Matrix

# Select relevant numerical columns with backticks for special characters
cor_data <- df %>% dplyr::select(Base_salary, Bonus, Total_salary, Big_3, Big_5,
                                   `Worked_hours_less_50`, `Worked_hours_50_59`, 
                                   `Worked_hours_60_69`, `Worked_hours_70+`)


# Compute correlation matrix
cor_matrix <- cor(cor_data, use = "complete.obs")

# Plot correlation heatmap
corrplot(cor_matrix, method = "color", type = "upper",
         tl.col = "black", tl.cex = 0.8, addCoef.col = "black", number.cex = 0.7)

KEY OUTPUTS FROM THE CORRELATION HEATMAP:

  • Worked Hours & Salary:

    • Work hours do not show strong correlations with salary components.

    • This suggests that higher work hours do not directly translate into significantly higher salaries.

  • Firm Type vs. Work Hours:

    • Big 3 & Worked 60–69 Hours (0.34): Employees at Big 3 firms tend to work longer hours.

    • Big 5 & Worked 60–69 Hours (-0.34): Employees at Big 5 firms are less likely to work extended hours.

Work hours are not strongly correlated with salary, meaning compensation is not necessarily based on effort.

Three Way Annova:

data <- read_excel("C:/Users/aitha/Downloads/W38325-XLS-ENG (1).xlsx")

# Convert categorical variables to factors
data$LOCATION <- as.factor(data$LOCATION)

data$Worked_hours <- case_when(
  data$Worked_hours_less_50 == 1 ~ "<50",
  data$Worked_hours_50_59 == 1 ~ "50-59",
  data$Worked_hours_60_69 == 1 ~ "60-69",
  data$`Worked_hours_70+` == 1 ~ "70+"
)

data$Worked_hours <- as.factor(data$Worked_hours)

# Create a new categorical variable for Firm Type
data$Firm_Type <- case_when(
  data$Big_3 == 1 ~ "Big 3",
  data$Big_5 == 1 ~ "Big 5",
  TRUE ~ "Other"
)
data$Firm_Type <- factor(data$Firm_Type, levels = c("Big 3", "Big 5", "Other"))


# Load necessary library
library(car)  # For ANOVA Type III tests

# Run the Three-Way ANOVA
anova_model <- aov(Total_salary ~ Firm_Type * LOCATION * Worked_hours, data = data)

# Display the ANOVA table
summary(anova_model)
##                                   Df    Sum Sq   Mean Sq F value   Pr(>F)    
## Firm_Type                          1 1.601e+11 1.601e+11  92.393  < 2e-16 ***
## LOCATION                           1 1.468e+10 1.468e+10   8.471  0.00368 ** 
## Worked_hours                       3 7.226e+09 2.409e+09   1.390  0.24418    
## Firm_Type:LOCATION                 1 6.245e+10 6.245e+10  36.049 2.61e-09 ***
## Firm_Type:Worked_hours             3 1.628e+10 5.426e+09   3.132  0.02486 *  
## LOCATION:Worked_hours              3 6.194e+09 2.065e+09   1.192  0.31162    
## Firm_Type:LOCATION:Worked_hours    3 8.275e+09 2.758e+09   1.592  0.18955    
## Residuals                       1096 1.899e+12 1.732e+09                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

KEY OUTPUTS FROM ANOVA ANALYSIS

  • Significant Factors Affecting Salary:

    • Firm Type (F = 92.393, p < 2e-16): Highly significant effect on salary.

    • Location (F = 8.471, p = 0.00368): Statistically significant effect.

    • Firm Type × Location Interaction (F = 36.049, p = 2.61e-09):

      • Strong interaction effect between firm type and location.
      • Example: Big 3 firms in the USA may pay significantly more than Big 3 firms in Canada.
    • Firm Type × Worked Hours Interaction (F = 3.132, p = 0.02486):

      • Significant interaction effect between firm type and hours worked.
      • The salary impact of longer work hours depends on whether a person works at a Big 3 or Big 5 firm.
  • Non-Significant Factors Affecting Salary:

    • Worked Hours (F=1.390, p = 0.24418):

      • Not statistically significant in affecting salary alone.
      • This suggests simply working more hours does not necessarily lead to higher salaries.
    • Location × Worked Hours (F = 1.192, p = 0.31162):

      • No significant interaction; the effect of working more hours is similar in both the USA and Canada.
    • Firm Type × Location × Worked Hours (F = 1.592, p = 0.18955):

      • No significant three-way interaction.

CONCLUSION:

Based on the analysis, several key insights emerge regarding salary distribution, firm selection, location, and working hours. The findings indicate that employees in the USA tend to earn higher salaries than those in Canada, with a statistically significant difference in average salaries. Additionally, firm type plays a crucial role in compensation, as employees working in top-tier firms (Big 3 or Big 5) generally receive higher salaries and bonuses.

Furthermore, the correlation analysis suggests that total salary is positively influenced by base salary and bonuses, while working longer hours does not necessarily translate into higher earnings. Employees in top firms may need to work extended hours, but this does not always result in proportional salary increases.

Overall, the analysis suggests that employees should carefully evaluate job offers based on firm reputation, location, and expected working hours to maximize their earnings and work-life balance. Selecting a role in a prestigious firm with competitive compensation, in a location with higher salary prospects, is advisable for career growth and financial stability.