The consulting industry is known for its competitive compensation
structures, fast-paced work environments, and demanding workloads. This
study aims to explore salary trends and working conditions across
leading consulting firms, with a particular focus on how factors such as
firm classification (Big 3 vs. Big 5 vs. other firms), geographic
location (Canada vs. the USA), and work hours influence overall
earnings, including salaries and bonuses.
- Big 3 (MBB - McKinsey & Company, Boston Consulting Group,
Bain & Company): Considered the most prestigious firms,
these companies typically offer higher base salaries, larger
performance-based bonuses, and a more rigorous workload.
- Big 5 (Deloitte, PwC, EY, KPMG, Accenture): While
still competitive, these firms generally have slightly lower
compensation packages than the Big 3 but may offer better work-life
balance and career flexibility.
options(repos = c(CRAN = "https://cloud.r-project.org/"))
file.edit("~/.Rprofile")
install.packages("readxl")
## package 'readxl' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("ggplot")
install.packages("dplyr")
## package 'dplyr' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("tidyr")
## package 'tidyr' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("car")
## package 'car' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("scales")
## package 'scales' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("gridExtra")
## package 'gridExtra' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
install.packages("corrplot")
## package 'corrplot' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aitha\AppData\Local\Temp\Rtmp4mcl2p\downloaded_packages
library(readxl)
library(dplyr)
library(ggplot2)
library(tidyr)
library(car)
library(scales)
library(gridExtra)
library(corrplot)
file_path <- "C:/Users/aitha/Downloads/W38325-XLS-ENG (1).xlsx" # Adjust the path if needed
df <- read_excel(file_path, sheet = "OfferData")
colnames(df) <- trimws(colnames(df))
colnames(df)
## [1] "Base_salary" "Bonus" "Total_salary"
## [4] "Big_3" "Big_5" "Worked_hours_less_50"
## [7] "Worked_hours_50_59" "Worked_hours_60_69" "Worked_hours_70+"
## [10] "LOCATION"
# Descriptive Statistics for Big 3 and Big 5 firms.
salary_stats <- df %>%
mutate(Firm = case_when(
Big_3 == 1 ~ "Big 3",
Big_5 == 1 ~ "Big 5",
TRUE ~ "Other"
)) %>%
filter(Firm %in% c("Big 3","Big 5")) %>%
group_by(Firm) %>%
summarise(
count = n(),
Mean_Salary = mean(Total_salary, na.rm = TRUE),
Median_Salary = median(Total_salary, na.rm = TRUE),
Std_Dev_Salary = sd(Total_salary, na.rm = TRUE),
Min_Salary = min(Total_salary, na.rm = TRUE),
Max_Salary = max(Total_salary, na.rm = TRUE)
)
# Print results
print(salary_stats)
## # A tibble: 2 × 7
## Firm count Mean_Salary Median_Salary Std_Dev_Salary Min_Salary Max_Salary
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Big 3 185 102614. 97000 61719. 6070 420000
## 2 Big 5 927 70399. 74700 37653. 4658. 210000
# Create a new column to classify firms
df <- df %>%
mutate(Firm_Type = ifelse(Big_3 == 1, "Big 3", "Big 5"))
# Define colors for better visualization
big3_color <- "#1f77b4" # Blue
big5_color <- "#ff7f0e" # Orange
# Boxplot to compare salary distribution in Big 3 vs. Big 5
ggplot(df, aes(x = Firm_Type, y = Total_salary, fill = Firm_Type)) +
geom_boxplot(alpha = 0.9, outlier.shape = 16, outlier.size = 2) +
scale_y_continuous(labels = scales::dollar_format()) + # Format salary axis
scale_fill_manual(values = c("Big 3" = big3_color, "Big 5" = big5_color)) +
labs(title = "Salary Variations & Outliers",
x = "Firm Type",
y = "Salary ($)") +
theme_minimal()
Salary Levels and Distribution:
Big 3 firms offer significantly higher salaries than Big 5 firms, with a mean salary of $102,614 compared to $70,399.
The median salary also follows a similar trend: $97,000 (Big 3) vs. $74,700 (Big 5).
Salary Variation and Dispersion:
The standard deviation is higher for Big 3 ($61,719) compared to Big 5 ($37,653), indicating that salary distribution in Big 3 firms is more spread out..
The box plot visually confirms this as the Big 3 firms show a wider interquartile range (IQR) and longer whiskers.
Presence of outliers:
salary_stats <- df %>%
group_by(LOCATION) %>%
summarise(
Average_Salary = mean(Total_salary, na.rm = TRUE),
Median_Salary = median(Total_salary, na.rm = TRUE),
Salary_Std_Dev = sd(Total_salary, na.rm = TRUE),
Number_of_Entries = n()
)
# Print salary statistics
print(salary_stats)
## # A tibble: 2 × 5
## LOCATION Average_Salary Median_Salary Salary_Std_Dev Number_of_Entries
## <chr> <dbl> <dbl> <dbl> <int>
## 1 CANADA 71188. 75150 34868. 484
## 2 USA 79280. 80250. 50019. 628
ggplot(df, aes(x = LOCATION, y = Total_salary, fill = LOCATION)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Salary Distribution by Location",
x = "Location",
y = "Total Salary ($)") + # Update Y-axis label
scale_y_continuous(labels = dollar)
Salary Levels:
USA salaries are generally higher than Canada’s:
Mean Salary: USA ($79,280) vs. Canada ($71,188) → ~11% higher in the USA.
Median Salary: USA ($80,250) vs. Canada ($75,150) → USA remains higher, but the gap is narrower.
Salary Variation and Dispersion:
USA has a wider salary range, indicated by - Higher standard deviation ($50,019 vs. $34,868) → Salaries in the USA are more spread out.
The box plot shows a wider interquartile range (IQR) for the USA, meaning more variation in mid-range salaries.
Greater variability in the USA suggests wider salary opportunities but also larger income inequality, Canada’s salaries are more consistent, with fewer extreme outliers.
For professionals seeking higher earning potential, the USA may offer better opportunities, but it comes with greater salary fluctuations.
# Null Hypothesis (H₀): Salaries are the same across all locations.
# Alternative Hypothesis (H₁): Salaries are not the same across all locations (i.e., there are significant differences).
anova_result <- aov(Total_salary ~ LOCATION, data = df)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## LOCATION 1 1.790e+10 1.790e+10 9.216 0.00246 **
## Residuals 1110 2.156e+12 1.942e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since p < 0.05, the result is statistically significant. This means there is strong evidence that salary differences between the USA and Canada are not due to random chance.
We reject the null hypothesis, meaning that the average salary in Canada and the USA is significantly different.
# Perform Tukey's HSD test
tukey_result <- TukeyHSD(anova_result)
# Print the results
print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Total_salary ~ LOCATION, data = df)
##
## $LOCATION
## diff lwr upr p adj
## USA-CANADA 8092.226 2861.955 13322.5 0.0024553
# Convert Tukey results to a data frame for better readability
tukey_df <- as.data.frame(tukey_result$LOCATION)
# Sort by adjusted p-value
tukey_df <- tukey_df[order(tukey_df$`p adj`), ]
print(tukey_df)
## diff lwr upr p adj
## USA-CANADA 8092.226 2861.955 13322.5 0.002455327
There is strong statistical evidence that professionals in the USA earn significantly more than their Canadian counterparts.
The exact difference may vary between $2,862 and $13,322, but on average, it is around $8,092.
# Select relevant numerical columns with backticks for special characters
cor_data <- df %>% dplyr::select(Base_salary, Bonus, Total_salary, Big_3, Big_5,
`Worked_hours_less_50`, `Worked_hours_50_59`,
`Worked_hours_60_69`, `Worked_hours_70+`)
# Compute correlation matrix
cor_matrix <- cor(cor_data, use = "complete.obs")
# Plot correlation heatmap
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black", tl.cex = 0.8, addCoef.col = "black", number.cex = 0.7)
Worked Hours & Salary:
Work hours do not show strong correlations with salary components.
This suggests that higher work hours do not directly translate into significantly higher salaries.
Firm Type vs. Work Hours:
Big 3 & Worked 60–69 Hours (0.34): Employees at Big 3 firms tend to work longer hours.
Big 5 & Worked 60–69 Hours (-0.34): Employees at Big 5 firms are less likely to work extended hours.
Work hours are not strongly correlated with salary, meaning compensation is not necessarily based on effort.
data <- read_excel("C:/Users/aitha/Downloads/W38325-XLS-ENG (1).xlsx")
# Convert categorical variables to factors
data$LOCATION <- as.factor(data$LOCATION)
data$Worked_hours <- case_when(
data$Worked_hours_less_50 == 1 ~ "<50",
data$Worked_hours_50_59 == 1 ~ "50-59",
data$Worked_hours_60_69 == 1 ~ "60-69",
data$`Worked_hours_70+` == 1 ~ "70+"
)
data$Worked_hours <- as.factor(data$Worked_hours)
# Create a new categorical variable for Firm Type
data$Firm_Type <- case_when(
data$Big_3 == 1 ~ "Big 3",
data$Big_5 == 1 ~ "Big 5",
TRUE ~ "Other"
)
data$Firm_Type <- factor(data$Firm_Type, levels = c("Big 3", "Big 5", "Other"))
# Load necessary library
library(car) # For ANOVA Type III tests
# Run the Three-Way ANOVA
anova_model <- aov(Total_salary ~ Firm_Type * LOCATION * Worked_hours, data = data)
# Display the ANOVA table
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Firm_Type 1 1.601e+11 1.601e+11 92.393 < 2e-16 ***
## LOCATION 1 1.468e+10 1.468e+10 8.471 0.00368 **
## Worked_hours 3 7.226e+09 2.409e+09 1.390 0.24418
## Firm_Type:LOCATION 1 6.245e+10 6.245e+10 36.049 2.61e-09 ***
## Firm_Type:Worked_hours 3 1.628e+10 5.426e+09 3.132 0.02486 *
## LOCATION:Worked_hours 3 6.194e+09 2.065e+09 1.192 0.31162
## Firm_Type:LOCATION:Worked_hours 3 8.275e+09 2.758e+09 1.592 0.18955
## Residuals 1096 1.899e+12 1.732e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Significant Factors Affecting Salary:
Firm Type (F = 92.393, p < 2e-16): Highly significant effect on salary.
Location (F = 8.471, p = 0.00368): Statistically significant effect.
Firm Type × Location Interaction (F = 36.049, p = 2.61e-09):
Firm Type × Worked Hours Interaction (F = 3.132, p = 0.02486):
Non-Significant Factors Affecting Salary:
Worked Hours (F=1.390, p = 0.24418):
Location × Worked Hours (F = 1.192, p = 0.31162):
Firm Type × Location × Worked Hours (F = 1.592, p = 0.18955):
Based on the analysis, several key insights emerge regarding salary distribution, firm selection, location, and working hours. The findings indicate that employees in the USA tend to earn higher salaries than those in Canada, with a statistically significant difference in average salaries. Additionally, firm type plays a crucial role in compensation, as employees working in top-tier firms (Big 3 or Big 5) generally receive higher salaries and bonuses.
Furthermore, the correlation analysis suggests that total salary is positively influenced by base salary and bonuses, while working longer hours does not necessarily translate into higher earnings. Employees in top firms may need to work extended hours, but this does not always result in proportional salary increases.
Overall, the analysis suggests that employees should carefully evaluate job offers based on firm reputation, location, and expected working hours to maximize their earnings and work-life balance. Selecting a role in a prestigious firm with competitive compensation, in a location with higher salary prospects, is advisable for career growth and financial stability.