2026-03-29

Dataset Overview and Source

Data Source: “Salary by Job Title and Country” dataset from Kaggle

Description: This dataset analyses salary information of individuals from various industries and geographic regions to understand the factors that influence a person’s salary.

Variables: The dataset contains the following information for 6680 individuals.

  • Age: Individual’s age in years
  • Gender: Male or Female
  • Education Level: Highest educational level (0 = High School, 1 = Bachelor Degree, 2 = Master Degree, 3 = PhD)
  • Job Title: 129 unique job positions
  • Years of Experience: Total number of years in the industry
  • Salary: Salary compensation per year in U.S. dollars
  • Country: Employee’s work location (United States, United Kingdom, Canada, China, Austria)
  • Race: Ethnic background/identification of individual (White, Hispanic, Asian, Korean, Chinese, Australian, Welsh, African American, Mixed, Black)
  • Senior: 1 means Senior Position and 0 means not in a Senior Position

R Code for Data Preparation

This code shows how I loaded and prepared the Salary dataset for analysis:

# Load Required Libraries
library(ggplot2)
library(plotly)
library(dplyr)

# Load Dataset
salary_data = read.csv("JobSalary.csv")

# Data Cleaning - Remove Salary Values less than 600
salary_data = salary_data[salary_data$Salary > 600, ]

# Change Education Level to factors for analysis
salary_data$Education.Level = 
  factor(salary_data$Education.Level,
         levels = c(0, 1, 2, 3),
         labels = c("High School", "Bachelor Degree", 
                    "Master Degree", "PhD"))

Plotly 3D Plot: Salary by Age, Experience, and Education

Plotly 3D Plot: Analysis and Discussion

Key Findings:

  • Age Pattern: Salary generally increases with age, as younger individuals (20-30 years old) have a lower salary range around 20-100k U.S. dollars whereas older individuals (40-50 years old) have a higher salary range around 150-180k U.S. dollars. After around 50 years of age, salary growth appears to plateau with most individuals reaching their maximum stable income level and few instances of high salaries greater than 200k U.S. dollars.
  • Years of Experience Pattern: Salary generally increases as years of experience increases, with the fastest growth in salary happening during the first 10-15 years. However, after around 15 years of experience, salary growth slows down, indicating that after a point, additional experience adds less to an individual’s salary for the job.
  • Education Level Pattern: The higher education levels (Master and PhD) have higher salary ranges of around 120k-180k U.S. dollars whereas lower education levels (High School and Bachelor) tend to have lower to mid-salary ranges around 20k-120k U.S. dollars. However, at higher experience levels, individuals with a Bachelors or Master’s degree earn salaries comparable to those with a PhD.
  • Combined Pattern: The highest earners generally have the most years of experience, are older in age, and have a higher education level.

Overall, age, years of experience, and education level are all factors that contribute to an individual’s salary with entry-level employees (younger age, less work experience, and lower education) having salaries below 100k U.S. dollars while experienced employees (older age, extensive work experience, and higher education) have salaries above 100k U.S. dollars.

Plotly BoxPlot: Salary Distribution by Race and Gender

Plotly BoxPlot: Analysis and Discussion

Key Findings:

  • The median salary for men across all races is approximately 125k U.S. dollars.
  • For men, the Black race has the highest median salary at 130k U.S. dollars, while Hispanic have the lowest median salary at 110k U.S. dollars.
  • The median salary for women across all races is approximately 100k U.S. dollars.
  • For women, the Korean race has the highest median salary at 110.7k U.S. dollars, while Hispanic have the lowest median salary at 95k U.S. dollars.
  • The boxplot shows that men have a higher salary in general than women, as indicated by the higher median salary for men than women across all races.
  • The boxplot shows that the salary distribution for men is more variable than for women, as indicated by the longer whiskers. Specifically, the upper whiskers extend higher for men than women for all races, showing that men have higher maximum salaries. For example, among Asians, the maximum salary for men reaches 250k U.S. dollars whereas for women it reaches 200k U.S. dollars.

Overall, the boxplot indicates that Gender influences salary levels more than Race. The boxplot shows a gender gap in salary, with men generally earning more than women across all races as demonstrated by the higher median and maximum salaries. However, the salaries across different racial groups remain relatively similar, with only modest variation compared to the gender differences.

Statistical Analysis: ANOVA Test

# Two way ANOVA test with Race and Gender as independent variables
anova_two_way = aov(Salary ~ Race + Gender, data = salary_data)
summary(anova_two_way)
##               Df    Sum Sq   Mean Sq F value Pr(>F)    
## Race           9 3.257e+10 3.619e+09   1.323  0.219    
## Gender         1 3.021e+11 3.021e+11 110.416 <2e-16 ***
## Residuals   6669 1.825e+13 2.736e+09                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Statistical Analysis: ANOVA Test

The two-way ANOVA statistical test was used to examine the effects of Race and Gender on Salary.

The effect of Race on Salary was not statistically significant.

  • The p-value 0.219 was greater than the 0.05 significance threshold, so we fail to reject the null hypothesis and conclude that there is insufficient evidence to suggest that the mean salaries differ significantly among the Race groups.

The effect of Gender on Salary was statistically significant.

  • The p-value 2e-16 was less than the 0.05 significance threshold, so we reject the null hypothesis and conclude that there is a statistically significant difference between the mean salaries of males and females.

The results of the two-way ANOVA test show that there is a statistically significant difference in average salaries between Genders, but not between Races.

Ggplot Bar Graph: Average Salary by Country

Ggplot Bar Graph: Analysis and Discussion

Key Findings:

  • The bar graph suggests that these five countries have comparable salaries for each gender, with males receiving a compensation around 120k U.S. dollars on average and females receiving 110k U.S. dollars on average.
  • Canada shows the highest average salary for males at 123,982 U.S. dollars whereas China shows the highest average salary for females at 111,291 U.S. dollars.
  • The United States shows the lowest average salary for both males and females at 119,845 and 105,162 U.S. dollars respectively.
  • In every country, males have a higher average salary than females with males earning around 10k-15k more U.S. dollars on average. For example, in Australia, the boxplot shows that males earn an average of 120,897 U.S. dollars whereas females earn 107,914 U.S. dollars.

The bar graph shows a gender gap in salary across all five countries, with males consistently earning more than females. The average salary varies only slightly across the countries, indicating similar market and industry compensation in these selected countries.

Ggplot Scatterplot: Salary vs. Years of Experience by Gender

Ggplot Scatterplot: Analysis and Discussion

Key Findings:

  • There is a positive correlation between years of experience and salary, as salary generally increases for both genders with more years of experience.
  • At each level of experience, there is variability present in salary, indicating that there are factors beyond experience that influence a person’s salary, such as job position, educational level, and seniority. For example, at 10 years of experience, the scatterplot shows salaries ranging from 50k to 180k U.S. dollars.
  • The growth in salary appears to plateau (level off) for both genders after a certain point. After roughly 20 years of experience, most individuals’ salaries stabilize around 180k U.S. dollars, suggesting that additional experience doesn’t contribute as much to increasing salary eventually.
  • There exists outliers for both genders. Male outliers tend to show extremely high salaries, reaching around 250k U.S. dollars, whereas female outliers show relatively low salaries around 60k U.S. dollars.

Overall, while experience positively influences salary for both genders, there is variability in the scatterplot that indicate that other factors contribute to the salary variance between individuals.

Note: The scatterplot is less continuous in nature due to the Years of Experience variable.

Conclusions and Future Insights

Summary:

  • Primary contributors of Salary: Out of the variables analyzed in this presentation, Gender, Years of Experience, and Educational Level are the most influential factors to a person’s salary.
  • Gender Salary Gap: Across all five countries and all ten racial groups, there is a consistent gender gap that was observed in salaries, with men earning higher salaries on average then women. This is supported by the result of the ANOVA test.
  • Years of Work Experience: While individuals with greater years of work experience tend to have higher salaries, this effect eventually plateaus after 15-20 years of experiences, suggesting that eventually experience is valued less that the person’s role/job.
  • Race and Countries: Race and Countries do not significantly influence salary, showing that within this specific dataset, individual’s are compensated consistently regardless of the ethnic/cultural background and geographic locations.
  • Education: Higher educational levels positively correlate with higher salaries. However, in combination with greater years of work experience, individuals with lower educational backgrounds can achieve higher earnings.

Future Insights: In order to understand the difference in salary between men and women, future studies should analyze the type of work/job positions that each gender occupy. This could help answer the differences in salary between genders as women may occupy more low-paid administrative work as compared to men who may occupy high-paid director/managerial roles.

Sources and R Tools