The Data

The Bureau of Labor Statistics compiles salary information across different occupations and industries. The data is available for browsing at https://www.bls.gov/oes/current/oes_nat.htm, and one can also download a data file with richer information. Look for “downloadable XLS file” which provides not just average salary levels but also the 10th, 25th, 75th, and 90th percentile wages for each category. I am going to work with this data set. Let’s read the Excel file into R.

df.BLS <- read_excel("national_M2019_dl.xlsx")
# df1 <- read.xlsx2("national_M2019_dl.xlsx", sheetIndex = 1,  startRow=2)

Columns

This brings in a data set with rows and columns. We don’t need all the columns. Let’s pick a subset of them (occ_title, o_group, tot_emp, a_mean, a_pct10, a_pct25, a_median, a_pct75, a_pct90). What they denote are: (Title, Group, Total Employment, Annual Mean Salary, 10 percentile level, 25 percentile level, 50 percentile level, 75 percentile level, 90 percentile level).

Categories of Data Rows

Jobs can be organized into a hierarchy, so that detailed occupational categories are rolled up into higher-level categories (indicated in the o_group column, with values Major and Minor). since it is not logical to cross these hierarchical boundaries while displaying comparisons, we’ll split the data set into 3 categories based on the values of the o_group column: just the “major” categories, then the “minor” categories, and finally the data set where o_group is “broad” or “detailed”.

Examples of Data

By Major Occupational Category

head(df.major)

By Minor Occupational Category

head(df.minor)

By Broad/Detailed Occupational Category

head(df.detailed) 

Analysis of Jobs: How Numerous vs How Well-Paying?

An interesting question to ask is how common are high-paying vs low-paying jobs? More generally, how many jobs exist at different levels of salary? Here is a density function created by just looking at major occupational categories. The left panel is a density function that indicates how numerous jobs are at different salary levels. The right panel shows a cumulative density function (i.e., what fraction of jobs below each salary level).

The chart shows that the modal (most common) annual salary is around $40,000. About 5% of jobs pay over $100,000 a year, and only about 2% pay over $120,000 a year.

And a similar plot but this time using minor categories.

And finally a more fine-grained view using the detailed job categories.

Analysis by MAJOR occupation categories

Let’s take a quick look at the employment categories, specifically the total number employed in each category and the average annual salary in each category.

To make it a little more informative, we will also get a sense of the variation within each category (vs looking at just mean salary). So, let’s compute the ratio of 75th percentile and 25th percentile annual salary, and reflect that in the size of the dots.

One thing to note, not surprisingly, is that the more numerous jobs (the ones to the right on the horizontal axis) pay less. Or, in other words, high-paying jobs (the ones higher up vertically) are less common.

Now let’s add the names of these categories onto the plot. This is a good view because there are only 21 Major categories, so they can all be represented on the plot. Of course, each category has many levels of jobs – for instance, healthcare can include a machine technician, nurse, physician, and surgeon (and many others) – and so there is a lot of variation buried within each category. Still, the chart shows which categories overall come out ahead in terms of income, while others (although quite numerous) have little prospects for high salaries.

Analysis by MINOR occupation categories

The same for minor categories

and now with names.

Analysis by BROAD and DETAILED occupation categories

Let’s take a quick look at the detailed employment categories. But this is a large set, and with huge variation, so we will split it into “high” “medium” and “low” groups based on annual mean salary, and further within each group into categories that employ more than, or less than, 50,000 people. Then visualize each of them separately.

High Salary Occupations

Now let’s add the names of these categories onto the plot.

Medium Salary Occupations

Now let’s add the names of these categories onto the plot.

Low Salary Occupations

Low salary occupations with > 50,000 employed.

Low salary occupations with < 50,000 employed.

Lowest Salary Occupations

Lowest salary occupations with > 50,000 employed.

Lowest salary occupations with < 50,000 employed.