The Center for Disease Control and Prevention (CDC), through the National Center for Health Statistics (NCHS), released data about health care employment and wages within the United States between 2000-2020. The selected occupations range between two categories of health care practitioners and technical roles such as physician assistants and pharmacy technicians and health care support roles such as nursing assistants and psychiatric aides.
Employment figures are number of filled positions. This includes both full- and part-time wage and salary positions. Estimates do not include the self-employed, owners and partners in unincorporated firms, household workers, or unpaid family workers. This data excludes occupations such as dentists, physicians, and chiropractors, which have a large percentage of workers who are self-employed. Wages reported is calculated as a mean hourly wage rate for an occupation, where the total wages that all workers in the occupation earn in an hour divided by the total number of employees in the occupation.
Data Source: Table HCEmpl
After tidying up the Excel file into long format, the analysis will look to see if there is a relationship between the percentage change of employment versus the percentage change of mean hourly wages.
library(tidyverse)
library(rio)
library(janitor)
To import the data into R, the rio library allows it to read a URL that directly links to an Excel file and transform it into a data frame
url <- 'https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/Health_US/hus20-21tables/hcempl.xlsx'
data = import(url)
knitr::kable(head(data, 3))
| Table HCEmpl. Health care employment and wages, by selected occupations: United States, selected years 2000–2020 | …2 | …3 | …4 | …5 | …6 | …7 | …8 | …9 | …10 | …11 | …12 | …13 | …14 | …15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Excel version (with more data years and standard errors when available): https://www.cdc.gov/nchs/hus/contents2020-2021.htm#Table-HCEmpl | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| [Data are based on a semiannual survey of nonfarm establishments] | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| NA | Employment\1 | NA | NA | NA | NA | NA | NA | Mean hourly wage (dollars)\2 | NA | NA | NA | NA | NA | NA |
Removing rows that does not provide figures from the table. Along with the janitor library, it can take a specific row and use it as the column headers, while also cleaning their names to a more appropriate syntax.
updated_data <-
data |>
filter(!row_number() %in% c(1:3, 49:51)) |> # drop non-data rows
row_to_names(row_number = 1) |> # first row as column names
clean_names() # clean names
knitr::kable(head(updated_data, 3))
| occupation_title | x2000 | x2005 | x2009 | x2010 | x2015 | x2016 | x2020 | x2000_2 | x2005_2 | x2009_2 | x2010_2 | x2015_2 | x2016_2 | x2020_2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | Health care practitioners and technical occupations | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| 3 | Audiologists | 11530 | 10030 | 12590 | 12860 | 12070 | 12310 | 13300 | 22.92 | 27.72 | 32.14 | 33.58 | 37.22 | 38.119999999999997 | 42.9 |
| 4 | Cardiovascular technologists and technicians | 40080 | 43560 | 48070 | 48720 | 51400 | 53760 | 55980 | 16.809999999999999 | 19.989999999999998 | 23.91 | 24.38 | 26.97 | 27.45 | 29.3 |
These next two sections subsets the data into two data frames that focus on employment and wage figures. This allows the data to be easily pivoted separately from multiple year columns into one long-format standard.
updated_employment <-
updated_data |>
select(c(1:8)) |>
gather('year', 'employment', -c('occupation_title')) |>
rename(occupation = occupation_title)
updated_employment$year <-
updated_employment$year |>
parse_number()
knitr::kable(head(updated_employment, 3))
| occupation | year | employment |
|---|---|---|
| Health care practitioners and technical occupations | 2000 | NA |
| Audiologists | 2000 | 11530 |
| Cardiovascular technologists and technicians | 2000 | 40080 |
updated_wages <-
updated_data |>
select(c(1, 9:15)) |>
gather('year', 'mean_hourly_wage', -c('occupation_title')) |>
rename(occupation = occupation_title, mean_wage = mean_hourly_wage)
updated_wages$year <-
updated_wages$year |>
parse_number()
knitr::kable(head(updated_wages, 3))
| occupation | year | mean_wage |
|---|---|---|
| Health care practitioners and technical occupations | 2000 | NA |
| Audiologists | 2000 | 22.92 |
| Cardiovascular technologists and technicians | 2000 | 16.809999999999999 |
Here, the data is joined back together, where year has its own unique column and we have the figures to compare a year and occupation easier.
employment_wages <-
inner_join(updated_employment, updated_wages)
knitr::kable(head(employment_wages, 3))
| occupation | year | employment | mean_wage |
|---|---|---|---|
| Health care practitioners and technical occupations | 2000 | NA | NA |
| Audiologists | 2000 | 11530 | 22.92 |
| Cardiovascular technologists and technicians | 2000 | 40080 | 16.809999999999999 |
Some of the occupation names have special characters and numbers as seen below.
knitr::kable(employment_wages[11:16, 1])
| x |
|---|
| Magnetic resonance imaging technologists\3 |
| Medical dosimetrists, medical records specialists, and health technologists and technicians, all other |
| Nuclear medicine technologists |
| Nurse anesthetists\4 |
| Nurse midwives\4 |
| Nurse practitioners\4 |
The employment_wages table is cleaned up removing unwanted characters and type casting integer and float values where appropriate
employment_wages <-
employment_wages |>
mutate(occupation = str_replace(occupation, "\\\\[:digit:]", "")) |> # clean occupation titles
mutate(employment = str_replace(employment, "[^[:alnum:]]+", "")) |> # clean employment
mutate(employment = as.integer(employment)) |>
mutate(mean_wage = str_replace(mean_wage, "[^[:alnum:]\\.]+", "")) |> # clean wages
mutate(mean_wage = as.numeric(mean_wage)) |>
drop_na(employment) # drop rows that does not have any employment data
knitr::kable(head(employment_wages, 3))
| occupation | year | employment | mean_wage |
|---|---|---|---|
| Audiologists | 2000 | 11530 | 22.92 |
| Cardiovascular technologists and technicians | 2000 | 40080 | 16.81 |
| Dental hygienists | 2000 | 148460 | 24.99 |
To calculate the percent change of employment and wage figures, the lag() function was used, grouping by occupation_title.
employment_wages <-
employment_wages |>
group_by(occupation) |>
mutate(emp_delta = employment - lag(employment),
wage_delta = mean_wage - lag(mean_wage),
emp_pct_chg = round(((employment - lag(employment)) / lag(employment)), 4),
wage_pct_chg = round(((mean_wage - lag(mean_wage)) / lag(mean_wage)), 4)) |>
arrange(occupation)
knitr::kable(head(employment_wages))
| occupation | year | employment | mean_wage | emp_delta | wage_delta | emp_pct_chg | wage_pct_chg |
|---|---|---|---|---|---|---|---|
| Audiologists | 2000 | 11530 | 22.92 | NA | NA | NA | NA |
| Audiologists | 2005 | 10030 | 27.72 | -1500 | 4.80 | -0.1301 | 0.2094 |
| Audiologists | 2009 | 12590 | 32.14 | 2560 | 4.42 | 0.2552 | 0.1595 |
| Audiologists | 2010 | 12860 | 33.58 | 270 | 1.44 | 0.0214 | 0.0448 |
| Audiologists | 2015 | 12070 | 37.22 | -790 | 3.64 | -0.0614 | 0.1084 |
| Audiologists | 2016 | 12310 | 38.12 | 240 | 0.90 | 0.0199 | 0.0242 |
Filtering for the highest increase in percentage change in wages for each occupation regardless of year, we can see that Nuclear medicine technologists had the largest wage percentage change at 34.97%. Does these maximum increases within each occupation relate to how many more people are being employed from previous years?
top_occ <-
employment_wages |>
filter(!is.na(emp_delta)) |>
group_by(occupation) |>
filter(wage_pct_chg == max(wage_pct_chg)) |>
arrange(desc(wage_pct_chg)) |>
select(occupation, emp_pct_chg, wage_pct_chg)
knitr::kable(top_occ)
| occupation | emp_pct_chg | wage_pct_chg |
|---|---|---|
| Nuclear medicine technologists | 0.0139 | 0.3497 |
| Occupational therapy assistants | 0.2040 | 0.2776 |
| Pharmacists | 0.0803 | 0.2764 |
| Radiologic technologists and technicians | 0.0726 | 0.2605 |
| Massage therapists | 0.5301 | 0.2463 |
| Physical therapist assistants | 0.0866 | 0.2308 |
| Registered nurses | 0.0815 | 0.2259 |
| Radiation therapists | 0.1027 | 0.2154 |
| Respiratory therapists | 0.1530 | 0.2107 |
| Diagnostic medical sonographers | 0.3725 | 0.2097 |
| Audiologists | -0.1301 | 0.2094 |
| Pharmacy aides | -0.2328 | 0.2022 |
| Speech-language pathologists | 0.1425 | 0.1965 |
| Cardiovascular technologists and technicians | 0.1035 | 0.1961 |
| Occupational therapists | 0.1191 | 0.1961 |
| Physician assistants | 0.2139 | 0.1934 |
| Licensed practical and licensed vocational nurses | 0.0450 | 0.1884 |
| Recreational therapists | -0.1366 | 0.1876 |
| Dietitians and nutritionists | 0.1353 | 0.1775 |
| Occupational therapy aides | -0.3003 | 0.1775 |
| Pharmacy technicians | 0.3972 | 0.1744 |
| Opticians, dispensing | 0.0527 | 0.1681 |
| Dental hygienists | 0.0854 | 0.1665 |
| Physical therapists | 0.1534 | 0.1661 |
| Medical equipment preparers | 0.2756 | 0.1629 |
| Nursing assistants | 0.0926 | 0.1623 |
| Medical transcriptionists | -0.0714 | 0.1609 |
| Emergency medical technicians and paramedics | 0.1069 | 0.1608 |
| Psychiatric aides | -0.2353 | 0.1576 |
| Nurse anesthetists | 0.0527 | 0.1534 |
| Psychiatric technicians | -0.1955 | 0.1512 |
| Dental assistants | 0.0861 | 0.1346 |
| Nurse midwives | 0.1356 | 0.1284 |
| Medical assistants | 0.2959 | 0.1256 |
| Dietetic technicians | 0.0307 | 0.1246 |
| Orderlies | -0.1770 | 0.1129 |
| Physical therapist aides | 0.1011 | 0.0973 |
| Magnetic resonance imaging technologists | 0.0954 | 0.0970 |
| Nurse practitioners | 0.4064 | 0.0944 |
| Clinical laboratory technologists and technicians | -0.0021 | 0.0712 |
Plotting these changes, we can see that there is not a relationship where as employee rates increase, so does wages.
top_occ |>
ggplot(aes(x = emp_pct_chg, y = wage_pct_chg)) +
geom_point(stat = 'identity')