Code
library(tidyverse)
library(gt)
library(gtExtras)
jobs <- read_csv("https://jsuleiman.com/datasets/me_grad_employment.csv")library(tidyverse)
library(gt)
library(gtExtras)
jobs <- read_csv("https://jsuleiman.com/datasets/me_grad_employment.csv")#create a new table with numeric data from existing numeric as chr data
jobs_tbl <- jobs |>
mutate(
growth_rate = parse_number(growth_rate),
median_wage = parse_number(median_wage),
entry_wage = parse_number(entry_wage)) |>
select(!wage_year) #chatgpt for parse_numberjobs_table <- jobs_tbl |>
gt() |>
fmt_number(
columns = c(base_employment, projected_employment, median_wage, entry_wage),
decimals = 0,
use_seps = TRUE,
accounting = TRUE
) |>
tab_spanner(label = "Wages (US$)", columns = c(median_wage, entry_wage))|>
cols_label(
occupation = "Occupation",
base_employment = "Base Employment",
projected_employment = "Projected Employment",
growth_rate = "Growth Rate (%)",
annual_openings = "Annual Openings",
median_wage = "Median",
entry_wage = "Entry"
) |>
tab_header(
title = "Occupational Outlook",
subtitle = md("Growth and Wage Expectations"),
) |>
tab_options(heading.align = "left") |>
tab_footnote(md("**Note:** Wage Data as of 2023"))|>
gt_theme_espn()jobs_table <- jobs_table|>
fmt(
columns = c(median_wage, entry_wage),
fns = function(x) paste0(round(x/1000), "k")
) #ChatGPT for function and trouble shooting# adding values for outliers
growth_min = -.750 - (1.5*6.5)
growth_max = 5.75 + (1.5*6.5)
med_wage_min = 52055 - (1.5*14882)
med_wage_max = 89323 + (1.5*14882)
ent_wage_min = 52055 - (1.5*15460)
ent_wage_max = 67515 + (1.5*15460)jobs_table <- jobs_table |>
tab_style (
style = cell_text(color = "green4", weight = "bold"),
locations = cells_body(
columns = growth_rate,
rows = growth_rate > growth_max
)
)|>
tab_style(
style = cell_text(color = "red4", weight = "bold"),
locations = cells_body(
columns = growth_rate,
rows = growth_rate < growth_min
)
)|>
tab_style (
style = cell_text(color = "green4", weight = "bold"),
locations = cells_body(
columns = median_wage,
rows = median_wage > med_wage_max
)
)|>
tab_style(
style = cell_text(color = "red4", weight = "bold"),
locations = cells_body(
columns = median_wage,
rows = median_wage < med_wage_min
)
)|>
tab_style (
style = cell_text(color = "green4", weight = "bold"),
locations = cells_body(
columns = entry_wage,
rows = entry_wage > ent_wage_max
)
)|>
tab_style(
style = cell_text(color = "red4", weight = "bold"),
locations = cells_body(
columns = entry_wage,
rows = entry_wage < ent_wage_min
)
)
jobs_table| Occupational Outlook | ||||||
|---|---|---|---|---|---|---|
| Growth and Wage Expectations | ||||||
| Occupation | Base Employment | Projected Employment | Growth Rate (%) | Annual Openings |
Wages (US$)
|
|
| Median | Entry | |||||
| Lawyers | 2,725 | 2,862 | 5 | 121 | 99k | 66k |
| Educational, Guidance, and Career Counselors and Advisors | 1,679 | 1,712 | 2 | 122 | 57k | 43k |
| Education Administrators, Kindergarten through Secondary | 1,673 | 1,683 | 1 | 111 | 99k | 75k |
| Physical Therapists | 1,580 | 1,667 | 6 | 71 | 91k | 75k |
| Mental Health and Substance Abuse Social Workers | 1,459 | 1,460 | 0 | 100 | 66k | 49k |
| Pharmacists | 1,366 | 1,388 | 2 | 53 | 135k | 102k |
| Nurse Practitioners | 1,332 | 1,804 | 35 | 117 | 123k | 100k |
| Occupational Therapists | 1,161 | 1,200 | 3 | 68 | 80k | 64k |
| Librarians and Media Collections Specialists | 914 | 904 | -1 | 83 | 59k | 41k |
| Instructional Coordinators | 895 | 895 | 0 | 76 | 74k | 54k |
| Psychologists, All Other | 893 | 920 | 3 | 60 | 86k | 63k |
| Physician Assistants | 795 | 971 | 22 | 61 | 132k | 109k |
| Speech-Language Pathologists | 764 | 852 | 12 | 52 | 80k | 60k |
| Education Administrators, Postsecondary | 752 | 738 | -2 | 48 | 82k | 60k |
| Health Specialties Teachers, Postsecondary | 706 | 797 | 13 | 67 | 84k | 60k |
| Art, Drama, and Music Teachers, Postsecondary | 587 | 575 | -2 | 45 | 78k | 51k |
| Veterinarians | 536 | 571 | 7 | 22 | 128k | 90k |
| Healthcare Social Workers | 484 | 495 | 2 | 42 | 64k | 55k |
| Acupuncturists | 432 | 432 | 0 | 26 | 65k | 45k |
| English Language and Literature Teachers, Postsecondary | 428 | 411 | -4 | 30 | 82k | 61k |
| Biochemists and Biophysicists | 414 | 388 | -6 | 25 | 84k | 68k |
| Education Teachers, Postsecondary | 410 | 404 | -1 | 31 | 76k | 51k |
| Nursing Instructors and Teachers, Postsecondary | 404 | 452 | 12 | 38 | 77k | 56k |
| Postsecondary Teachers, All Other | 378 | 371 | -2 | 28 | 72k | 44k |
| Business Teachers, Postsecondary | 309 | 313 | 1 | 24 | 81k | 57k |
| Biological Science Teachers, Postsecondary | 303 | 311 | 3 | 25 | 79k | 59k |
| Note: Wage Data as of 2023 | ||||||
The gt() package does a good job of offsetting the headers from the body of the table in a very clean, visually friendly way. Additionally, I renamed each variable to easily understood terminology and units using cols_labels(). I used tab_spanner to add a spanner header over median and entry wages, in order to reduce redundancy.
I used the gt_theme_espn() which replaces the horizontal gridlines with an alternating shading. I prefer this style of tables when I am making them, because it removes excess lines while still allowing the reader’s eyes to discern each row individually. The theme also tastefully bolds and capitalizes the header labels.
The default alignment in gt() is set to “auto”, which aligns columns by their data type. This has left-aligned the Occupation column because its text, and right-aligned the remaining columns as they are numeric. If we wanted to change this, we could use cols_align(), and specifically align columns as we prefer.
The gt_theme_espn() has left aligned our header for us, but without the theme, we have tab_options(heading.align = “left”) which would ensure that the table title and subtitle both are left aligned.
The decimals were removed using fmt_number( decimals = 0). I made the decision to display the wages rounded to the thousands, as this seemed to be the most relevant for comparing annual salaries. I also pasted “k” to the end of each number in the same function, which indicates that these wages are in the thousands. While one of the rules is to eliminate unit repetition, it would be potentially confusing to the average reader to see “99” instead of “99k”, even if it was noted in the spanner header. If the audience of this were specifically in the financial fields, I might consider a different display.
In order to find the outliers in the growth rates and wages, I took the interquartile range using summary() and IQR(). I then applied 1.5*IQR to the quartiles for each of these variables, and used a comparison operator in tab_styles(rows=) to determine if an observation should be highlighted. I used cell_text() to apply a color (green for positive, red for negative) and bold weight in order to high each outlier value.
Tidyverse ggplot2 reference
https://ggplot2.tidyverse.org/reference/index.html
gt Tables
https://gt.rstudio.com/articles/gt.html
ChatGPT was used for some troubleshooting, and to learn remind me of some function usage. Session link below: