Assignment 4

Author

Alex St.Hilaire

Code
library(tidyverse)
library(gt)
library(gtExtras)
jobs <- read_csv("https://jsuleiman.com/datasets/me_grad_employment.csv")
Code
#create a new table with numeric data from existing numeric as chr data
jobs_tbl <- jobs |>
  mutate( 
    growth_rate = parse_number(growth_rate),
    median_wage = parse_number(median_wage),
    entry_wage = parse_number(entry_wage)) |>
  select(!wage_year) #chatgpt for parse_number
Code
jobs_table <- jobs_tbl |>
  gt() |>
  fmt_number(
    columns = c(base_employment, projected_employment, median_wage, entry_wage),
    decimals = 0,
    use_seps = TRUE,
    accounting = TRUE
  ) |>
  tab_spanner(label = "Wages (US$)", columns = c(median_wage, entry_wage))|>
  cols_label(
    occupation = "Occupation",
    base_employment = "Base Employment",
    projected_employment = "Projected Employment",
    growth_rate = "Growth Rate (%)",
    annual_openings = "Annual Openings",
    median_wage = "Median",
    entry_wage = "Entry"
  ) |>
  tab_header(
    title = "Occupational Outlook",
    subtitle = md("Growth and Wage Expectations"),
  ) |>
  tab_options(heading.align = "left") |>
  tab_footnote(md("**Note:** Wage Data as of 2023"))|>
  gt_theme_espn()
Code
jobs_table <- jobs_table|>
  fmt(
    columns = c(median_wage, entry_wage),
    fns = function(x) paste0(round(x/1000), "k")
  ) #ChatGPT for function and trouble shooting
Code
# adding values for outliers
growth_min = -.750 - (1.5*6.5)
growth_max = 5.75 + (1.5*6.5)
med_wage_min = 52055 - (1.5*14882)
med_wage_max = 89323 + (1.5*14882)
ent_wage_min = 52055 - (1.5*15460)
ent_wage_max = 67515 + (1.5*15460)
Code
jobs_table <- jobs_table |>
  tab_style (
    style = cell_text(color = "green4", weight = "bold"),
    locations = cells_body(
      columns = growth_rate,
      rows = growth_rate > growth_max
    )
  )|>
  tab_style(
    style = cell_text(color = "red4", weight = "bold"),
    locations = cells_body(
      columns = growth_rate,
      rows = growth_rate < growth_min
    )
  )|>
    tab_style (
    style = cell_text(color = "green4", weight = "bold"),
    locations = cells_body(
      columns = median_wage,
      rows = median_wage > med_wage_max
    )
  )|>
  tab_style(
    style = cell_text(color = "red4", weight = "bold"),
    locations = cells_body(
      columns = median_wage,
      rows = median_wage < med_wage_min
    )
  )|>
    tab_style (
    style = cell_text(color = "green4", weight = "bold"),
    locations = cells_body(
      columns = entry_wage,
      rows = entry_wage > ent_wage_max
    )
  )|>
  tab_style(
    style = cell_text(color = "red4", weight = "bold"),
    locations = cells_body(
      columns = entry_wage,
      rows = entry_wage < ent_wage_min
    )
  )

jobs_table
Occupational Outlook
Growth and Wage Expectations
Occupation Base Employment Projected Employment Growth Rate (%) Annual Openings
Wages (US$)
Median Entry
Lawyers 2,725 2,862 5 121 99k 66k
Educational, Guidance, and Career Counselors and Advisors 1,679 1,712 2 122 57k 43k
Education Administrators, Kindergarten through Secondary 1,673 1,683 1 111 99k 75k
Physical Therapists 1,580 1,667 6 71 91k 75k
Mental Health and Substance Abuse Social Workers 1,459 1,460 0 100 66k 49k
Pharmacists 1,366 1,388 2 53 135k 102k
Nurse Practitioners 1,332 1,804 35 117 123k 100k
Occupational Therapists 1,161 1,200 3 68 80k 64k
Librarians and Media Collections Specialists 914 904 -1 83 59k 41k
Instructional Coordinators 895 895 0 76 74k 54k
Psychologists, All Other 893 920 3 60 86k 63k
Physician Assistants 795 971 22 61 132k 109k
Speech-Language Pathologists 764 852 12 52 80k 60k
Education Administrators, Postsecondary 752 738 -2 48 82k 60k
Health Specialties Teachers, Postsecondary 706 797 13 67 84k 60k
Art, Drama, and Music Teachers, Postsecondary 587 575 -2 45 78k 51k
Veterinarians 536 571 7 22 128k 90k
Healthcare Social Workers 484 495 2 42 64k 55k
Acupuncturists 432 432 0 26 65k 45k
English Language and Literature Teachers, Postsecondary 428 411 -4 30 82k 61k
Biochemists and Biophysicists 414 388 -6 25 84k 68k
Education Teachers, Postsecondary 410 404 -1 31 76k 51k
Nursing Instructors and Teachers, Postsecondary 404 452 12 38 77k 56k
Postsecondary Teachers, All Other 378 371 -2 28 72k 44k
Business Teachers, Postsecondary 309 313 1 24 81k 57k
Biological Science Teachers, Postsecondary 303 311 3 25 79k 59k
Note: Wage Data as of 2023

Table Narrative

Rule 1: Offset the headers from the body

The gt() package does a good job of offsetting the headers from the body of the table in a very clean, visually friendly way. Additionally, I renamed each variable to easily understood terminology and units using cols_labels(). I used tab_spanner to add a spanner header over median and entry wages, in order to reduce redundancy.

Rule 2: Use subtle dividers instead of heavy gridlines

I used the gt_theme_espn() which replaces the horizontal gridlines with an alternating shading. I prefer this style of tables when I am making them, because it removes excess lines while still allowing the reader’s eyes to discern each row individually. The theme also tastefully bolds and capitalizes the header labels.

Rule 3: Right-align numbers and headers

The default alignment in gt() is set to “auto”, which aligns columns by their data type. This has left-aligned the Occupation column because its text, and right-aligned the remaining columns as they are numeric. If we wanted to change this, we could use cols_align(), and specifically align columns as we prefer.

Rule 4: Left-align text and header

The gt_theme_espn() has left aligned our header for us, but without the theme, we have tab_options(heading.align = “left”) which would ensure that the table title and subtitle both are left aligned.

Rule 5: Select the appropriate level of precision

The decimals were removed using fmt_number( decimals = 0). I made the decision to display the wages rounded to the thousands, as this seemed to be the most relevant for comparing annual salaries. I also pasted “k” to the end of each number in the same function, which indicates that these wages are in the thousands. While one of the rules is to eliminate unit repetition, it would be potentially confusing to the average reader to see “99” instead of “99k”, even if it was noted in the spanner header. If the audience of this were specifically in the financial fields, I might consider a different display.

Rule 8: Highlight outliers

In order to find the outliers in the growth rates and wages, I took the interquartile range using summary() and IQR(). I then applied 1.5*IQR to the quartiles for each of these variables, and used a comparison operator in tab_styles(rows=) to determine if an observation should be highlighted. I used cell_text() to apply a color (green for positive, red for negative) and bold weight in order to high each outlier value.

Generative AI & Sources:

Tidyverse ggplot2 reference

https://ggplot2.tidyverse.org/reference/index.html

gt Tables

https://gt.rstudio.com/articles/gt.html

ChatGPT was used for some troubleshooting, and to learn remind me of some function usage. Session link below:

https://chatgpt.com/c/67f1c59d-00d8-800d-829f-b331591559b0