Assignment 4

Author

Alex St.Hilaire

Code

library(tidyverse)
library(gt)
library(gtExtras)
jobs <- read_csv("https://jsuleiman.com/datasets/me_grad_employment.csv")

Code

#create a new table with numeric data from existing numeric as chr data
jobs_tbl <- jobs |>
  mutate( 
    growth_rate = parse_number(growth_rate),
    median_wage = parse_number(median_wage),
    entry_wage = parse_number(entry_wage)) |>
  select(!wage_year) #chatgpt for parse_number

Code

jobs_table <- jobs_tbl |>
  gt() |>
  fmt_number(
    columns = c(base_employment, projected_employment, median_wage, entry_wage),
    decimals = 0,
    use_seps = TRUE,
    accounting = TRUE
  ) |>
  tab_spanner(label = "Wages (US$)", columns = c(median_wage, entry_wage))|>
  cols_label(
    occupation = "Occupation",
    base_employment = "Base Employment",
    projected_employment = "Projected Employment",
    growth_rate = "Growth Rate (%)",
    annual_openings = "Annual Openings",
    median_wage = "Median",
    entry_wage = "Entry"
  ) |>
  tab_header(
    title = "Occupational Outlook",
    subtitle = md("Growth and Wage Expectations"),
  ) |>
  tab_options(heading.align = "left") |>
  tab_footnote(md("**Note:** Wage Data as of 2023"))|>
  gt_theme_espn()

Code

jobs_table <- jobs_table|>
  fmt(
    columns = c(median_wage, entry_wage),
    fns = function(x) paste0(round(x/1000), "k")
  ) #ChatGPT for function and trouble shooting

Code

# adding values for outliers
growth_min = -.750 - (1.5*6.5)
growth_max = 5.75 + (1.5*6.5)
med_wage_min = 52055 - (1.5*14882)
med_wage_max = 89323 + (1.5*14882)
ent_wage_min = 52055 - (1.5*15460)
ent_wage_max = 67515 + (1.5*15460)

Code

jobs_table <- jobs_table |>
  tab_style (
    style = cell_text(color = "green4", weight = "bold"),
    locations = cells_body(
      columns = growth_rate,
      rows = growth_rate > growth_max
    )
  )|>
  tab_style(
    style = cell_text(color = "red4", weight = "bold"),
    locations = cells_body(
      columns = growth_rate,
      rows = growth_rate < growth_min
    )
  )|>
    tab_style (
    style = cell_text(color = "green4", weight = "bold"),
    locations = cells_body(
      columns = median_wage,
      rows = median_wage > med_wage_max
    )
  )|>
  tab_style(
    style = cell_text(color = "red4", weight = "bold"),
    locations = cells_body(
      columns = median_wage,
      rows = median_wage < med_wage_min
    )
  )|>
    tab_style (
    style = cell_text(color = "green4", weight = "bold"),
    locations = cells_body(
      columns = entry_wage,
      rows = entry_wage > ent_wage_max
    )
  )|>
  tab_style(
    style = cell_text(color = "red4", weight = "bold"),
    locations = cells_body(
      columns = entry_wage,
      rows = entry_wage < ent_wage_min
    )
  )

jobs_table

Occupational Outlook
Growth and Wage Expectations
Occupation	Base Employment	Projected Employment	Growth Rate (%)	Annual Openings	Wages (US$)
Occupation	Base Employment	Projected Employment	Growth Rate (%)	Annual Openings	Median	Entry
Lawyers	2,725	2,862	5	121	99k	66k
Educational, Guidance, and Career Counselors and Advisors	1,679	1,712	2	122	57k	43k
Education Administrators, Kindergarten through Secondary	1,673	1,683	1	111	99k	75k
Physical Therapists	1,580	1,667	6	71	91k	75k
Mental Health and Substance Abuse Social Workers	1,459	1,460	0	100	66k	49k
Pharmacists	1,366	1,388	2	53	135k	102k
Nurse Practitioners	1,332	1,804	35	117	123k	100k
Occupational Therapists	1,161	1,200	3	68	80k	64k
Librarians and Media Collections Specialists	914	904	-1	83	59k	41k
Instructional Coordinators	895	895	0	76	74k	54k
Psychologists, All Other	893	920	3	60	86k	63k
Physician Assistants	795	971	22	61	132k	109k
Speech-Language Pathologists	764	852	12	52	80k	60k
Education Administrators, Postsecondary	752	738	-2	48	82k	60k
Health Specialties Teachers, Postsecondary	706	797	13	67	84k	60k
Art, Drama, and Music Teachers, Postsecondary	587	575	-2	45	78k	51k
Veterinarians	536	571	7	22	128k	90k
Healthcare Social Workers	484	495	2	42	64k	55k
Acupuncturists	432	432	0	26	65k	45k
English Language and Literature Teachers, Postsecondary	428	411	-4	30	82k	61k
Biochemists and Biophysicists	414	388	-6	25	84k	68k
Education Teachers, Postsecondary	410	404	-1	31	76k	51k
Nursing Instructors and Teachers, Postsecondary	404	452	12	38	77k	56k
Postsecondary Teachers, All Other	378	371	-2	28	72k	44k
Business Teachers, Postsecondary	309	313	1	24	81k	57k
Biological Science Teachers, Postsecondary	303	311	3	25	79k	59k
Note: Wage Data as of 2023

Table Narrative

Rule 1: Offset the headers from the body

The gt() package does a good job of offsetting the headers from the body of the table in a very clean, visually friendly way. Additionally, I renamed each variable to easily understood terminology and units using cols_labels(). I used tab_spanner to add a spanner header over median and entry wages, in order to reduce redundancy.

Rule 2: Use subtle dividers instead of heavy gridlines

I used the gt_theme_espn() which replaces the horizontal gridlines with an alternating shading. I prefer this style of tables when I am making them, because it removes excess lines while still allowing the reader’s eyes to discern each row individually. The theme also tastefully bolds and capitalizes the header labels.

Rule 3: Right-align numbers and headers

The default alignment in gt() is set to “auto”, which aligns columns by their data type. This has left-aligned the Occupation column because its text, and right-aligned the remaining columns as they are numeric. If we wanted to change this, we could use cols_align(), and specifically align columns as we prefer.

Rule 4: Left-align text and header

The gt_theme_espn() has left aligned our header for us, but without the theme, we have tab_options(heading.align = “left”) which would ensure that the table title and subtitle both are left aligned.

Rule 5: Select the appropriate level of precision

The decimals were removed using fmt_number( decimals = 0). I made the decision to display the wages rounded to the thousands, as this seemed to be the most relevant for comparing annual salaries. I also pasted “k” to the end of each number in the same function, which indicates that these wages are in the thousands. While one of the rules is to eliminate unit repetition, it would be potentially confusing to the average reader to see “99” instead of “99k”, even if it was noted in the spanner header. If the audience of this were specifically in the financial fields, I might consider a different display.

Rule 8: Highlight outliers

In order to find the outliers in the growth rates and wages, I took the interquartile range using summary() and IQR(). I then applied 1.5*IQR to the quartiles for each of these variables, and used a comparison operator in tab_styles(rows=) to determine if an observation should be highlighted. I used cell_text() to apply a color (green for positive, red for negative) and bold weight in order to high each outlier value.

Generative AI & Sources:

Tidyverse ggplot2 reference

https://ggplot2.tidyverse.org/reference/index.html

gt Tables

https://gt.rstudio.com/articles/gt.html

ChatGPT was used for some troubleshooting, and to learn remind me of some function usage. Session link below:

https://chatgpt.com/c/67f1c59d-00d8-800d-829f-b331591559b0