library(tidyverse)
library(gt)
<- read_csv("https://jsuleiman.com/datasets/maine_workforce_data.csv") me_workforce
Lab 2
Overview
To complete this lab, you will need:
- a quarto.pub account (free)
- GitHub Copilot account approved and enabled in posit.cloud
From Brightspace make sure to download lab02.qmd and lab02_table.png. Put both files in your directory where RStudio is looking for files. Open the lab02.qmd
file and complete the exercises. Below is an annotated guide to assist you. There is also a video in the Brightspace Todo section for this module.
We will be using a subset from the Maine Center for Workforce Information for this lab so let’s start by loading the tidyverse family of packages, gt
for making pretty tables, and read in the data. We’ll be using the message: false
option to suppress the output message from loading tidyverse
and gt
Exercises
There are eight exercises in this lab. Grading is shown in Section 4 at the end of the document. We will be attempting to recreate this table from the data.
Exercise 1
It is always helpful to glimpse our data. We can refer to the maine.gov website for details but they will be provided here.
glimpse(me_workforce)
Rows: 144
Columns: 7
$ year <dbl> 2023, 2023, 2023, 2023, 2023, 2023, 2022, 2022, 202…
$ age_group <chr> "16-24", "25-34", "35-44", "45-54", "55-64", "65+",…
$ number <dbl> NA, 164, 182, 159, 200, 328, 128, 159, 184, 156, 19…
$ in_labor_force <dbl> NA, 139, 152, 119, 136, 61, 81, 129, 151, 124, 125,…
$ employed <dbl> NA, 135, 149, 116, 134, 59, 73, 125, 148, 121, 123,…
$ unemployed <dbl> NA, 4, 3, 3, 2, 2, 8, 4, 4, 3, 2, 3, NA, 7, 7, 3, 7…
$ not_in_labor_force <dbl> NA, 25, 30, 40, 64, 267, 47, 30, 33, 32, 73, 266, N…
We can see the following columns:
year
age_group
number
- the number of people in that age group (in thousands), all numbers below are also in thousands.in_labor_force
- the number of people participating in the labor force (i.e., people employed + people unemployed and looking for work).employed
- number employedunemployed
- number unemployednot_in_labor_force
- not actively seeking employment
Looking at the table we created, since the participation rate is defined as the number in the labor force divided by the total number of people, we know we will need the following columns: year
, age_group
, number
, in_labor_force
Create a tibble named participation_force
as subset of me_workforce
that contains only the data we need. We’ll start you out with a partial statement to complete.
<- me_workforce |>
participation_force select(year, age_group, number, in_labor_force)
Exercise 2
Use the mutate
function to add a column to participation_force
called participation_rate
which is in_labor_force / number
.
<- participation_force |>
participation_force mutate(participation_rate = in_labor_force / number)
Exercise 3
Before we pivot_wider
to create our table, we want to eliminate any data we don’t need for our chart. Now that we calculated participation_rate
create a new tibble called participation_force_chart
that contains year
, age_group
, and participation_rate
from participation_force
.
<- participation_force |>
participation_force_chart select(year, age_group, participation_rate)
Exercise 4
Now we can use pivot_wider
to make the tibble contain similar data to our chart.
Hints
In
pivot_wider
use the parameternames_from = age_group
to create a column for each age_group.In
pivot_wider
use the parametervalues_from = participation_rate
to fill the age_group categories with the participation rates we calculated earlier.glimpse
theparticipation_force_chart
to see where you are at. :::
<- participation_force_chart |>
participation_force_chart pivot_wider(names_from = age_group, values_from = participation_rate)
Exercise 5
You might still need to filter
the data to restrict it to the last five years of data. If you haven’t already done so, do it in the code chunk below.
<- participation_force_chart |>
participation_force_chart filter(year >= max(year) - 4)
Exercise 6
We haven’t gone into depth on using gt
for pretty tables, let’s take a look at what the default formatting offers. Note: this table won’t have data until you complete the prior exercises.
|> gt() participation_force_chart
year | 16-24 | 25-34 | 35-44 | 45-54 | 55-64 | 65+ |
---|---|---|---|---|---|---|
2023 | NA | 0.8475610 | 0.8351648 | 0.7484277 | 0.6800000 | 0.1859756 |
2022 | 0.6328125 | 0.8113208 | 0.8206522 | 0.7948718 | 0.6313131 | 0.1914894 |
2021 | NA | 0.8343949 | 0.8430233 | 0.8053691 | 0.6698565 | 0.2000000 |
2020 | NA | 0.8496732 | 0.8060606 | 0.8322148 | 0.6805556 | 0.2040134 |
2019 | 0.6758621 | 0.8684211 | 0.8421053 | 0.8322981 | 0.6759259 | 0.1795775 |
There are three formats we need to apply.
- The
participation-rate
values should be percentages with one decimal place. NA
values should appear as blanks.- The table should be captioned.
Exercise 7
The caption must be created within the functiongt(caption = "insert caption name here")
. The other three functions can be piped (e.g., gt() |> function_name
For percent formatting, use
fmt_percent(columns = c("16-24", ..., "list all applicable columns here"), decimals = 1)
For replacing NA with blank, use
sub_missing(missing_text = "")
participation_force_chart |> gt(caption = “Participation Rate by Age Group in Maine Workforce”) |> fmt_percent(columns = c(“16-24”, “25-54”, “55-64”, “65+”), decimals = 1) |> sub_missing(missing_text = ““)
Exercise 8
To submit your lab:
- Change the author name to your name in the YAML portion at the top of this document
- Render your document to html and publish it to RPubs.
- Submit the link to your Rpubs document in the Brightspace comments section for this lab.
- Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.
Grading
Exercise | Points |
---|---|
Exercise 1 | 10 |
Exercise 2 | 10 |
Exercise 3 | 10 |
Exercise 4 | 10 |
Exercise 5 | 10 |
Exercise 6 | 10 |
Exercise 7 | 10 |
Exercise 8 | 30 |
Total | 100 |