Lab 2

Author

Anna StPierre

Overview

To complete this lab, you will need:

  • a quarto.pub account (free)
  • GitHub Copilot account approved and enabled in posit.cloud

From Brightspace make sure to download lab02.qmd and lab02_table.png. Put both files in your directory where RStudio is looking for files. Open the lab02.qmd file and complete the exercises. Below is an annotated guide to assist you. There is also a video in the Brightspace Todo section for this module.

We will be using a subset from the Maine Center for Workforce Information for this lab so let’s start by loading the tidyverse family of packages, gt for making pretty tables, and read in the data. We’ll be using the message: false option to suppress the output message from loading tidyverse and gt

library(tidyverse)
library(gt)
me_workforce <- read_csv("https://jsuleiman.com/datasets/maine_workforce_data.csv")

Exercises

There are eight exercises in this lab. Grading is shown in Section 4 at the end of the document. We will be attempting to recreate this table from the data.

Exercise 1

It is always helpful to glimpse our data. We can refer to the maine.gov website for details but they will be provided here.

glimpse(me_workforce)
Rows: 144
Columns: 7
$ year               <dbl> 2023, 2023, 2023, 2023, 2023, 2023, 2022, 2022, 202…
$ age_group          <chr> "16-24", "25-34", "35-44", "45-54", "55-64", "65+",…
$ number             <dbl> NA, 164, 182, 159, 200, 328, 128, 159, 184, 156, 19…
$ in_labor_force     <dbl> NA, 139, 152, 119, 136, 61, 81, 129, 151, 124, 125,…
$ employed           <dbl> NA, 135, 149, 116, 134, 59, 73, 125, 148, 121, 123,…
$ unemployed         <dbl> NA, 4, 3, 3, 2, 2, 8, 4, 4, 3, 2, 3, NA, 7, 7, 3, 7…
$ not_in_labor_force <dbl> NA, 25, 30, 40, 64, 267, 47, 30, 33, 32, 73, 266, N…

We can see the following columns:

  • year

  • age_group

  • number - the number of people in that age group (in thousands), all numbers below are also in thousands.

  • in_labor_force - the number of people participating in the labor force (i.e., people employed + people unemployed and looking for work).

  • employed - number employed

  • unemployed - number unemployed

  • not_in_labor_force - not actively seeking employment

Looking at the table we created, since the participation rate is defined as the number in the labor force divided by the total number of people, we know we will need the following columns: year, age_group, number, in_labor_force

Create a tibble named participation_force as subset of me_workforce that contains only the data we need. We’ll start you out with a partial statement to complete.

participation_force <- me_workforce |> 
  select(year, age_group, number, in_labor_force)

Exercise 2

Use the mutate function to add a column to participation_force called participation_rate which is in_labor_force / number.

participation_force <- participation_force |> 
  mutate(participation_rate = in_labor_force / number)

Exercise 3

Before we pivot_wider to create our table, we want to eliminate any data we don’t need for our chart. Now that we calculated participation_rate create a new tibble called participation_force_chart that contains year, age_group, and participation_rate from participation_force.

participation_force_chart <- participation_force |> 
  select(year, age_group, participation_rate)

Exercise 4

Now we can use pivot_wider to make the tibble contain similar data to our chart.

Hints

  • In pivot_wider use the parameter names_from = age_group to create a column for each age_group.

  • In pivot_wider use the parameter values_from = participation_rate to fill the age_group categories with the participation rates we calculated earlier.

  • glimpse the participation_force_chart to see where you are at. :::

participation_force_chart <- participation_force_chart |> 
  pivot_wider(names_from = age_group, values_from = participation_rate)

Exercise 5

You might still need to filter the data to restrict it to the last five years of data. If you haven’t already done so, do it in the code chunk below.

participation_force_chart <- participation_force_chart |>
  filter(year >= max(year) - 4)

Exercise 6

We haven’t gone into depth on using gt for pretty tables, let’s take a look at what the default formatting offers. Note: this table won’t have data until you complete the prior exercises.

participation_force_chart |> gt()
year 16-24 25-34 35-44 45-54 55-64 65+
2023 NA 0.8475610 0.8351648 0.7484277 0.6800000 0.1859756
2022 0.6328125 0.8113208 0.8206522 0.7948718 0.6313131 0.1914894
2021 NA 0.8343949 0.8430233 0.8053691 0.6698565 0.2000000
2020 NA 0.8496732 0.8060606 0.8322148 0.6805556 0.2040134
2019 0.6758621 0.8684211 0.8421053 0.8322981 0.6759259 0.1795775

There are three formats we need to apply.

  1. The participation-rate values should be percentages with one decimal place.
  2. NA values should appear as blanks.
  3. The table should be captioned.

Exercise 7

The caption must be created within the functiongt(caption = "insert caption name here"). The other three functions can be piped (e.g., gt() |> function_name

  • For percent formatting, use fmt_percent(columns = c("16-24", ..., "list all applicable columns here"), decimals = 1)

  • For replacing NA with blank, use sub_missing(missing_text = "")

participation_force_chart |> gt(caption = “Participation Rate by Age Group in Maine Workforce”) |> fmt_percent(columns = c(“16-24”, “25-54”, “55-64”, “65+”), decimals = 1) |> sub_missing(missing_text = ““)

Exercise 8

To submit your lab:

  • Change the author name to your name in the YAML portion at the top of this document
  • Render your document to html and publish it to RPubs.
  • Submit the link to your Rpubs document in the Brightspace comments section for this lab.
  • Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.

Grading

Exercise Points
Exercise 1 10
Exercise 2 10
Exercise 3 10
Exercise 4 10
Exercise 5 10
Exercise 6 10
Exercise 7 10
Exercise 8 30
Total 100