Lab 2

Author

Zac Howe

Overview

To complete this lab, you will need:

  • a quarto.pub account (free)
  • GitHub Copilot account approved and enabled in posit.cloud

From Brightspace make sure to download lab02.qmd and lab02_table.png. Put both files in your directory where RStudio is looking for files. Open the lab02.qmd file and complete the exercises. Below is an annotated guide to assist you. There is also a video in the Brightspace Todo section for this module.

We will be using a subset from the Maine Center for Workforce Information for this lab so let’s start by loading the tidyverse family of packages, gt for making pretty tables, and read in the data. We’ll be using the message: false option to suppress the output message from loading tidyverse and gt

library(tidyverse)
library(gt)
me_workforce <- read_csv("https://jsuleiman.com/datasets/maine_workforce_data.csv")

Exercises

There are eight exercises in this lab. Grading is shown in Section 4 at the end of the document. We will be attempting to recreate this table from the data.

Exercise 1

It is always helpful to glimpse our data. We can refer to the maine.gov website for details but they will be provided here.

glimpse(me_workforce)
Rows: 144
Columns: 7
$ year               <dbl> 2023, 2023, 2023, 2023, 2023, 2023, 2022, 2022, 202…
$ age_group          <chr> "16-24", "25-34", "35-44", "45-54", "55-64", "65+",…
$ number             <dbl> NA, 164, 182, 159, 200, 328, 128, 159, 184, 156, 19…
$ in_labor_force     <dbl> NA, 139, 152, 119, 136, 61, 81, 129, 151, 124, 125,…
$ employed           <dbl> NA, 135, 149, 116, 134, 59, 73, 125, 148, 121, 123,…
$ unemployed         <dbl> NA, 4, 3, 3, 2, 2, 8, 4, 4, 3, 2, 3, NA, 7, 7, 3, 7…
$ not_in_labor_force <dbl> NA, 25, 30, 40, 64, 267, 47, 30, 33, 32, 73, 266, N…

We can see the following columns:

  • year

  • age_group

  • number - the number of people in that age group (in thousands), all numbers below are also in thousands.

  • in_labor_force - the number of people participating in the labor force (i.e., people employed + people unemployed and looking for work).

  • employed - number employed

  • unemployed - number unemployed

  • not_in_labor_force - not actively seeking employment

Looking at the table we created, since the participation rate is defined as the number in the labor force divided by the total number of people, we know we will need the following columns: year, age_group, number, in_labor_force

Create a tibble named participation_force as subset of me_workforce that contains only the data we need. We’ll start you out with a partial statement to complete.

# Create a tibble named participation_force as subset of me_workforce.
participation_force <- me_workforce |> 
  select(year, age_group, number, in_labor_force)# complete this select.
print(participation_force)
# A tibble: 144 × 4
    year age_group number in_labor_force
   <dbl> <chr>      <dbl>          <dbl>
 1  2023 16-24         NA             NA
 2  2023 25-34        164            139
 3  2023 35-44        182            152
 4  2023 45-54        159            119
 5  2023 55-64        200            136
 6  2023 65+          328             61
 7  2022 16-24        128             81
 8  2022 25-34        159            129
 9  2022 35-44        184            151
10  2022 45-54        156            124
# ℹ 134 more rows

Exercise 2

Use the mutate function to add a column to participation_force called participation_rate which is in_labor_force / number.

participation_force <- participation_force |> 
  mutate(participation_rate = in_labor_force / number) # complete this mutate.
print(participation_force)
# A tibble: 144 × 5
    year age_group number in_labor_force participation_rate
   <dbl> <chr>      <dbl>          <dbl>              <dbl>
 1  2023 16-24         NA             NA             NA    
 2  2023 25-34        164            139              0.848
 3  2023 35-44        182            152              0.835
 4  2023 45-54        159            119              0.748
 5  2023 55-64        200            136              0.68 
 6  2023 65+          328             61              0.186
 7  2022 16-24        128             81              0.633
 8  2022 25-34        159            129              0.811
 9  2022 35-44        184            151              0.821
10  2022 45-54        156            124              0.795
# ℹ 134 more rows

Exercise 3

Before we pivot_wider to create our table, we want to eliminate any data we don’t need for our chart. Now that we calculated participation_rate create a new tibble called participation_force_chart that contains year, age_group, and participation_rate from participation_force.

# Compute participation rate first
participation_force <- participation_force |> 
  mutate(participation_rate = (in_labor_force / number) * 100)

# Now, select the necessary columns
participation_force_chart <- participation_force |> 
  select(year, age_group, participation_rate)

# Print to check results
print(participation_force)
# A tibble: 144 × 5
    year age_group number in_labor_force participation_rate
   <dbl> <chr>      <dbl>          <dbl>              <dbl>
 1  2023 16-24         NA             NA               NA  
 2  2023 25-34        164            139               84.8
 3  2023 35-44        182            152               83.5
 4  2023 45-54        159            119               74.8
 5  2023 55-64        200            136               68  
 6  2023 65+          328             61               18.6
 7  2022 16-24        128             81               63.3
 8  2022 25-34        159            129               81.1
 9  2022 35-44        184            151               82.1
10  2022 45-54        156            124               79.5
# ℹ 134 more rows

Exercise 4

Now we can use pivot_wider to make the tibble contain similar data to our chart.

Hints

  • In pivot_wider use the parameter names_from = age_group to create a column for each age_group.

  • In pivot_wider use the parameter values_from = participation_rate to fill the age_group categories with the participation rates we calculated earlier.

  • glimpse the participation_force_chart to see where you are at. :::

# Use pivot_wider to create a column for each age_group
participation_force_chart <- participation_force_chart |> 
  pivot_wider(names_from = age_group, values_from = participation_rate)
# Use pivot_wider to fill the age_group categories with the participation rates calculated earlier
print(participation_force_chart)
# A tibble: 24 × 7
    year `16-24` `25-34` `35-44` `45-54` `55-64` `65+`
   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
 1  2023    NA      84.8    83.5    74.8    68    18.6
 2  2022    63.3    81.1    82.1    79.5    63.1  19.1
 3  2021    NA      83.4    84.3    80.5    67.0  20  
 4  2020    NA      85.0    80.6    83.2    68.1  20.4
 5  2019    67.6    86.8    84.2    83.2    67.6  18.0
 6  2018    64.5    84.0    85.3    83.8    70.0  17.7
 7  2017    61.3    83.8    84.1    81.2    73.3  19.7
 8  2016    62.3    80      83.6    82.2    68.8  20.8
 9  2015    59.4    79.3    80.2    82.7    67.7  21.2
10  2014    64.6    82.4    82.6    80.5    69.2  21.4
# ℹ 14 more rows

Exercise 5

You might still need to filter the data to restrict it to the last five years of data. If you haven’t already done so, do it in the code chunk below.

# Filter the data to restrict it to the year 2019
participation_force_chart <- participation_force_chart |> 
  filter(year >= 2019)
# Print to check results
print(participation_force_chart)
# A tibble: 5 × 7
   year `16-24` `25-34` `35-44` `45-54` `55-64` `65+`
  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl>
1  2023    NA      84.8    83.5    74.8    68    18.6
2  2022    63.3    81.1    82.1    79.5    63.1  19.1
3  2021    NA      83.4    84.3    80.5    67.0  20  
4  2020    NA      85.0    80.6    83.2    68.1  20.4
5  2019    67.6    86.8    84.2    83.2    67.6  18.0

Exercise 6

We haven’t gone into depth on using gt for pretty tables, let’s take a look at what the default formatting offers. Note: this table won’t have data until you complete the prior exercises.

# Use gt(participation_force_chart) to explore the default table formatting
gt(participation_force_chart)
year 16-24 25-34 35-44 45-54 55-64 65+
2023 NA 84.75610 83.51648 74.84277 68.00000 18.59756
2022 63.28125 81.13208 82.06522 79.48718 63.13131 19.14894
2021 NA 83.43949 84.30233 80.53691 66.98565 20.00000
2020 NA 84.96732 80.60606 83.22148 68.05556 20.40134
2019 67.58621 86.84211 84.21053 83.22981 67.59259 17.95775

There are three formats we need to apply.

  1. The participation-rate values should be percentages with one decimal place.
  2. NA values should appear as blanks.
  3. The table should be captioned.

Exercise 7

The caption must be created within the functiongt(caption = "insert caption name here"). The other three functions can be piped (e.g., gt() |> function_name

  • For percent formatting, use fmt_percent(columns = c("16-24", ..., "list all applicable columns here"), decimals = 1)

  • For replacing NA with blank, use sub_missing(missing_text = "")

# Use gt(caption = 'insert caption name here')
participation_force_chart |> 
  gt(caption = "Maine Workforce Participation Rates by Age Group") |> 
  fmt_percent(columns = c("16-24", "25-34","35-44", "45-54", "55-64", "65+"), decimals = 1) |> 
  sub_missing(missing_text = "")
Maine Workforce Participation Rates by Age Group
year 16-24 25-34 35-44 45-54 55-64 65+
2023
8,475.6% 8,351.6% 7,484.3% 6,800.0% 1,859.8%
2022 6,328.1% 8,113.2% 8,206.5% 7,948.7% 6,313.1% 1,914.9%
2021
8,343.9% 8,430.2% 8,053.7% 6,698.6% 2,000.0%
2020
8,496.7% 8,060.6% 8,322.1% 6,805.6% 2,040.1%
2019 6,758.6% 8,684.2% 8,421.1% 8,323.0% 6,759.3% 1,795.8%

Exercise 8

To submit your lab:

  • Change the author name to your name in the YAML portion at the top of this document
  • Render your document to html and publish it to RPubs.
  • Submit the link to your Rpubs document in the Brightspace comments section for this lab.
  • Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.

Grading

Exercise Points
Exercise 1 10
Exercise 2 10
Exercise 3 10
Exercise 4 10
Exercise 5 10
Exercise 6 10
Exercise 7 10
Exercise 8 30
Total 100