library(tidyverse)
library(gt)
<- read_csv("https://jsuleiman.com/datasets/maine_workforce_data.csv") me_workforce
Lab 2
Overview
To complete this lab, you will need:
- a quarto.pub account (free)
- GitHub Copilot account approved and enabled in posit.cloud
From Brightspace make sure to download lab02.qmd and lab02_table.png. Put both files in your directory where RStudio is looking for files. Open the lab02.qmd
file and complete the exercises. Below is an annotated guide to assist you. There is also a video in the Brightspace Todo section for this module.
We will be using a subset from the Maine Center for Workforce Information for this lab so let’s start by loading the tidyverse family of packages, gt
for making pretty tables, and read in the data. We’ll be using the message: false
option to suppress the output message from loading tidyverse
and gt
Exercises
There are eight exercises in this lab. Grading is shown in Section 4 at the end of the document. We will be attempting to recreate this table from the data.
Exercise 1
It is always helpful to glimpse our data. We can refer to the maine.gov website for details but they will be provided here.
glimpse(me_workforce)
Rows: 144
Columns: 7
$ year <dbl> 2023, 2023, 2023, 2023, 2023, 2023, 2022, 2022, 202…
$ age_group <chr> "16-24", "25-34", "35-44", "45-54", "55-64", "65+",…
$ number <dbl> NA, 164, 182, 159, 200, 328, 128, 159, 184, 156, 19…
$ in_labor_force <dbl> NA, 139, 152, 119, 136, 61, 81, 129, 151, 124, 125,…
$ employed <dbl> NA, 135, 149, 116, 134, 59, 73, 125, 148, 121, 123,…
$ unemployed <dbl> NA, 4, 3, 3, 2, 2, 8, 4, 4, 3, 2, 3, NA, 7, 7, 3, 7…
$ not_in_labor_force <dbl> NA, 25, 30, 40, 64, 267, 47, 30, 33, 32, 73, 266, N…
We can see the following columns:
year
age_group
number
- the number of people in that age group (in thousands), all numbers below are also in thousands.in_labor_force
- the number of people participating in the labor force (i.e., people employed + people unemployed and looking for work).employed
- number employedunemployed
- number unemployednot_in_labor_force
- not actively seeking employment
Looking at the table we created, since the participation rate is defined as the number in the labor force divided by the total number of people, we know we will need the following columns: year
, age_group
, number
, in_labor_force
Create a tibble named participation_force
as subset of me_workforce
that contains only the data we need. We’ll start you out with a partial statement to complete.
# Create a tibble named participation_force as subset of me_workforce.
<- me_workforce |>
participation_force select(year, age_group, number, in_labor_force)# complete this select.
print(participation_force)
# A tibble: 144 × 4
year age_group number in_labor_force
<dbl> <chr> <dbl> <dbl>
1 2023 16-24 NA NA
2 2023 25-34 164 139
3 2023 35-44 182 152
4 2023 45-54 159 119
5 2023 55-64 200 136
6 2023 65+ 328 61
7 2022 16-24 128 81
8 2022 25-34 159 129
9 2022 35-44 184 151
10 2022 45-54 156 124
# ℹ 134 more rows
Exercise 2
Use the mutate
function to add a column to participation_force
called participation_rate
which is in_labor_force / number
.
<- participation_force |>
participation_force mutate(participation_rate = in_labor_force / number) # complete this mutate.
print(participation_force)
# A tibble: 144 × 5
year age_group number in_labor_force participation_rate
<dbl> <chr> <dbl> <dbl> <dbl>
1 2023 16-24 NA NA NA
2 2023 25-34 164 139 0.848
3 2023 35-44 182 152 0.835
4 2023 45-54 159 119 0.748
5 2023 55-64 200 136 0.68
6 2023 65+ 328 61 0.186
7 2022 16-24 128 81 0.633
8 2022 25-34 159 129 0.811
9 2022 35-44 184 151 0.821
10 2022 45-54 156 124 0.795
# ℹ 134 more rows
Exercise 3
Before we pivot_wider
to create our table, we want to eliminate any data we don’t need for our chart. Now that we calculated participation_rate
create a new tibble called participation_force_chart
that contains year
, age_group
, and participation_rate
from participation_force
.
# Compute participation rate first
<- participation_force |>
participation_force mutate(participation_rate = (in_labor_force / number) * 100)
# Now, select the necessary columns
<- participation_force |>
participation_force_chart select(year, age_group, participation_rate)
# Print to check results
print(participation_force)
# A tibble: 144 × 5
year age_group number in_labor_force participation_rate
<dbl> <chr> <dbl> <dbl> <dbl>
1 2023 16-24 NA NA NA
2 2023 25-34 164 139 84.8
3 2023 35-44 182 152 83.5
4 2023 45-54 159 119 74.8
5 2023 55-64 200 136 68
6 2023 65+ 328 61 18.6
7 2022 16-24 128 81 63.3
8 2022 25-34 159 129 81.1
9 2022 35-44 184 151 82.1
10 2022 45-54 156 124 79.5
# ℹ 134 more rows
Exercise 4
Now we can use pivot_wider
to make the tibble contain similar data to our chart.
Hints
In
pivot_wider
use the parameternames_from = age_group
to create a column for each age_group.In
pivot_wider
use the parametervalues_from = participation_rate
to fill the age_group categories with the participation rates we calculated earlier.glimpse
theparticipation_force_chart
to see where you are at. :::
# Use pivot_wider to create a column for each age_group
<- participation_force_chart |>
participation_force_chart pivot_wider(names_from = age_group, values_from = participation_rate)
# Use pivot_wider to fill the age_group categories with the participation rates calculated earlier
print(participation_force_chart)
# A tibble: 24 × 7
year `16-24` `25-34` `35-44` `45-54` `55-64` `65+`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023 NA 84.8 83.5 74.8 68 18.6
2 2022 63.3 81.1 82.1 79.5 63.1 19.1
3 2021 NA 83.4 84.3 80.5 67.0 20
4 2020 NA 85.0 80.6 83.2 68.1 20.4
5 2019 67.6 86.8 84.2 83.2 67.6 18.0
6 2018 64.5 84.0 85.3 83.8 70.0 17.7
7 2017 61.3 83.8 84.1 81.2 73.3 19.7
8 2016 62.3 80 83.6 82.2 68.8 20.8
9 2015 59.4 79.3 80.2 82.7 67.7 21.2
10 2014 64.6 82.4 82.6 80.5 69.2 21.4
# ℹ 14 more rows
Exercise 5
You might still need to filter
the data to restrict it to the last five years of data. If you haven’t already done so, do it in the code chunk below.
# Filter the data to restrict it to the year 2019
<- participation_force_chart |>
participation_force_chart filter(year >= 2019)
# Print to check results
print(participation_force_chart)
# A tibble: 5 × 7
year `16-24` `25-34` `35-44` `45-54` `55-64` `65+`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023 NA 84.8 83.5 74.8 68 18.6
2 2022 63.3 81.1 82.1 79.5 63.1 19.1
3 2021 NA 83.4 84.3 80.5 67.0 20
4 2020 NA 85.0 80.6 83.2 68.1 20.4
5 2019 67.6 86.8 84.2 83.2 67.6 18.0
Exercise 6
We haven’t gone into depth on using gt
for pretty tables, let’s take a look at what the default formatting offers. Note: this table won’t have data until you complete the prior exercises.
# Use gt(participation_force_chart) to explore the default table formatting
gt(participation_force_chart)
year | 16-24 | 25-34 | 35-44 | 45-54 | 55-64 | 65+ |
---|---|---|---|---|---|---|
2023 | NA | 84.75610 | 83.51648 | 74.84277 | 68.00000 | 18.59756 |
2022 | 63.28125 | 81.13208 | 82.06522 | 79.48718 | 63.13131 | 19.14894 |
2021 | NA | 83.43949 | 84.30233 | 80.53691 | 66.98565 | 20.00000 |
2020 | NA | 84.96732 | 80.60606 | 83.22148 | 68.05556 | 20.40134 |
2019 | 67.58621 | 86.84211 | 84.21053 | 83.22981 | 67.59259 | 17.95775 |
There are three formats we need to apply.
- The
participation-rate
values should be percentages with one decimal place. NA
values should appear as blanks.- The table should be captioned.
Exercise 7
The caption must be created within the functiongt(caption = "insert caption name here")
. The other three functions can be piped (e.g., gt() |> function_name
For percent formatting, use
fmt_percent(columns = c("16-24", ..., "list all applicable columns here"), decimals = 1)
For replacing NA with blank, use
sub_missing(missing_text = "")
# Use gt(caption = 'insert caption name here')
|>
participation_force_chart gt(caption = "Maine Workforce Participation Rates by Age Group") |>
fmt_percent(columns = c("16-24", "25-34","35-44", "45-54", "55-64", "65+"), decimals = 1) |>
sub_missing(missing_text = "")
year | 16-24 | 25-34 | 35-44 | 45-54 | 55-64 | 65+ |
---|---|---|---|---|---|---|
2023 | 8,475.6% | 8,351.6% | 7,484.3% | 6,800.0% | 1,859.8% | |
2022 | 6,328.1% | 8,113.2% | 8,206.5% | 7,948.7% | 6,313.1% | 1,914.9% |
2021 | 8,343.9% | 8,430.2% | 8,053.7% | 6,698.6% | 2,000.0% | |
2020 | 8,496.7% | 8,060.6% | 8,322.1% | 6,805.6% | 2,040.1% | |
2019 | 6,758.6% | 8,684.2% | 8,421.1% | 8,323.0% | 6,759.3% | 1,795.8% |
Exercise 8
To submit your lab:
- Change the author name to your name in the YAML portion at the top of this document
- Render your document to html and publish it to RPubs.
- Submit the link to your Rpubs document in the Brightspace comments section for this lab.
- Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.
Grading
Exercise | Points |
---|---|
Exercise 1 | 10 |
Exercise 2 | 10 |
Exercise 3 | 10 |
Exercise 4 | 10 |
Exercise 5 | 10 |
Exercise 6 | 10 |
Exercise 7 | 10 |
Exercise 8 | 30 |
Total | 100 |