HW 4 Instructions

Due 2/26/2025

Introduction

This assignment will give you experience with:

  • Creating an R Project Directory and data and img folders (Review)

  • Saving and modifying a provided R Quarto Template file (Review)

  • Importing and Cleaning Data

  • Modifying and using R functions

  • Joining datasets using full_join

  • Reshaping and modifying data for a plot (Review)

  • Creating a fully formatted plot

  • Knitting an R Quarto file (Review)

  • Creating a README file (Review)

Instructions

HW 4 - First Steps

Steps to Follow:

  1. Create an R project named HW 4 <first name> <last name>

  2. Create two folders in your R project labeled data and img.

  • Save these three .csv files to your data folder:
    • bls_unemp_rate.csv
    • bls_import_index.csv
    • bls_export_index.csv
  1. Download the provided file, HW4 Template.qmd.
  • In the header section of this file:

    • change the title to be HW 4.

    • Specify yourself as the author.

    • Click File >“Save As…” and change file name to be HW4_FirstName_LastName.qmd


NOTES

  • Provided header text below shows the correct format.

  • This header text also creates a floating Table of Contents (toc).

---
title: "HW 4"
author: "Penelope Pooler"
subtitle: "Due 2/26/2025"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
  html:
    code-line-numbers: true
    code-fold: true
    code-tools: true
execute:
  echo: fenced 
---


  1. Create a new chunk under the Setup header and text and add the following text to the body of the chunk.
#|label: setup

# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=T, highlight=T)

# suppress scientific notation
options(scipen=100)

# install helper package (pacman) if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")

# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman, tidyverse, gridExtra, magrittr, 
               ggthemes)

# verify packages (comment out in finished documents)
p_loaded()


  1. Click the green triangle or type Ctrl/Cmd + Shift + Enter to run this setup chunk.

HW 4 - Part 0

HW 4 - Part 1

Chunk 2: Create bls_tidy function from provided code.

Steps to Follow:

  1. Remove # in front of bls_tidy <- function(...){ and the following lines.

  2. Copy code from read_csv(... to rename("unemp_rate" = "value") and paste it within the body of the function.

Notice the three function inputs are:

  • data_file
  • skip_num
  • var_name
  1. Make the following replacements to turn the code used for one dataset into a function that can be used for any bls dataset in this format.
  • Replace the name of the dataset, "bls_unemp_rate.csv", with the text, data_file (with no quotes) in the read_csv command.

  • Replace 11 with skip_num in the read_csv command.

  • Replace the last line of the function with this line:

    • rename({{var_name}} := "value")

NOTES:

  • The rename command is modified to work within a function.

  • This command uses the var_name input.

  1. Run the function code and examine the saved function by clicking on it in the Global Environment to verify it is correct.

  2. Use the function to import the "bls_unemp_rate.csv" dataset and save it as unemp, by specifying these inputs, separated by commas:

  • data_file = "data/bls_unemp_rate.csv", skip_num = 11, var_name = "unemp_rate"
BB Question 1

Match each input of the bls_tidy function to the correct description.

  • data_file =

  • skip_num =

  • var_name =

BB Question 2

In this imported example dataset, the number of rows skipped (skip_num) was 11 rows and you can verify this by examining the bls_unemp_rate.csv file.

In the next part, we will use the bls_tidy function to import bls_export_index.csv and bls_import_index.csv.

Before doing that, examine these two data files in Excel to answer this question:

Fill in the blank:

  • ____ rows will need to be skipped when these two files are imported.

HW 4 - Part 2

Chunk 3: Use function to import two datasets.

Steps to Follow:

  1. Run function to import bls_export_index.csv and save it as export_index.
  • For this dataset (bls_export_index.csv), the function inputs are:

    • data_file = "data/bls_export_index.csv", skip_num = ____, var_name = "exp_indx"

    • Replace the ____with the answer from BB Question 2.

  1. Run function to import data/bls_import_index.csv and save it as import_index.
  • Replace the inputs to be appropriate for the bls_import_index.csv dataset.
  1. Use summary to examine the numerical variable in each dataset.
  • e.g. summary(export_index$exp_indx)

OR export_index |> pull(exp_indx) |> summary()

BB Question 3

Each imported tidy dataset has

  • ____ observations

  • ____ variables

BB Question 4

Verify that two separate variables/datasets have been imported correctly.

You can examine datasets, by clicking on them in the Global Environment or by using the summary command.

  • The minimum value of the export index, is ____.

  • The maximum value of the export index is ____.

  • The minimum value of the import index is ____.

  • The maximum value of the import index is ____.

Note: Answering this question is helpful to verify that the two distinct datasets were imported with the correct names.

HW 4 - Part 3

Chunk 4: Join datasets and create date variable.

Steps to Follow:

  1. Use full_join to create new dataset named export_import that includes
  • Year

  • month

  • the export index variable, exp_indx

  • the import index variable, imp_indx

Notes: There are two matching variables in these two datasets and R will use those variables for the join by default.

  1. Use paste and ym on the Year and month variables within a mutate command to create a date variable:
  • date = ym(paste(Year, month))

NOTES:

  • The command paste concatenates the characters from the Year and month variables, e.g. 2023 Jan

  • The command ym specifies that that Year (y) and month (m) in the pasted text will be converted to a date.

BB Question 5

What two variables are common to both datasets and used by R to do the full_join?

Enter variable names exactly as they appear in both datasets. R is case-sensitive and so is this answer.

  • ____

  • ____

BB Question 6

As mentioned above, the lubridate command ym is able to create a date variable without day, using only the Year and month variables in the data.

What day of the month does this created date variable show by default?

For example, the earliest date in the dataset is

  • 2014-01-____.

HW 4 - Part 4

Chunk 5: Reshape, format data for plot

Steps to Follow:

Create a new dataset, exp_imp_plt and complete the following steps:

  1. Useselect to keep three variables: date, exp_indx,imp_indx

  2. Use pivot_longer so that you have three columns:

  • date column (already)

  • type column showing type of index, export or import

  • value column showing value of export index or import index

  1. Convert type to a factor variable in a mutate statement with these levels and labels:
  • levels = c("exp_indx", "imp_indx")

  • labels = c("Export", "Import")

Note that the same order must be used in the levels and labels options.

HW 4 - Part 5

Chunk 6: Create Formatted Line Plot

Steps to Follow:

0. Remove , eval = F from header of Chunk 6

  • This option is there so that the instructions can be rendered while this code is still incomplete.
  1. Create unformatted line plot with 2 lines: 1 line for each index:
  • Replace ____ with name of reshaped data with type factor variable from previous chunk.

  • Complete geom_line command by adding aesthetic, aes().

  • Inside aes(): x = date, y = value, color = type

  • After aes(): add a comma and then specify linewidth = 1 to create thicker lines.

2.Add the following 4 options by following the geom_line command with a plus (+) and then adding a plus (+) after each option

# can opt for different theme
theme_classic()                                           

# moves legend to bottom
theme(legend.position="bottom")                         

# specifies colors (many other options vailable)
scale_color_manual(values=c("lightblue","blue"))        
                                                        
# formats x axis to show each year as 4 digits                                       
scale_x_date(date_breaks = "year", date_labels = "%Y")
  1. Complete labs command using example plot below.
  • All text should be in quotes.

  • color = "" has been completed for you to suppress label for legend.

  • title and subtitle text are shown at the top.

  • x and y axis labels are shown on the respective axes.

  • For caption = Omit 2nd line. Only need Data Source: www.bls.gov

NOTE: In the future, (such as in your project), if you are creating multiple plots from the data source, you do not need to include a data source caption in every single plot. Instead you can mention the data source in accompanying side panel.

Example of Fully Formatted Line Plot

BB Question 7:

Examine the plot and then fill in the blanks with the correct years.

NOte: Tick marks for years indicate the BEGINNING of each year.

  • The Import Price Index was higher than the Export Price Index from ____ to ____.

  • The Import and Export Price Indices were approximately equal from ____ to ____.

  • The Export Price Index has been higher than the Import Price Index since early ____.

HW 4 - Final Steps

  1. Save your completed HW4 R Quarto (qmd.) file to your HW 4 project folder.

  2. Render your HW 4 Quarto file File.

  • This will be an .html file
  1. Answer all Blackboard questions associated with this assignment.

4.Create a README.txt file using the template provided in HW 4 folder.

  • All three .csv files used for this assignment should be:

    • included in your data folder

    • listed in your README under DATA FILES in data folder:

  1. Zip your entire HW 4 Project Directory into a compressed File and submit it.
  • The zipped R Project should be named HW 4 FirstName LastName.

  • The zipped project directory should contain:

    • The completed and renamed README.txt file

    • The .Rproj file

    • The completed and renamed Quarto (.qmd) file.

    • The rendered HTML (.html) file.

    • An empty img folder.

      NOTE: You can use ggsave to export your final plot to your img folder but that is not required.

    • A data folder with all three of the provided .csv data files.

Grading Criteria

(10 pts.) Blackboard questions for HW 4 are worth 1 or 2 points.

(3 pts.) Completing HW 4 - First Steps as specified

  • Project is created correctly and data and img folders are created.

  • Template is saved to project folder is renamed.

  • All three datasets are saved to data folder.

(3 pts.) Part 1: Full credit for creating bls_tidy function and using it to import and clean bls_unemp_rate.csv

(2 pts.) Part 2: Full credit for correctly using the bls_tidy function you created to import the two separate datasets.

(2 pts.) Part 3: Full credit for correctly joining the two datasets and creating a new date variable.

(2 pts.) Part 4: Full credit for correctly reshaping data and creating index factor variable

(2 pts.) Part 5: Full credit for creating fully formatted plot as specified

(3 pts.) Completing the HW 4 - Final Steps and correctly submitting you zipped project directory.

  • 1 point for creating a correct README file that is saved in your project folder.

  • 1 point for having your .qmd, .html, project folder and all three .csv files in your data folder.

  • 1 point for zipping and submitting your project correctly