HW 4 Instructions
Due 2/26/2025
Introduction
This assignment will give you experience with:
Creating an R Project Directory and
data
andimg
folders (Review)Saving and modifying a provided R Quarto Template file (Review)
Importing and Cleaning Data
Modifying and using R functions
Joining datasets using
full_join
Reshaping and modifying data for a plot (Review)
Creating a fully formatted plot
Knitting an R Quarto file (Review)
Creating a README file (Review)
Instructions
HW 4 - First Steps
Steps to Follow:
Create an R project named
HW 4 <first name> <last name>
Create two folders in your R project labeled
data
andimg
.
- Save these three .csv files to your
data
folder:bls_unemp_rate.csv
bls_import_index.csv
bls_export_index.csv
- Download the provided file,
HW4 Template.qmd
.
In the header section of this file:
change the title to be
HW 4
.Specify yourself as the author.
Click File >“Save As…” and change file name to be
HW4_FirstName_LastName.qmd
NOTES
Provided header text below shows the correct format.
This header text also creates a floating Table of Contents (toc).
---
title: "HW 4"
author: "Penelope Pooler"
subtitle: "Due 2/26/2025"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
- Create a new chunk under the
Setup
header and text and add the following text to the body of the chunk.
#|label: setup
# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=T, highlight=T)
# suppress scientific notation
options(scipen=100)
# install helper package (pacman) if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman, tidyverse, gridExtra, magrittr,
ggthemes)
# verify packages (comment out in finished documents)
p_loaded()
- Click the green triangle or type Ctrl/Cmd + Shift + Enter to run this
setup
chunk.
HW 4 - Part 0
Examine
Export
andImport
datasets from BLSExamine provided
.csv
files and note number of rows to be skipped when importing datasets.
HW 4 - Part 1
Chunk 2: Create bls_tidy
function from provided code.
Steps to Follow:
Remove
#
in front ofbls_tidy <- function(...){
and the following lines.Copy code from
read_csv(...
torename("unemp_rate" = "value")
and paste it within the body of the function.
Notice the three function inputs are:
data_file
skip_num
var_name
- Make the following replacements to turn the code used for one dataset into a function that can be used for any
bls
dataset in this format.
Replace the name of the dataset,
"bls_unemp_rate.csv"
, with the text,data_file
(with no quotes) in theread_csv
command.Replace
11
withskip_num
in theread_csv
command.Replace the last line of the function with this line:
rename({{var_name}} := "value")
NOTES:
The
rename
command is modified to work within a function.This command uses the
var_name
input.
Run the function code and examine the saved function by clicking on it in the Global Environment to verify it is correct.
Use the function to import the
"bls_unemp_rate.csv"
dataset and save it asunemp
, by specifying these inputs, separated by commas:
data_file = "data/bls_unemp_rate.csv", skip_num = 11, var_name = "unemp_rate"
BB Question 1
Match each input of the bls_tidy
function to the correct description.
data_file =
skip_num =
var_name =
BB Question 2
In this imported example dataset, the number of rows skipped (skip_num
) was 11 rows and you can verify this by examining the bls_unemp_rate.csv
file.
In the next part, we will use the bls_tidy
function to import bls_export_index.csv
and bls_import_index.csv
.
Before doing that, examine these two data files in Excel to answer this question:
Fill in the blank:
____
rows will need to be skipped when these two files are imported.
HW 4 - Part 2
Chunk 3: Use function to import two datasets.
Steps to Follow:
- Run function to import
bls_export_index.csv
and save it asexport_index
.
For this dataset (
bls_export_index.csv
), the function inputs are:data_file = "data/bls_export_index.csv", skip_num = ____, var_name = "exp_indx"
Replace the
____
with the answer from BB Question 2.
- Run function to import
data/bls_import_index.csv
and save it asimport_index
.
- Replace the inputs to be appropriate for the
bls_import_index.csv
dataset.
- Use
summary
to examine the numerical variable in each dataset.
- e.g.
summary(export_index$exp_indx)
OR export_index |> pull(exp_indx) |> summary()
BB Question 3
Each imported tidy dataset has
____
observations____
variables
BB Question 4
Verify that two separate variables/datasets have been imported correctly.
You can examine datasets, by clicking on them in the Global Environment
or by using the summary
command.
The minimum value of the export index, is
____
.The maximum value of the export index is
____
.The minimum value of the import index is
____
.The maximum value of the import index is
____
.
Note: Answering this question is helpful to verify that the two distinct datasets were imported with the correct names.
HW 4 - Part 3
Chunk 4: Join datasets and create date variable.
Steps to Follow:
- Use
full_join
to create new dataset namedexport_import
that includes
Year
month
the export index variable,
exp_indx
the import index variable,
imp_indx
Notes: There are two matching variables in these two datasets and R will use those variables for the join by default.
- Use
paste
andym
on theYear
andmonth
variables within amutate
command to create adate
variable:
date = ym(paste(Year, month))
NOTES:
The command
paste
concatenates the characters from theYear
andmonth
variables, e.g.2023 Jan
The command
ym
specifies that thatYear
(y
) andmonth
(m
) in the pasted text will be converted to a date.
BB Question 5
What two variables are common to both datasets and used by R to do the full_join
?
Enter variable names exactly as they appear in both datasets. R is case-sensitive and so is this answer.
____
____
BB Question 6
As mentioned above, the lubridate
command ym
is able to create a date
variable without day, using only the Year
and month
variables in the data.
What day of the month does this created date
variable show by default?
For example, the earliest date
in the dataset is
2014-01-____
.
HW 4 - Part 4
Chunk 5: Reshape, format data for plot
Steps to Follow:
Create a new dataset, exp_imp_plt
and complete the following steps:
Use
select
to keep three variables:date
,exp_indx
,imp_indx
Use
pivot_longer
so that you have three columns:
date
column (already)type
column showing type of index, export or importvalue
column showing value of export index or import index
- Convert
type
to a factor variable in a mutate statement with these levels and labels:
levels = c("exp_indx", "imp_indx")
labels = c("Export", "Import")
Note that the same order must be used in the levels and labels options.
HW 4 - Part 5
Chunk 6: Create Formatted Line Plot
Steps to Follow:
0. Remove , eval = F
from header of Chunk 6
- This option is there so that the instructions can be rendered while this code is still incomplete.
- Create unformatted line plot with 2 lines: 1 line for each index:
Replace
____
with name of reshaped data withtype
factor variable from previous chunk.Complete
geom_line
command by adding aesthetic,aes()
.Inside
aes()
:x = date, y = value, color = type
After
aes()
: add a comma and then specifylinewidth = 1
to create thicker lines.
2.Add the following 4 options by following the geom_line
command with a plus (+
) and then adding a plus (+
) after each option
# can opt for different theme
theme_classic()
# moves legend to bottom
theme(legend.position="bottom")
# specifies colors (many other options vailable)
scale_color_manual(values=c("lightblue","blue"))
# formats x axis to show each year as 4 digits
scale_x_date(date_breaks = "year", date_labels = "%Y")
- Complete labs command using example plot below.
All text should be in quotes.
color = ""
has been completed for you to suppress label for legend.title
andsubtitle
text are shown at the top.x
andy
axis labels are shown on the respective axes.For
caption =
Omit 2nd line. Only needData Source: www.bls.gov
NOTE: In the future, (such as in your project), if you are creating multiple plots from the data source, you do not need to include a data source caption in every single plot. Instead you can mention the data source in accompanying side panel.
Example of Fully Formatted Line Plot
BB Question 7:
Examine the plot and then fill in the blanks with the correct years.
NOte: Tick marks for years indicate the BEGINNING of each year.
The Import Price Index was higher than the Export Price Index from
____
to____
.The Import and Export Price Indices were approximately equal from
____
to____
.The Export Price Index has been higher than the Import Price Index since early
____
.
HW 4 - Final Steps
Save your completed HW4 R Quarto (
qmd.
) file to your HW 4 project folder.Render your HW 4 Quarto file File.
- This will be an
.html
file
- Answer all Blackboard questions associated with this assignment.
4.Create a README.txt
file using the template provided in HW 4 folder.
All three .csv files used for this assignment should be:
included in your
data
folderlisted in your README under DATA FILES in data folder:
- Zip your entire HW 4 Project Directory into a compressed File and submit it.
The zipped R Project should be named
HW 4 FirstName LastName
.The zipped project directory should contain:
The completed and renamed
README.txt
fileThe .Rproj file
The completed and renamed Quarto (
.qmd
) file.The rendered HTML (
.html
) file.An empty
img
folder.NOTE: You can use
ggsave
to export your final plot to yourimg
folder but that is not required.A
data
folder with all three of the provided.csv
data files.
Grading Criteria
(10 pts.) Blackboard questions for HW 4 are worth 1 or 2 points.
(3 pts.) Completing HW 4 - First Steps as specified
Project is created correctly and
data
andimg
folders are created.Template is saved to project folder is renamed.
All three datasets are saved to
data
folder.
(3 pts.) Part 1: Full credit for creating bls_tidy
function and using it to import and clean bls_unemp_rate.csv
(2 pts.) Part 2: Full credit for correctly using the bls_tidy
function you created to import the two separate datasets.
(2 pts.) Part 3: Full credit for correctly joining the two datasets and creating a new date variable.
(2 pts.) Part 4: Full credit for correctly reshaping data and creating index factor variable
(2 pts.) Part 5: Full credit for creating fully formatted plot as specified
(3 pts.) Completing the HW 4 - Final Steps and correctly submitting you zipped project directory.
1 point for creating a correct README file that is saved in your project folder.
1 point for having your
.qmd
,.html
, project folder and all three.csv
files in yourdata
folder.1 point for zipping and submitting your project correctly