HW 2 Instructions
Due Wednesday, September 10, 2025 at 11:59 PM
Purpose
This assignment will give you experience with:
creating an R Project Directory with
data
andimg
folders.saving, editing and using an Quarto (
.qmd
) file (Review).knitting (rendering) an R Quarto file to create an HTML file.
creating a README file.
working with a larger dataset.
using the
dplyr
package toselect
variables andslice
andfilter
data.creating a basic plot with minimal formatting.
Instructions
HW 2 - First Steps
Create an R project named
HW 2 <first name> <last name>
.File > New Project > New Directory > R Project
In box, name this project:
HW 2 <first name> <last name>
My project name:
HW 2 Penelope Pooler
- Click Create Project.
- Note that if you create an R Project that is NOT a Quarto project, a Quarto file is not created.
Create
img
anddata
folders with the R Project.Download the provided file,
HW2_Template.qmd
from the Homework Assignments page of the 455 website.Save the downloaded
HW2_Template.qmd
file to your R project.
Change file name to be
HW2_FirstName_LastName.qmd
.For example, I would change the template file to be named
HW2_Penelope_Pooler.qmd
.There should be no spaces in a file name of a Quarto (
.qmd
) file.Change title in the file header to be ‘HW 2’.
Specify yourself as the author.
NOTES
Provided header text below shows the correct format.
This header text also creates a floating Table of Contents (toc) and will show chunk labels.
Note that the options below will make the code chunks and code chunk labels visible in the output. We will change these options in later assignments and in the group project.
---
title: "HW 2"
author: "Penelope Pooler"
date: last-modified
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
- Create a new chunk under the
Setup
header and text and add the following text to the body of the chunk:
#|label: setup
# this line specifies options for default options for all R chunks
knitr::opts_chunk$set(echo=T,
highlight=T)
# suppress scientific notation
options(scipen=100)
# install helper package (pacman) if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman, tidyverse, gridExtra, magrittr)
# verify packages (comment out in finished documents)
p_loaded()
- Click the green triangle on the right side of the setup chunk to run this code.
HW 2 - Part 1: glimpse
and unique
Chunk 2: Examining the diamonds
Data
Notes:
This chunk reviews the
glimpse
andunique
commands from Week 1.diamonds
is large R dataset that is part of theggplot2
package in thetidyverse
package suite.Provided code for HW 2 - Part 1 (Chunk 2) WILL NOT RUN until the
tidyverse
package suite is loaded by running thesetup
chunk with the provided code.When you run the
glimpse()
command you will see the variable type<ord>
, which is an ordered factor variable.
Steps to Follow:
Run the R code in the provided R chunk for HW 2 - Part 1 (Chunk 2) which reviews:
how to save a dataset and examine it using
glimpse
.how to examine the levels of a variable using
unique
.traditional and piped code to do the same task.
Use the
unique
command with or without piping to examine the levels of:the
clarity
variable in thediamonds
R dataset.the
color
variable in thediamonds
R dataset.Note that the new
unique
commands that you write should be ADDED to Chunk 2.
Answer these Blackboard Questions:
BB Question 1
How many rows (observations) and columns (variables) are in the my_diamonds
dataset which is a copy of the diamonds
R dataset that you have saved to your Global Environment
?
____
rows____
columns
BB Question 2
Order the levels of the clarity
variable from first to last based on the output from using the unique
command with this variable.
BB Question 3
Fill in the blanks. The color variable in the diamonds dataset has factor levels that are alphabetical.
The first level of diamond color is
____
.The last level of diamond color is
____
.
HW 2 - Part 2: select
Chunk 3: Selecting variables in a dataset
Notes:
This chunk demonstrates using the
select
command to select variables in a dataset.In the provided code for HW 2 - Part 2 (Chunk 3),there are three code examples that select the first 7 variables in the
my_diamonds
dataset and save them as a new dataset:my_diamonds1
is created by specifying variables to INCLUDE.my_diamonds2
is created by specifying variables to EXCLUDE.select
is ALSO used to reorder the variables (price
is first).
Steps to Follow:
Create
my_diamonds3
using the select command to only INCLUDE the first FIVE variables:- Variables included:
price, carat, cut, color, clarity
- Variables included:
Create
my_diamonds4
which will be identical tomy_diamonds3
, but is created by EXCLUDING the last FIVE variables using the!
operator and thec(...)
operator to group the variables:- Variables excluded:
depth, table, x, y, z
- Variables excluded:
HW 2 - Part 3: slice
Chunk 4: Selecting rows by row number
Notes:
This chunk demonstrates using the
slice
command to select observations (rows).In the provided code for HW 2 - Part 3 (Chunk 4), and subsequent chunks you will continue to build on your code from Part 2 (Chunk 3) with more commands.
Using piping in your code makes this process more efficient.
Steps to Follow:
Copy and the code you wrote to create
my_diamonds3
in Chunk 3 and paste this code into Chunk 4.Use the examples to add on to your code and select rows: 1001 through 30000 and 45001 through 50000
This dataset should still be named
my_diamonds3
Piping will make your coding more efficient and easier to read.
Answer the following Blackboard Question to verify that your dataset is correct:
BB Question 4
After successfully completing the R code in Chunk 4 of HW 2, the my_diamonds3
dataset is smaller than the original dataset.
my_diamonds3
has:
fewer variables (columns) after using the
select
command as specified.fewer observations (rows) after using the
slice
command as specified.
After completing Chunk 4, the my_diamonds3
dataset has:
____
rows.____
columns.
HW 2 - Part 4: filter
and summary
Chunk 5: Filtering data by value and summarizing
Notes:
This chunk demonstrates:
using the
filter
command to select rows by variables values.using the
summary
command to summarize variables.
The
filter
command enables us to select observations by one or more values of one or more variables.You can use multiple consecutive
filter
commands or you can use the and operator,&
, or the or operator,|
.You can use filter to INCLUDE rows or EXCLUDE rows with
!
.The provided code includes multiple examples of how to complete the same two filtering tasks
Steps to Follow:
Copy the code you wrote in Chunk 4 and then paste it into Chunk 5.
Add the specified slice command to subset the
my_diamonds3
dataset in Chunk 5.Use one of the examples in the provided R code for Chunk 5 to complete these TWO specified filter tasks:
Filter
my_diamonds3
to diamonds weighing 1.25 or more carats.filter
my_diamonds3
to these cut categories: `Very Good, Premium, Ideal
Use the example
summary
command code to summarize the factor variableclarity
in the finalmy_diamonds3
dataset.Answer the following Blackboard Questions:
BB Question 5
In Chunk 5 of HW 2, you use the my_diamonds3
dataset from Chunk 4, and then you filter the data by carat
and by cut
category.
- How many observations are in this final
my_diamonds3
dataset?
BB Question 6
Fill in the blanks to indicate how many observations are in each of the three most valuable categories in the my_diamonds3
dataset.
There are
____
observations inVVS2
level ofclarity
variable.There are
____
observations inVVS1
level ofclarity
variable.There are
____
observations inIF
level ofclarity
variable.
HW 2 - Part 5: Creating plots with ggplot
Chunk 6: Creating Basic Plots
Notes:
The provided R code demonstrates using
ggplot
to make some basic plotsgrid.arrange
to present multiple plots in a grid or column.
In Part 5, you will create a chunk and copy the provided R code into the chunk you create.
Steps to Follow:
Create a new chunk (Chunk 6) under the HW 2 - Part 5 heading in your HW 2 Markdown file (created from the provided template).
Copy and paste the provided R code below into the chunk you created.
After the label fence add this text:
creating plots with ggplot
Leave one space after the colon:
#|label: creating plots with ggplot
This step is included and required so that students no how to label chunks in Quarto files.
Use the example scatter plot code below to create a saved plot named
scatter_cut
.This will be similar to the example code for the
scatter_clarity
scatter plot.Remove
#
at the beginning of this line# scatter_cut <-
to start code.Replace
color=clarity
withcolor=cut
.
Use the example code provided below to create a saved plot named
scatter_color
.This will be similar to the code for the
scatter_clarity
scatter plot.Remove
#
at the beginning of this line# scatter_color <-
to start code.Replace
color=clarity
withcolor=color
.
Use the provided
grid.arrange
command to create a 2x2 grid of all four scatter plots.Remove
#
at the beginning of the line withgrid.arrange(..., ncol=2)
This code will only work if you use the provided names to save your scatterplots.
Use the provided example boxplot code below to create a saved plot named
box_clarity
This will be similar to the provided code for the side-by-side grouped boxplot,
box_color
.Note that there is a
ggsave
command after thebox_color
code that will export this plot to theimg
file you created.
Use the provided
grid.arrange
command to create a stacked column of the two boxplot figures.Remove
#
at the beginning of the line withgrid.arrange(..., ncol=1)
This code will only work if you use the provided name to save your new boxplot figure.
Answer the following Blackboard Questions:
BB Question 7
Compare the three scatter plots to determine which one of the three variables, clarity
, cut
, or color
, shows the least evidence of a relationship with price
or carats
, i.e., shows no trending color pattern.
BB Question 8
Fill in the blank:
Compare the two boxplots of the my_diamonds3
dataset to determine which variable, color
, or clarity
, has one category that is substantially lower in prices from the other categories.
- The
____
level in the____
variable includes diamonds that are substantially lower in price than the other levels.
Provided R code for Chunk 6
- Create chunk then copy and paste code below into it.
#|label:
#### scatterplots ####
# scatter_none is the most basic scatter plot of carat vs. price
# no other variables are included
# to view this plot by itself (not required), enclose code in parentheses
scatter_none <- my_diamonds |>
ggplot() +
geom_point(aes(x=carat, y=price))
# scatter_clarity adds the option color=clarity to the aes (aesthetic)
# observations are color coded by diamond clarity level
# theme_classic() added to remove background
scatter_clarity <- my_diamonds |>
ggplot() +
geom_point(aes(x=carat, y=price, color=clarity)) +
theme_classic()
# create plot named scatter_cut using the above code
# change color=clarity to color=cut
# scatter_cut <-
# create plot named scatter_color using the above code
# change color=clarity to color=color
# scatter_color <-
# plot all 4 plots above in a 2x2 grid and answer Blackboard Questions
# grid.arrange(scatter_none, scatter_clarity,
# scatter_color, scatter_cut, ncol=2)
#### boxplots ####
# below is a plot of grouped side-by_side boxplots
# Within each cut category there is a separate boxplot for each color
# this is one good way to examine categorical data
# code is enclosed in parentheses
# plot is saved as box_color AND is shown on the screen
(box_color <- my_diamonds3 |>
ggplot() +
geom_boxplot(aes(x=cut, y=price, fill=color))+
theme_classic())
ggsave(filename="img/HW2_Diamond_Color_Boxplots.png",
width = 6, height = 4)
# create a plot of grouped side-by_side boxplots that show
# boxplots for each clarity category with each cut category
# same plot as above but change fill=color to fill=clarity
# name this plot box_clarity
# (box_clarity <- )
# plot these two sets of plots in a stacked column (ncol=1)
# grid.arrange(box_color, box_clarity, ncol=1)
HW 2 - Final Steps
Save your completed HW 2 R Quarto File (
.qmd
) within your project folder.Render your
.qmd
file to create a.html
file.Answer all 8 Blackboard questions associated with this assignment.
- Reminder: You are welcome and encouraged to work together and practice sharing Quarto (
.qmd
) files but each student should submit their own zipped R project and Blackboard assignment.
- Reminder: You are welcome and encouraged to work together and practice sharing Quarto (
Create a README file for HW 2 using the README template provided with the HW 2 files.
Zip your entire Project Directory into a compressed File and submit it.
The zipped project directory should contain:
The HW 2 project folder with the following inside:
the HW2 Quarto (
.qmd
) and HTML files labeled with your full name.the
img
folder with the exported plot filethe empty
data
folderThe complete and accurate
README.txt
file that is saved with the same name as the project, e.g. `HW 2 Penelope Pooler README.txtThe .Rproj file
Grading Criteria
(8 pts.) Each Blackboard question for HW 2 is worth 1 point.
(2 pts.) Completing HW 1 - First Steps as specified.
(2 pts.) Part 1: Full credit for:
Correctly completing the chunk labeled
examining the diamonds data
- Each
unique
command is 1 point.
- Each
(2 pts.) Part 2: Full credit for:
Creating the two identical datasets using
select
to include variables and exclude variables.- Each
select
command is 1 point
- Each
(2 pts.) Part 3: Full credit for:
- Creating the specified dataset using the
select
command and theslice
command.
- Creating the specified dataset using the
(2 pts.) Part 4: Full credit for:
- Creating the specified dataset using the
select
,slice
, andfilter
commands
- Creating the specified dataset using the
(3 pts.) Part 5: Full credit for:
1 point for creating both specified scatterplots
1 point for creating the specified 2x2 grid of scatterplots
1 point for creating the specified grouped boxplots and the column of 2 grouped boxplots
(4 pts.) Completing the HW 2 - Final Steps and correctly submitting your zipped project directory.
1 point for creating a correct README file
1 point for having both the .qmd and .html files
1 point for having the empty
data
folder andimg
folder with one plot in it1 point for zipping and submitting your project correctly