HW 2 Instructions
Due Wednesday, September 10, 2025 at 11:59 PM
Purpose
This assignment will give you experience with:
creating an R Project Directory with
dataandimgfolders.saving, editing and using an Quarto (
.qmd) file (Review).knitting (rendering) an R Quarto file to create an HTML file.
creating a README file.
working with a larger dataset.
using the
dplyrpackage toselectvariables andsliceandfilterdata.creating a basic plot with minimal formatting.
Instructions
HW 2 - First Steps
Create an R project named
HW 2 <first name> <last name>.File > New Project > New Directory > R Project
In box, name this project:
HW 2 <first name> <last name>My project name:
HW 2 Penelope Pooler
- Click Create Project.
- Note that if you create an R Project that is NOT a Quarto project, a Quarto file is not created.
Create
imganddatafolders with the R Project.Download the provided file,
HW2_Template.qmdfrom the Homework Assignments page of the 455 website.Save the downloaded
HW2_Template.qmdfile to your R project.
Change file name to be
HW2_FirstName_LastName.qmd.For example, I would change the template file to be named
HW2_Penelope_Pooler.qmd.There should be no spaces in a file name of a Quarto (
.qmd) file.Change title in the file header to be ‘HW 2’.
Specify yourself as the author.
NOTES
Provided header text below shows the correct format.
This header text also creates a floating Table of Contents (toc) and will show chunk labels.
Note that the options below will make the code chunks and code chunk labels visible in the output. We will change these options in later assignments and in the group project.
---
title: "HW 2"
author: "Penelope Pooler"
date: last-modified
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
- Create a new chunk under the
Setupheader and text and add the following text to the body of the chunk:
#|label: setup
# this line specifies options for default options for all R chunks
knitr::opts_chunk$set(echo=T,
highlight=T)
# suppress scientific notation
options(scipen=100)
# install helper package (pacman) if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman, tidyverse, gridExtra, magrittr)
# verify packages (comment out in finished documents)
p_loaded()
- Click the green triangle on the right side of the setup chunk to run this code.
HW 2 - Part 1: glimpse and unique
Chunk 2: Examining the diamonds Data
Notes:
This chunk reviews the
glimpseanduniquecommands from Week 1.diamondsis large R dataset that is part of theggplot2package in thetidyversepackage suite.Provided code for HW 2 - Part 1 (Chunk 2) WILL NOT RUN until the
tidyversepackage suite is loaded by running thesetupchunk with the provided code.When you run the
glimpse()command you will see the variable type<ord>, which is an ordered factor variable.
Steps to Follow:
Run the R code in the provided R chunk for HW 2 - Part 1 (Chunk 2) which reviews:
how to save a dataset and examine it using
glimpse.how to examine the levels of a variable using
unique.traditional and piped code to do the same task.
Use the
uniquecommand with or without piping to examine the levels of:the
clarityvariable in thediamondsR dataset.the
colorvariable in thediamondsR dataset.Note that the new
uniquecommands that you write should be ADDED to Chunk 2.
Answer these Blackboard Questions:
BB Question 1
How many rows (observations) and columns (variables) are in the my_diamonds dataset which is a copy of the diamonds R dataset that you have saved to your Global Environment?
____rows____columns
BB Question 2
Order the levels of the clarity variable from first to last based on the output from using the unique command with this variable.
BB Question 3
Fill in the blanks. The color variable in the diamonds dataset has factor levels that are alphabetical.
The first level of diamond color is
____.The last level of diamond color is
____.
HW 2 - Part 2: select
Chunk 3: Selecting variables in a dataset
Notes:
This chunk demonstrates using the
selectcommand to select variables in a dataset.In the provided code for HW 2 - Part 2 (Chunk 3),there are three code examples that select the first 7 variables in the
my_diamondsdataset and save them as a new dataset:my_diamonds1is created by specifying variables to INCLUDE.my_diamonds2is created by specifying variables to EXCLUDE.selectis ALSO used to reorder the variables (priceis first).
Steps to Follow:
Create
my_diamonds3using the select command to only INCLUDE the first FIVE variables:- Variables included:
price, carat, cut, color, clarity
- Variables included:
Create
my_diamonds4which will be identical tomy_diamonds3, but is created by EXCLUDING the last FIVE variables using the!operator and thec(...)operator to group the variables:- Variables excluded:
depth, table, x, y, z
- Variables excluded:
HW 2 - Part 3: slice
Chunk 4: Selecting rows by row number
Notes:
This chunk demonstrates using the
slicecommand to select observations (rows).In the provided code for HW 2 - Part 3 (Chunk 4), and subsequent chunks you will continue to build on your code from Part 2 (Chunk 3) with more commands.
Using piping in your code makes this process more efficient.
Steps to Follow:
Copy and the code you wrote to create
my_diamonds3in Chunk 3 and paste this code into Chunk 4.Use the examples to add on to your code and select rows: 1001 through 30000 and 45001 through 50000
This dataset should still be named
my_diamonds3Piping will make your coding more efficient and easier to read.
Answer the following Blackboard Question to verify that your dataset is correct:
BB Question 4
After successfully completing the R code in Chunk 4 of HW 2, the my_diamonds3 dataset is smaller than the original dataset.
my_diamonds3 has:
fewer variables (columns) after using the
selectcommand as specified.fewer observations (rows) after using the
slicecommand as specified.
After completing Chunk 4, the my_diamonds3 dataset has:
____rows.____columns.
HW 2 - Part 4: filter and summary
Chunk 5: Filtering data by value and summarizing
Notes:
This chunk demonstrates:
using the
filtercommand to select rows by variables values.using the
summarycommand to summarize variables.
The
filtercommand enables us to select observations by one or more values of one or more variables.You can use multiple consecutive
filtercommands or you can use the and operator,&, or the or operator,|.You can use filter to INCLUDE rows or EXCLUDE rows with
!.The provided code includes multiple examples of how to complete the same two filtering tasks
Steps to Follow:
Copy the code you wrote in Chunk 4 and then paste it into Chunk 5.
Add the specified slice command to subset the
my_diamonds3dataset in Chunk 5.Use one of the examples in the provided R code for Chunk 5 to complete these TWO specified filter tasks:
Filter
my_diamonds3to diamonds weighing 1.25 or more carats.filter
my_diamonds3to these cut categories: `Very Good, Premium, Ideal
Use the example
summarycommand code to summarize the factor variableclarityin the finalmy_diamonds3dataset.Answer the following Blackboard Questions:
BB Question 5
In Chunk 5 of HW 2, you use the my_diamonds3 dataset from Chunk 4, and then you filter the data by carat and by cut category.
- How many observations are in this final
my_diamonds3dataset?
BB Question 6
Fill in the blanks to indicate how many observations are in each of the three most valuable categories in the my_diamonds3 dataset.
There are
____observations inVVS2level ofclarityvariable.There are
____observations inVVS1level ofclarityvariable.There are
____observations inIFlevel ofclarityvariable.
HW 2 - Part 5: Creating plots with ggplot
Chunk 6: Creating Basic Plots
Notes:
The provided R code demonstrates using
ggplotto make some basic plotsgrid.arrangeto present multiple plots in a grid or column.
In Part 5, you will create a chunk and copy the provided R code into the chunk you create.
Steps to Follow:
Create a new chunk (Chunk 6) under the HW 2 - Part 5 heading in your HW 2 Markdown file (created from the provided template).
Copy and paste the provided R code below into the chunk you created.
After the label fence add this text:
creating plots with ggplotLeave one space after the colon:
#|label: creating plots with ggplotThis step is included and required so that students no how to label chunks in Quarto files.
Use the example scatter plot code below to create a saved plot named
scatter_cut.This will be similar to the example code for the
scatter_clarityscatter plot.Remove
#at the beginning of this line# scatter_cut <-to start code.Replace
color=claritywithcolor=cut.
Use the example code provided below to create a saved plot named
scatter_color.This will be similar to the code for the
scatter_clarityscatter plot.Remove
#at the beginning of this line# scatter_color <-to start code.Replace
color=claritywithcolor=color.
Use the provided
grid.arrangecommand to create a 2x2 grid of all four scatter plots.Remove
#at the beginning of the line withgrid.arrange(..., ncol=2)This code will only work if you use the provided names to save your scatterplots.
Use the provided example boxplot code below to create a saved plot named
box_clarityThis will be similar to the provided code for the side-by-side grouped boxplot,
box_color.Note that there is a
ggsavecommand after thebox_colorcode that will export this plot to theimgfile you created.
Use the provided
grid.arrangecommand to create a stacked column of the two boxplot figures.Remove
#at the beginning of the line withgrid.arrange(..., ncol=1)This code will only work if you use the provided name to save your new boxplot figure.
Answer the following Blackboard Questions:
BB Question 7
Compare the three scatter plots to determine which one of the three variables, clarity, cut, or color, shows the least evidence of a relationship with price or carats, i.e., shows no trending color pattern.
BB Question 8
Fill in the blank:
Compare the two boxplots of the my_diamonds3 dataset to determine which variable, color, or clarity, has one category that is substantially lower in prices from the other categories.
- The
____level in the____variable includes diamonds that are substantially lower in price than the other levels.
Provided R code for Chunk 6
- Create chunk then copy and paste code below into it.
#|label:
#### scatterplots ####
# scatter_none is the most basic scatter plot of carat vs. price
# no other variables are included
# to view this plot by itself (not required), enclose code in parentheses
scatter_none <- my_diamonds |>
ggplot() +
geom_point(aes(x=carat, y=price))
# scatter_clarity adds the option color=clarity to the aes (aesthetic)
# observations are color coded by diamond clarity level
# theme_classic() added to remove background
scatter_clarity <- my_diamonds |>
ggplot() +
geom_point(aes(x=carat, y=price, color=clarity)) +
theme_classic()
# create plot named scatter_cut using the above code
# change color=clarity to color=cut
# scatter_cut <-
# create plot named scatter_color using the above code
# change color=clarity to color=color
# scatter_color <-
# plot all 4 plots above in a 2x2 grid and answer Blackboard Questions
# grid.arrange(scatter_none, scatter_clarity,
# scatter_color, scatter_cut, ncol=2)
#### boxplots ####
# below is a plot of grouped side-by_side boxplots
# Within each cut category there is a separate boxplot for each color
# this is one good way to examine categorical data
# code is enclosed in parentheses
# plot is saved as box_color AND is shown on the screen
(box_color <- my_diamonds3 |>
ggplot() +
geom_boxplot(aes(x=cut, y=price, fill=color))+
theme_classic())
ggsave(filename="img/HW2_Diamond_Color_Boxplots.png",
width = 6, height = 4)
# create a plot of grouped side-by_side boxplots that show
# boxplots for each clarity category with each cut category
# same plot as above but change fill=color to fill=clarity
# name this plot box_clarity
# (box_clarity <- )
# plot these two sets of plots in a stacked column (ncol=1)
# grid.arrange(box_color, box_clarity, ncol=1)
HW 2 - Final Steps
Save your completed HW 2 R Quarto File (
.qmd) within your project folder.Render your
.qmdfile to create a.htmlfile.Answer all 8 Blackboard questions associated with this assignment.
- Reminder: You are welcome and encouraged to work together and practice sharing Quarto (
.qmd) files but each student should submit their own zipped R project and Blackboard assignment.
- Reminder: You are welcome and encouraged to work together and practice sharing Quarto (
Create a README file for HW 2 using the README template provided with the HW 2 files.
Zip your entire Project Directory into a compressed File and submit it.
The zipped project directory should contain:
The HW 2 project folder with the following inside:
the HW2 Quarto (
.qmd) and HTML files labeled with your full name.the
imgfolder with the exported plot filethe empty
datafolderThe complete and accurate
README.txtfile that is saved with the same name as the project, e.g. `HW 2 Penelope Pooler README.txtThe .Rproj file
Grading Criteria
(8 pts.) Each Blackboard question for HW 2 is worth 1 point.
(2 pts.) Completing HW 1 - First Steps as specified.
(2 pts.) Part 1: Full credit for:
Correctly completing the chunk labeled
examining the diamonds data- Each
uniquecommand is 1 point.
- Each
(2 pts.) Part 2: Full credit for:
Creating the two identical datasets using
selectto include variables and exclude variables.- Each
selectcommand is 1 point
- Each
(2 pts.) Part 3: Full credit for:
- Creating the specified dataset using the
selectcommand and theslicecommand.
- Creating the specified dataset using the
(2 pts.) Part 4: Full credit for:
- Creating the specified dataset using the
select,slice, andfiltercommands
- Creating the specified dataset using the
(3 pts.) Part 5: Full credit for:
1 point for creating both specified scatterplots
1 point for creating the specified 2x2 grid of scatterplots
1 point for creating the specified grouped boxplots and the column of 2 grouped boxplots
(4 pts.) Completing the HW 2 - Final Steps and correctly submitting your zipped project directory.
1 point for creating a correct README file
1 point for having both the .qmd and .html files
1 point for having the empty
datafolder andimgfolder with one plot in it1 point for zipping and submitting your project correctly