HW 3 Instructions
Due Wednesday, September 17, 2025 at 11:59 PM
Instructions
HW 3 - First Steps
Create an R project named
HW 3 <first name> <last name>.File > New Project > New Directory > R Project
In box, name this project:
HW 3 <first name> <last name>My project name:
HW 3 Penelope Pooler
- Click Create Project.
- Note that if you create an R Project that is NOT a Quarto project, a Quarto file is not created.
Create
imganddatafolders with the R Project.Download the provided file,
HW3_Template.qmdfrom the Homework Assignments page of the 455 website.Save the downloaded
HW3_Template.qmdfile to your R project.
Change file name to be
HW3_FirstName_LastName.qmd.For example, I would change the template file to be named
HW3_Penelope_Pooler.qmd.There should be no spaces in a file name of a Quarto (
.qmd) file.Change title in the file header to be ‘HW 3’.
Specify yourself as the author.
- Save the provided data file
Box_Office_Mojo_Week3_HW3.csvinto yourdatafolder within this HW 3 project.
- Note: that these data are already clean and useable but they will be modified in the steps below.
NOTES
Provided header text below shows the correct format.
This header text also creates a floating Table of Contents (toc) and will show chunk labels.
Note that the options below will make the code chunks and code chunk labels visible in the output. We will change these options in later assignments and in the group project.
---
title: "HW 3"
author: "Penelope Pooler"
date: last-modified
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
- Create a new chunk under the
Setupheader and text and add the following text to the body of the chunk:
#|label: setup
# this line specifies options for default options for all R chunks
knitr::opts_chunk$set(echo=T,
highlight=T)
# suppress scientific notation
options(scipen=100)
# install helper package (pacman) if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman, tidyverse, gridExtra, magrittr, kableExtra)
# verify packages (comment out in finished documents)
p_loaded()
- Click the green triangle or type Ctrl/Cmd + Shift + Enter to run this setup chunk.
HW 3 - Part 1
Chunk 2: Import and Modify Categorical Variables
Steps to Follow:
Import
Box_Office_Mojo_Week3_HW3.csvand save it asmojo_23using the provided R code in Chunk 2.Remove
#(comment indicator) from in front of theselectcommand and use!to omit the text variable,num1.Remove
#(comment indicator) from in front of incomplete code lines in the lower part of this chunk.Use examples that show how to create a factor variable to complete code with a
mutatestatement to create factor variablequartFfromquart:
- levels:
c(1,2,3,4) - labels:
c("1st Qtr", "2nd Qtr", "3rd Qtr", "4th Qtr")
- Examine data with
glimpseand answer Blackboard Question:
BB Question 1
In Part 1 (Chunk 2) you will exclude the text variable num1 and create three factor variables, monthF, wkdayF, and quartF.
Examine the output from glimpse after creating the third factor variable, quartF.
The dataset, mojo_23_mod now has
____ rows and
____ columns
and includes a date variable,
____ character <chr> variables,
____ numeric <dbl> variables, and
____ factor <fct> variables.
HW 3 - Part 2
Chunk 3: Modify and Create Numerical Variables
Steps to Follow:
Remove
#(comment indicator) from in front of incomplete code lines in the lower portion of this chunk.Use
mutateto do the following:
- Coerce Number of Releases to be an integer variable:
num_releases = as.integer(num_releases)
- Create
num1_pct = num1gross/top10gross*100and round to 2 decimal places
- Answer Blackboard Question:
BB Question 2
The correct command used to convert a numeric variable to an integer variable is
_____.
When you glimpse the data after completing Part 2 (Chunk 3), the type for the num_releases variable is shown as
____ instead of <dbl>.
HW 3 - Part 3
Chunk 4: Group and Summarize Data
Steps to Follow:
- Remove
#(comment indicator) from in front of incomplete code lines in this chunk.
- Note: The first step of selecting the variables for the summary table has been completed for you.
Complete the
group_bycommand to group the data byquartF,wkdayFComplete the
summarizecommand to create the following summary variables.
Recall that
na.rm=Tis used remove missing values before calculating summary statistics.max_num, the maximum number of releases (num_releases):max_num = max(num_releases, na.rm=T)
mean_num1grossM, mean of the number 1 gross (num1grossM) rounded to 2 decimal placesmean_num1grossM = mean(num1grossM, na.rm=T) |> round(2)
- Answer Blackboard Question:
BB Question 3
Your grouped and summarized data have
____ rows and
____ columns with
____ summary numeric variables.
HW 3 - Part 4
Chunk 5: Reshape Data to Create a Table
Steps to Follow:
Use the summary dataset mojo_qtr_smry you created in Part 3 (Chunk 4)
Remove
#(comment indicator) from in front of incomplete code lines in this chunk.Use
pivot_widerto create a wider table with
1 row for each quarter and 1 column for each weekday.
Here is some code to assist you:
id_cols=quartF, names_from=wkdayF, values_from=mean_num1grossM
Create a presentation version of your table with
kable.Fill in the blanks in the Blackboard Question:
BB Question 4
The mean daily gross ($ millions) for Fridays in each quarter was:
1st Qtr: $____ million
2nd Qtr: $____ million
3rd Qtr: $____ million
4th Qtr: $____ million
HW 3 - Part 5
Chunk 6: Reshaping and Plotting Data
Steps to Follow:
Use the wide dataset mojo_qtr_wide you created in Part 4 (Chunk 5)
Remove
#(comment indicator) from in front of incomplete code lines in this chunk.Use
pivot_longerto reshapemojo_qtr_wideto be long again with these specifications:
cols=M:Su, names_to="Day", values_to="mean_num1grossM"cols=M:Su** means column labels from M to Su (Monday to Sunday will be 1 column)names_to="Day"means there will be a column namedDaythat lists all of the days, M:Su (M, T, W, Th, F, Sa, Su)values_to="mean_num1grossM"means the values will be in one long column namedmean_num1grossMNOTE: Reshaping the data turned our ‘Day’ data into a character variable (again). I provide the code to make it a factor variable (again).
- Complete the geom_bar statement to create the bar plot as follows:
aes(x=Qtr, y=mean_num1grossM, fill=Day)outside of aesthetic (
aes), after the comma:stat="identity",position="dodge"
stat="identity"tells R to use the data values themselves instead of default (number of observations).position="dodge"specifies that the data is diplayed in side-by-side bars instead of stacked bars.
Modify and run provided
ggsavecode to export completed plot to theimgfolder in your HW 3 Project folder.- Filename example:
HW3_Barplot_Penelope_Pooler.png
- Filename example:
Answer Blackboard Questions:
BB Question 5
For each option include the quotes in your answer.
stat=____ tells R to create the barplot using the numeric values in the data INSTEAD of the number of observations, which is the default.
position=____ indicates that the bars should be side-by-side, instead of stacked, which is the default.
BB Question 6
Based on the barplot created in Part 5 (Chunk 6), and the table created in Part 4 (Chunk 5), which day of the week has the LOWEST mean daily gross for the top film in first three quarters?
HW 3 - Final Steps
- Save your HW 3 Quarto File (.qmd) within your project folder.
- Feel free to add additional notes to yourself above or within each chunk.
Knit your Quarto File to create an HTML file (.html).
Answer all Blackboard questions associated with this assignment.
You are welcome (encouraged) to work together.
Each student should submit their own Blackboard assignment and zipped R Project.
- Create a README file using the template provided.
The dataset Box_Office_Mojo_Week3_HW3.csv should be saved in your
datafolder and listed in yourREADME.txtfile.The plot you created in Chunk 6 should be saved in your
imgfolder and listed in yourREADME.txtfile.
- Zip your entire Project Directory into a compressed File and submit it.
- The zipped R Project should be named
HW 3 FirstName LastName**. - The zipped project directory should contain:
- The completed
README.txtfile. - The .Rproj file
- The completed, correctly named, Quarto (.qmd) and rendered HTML (.html) files.
- A
datafolder that contains the.csvfile. - An
imgfolder that contains the exported .png file of the final plot.
- The completed
Grading Criteria
(8 pts.) Each Blackboard question for HW 3 is worth 1 or 2 points.
(2 pts.) Completing HW 3 - First Steps as specified.
(2 pts.) Part 1: Full credit for:
- correctly excluding
num1from the dataset - correctly creating a factor variable,
quartF
(2 pts.) Part 2: Full credit for:
- correctly coercing
num_releasesto be an integer variable - correctly creating the variable
num1pct
(2 pts.) Part 3: Full credit for:
- correctly grouping and summarizing the data to create to summary variables:
max_numandmean_num1grossM
(2 pts.) Part 4: Full credit for:
- correctly using
pivot_widerto reshape the data to wide format - creating a basic presentation table using
kable
(2 pts.) Part 5: Full credit for:
- correctly using
pivot_longerto reshape the data to long format - completing the barplot code to create a barplot
(4 pts.) Completing the HW 3 - Final Steps and correctly submitting your zipped project directory:
- 1 point for creating a correct README file
- 1 point for having the completed .qmd, .html files in the project folder
- 1 point for having
- the
.csvfile in thedatafolder - the exported
.pngfile in theimgfolder
- the
- 1 point for zipping and submitting your project correctly