HW 3 Instructions
Due Wednesday, February 5, 2025 at 11:59 PM
Instructions
HW 3 - First Steps
Create an R project named
HW 3 <first name> <last name>
.File > New Project > New Directory > R Project
In box, name this project:
HW 3 <first name> <last name>
My project name:
HW 3 Penelope Pooler
- Click Create Project.
- Note that if you create an R Project that is NOT a Quarto project, a Quarto file is not created.
Create
img
anddata
folders with the R Project.Download the provided file,
HW3_Template.qmd
from the Homework Assignments page of the 455 website.Save the downloaded
HW3_Template.qmd
file to your R project.
Change file name to be
HW3_FirstName_LastName.qmd
.For example, I would change the template file to be named
HW3_Penelope_Pooler.qmd
.There should be no spaces in a file name of a Quarto (
.qmd
) file.Change title in the file header to be ‘HW 3’.
Specify yourself as the author.
- Save the provided data file
Box_Office_Mojo_Week3_HW3.csv
into yourdata
folder within this HW 3 project.
- Note: that these data are already clean and useable but they will be modified in the steps below.
NOTES
Provided header text below shows the correct format.
This header text also creates a floating Table of Contents (toc) and will show chunk labels.
Note that the options below will make the code chunks and code chunk labels visible in the output. We will change these options in later assignments and in the group project.
---
title: "HW 3"
author: "Penelope Pooler"
date: last-modified
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
- Create a new chunk under the
Setup
header and text and add the following text to the body of the chunk:
#|label: setup
# this line specifies options for default options for all R chunks
knitr::opts_chunk$set(echo=T,
highlight=T)
# suppress scientific notation
options(scipen=100)
# install helper package (pacman) if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman, tidyverse, gridExtra, magrittr, kableExtra)
# verify packages (comment out in finished documents)
p_loaded()
- Click the green triangle or type Ctrl/Cmd + Shift + Enter to run this setup chunk.
HW 3 - Part 1
Chunk 2: Import and Modify Categorical Variables
Steps to Follow:
Import
Box_Office_Mojo_Week3_HW3.csv
and save it asmojo_23
using the provided R code in Chunk 2.Remove
#
(comment indicator) from in front of theselect
command and use!
to omit the text variable,num1
.Remove
#
(comment indicator) from in front of incomplete code lines in the lower part of this chunk.Use examples that show how to create a factor variable to complete code with a
mutate
statement to create factor variablequartF
fromquart
:
- levels:
c(1,2,3,4)
- labels:
c("1st Qtr", "2nd Qtr", "3rd Qtr", "4th Qtr")
- Examine data with
glimpse
and answer Blackboard Question:
BB Question 1
In Part 1 (Chunk 2) you will exclude the text variable num1
and create three factor variables, monthF
, wkdayF
, and quartF
.
Examine the output from glimpse
after creating the third factor variable, quartF
.
The dataset, mojo_23_mod
now has
____
rows and
____
columns
and includes a date variable,
____
character <chr>
variables,
____
numeric <dbl>
variables, and
____
factor <fct>
variables.
HW 3 - Part 2
Chunk 3: Modify and Create Numerical Variables
Steps to Follow:
Remove
#
(comment indicator) from in front of incomplete code lines in the lower portion of this chunk.Use
mutate
to do the following:
- Coerce Number of Releases to be an integer variable:
num_releases = as.integer(num_releases)
- Create
num1_pct = num1gross/top10gross*100
and round to 2 decimal places
- Answer Blackboard Question:
BB Question 2
The correct command used to convert a numeric variable to an integer variable is
_____
.
When you glimpse
the data after completing Part 2 (Chunk 3), the type for the num_releases variable is shown as
____
instead of <dbl>
.
HW 3 - Part 3
Chunk 4: Group and Summarize Data
Steps to Follow:
- Remove
#
(comment indicator) from in front of incomplete code lines in this chunk.
- Note: The first step of selecting the variables for the summary table has been completed for you.
Complete the
group_by
command to group the data byquartF
,wkdayF
Complete the
summarize
command to create the following summary variables.
Recall that
na.rm=T
is used remove missing values before calculating summary statistics.max_num
, the maximum number of releases (num_releases):max_num = max(num_releases, na.rm=T)
mean_num1grossM
, mean of the number 1 gross (num1grossM) rounded to 2 decimal placesmean_num1grossM = mean(num1grossM, na.rm=T) |> round(2)
- Answer Blackboard Question:
BB Question 3
Your grouped and summarized data have
____
rows and
____
columns with
____
summary numeric variables.
HW 3 - Part 4
Chunk 5: Reshape Data to Create a Table
Steps to Follow:
Use the summary dataset mojo_qtr_smry
you created in Part 3 (Chunk 4)
Remove
#
(comment indicator) from in front of incomplete code lines in this chunk.Use
pivot_wider
to create a wider table with
1 row for each quarter and 1 column for each weekday.
Here is some code to assist you:
id_cols=quartF, names_from=wkdayF, values_from=mean_num1grossM
Create a presentation version of your table with
kable
.Fill in the blanks in the Blackboard Question:
BB Question 4
The mean daily gross ($ millions) for Fridays in each quarter was:
1st Qtr: $____ million
2nd Qtr: $____ million
3rd Qtr: $____ million
4th Qtr: $____ million
HW 3 - Part 5
Chunk 6: Reshaping and Plotting Data
Steps to Follow:
Use the wide dataset mojo_qtr_wide
you created in Part 4 (Chunk 5)
Remove
#
(comment indicator) from in front of incomplete code lines in this chunk.Use
pivot_longer
to reshapemojo_qtr_wide
to be long again with these specifications:
cols=M:Su, names_to="Day", values_to="mean_num1grossM"
cols=M:Su
** means column labels from M to Su (Monday to Sunday will be 1 column)names_to="Day"
means there will be a column namedDay
that lists all of the days, M:Su (M, T, W, Th, F, Sa, Su)values_to="mean_num1grossM"
means the values will be in one long column namedmean_num1grossM
NOTE: Reshaping the data turned our ‘Day’ data into a character variable (again). I provide the code to make it a factor variable (again).
- Complete the geom_bar statement to create the bar plot as follows:
aes(x=Qtr, y=mean_num1grossM, fill=Day)
outside of aesthetic (
aes
), after the comma:stat="identity"
,position="dodge"
stat="identity"
tells R to use the data values themselves instead of default (number of observations).position="dodge"
specifies that the data is diplayed in side-by-side bars instead of stacked bars.
Modify and run provided
ggsave
code to export completed plot to theimg
folder in your HW 3 Project folder.- Filename example:
HW3_Barplot_Penelope_Pooler.png
- Filename example:
Answer Blackboard Questions:
BB Question 5
For each option include the quotes in your answer.
stat=____
tells R to create the barplot using the numeric values in the data INSTEAD of the number of observations, which is the default.
position=____
indicates that the bars should be side-by-side, instead of stacked, which is the default.
BB Question 6
Based on the barplot created in Part 5 (Chunk 6), and the table created in Part 4 (Chunk 5), which day of the week has the LOWEST mean daily gross for the top film in first three quarters?
HW 3 - Final Steps
- Save your HW 3 Quarto File (.qmd) within your project folder.
- Feel free to add additional notes to yourself above or within each chunk.
Knit your Quarto File to create an HTML file (.html).
Answer all Blackboard questions associated with this assignment.
You are welcome (encouraged) to work together.
Each student should submit their own Blackboard assignment and zipped R Project.
- Create a README file using the template provided.
The dataset Box_Office_Mojo_Week3_HW3.csv should be saved in your
data
folder and listed in yourREADME.txt
file.The plot you created in Chunk 6 should be saved in your
img
folder and listed in yourREADME.txt
file.
- Zip your entire Project Directory into a compressed File and submit it.
- The zipped R Project should be named
HW 3 FirstName LastName
**. - The zipped project directory should contain:
- The completed
README.txt
file. - The .Rproj file
- The completed, correctly named, Quarto (.qmd) and rendered HTML (.html) files.
- A
data
folder that contains the.csv
file. - An
img
folder that contains the exported .png file of the final plot.
- The completed
Grading Criteria
(8 pts.) Each Blackboard question for HW 3 is worth 1 or 2 points.
(2 pts.) Completing HW 3 - First Steps as specified.
(2 pts.) Part 1: Full credit for:
- correctly excluding
num1
from the dataset - correctly creating a factor variable,
quartF
(2 pts.) Part 2: Full credit for:
- correctly coercing
num_releases
to be an integer variable - correctly creating the variable
num1pct
(2 pts.) Part 3: Full credit for:
- correctly grouping and summarizing the data to create to summary variables:
max_num
andmean_num1grossM
(2 pts.) Part 4: Full credit for:
- correctly using
pivot_wider
to reshape the data to wide format - creating a basic presentation table using
kable
(2 pts.) Part 5: Full credit for:
- correctly using
pivot_longer
to reshape the data to long format - completing the barplot code to create a barplot
(4 pts.) Completing the HW 3 - Final Steps and correctly submitting your zipped project directory:
- 1 point for creating a correct README file
- 1 point for having the completed .qmd, .html files in the project folder
- 1 point for having
- the
.csv
file in thedata
folder - the exported
.png
file in theimg
folder
- the
- 1 point for zipping and submitting your project correctly