HW 3 Instructions

Due Wednesday, February 5, 2025 at 11:59 PM

Instructions

HW 3 - First Steps

  1. Create an R project named HW 3 <first name> <last name>.

    1. File > New Project > New Directory > R Project

    2. In box, name this project:

    • HW 3 <first name> <last name>

    • My project name: HW 3 Penelope Pooler

    1. Click Create Project.
    • Note that if you create an R Project that is NOT a Quarto project, a Quarto file is not created.
  2. Create img and data folders with the R Project.

  3. Download the provided file, HW3_Template.qmd from the Homework Assignments page of the 455 website.

  4. Save the downloaded HW3_Template.qmd file to your R project.

  • Change file name to be HW3_FirstName_LastName.qmd.

    • For example, I would change the template file to be named HW3_Penelope_Pooler.qmd.

    • There should be no spaces in a file name of a Quarto (.qmd) file.

    • Change title in the file header to be ‘HW 3’.

    • Specify yourself as the author.

  1. Save the provided data file Box_Office_Mojo_Week3_HW3.csv into your data folder within this HW 3 project.
  • Note: that these data are already clean and useable but they will be modified in the steps below.

NOTES

  • Provided header text below shows the correct format.

  • This header text also creates a floating Table of Contents (toc) and will show chunk labels.

  • Note that the options below will make the code chunks and code chunk labels visible in the output. We will change these options in later assignments and in the group project.

---
title: "HW 3"
author: "Penelope Pooler"
date: last-modified
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
  html:
    code-line-numbers: true
    code-fold: true
    code-tools: true
execute:
  echo: fenced 
---
  1. Create a new chunk under the Setup header and text and add the following text to the body of the chunk:
#|label: setup

# this line specifies options for default options for all R chunks
knitr::opts_chunk$set(echo=T,  
                      highlight=T)

# suppress scientific notation
options(scipen=100)

# install helper package (pacman) if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")

# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman, tidyverse, gridExtra, magrittr, kableExtra)

# verify packages (comment out in finished documents)
p_loaded()
  1. Click the green triangle or type Ctrl/Cmd + Shift + Enter to run this setup chunk.

HW 3 - Part 1

Chunk 2: Import and Modify Categorical Variables

Steps to Follow:

  1. Import Box_Office_Mojo_Week3_HW3.csv and save it as mojo_23 using the provided R code in Chunk 2.

  2. Remove # (comment indicator) from in front of the select command and use ! to omit the text variable, num1.

  3. Remove # (comment indicator) from in front of incomplete code lines in the lower part of this chunk.

  4. Use examples that show how to create a factor variable to complete code with a mutate statement to create factor variable quartF from quart:

  • levels: c(1,2,3,4)
  • labels: c("1st Qtr", "2nd Qtr", "3rd Qtr", "4th Qtr")
  1. Examine data with glimpse and answer Blackboard Question:
BB Question 1

In Part 1 (Chunk 2) you will exclude the text variable num1 and create three factor variables, monthF, wkdayF, and quartF.

Examine the output from glimpse after creating the third factor variable, quartF.

The dataset, mojo_23_mod now has

____ rows and

____ columns

and includes a date variable,

____ character <chr> variables,

____ numeric <dbl> variables, and

____ factor <fct> variables.

HW 3 - Part 2

Chunk 3: Modify and Create Numerical Variables

Steps to Follow:

  1. Remove # (comment indicator) from in front of incomplete code lines in the lower portion of this chunk.

  2. Use mutate to do the following:

  • Coerce Number of Releases to be an integer variable:
    • num_releases = as.integer(num_releases)
  • Create num1_pct = num1gross/top10gross*100 and round to 2 decimal places
  1. Answer Blackboard Question:
BB Question 2

The correct command used to convert a numeric variable to an integer variable is

_____.

When you glimpse the data after completing Part 2 (Chunk 3), the type for the num_releases variable is shown as

____ instead of <dbl>.

HW 3 - Part 3

Chunk 4: Group and Summarize Data

Steps to Follow:

  1. Remove # (comment indicator) from in front of incomplete code lines in this chunk.
  • Note: The first step of selecting the variables for the summary table has been completed for you.
  1. Complete the group_by command to group the data by quartF, wkdayF

  2. Complete the summarize command to create the following summary variables.

  • Recall that na.rm=T is used remove missing values before calculating summary statistics.

  • max_num, the maximum number of releases (num_releases):

    • max_num = max(num_releases, na.rm=T)
  • mean_num1grossM, mean of the number 1 gross (num1grossM) rounded to 2 decimal places

    • mean_num1grossM = mean(num1grossM, na.rm=T) |> round(2)
  1. Answer Blackboard Question:


BB Question 3

Your grouped and summarized data have

____ rows and

____ columns with

____ summary numeric variables.

HW 3 - Part 4

Chunk 5: Reshape Data to Create a Table

Steps to Follow:

Use the summary dataset mojo_qtr_smry you created in Part 3 (Chunk 4)

  1. Remove # (comment indicator) from in front of incomplete code lines in this chunk.

  2. Use pivot_wider to create a wider table with

  • 1 row for each quarter and 1 column for each weekday.

  • Here is some code to assist you:

    • id_cols=quartF, names_from=wkdayF, values_from=mean_num1grossM
  1. Create a presentation version of your table with kable.

  2. Fill in the blanks in the Blackboard Question:

BB Question 4

The mean daily gross ($ millions) for Fridays in each quarter was:

  • 1st Qtr: $____ million

  • 2nd Qtr: $____ million

  • 3rd Qtr: $____ million

  • 4th Qtr: $____ million

HW 3 - Part 5

Chunk 6: Reshaping and Plotting Data

Steps to Follow:

Use the wide dataset mojo_qtr_wide you created in Part 4 (Chunk 5)

  1. Remove # (comment indicator) from in front of incomplete code lines in this chunk.

  2. Use pivot_longer to reshape mojo_qtr_wide to be long again with these specifications:

  • cols=M:Su, names_to="Day", values_to="mean_num1grossM"

  • cols=M:Su** means column labels from M to Su (Monday to Sunday will be 1 column)

  • names_to="Day" means there will be a column named Day that lists all of the days, M:Su (M, T, W, Th, F, Sa, Su)

  • values_to="mean_num1grossM" means the values will be in one long column named mean_num1grossM

  • NOTE: Reshaping the data turned our ‘Day’ data into a character variable (again). I provide the code to make it a factor variable (again).

  1. Complete the geom_bar statement to create the bar plot as follows:
  • aes(x=Qtr, y=mean_num1grossM, fill=Day)

  • outside of aesthetic (aes), after the comma:

    • stat="identity", position="dodge"
  • stat="identity" tells R to use the data values themselves instead of default (number of observations).

  • position="dodge" specifies that the data is diplayed in side-by-side bars instead of stacked bars.

  1. Modify and run provided ggsave code to export completed plot to the img folder in your HW 3 Project folder.

    • Filename example: HW3_Barplot_Penelope_Pooler.png
  2. Answer Blackboard Questions:

BB Question 5

For each option include the quotes in your answer.

stat=____ tells R to create the barplot using the numeric values in the data INSTEAD of the number of observations, which is the default.

position=____ indicates that the bars should be side-by-side, instead of stacked, which is the default.

BB Question 6

Based on the barplot created in Part 5 (Chunk 6), and the table created in Part 4 (Chunk 5), which day of the week has the LOWEST mean daily gross for the top film in first three quarters?

HW 3 - Final Steps

  1. Save your HW 3 Quarto File (.qmd) within your project folder.
  • Feel free to add additional notes to yourself above or within each chunk.
  1. Knit your Quarto File to create an HTML file (.html).

  2. Answer all Blackboard questions associated with this assignment.

  • You are welcome (encouraged) to work together.

  • Each student should submit their own Blackboard assignment and zipped R Project.

  1. Create a README file using the template provided.
  • The dataset Box_Office_Mojo_Week3_HW3.csv should be saved in your data folder and listed in your README.txt file.

  • The plot you created in Chunk 6 should be saved in your img folder and listed in your README.txt file.

  1. Zip your entire Project Directory into a compressed File and submit it.
  • The zipped R Project should be named HW 3 FirstName LastName**.
  • The zipped project directory should contain:
    • The completed README.txt file.
    • The .Rproj file
    • The completed, correctly named, Quarto (.qmd) and rendered HTML (.html) files.
    • A data folder that contains the .csv file.
    • An img folder that contains the exported .png file of the final plot.

Grading Criteria

(8 pts.) Each Blackboard question for HW 3 is worth 1 or 2 points.

(2 pts.) Completing HW 3 - First Steps as specified.

(2 pts.) Part 1: Full credit for:

  • correctly excluding num1 from the dataset
  • correctly creating a factor variable, quartF

(2 pts.) Part 2: Full credit for:

  • correctly coercing num_releases to be an integer variable
  • correctly creating the variable num1pct

(2 pts.) Part 3: Full credit for:

  • correctly grouping and summarizing the data to create to summary variables:
  • max_num and mean_num1grossM

(2 pts.) Part 4: Full credit for:

  • correctly using pivot_wider to reshape the data to wide format
  • creating a basic presentation table using kable

(2 pts.) Part 5: Full credit for:

  • correctly using pivot_longer to reshape the data to long format
  • completing the barplot code to create a barplot


(4 pts.) Completing the HW 3 - Final Steps and correctly submitting your zipped project directory:

  • 1 point for creating a correct README file
  • 1 point for having the completed .qmd, .html files in the project folder
  • 1 point for having
    • the .csv file in the data folder
    • the exported .png file in the img folder
  • 1 point for zipping and submitting your project correctly