HW 5 - Part 1 Instructions

Due 3/26/25

Introduction

HW Assignment 5 - Part 1 will give you experience with:

  • creating/Modifying a Quarto dashboard based on a similar example.

  • citing source data.

  • executing data management tasks such as:

    • reshaping data using pivot_wider and pivot_longer (Review).

    • filtering rows and selecting variables (Review).

    • modifying variable formats using ifelse and factor (ifelse is new).

    • Summarizing and presenting data in a table using kable (Review).

Instructions

HW 5.1 - First Steps

Steps to Follow:

  1. Download, unzip, and save to your laptop the provided HW 5 - Part 1 R project.

  2. Rename the project folder to include your name, e.g. HW 5 - Part 1 - Penelope Pooler.

  • This R project has the data saved to the data folder and has a custom.scss file that can be used to modify dashboard defaults (advanced).

  • Notice that a template Quarto (.qmd) file is also included in the R project.

  1. Create a duplicate of this template Quarto file within the project.
  • Select the Quarto file, click the More option in the Files pane, and then click Copy....

  • You will delete the original template when you are done, but it is good to have in case you corrupt the provided code.

  1. Change the file name of the copied .qmd file to be the following with your name: HW5_Part1_Dashboard_FirstName_LastName.

  2. In the header of the dashboard, change the title to be HW 5 - Part 1 - FirstName LastName.

    • Note that in the video demo playlist I add my name to the .qmd file in ‘Final Steps’ video.
  3. Copy and paste chunks one through five from the provided text file, setup_data_mgmt_chunks.txt directly under the header, before the the line that says ## Nextflix and Amazon Stock Values

  • Run all five of these chunks.

  • This code was covered in class.

  • Notice that the chunk options for a dashboard are different because all code and extraneous output are hidden.

  1. At the end of Chunk 5:
  • Remove the # before nflx_mv <-

  • Use the code directly above this line as an example to filter the data to only movies:

  • filter condition: type == "Movie"

BB Question 1

Fill in the blanks:

The nflx_mv dataset has ____ rows and ____ columns.

HW 5.1 - Page 1

Stock Data

Steps to Follow:

The first page of the example dashboard compares Netflix stock data to Amazon. You are going to change the amazon stock information to AMC because your dashboard will focus on movie content.

  1. Change the page header to read # Nextflix and AMC Stock Values

  2. In Chunk 6, import stock data, change AMZN to AMC throughout this chunk.

  3. In the text after ## Row, update the text to indicate the dashboard will show AMC stock data instead of Amazon stock data.

  4. In Chunk 9, the third value box chunk, change Amazon to AMC and AMZN to AMC.

  5. In Chunk 11, pg1 amzn stock trends, change amzn to amc to AMZN to AMC throughout this chunk, including in the header.

BB Question 2

What type of dataset are the NFLX data when they are imported into R from Yahoo Finance?

BB Question 3

Given that these data are a time series, where is the time or date information located in the dataset?

BB Question 4

On June 2, 2021, AMC’s stock was at its highest value in this timespan. On June 2, 2021, which stock, Netflix or AMC, was valued higher?

  • Use the plots to answer this question.

HW 5.1 - Page 2 - Part 1

Bar Chart Data Mgmt.

Steps to Follow:

  1. Change tv to mv in chunk headers and throughout these two chunks and change the page header to:

    • # Bar Chart of Movie Trends.
  2. In Chunk 12, pg2 nflx mv release period data mgmt:

  1. Complete release_period = ifelse() statement in the mutate command to group data from "2001-2005" and "2006-2010" into one category, "2001-2010":

mutate(release_period = ifelse(release_period %in% c("2001-2005", "2006-2010"), "2001-2010", release_period))

  1. Notice that In the Netflix TV dashboard, data are filtered to most recent three release periods: 2001-2010”, “2011-2015”, and “2016-2021”.
  • In the filter command in the data management , add one more release period: "1981-2000".
  1. Create a factor variable, min_ageF from the variable min_age with these factor levels, levels = c(0, 7, 13, 17).

  2. The current levels and labels for genreF are not in order of prevalence for movies.

  • The correct order (based on most recent time period) for the movies data from most prevalent to least is:

    1. International
    2. Drama
    3. Comedies
    4. Documentaries
    5. Kids
    6. Action and Adventure
  • current levels: "international", "dramas", "action_adventr", "comedies", "kids", "docs"

  • current labels: "Int","Dr","A/A","C","K","Do"

  • Reorder the genre levels and labels in the R code so that the categories are in the order of prevalence shown above.

  1. In Chunk 13, change tv to mv in the header so that the header title is pg2 nflx mv release period bar chart, and then make the following changes to the labs command in the plot code:
  • In the plot subtitle in the labs command:

    • Update the order of the genres. Note there are 3 spaces between each genre in the subtitle.

    • Change Docuseries to Documentaries

  • In the plot title and y-axis label, change ‘TV Shows’ to ‘Movies’

NOTES:

  • After completing the pg2 nflx mv release period data mgmt Chunk (Chunk 12), remove eval=F from BOTH the bar plot chunk (Chunk 13) and the summary table chunk (Chunk 14).

  • Figure dimensions, fig.dim = c(10, 5) should be left as is to utilize available space.

HW 5.1 - Page 2 - Part 2

Summary Table

Steps to Follow:

  1. Change tv to mv in this chunk (Chunk 14).

  2. Complete the summary table code chunk so that the summary table appears in the right side panel next to the bar plot:

  1. Complete select command to select these variables:
  • release_period, genreF, n
  1. Complete group_by command to group data by these variables: release_period, genreF

  2. Complete summarize command to sum n (number of movies): n=sum(n)

  3. Complete the pivot_wider command to:

  • Maintain release_period as is: id_cols = release_period

  • Create a column for each genre: names_from = genreF

  • Use the values from n for each genre column: values_from = n

  • Note that these options in pivot_wider are all be separated by commas.

  1. Enter name of summary dataset, nflx_smry1 in kable() command to output table.
  • See completed example from class using Netflix TV Data and demo video
BB Question 5

The final filtered dataset used to create the barplot is nflx_mv_plot1. This dataset has:

  • ____ categories in the release_period variable

  • ____ categories in min_ageF, the minimum age factor variable

  • ____ categories in genreF, the genre factor variable


BB Question 6:

Based on the barplot and summary table, which genre has the most movies in the three most recent release periods?

HW 5.1 - Page 3

Area Plot

Steps to Follow:

  1. Change tv to mv in these last two chunks (Chunk 15 and 16) and change the page header to:

    • # Netflix Movies Added Each Year.
  2. Run provided data management code chunk, Chunk 15, pg3 nflx mv area plot data mgmt.

  3. Answer Blackboard Question 7 (BB Question 7) based on the dataset used to create the plot in Panel 3, nflx_mv_plot2.

BB Question 7:

After completing the data management steps, the final dataset used for the plot, nflx_mv_plot2, has

  • ____ rows.

  • ____ columns.

  • ____ different years in the year_added variable.


  1. Complete the geom_area() statement as follows:
  1. Add the aesthetic command within the parentheses: aes().

  2. Within the aesthetic command, aes(), specify the following:

  • x is year_added: x = year_added

  • y is total: y = total

  • fill is min_ageF: fill = min_ageF

  • NOTE: x, y, and fill should be separated by commas.

  • See completed example from class using Netflix TV Data and demo video

  1. Complete the scale_x_continuous() command with a breaks option:
  1. Within the parentheses add: breaks =
  • x-axis should show every year from 2013 to 2021

  • One solution: use seq() command, e.g. seq(2013, 2021, 1)

  • See completed example from class using Netflix TV Data and demo video

  1. In the plot title and in the y-axis label in the labs command change ‘TV Shows’ to ‘Movies’.

  2. Remember to remove eval=F from this chunk (Chunk 16, pg3 nflx tv area plot).

Optional Extra Credit (2 pts.)

NOTE: There is no partial credit on this extra credit, but this is not required.

  • The purpose of this Extra Credit is to experiment with dashboard themes, plot themes, and colors to examine choices and see what works well.

  • For 2 Extra Points:

  1. Change the dashboard theme.
  1. Change these two aspects in the Page 2 and Page 3 plots (plots must match).
  • the plot theme (chosen theme should NOT be theme_classic OR default)

  • the palette = option in the scale_fill_brewer commands (should not be “Spectral” OR default palette)

  • The plot theme for both plots must match and fit each plot, i.e., not obscure any plot elements in either plot.

  • The palette chosen must show all 4 colors clearly and can not be the R default.

  • There is no “right answer”, but if you chose a theme that makes some of the plot elements, e.g legend, titles, not visible you will not get credit.

  • If you choose a palette with colors that are not clearly visible or distinguishable, you will not get credit.

  • Here are some helpful links:

HW 5.1 - Final Steps

  1. Once all code is complete and runs without errors, render Quarto (.qmd) file to create dashboard
  • Don’t forget to remove eval = F from chunk headers (Chunks 13, 14, and 16).
  1. Verify that your project folder includes:
  • HW 5 - Part 1 Quarto (.qmd) file to create dashboard saved with your name.

  • HW 5 - Part 1 Dashboard (.html) file saved with your name.

  • a custom.scss file that adjusts the box size and font size of the value boxes.

  • a data folder that contains the data file, netflix_titles.csv.

  • an empty img folder.

  • an .Rproj file.

  1. Save the provided README template to your project folder and update it to list all of the files and folders above.
  • You do not have to list files or folder that are not listed above.
  1. Zip (Compress) project directory to submit it.

  2. Answer all Blackboard Questions (7 Questions)

Grading Criteria

  • (14 pts.) Each Blackboard question for this assignment is worth 2 points.

Dashboard Creation Steps:

  • (3 pts.) Completing HW 5.1 - First Steps as specified.

  • (2 pts.) Page 1 - Stock Page:

    • Full credit for

      • correctly updating Chunks 6, 9, and 11 from Amazon data to AMC data.

      • timsespan should be 2013-01-01 to 2021-12-31

  • (4 pts.) Page 2 - Part 1 - Bar Plot:

    • Full credit for correctly following all steps to

      • Create the barplot showing each movie genre in a separate bar

      • Have movie genres ordered by prevalence in 2016-2021 release period

      • Have bars correctly labeled as specified (labels must match levels)

      • Have stacked colors showing movies for each minimum age category

      • Have a different panel for each release period (4 panels)

      • have all plot text and accompanying text appearing correctly in dashboard

  • (4 pts.) Page 2 - Part 2 - Summary Table:

    • Full credit for correctly following all steps to

      • create correctly formatted and labeled table

      • place table and accompanying text correctly in right panel next to plot

  • (4 pts.) Page 3 - Area Plot:

    • Full credit for correctly following all steps to

      • create an area plot with a correctly labeled X-axis (each year showing)

      • show each minimum age category

      • have all parts of the plot labeled correctly

      • have accompanying text appearing correctly in dashboard

  • (2 pts.) Completing OPTIONAL EXTRA CREDIT as specified.

    • There is no partial credit for the extra credit but this is not required.
  • (2 pts.) Completing the HW 5 - Part 1 - Final Steps as specified and correctly submitting your zipped project directory.