knitr::opts_chunk$set(echo = F,
                      warning = F,
                      message = F,
                      fig.align = "center")

## Load the required package: tidyverse
library(tidyverse)

## Reading in the Dr Who data from github
drwho <- 
  read.csv("https://raw.githubusercontent.com/Shammalamala/DS-2870-Data-Sets/main/drwho.csv") |> 
  # Keeping only the unique episodes (some episodes are listed twice)
  distinct(title, season, episode, day, month, year, viewers, .keep_all = T) |> 
  # Putting the order of the dr actors by the order they played the doctor and adding a line break
  mutate(
    dr_actor = str_replace(dr_actor, " ", "\n") |> fct_reorder(dr_number),
    type = str_replace(type, "episode", "regular")
  ) |> 
  # Removing the row with NA for the doctor
  filter(!is.na(dr_actor))

Data Description

The drwho data set has the information on 175 episodes of the revival of the television show “Dr. Who”. There are 14 variables (columns), but this homework assignment will focus on the following columns:

      1. season, episode: The season number and episode number (within each season) for the episode
  1. type: If it is a “regular” season episode, or a TV “special” episode
  2. dr_actor: The name of the actor who played the Doctor in the episode - 5 different actors
  3. dr_number: The number of the iteration of the doctor (9/10/11/12/13)
  4. writer: The primary writer for the episode
  5. director: The director of the episode
  6. duration: The length of the episode (in minutes)
  7. viewers: The number of viewers for the initial broadcast in the UK (in millions)
  8. rating: The rating of the episode (0 - 100)

Question 1: Box plot of rating by actor

Create a set of boxplots comparing the rating of the episode by the actor.

It should match the completed graph in Brightspace attached to the assignment.

The color of the box should be “#003b6f” and the color of the lines and points of the box plots should be “orange2”

Question 2: Viewers and Rating by Actor and Type

Next, you’ll be creating a scatterplot to compare viewers, rating, actor, and type (regular vs special)

Part 2a: Basic scatterplot with trend line

Create a scatterplot with an added trend line that has the following aesthetics mapped to the corresponding columns:

  • x = viewers*1000000
  • y = rating
  • color = actor
  • shape = episode type

Set the size of the points to 1.5. Save the graph as gg_drwho_2a (make sure it still appears below the code chunk).

Part 2b: Changing the labels and themes

Using the gg_drwho_2a graph, make and save the changes so the plot appears like what is in Brightspace.

The title of the plot should be bolded, centered, and colored with the same color as the box plots in question 1.

Save it as gg_drwho_2b and make sure it is displayed beneath the code chunk

Part 2c: Changing the way the aesthetics are represented

Use gg_drwho_2b and the correct scale functions along with their respective arguments to make the changes seen in Brightspace

The hexcodes for the colors used are: “#1a2530”, “#a1856a”, “#b44a48”, “#e29d42”, “#2bafc8”

Disclaimer: The actors names aren’t coded as “First Last” but as “First” in order to have the first and last names appear on two separate lines. I.e David Tennant is coded as “David”

The number for the shapes used are 19 and 8, respectively

Save the results as gg_drwho_2c and display it beneath the code chunk

Part 2d: Small Multiples

Using gg_drwho_2c, create a separate graph for each actor.

Hide the guide for color but not shape using the guides() function (like in part 2b).

Move the legend into the blank space in the bottom right corner of the plot. To move the legend inside the graph, you give the correct argument a it a pair of (x,y) coordinates that range from 0 - 1 for both x (left - right) and also y (bottom - top)

See how the graph should appear in Brightspace