Instructions

Submission

Please knit your file as a HTML Document (to display interactivity) and submit your assignment to bCourses.

Tips

  • The questions in this exercise are meant to complement Lab Module: Interactive Plots & Tables in Week 10.

  • This exercise is going to use the ca_aqi.rds dataset that we cleaned in Module: Scatterplots

  • Please make sure to double check that your code does not run off the page in your knitted HTML Document. To fix this issue, go back to your code, and add new lines in your code by hitting enter so that part of the code chunk is on another line.

Read in the Data

# Week parameter
week = params$week

# Read clean rds (R dataset) file with `write_rds`
ca_aqi = read_rds(here(week, "data", "top5_cities.rds"))

Run this code

These steps are going to create dataframe top5_df for the 5 cities with highest average AQI in California using ca_aqi.

# Here we generate the names of cities we are interested in
top5cities <- ca_aqi %>%
  # Group by city
  group_by(city_ascii) %>%
  # Population-weighted average of AQI
  summarise(mean_aqi = weighted.mean(aqi, population, na.rm = T)) %>%
  # Sort dataframe by mean aqi, in descending order
  dplyr::arrange(desc(mean_aqi)) %>%
  # Take the top 5
  head(5) %>%
  # Select just the city names
  select(city_ascii) %>%
  # Convert the dataframe into a vector
  as_vector()

# Create dataframe with all unaggregated AQI measurements
top5_df <- ca_aqi %>%
  # Filter for cities within our list
  filter(city_ascii %in% top5cities,
         # Filter out extreme aqi values
         aqi < 1000)

Run Package Updates and Versions

Before building update “ggplot2” & “plotly”

# Update ggplot2 & update plotly
install.packages("ggplot2")
install.packages("plotly")

Then check their versions “ggplot2” & “plotly”

# Check package versions (Check package versions)
#packageVersion("ggplot2") # should be ‘3.5.1’
#packageVersion("plotly") # should be ‘4.10.4’

Question 1

Use the top5_df dataframe we’ve prepared above (the 5 cities with highest average AQI in California) from Lab Module: Interactive Plots & Tables and create either an interactive violin plot or an interactive box plot to depict the spread of AQI measurements across the five cities usingggplot with a ggplotly conversion.

Of importance is to take note as to how the tooltip is being generated. Refer to the code from Lab Module: Interactive Plots & Tables to customize the tooltip if you deem it necessary.

library(ggplot2)
library(plotly)

box_df <- top5_df %>%
  dplyr::ungroup() %>%        
  dplyr::select(city_ascii, aqi) %>%
  dplyr::mutate(
    city_ascii = as.factor(city_ascii),
    aqi        = as.numeric(aqi)
  ) %>%
  as.data.frame()   

p_box <- ggplot(
  box_df,
  aes(
    x    = city_ascii,
    y    = aqi,
    fill = city_ascii,
    text = paste("City:", city_ascii,
                 "<br>AQI:", aqi)
  )
) +
  geom_boxplot(alpha = 0.6, outlier.alpha = 0.3) +
  labs(
    x = "City",
    y = "Daily AQI",
    title = "Distribution of Daily AQI for Top 5 CA Cities (Boxplot)"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "none",
    plot.title      = element_text(face = "bold")
  )

ggplotly(p_box, tooltip = "text")

Question 2

Build the same plot as you did in Question 1 using native plotly object construction.

library(dplyr)

p_box_plotly <- plot_ly(
  data   = box_df,
  x      = ~city_ascii,
  y      = ~aqi,
  color  = ~city_ascii,
  type   = "box",
  boxpoints = "outliers",
  hoverinfo = "text",
  text  = ~paste(
    "City:", city_ascii,
    "<br>AQI:", round(aqi, 1)
  )
) %>%
  layout(
    title  = "Distribution of Daily AQI for Top 5 CA Cities (Boxplot)",
    xaxis  = list(title = "City"),
    yaxis  = list(title = "Daily AQI"),
    showlegend = FALSE
  )

p_box_plotly