TidyTuesday1119.knit

About TidyTuesday:

TidyTuesday is a weekly social data project. All are welcome to participate! Please remember to share the code used to generate your results.

TidyTuesday is organized by the Data Science Learning Community. Join their Slack for free online help with R, Python, and other data-related topics, or to participate in a data-related book club!

Goals:

Our overarching goal for TidyTuesday is to make it easier to learn to work with data by providing real-world datasets.

Our goal for 2023-2024 is to increase usage of #TidyTuesday within classrooms. We aim to be used in at least 10 courses by September 2024. If you are using TidyTuesday to teach data-related skills, please let us know!

How to Participate:

Data is posted to social media every Monday morning. Follow the instructions in the new post for how to download the data. Explore the data, watching for interesting relationships. Avoid drawing conclusions about causation as many moderating variables may not be captured. Create a visualization, model, shiny app, or another data-science-related output using R or another programming language. Share your output and the code used to generate it on social media with the #TidyTuesday hashtag. Curate a dataset for a future TidyTuesday! This Week’s Dataset: Bob’s Burgers Dialogue This week we’re exploring Bob’s Burgers dialogue! Thank you to Steven Ponce for the data and a blog post demonstrating how to visualize the data.

Access the dataset:

episode_metrics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-11-19/episode_metrics.csv')

## Rows: 272 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (8): season, episode, dialogue_density, avg_length, sentiment_variance, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

See the {bobsburgersR} R Package for the original transcript data, as well as additional information about each episode!

My Analysis Code to Compute Season Averages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

season_averages <- episode_metrics %>%
  group_by(season) %>%
  summarise(across(where(is.numeric), mean, na.rm = TRUE))

## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(where(is.numeric), mean, na.rm = TRUE)`.
## ℹ In group 1: `season = 1`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

season_averages

Code to My Chart

library(ggplot2)

ggplot(episode_metrics, aes(x = factor(season), y = sentiment_variance, fill = factor(season))) +
  geom_boxplot(alpha = 0.7, color = "black", size = 1.2) +
  geom_jitter(aes(color = dialogue_density), size = 3, width = 0.2, alpha = 0.8) +
  scale_fill_brewer(palette = "Set3") +
  scale_color_gradient(low = "purple", high = "red") +
  labs(
    title = "Crazy Chart: Boxplots, Jitter, and Madness",
    subtitle = "Featuring Dialogue Density Chaos",
    x = "Season",
    y = "Sentiment Variance",
    fill = "Season",
    color = "Dialogue Density"
  ) +
  theme_minimal(base_size = 15) +
  theme(
    plot.title = element_text(face = "bold", size = 20, color = "darkred"),
    legend.position = "top"
  )

## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set3 is 12
## Returning the palette you asked for with that many colors

Discussion:

How have dialogue metrics changed over the seasons?

Can you find any patterns not shown in Steven Ponce’s original visualization?