load packages

Load the weekly Data

The tt_load() function will pull this weeks data into RStudio. This week there are two datasets, you can pull each of them out of the list object using list$dataframe to get separate dataframes for the billboard and audio data.

tt <- tt_load("2021-09-14")

## --- Compiling #TidyTuesday Information for 2021-09-14 ----

## --- There are 2 files available ---

## --- Starting Download ---

## 
##  Downloading file 1 of 2: `billboard.csv`
##  Downloading file 2 of 2: `audio_features.csv`

## --- Download complete ---

billboard <- tt$billboard

audio <- tt$audio_features

Glimpse Data

the glimpse() function is a nice way to get an idea of the variables in each dataframe and what kind of data R thinks each variable is.

glimpse(billboard)

## Rows: 327,895
## Columns: 10
## $ url                    <chr> "http://www.billboard.com/charts/hot-100/1965-0…
## $ week_id                <chr> "7/17/1965", "7/24/1965", "7/31/1965", "8/7/196…
## $ week_position          <dbl> 34, 22, 14, 10, 8, 8, 14, 36, 97, 90, 97, 97, 9…
## $ song                   <chr> "Don't Just Stand There", "Don't Just Stand The…
## $ performer              <chr> "Patty Duke", "Patty Duke", "Patty Duke", "Patt…
## $ song_id                <chr> "Don't Just Stand TherePatty Duke", "Don't Just…
## $ instance               <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ previous_week_position <dbl> 45, 34, 22, 14, 10, 8, 8, 14, NA, 97, 90, 97, 9…
## $ peak_position          <dbl> 34, 22, 14, 10, 8, 8, 8, 8, 97, 90, 90, 90, 90,…
## $ weeks_on_chart         <dbl> 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 1, …

glimpse(audio)

## Rows: 29,503
## Columns: 22
## $ song_id                   <chr> "-twistin'-White Silver SandsBill Black's Co…
## $ performer                 <chr> "Bill Black's Combo", "Augie Rios", "Andy Wi…
## $ song                      <chr> "-twistin'-White Silver Sands", "¿Dònde Està…
## $ spotify_genre             <chr> "[]", "['novelty']", "['adult standards', 'b…
## $ spotify_track_id          <chr> NA, NA, "3tvqPPpXyIgKrm4PR9HCf0", "1fHHq3qHU…
## $ spotify_track_preview_url <chr> NA, NA, "https://p.scdn.co/mp3-preview/cef48…
## $ spotify_track_duration_ms <dbl> NA, NA, 166106, 172066, 211066, 208186, 2055…
## $ spotify_track_explicit    <lgl> NA, NA, FALSE, FALSE, FALSE, FALSE, TRUE, FA…
## $ spotify_track_album       <chr> NA, NA, "The Essential Andy Williams", "Comp…
## $ danceability              <dbl> NA, NA, 0.154, 0.588, 0.759, 0.613, NA, 0.64…
## $ energy                    <dbl> NA, NA, 0.185, 0.672, 0.699, 0.764, NA, 0.68…
## $ key                       <dbl> NA, NA, 5, 11, 0, 2, NA, 2, NA, NA, 7, NA, 1…
## $ loudness                  <dbl> NA, NA, -14.063, -17.278, -5.745, -6.509, NA…
## $ mode                      <dbl> NA, NA, 1, 0, 0, 1, NA, 0, NA, NA, 1, NA, 0,…
## $ speechiness               <dbl> NA, NA, 0.0315, 0.0361, 0.0307, 0.1360, NA, …
## $ acousticness              <dbl> NA, NA, 0.91100, 0.00256, 0.20200, 0.05270, …
## $ instrumentalness          <dbl> NA, NA, 2.67e-04, 7.45e-01, 1.31e-04, 0.00e+…
## $ liveness                  <dbl> NA, NA, 0.1120, 0.1450, 0.4430, 0.1970, NA, …
## $ valence                   <dbl> NA, NA, 0.150, 0.801, 0.907, 0.417, NA, 0.95…
## $ tempo                     <dbl> NA, NA, 83.969, 121.962, 92.960, 160.015, NA…
## $ time_signature            <dbl> NA, NA, 4, 4, 4, 4, NA, 4, NA, NA, 4, NA, 4,…
## $ spotify_track_popularity  <dbl> NA, NA, 38, 11, 77, 73, 61, 40, NA, NA, 31, …

Wrangle

The danceability variable in the audio dataframe peeks my interest. I wonder whether there are differences in the “danceability” of my favourite artists.

First, I used the unique() function to see which performers are in the audio dataset. Once I confirmed that Britney, Taylor and Billie are there, I filtered the data to include only songs from those 3 performers. The filter() function makes your data smaller by including only some of the rows in the bigger dataset. Then I used select() to choose only the columns that I was interested in. The names() function is a quick way to print the names of the variables in your dataset.

favs <- audio %>%
  filter(performer %in% c("Britney Spears", "Taylor Swift", "Billie Eilish")) %>%
  select(performer, song, danceability, spotify_track_popularity)

names(favs)

## [1] "performer"                "song"                    
## [3] "danceability"             "spotify_track_popularity"

I am interested to see whether songs by my favourite artists differ in their danceability so here use group_by() and summarise() to calculate mean danceability scores separately for each artist, averaging across all of their songs.

summary <- favs %>%
  group_by(performer) %>%
  summarise(mean_dance = mean(danceability, na.rm = TRUE))

summary

## # A tibble: 3 × 2
##   performer      mean_dance
##   <chr>               <dbl>
## 1 Billie Eilish       0.625
## 2 Britney Spears      0.714
## 3 Taylor Swift        0.590

Visualize

It looks like Britney’s song are more danceable than Taylor or Billie, lets plot that in a column graph. Note in this situation I typically use geom_col() instead of geom_bar() because the default for geom_col() is to make the height of the bar a value in your dataset, whereas geom_bar() tries to count frequencies (which is clever, but typically not what I need).

Here I have used easy_remove_legend() from the ggeasy package. You can install it by typing install.packages("ggeasy") into your console. I made the plot APA style-ish by using theme_classic() and fixing the floating bars with scale_y_continuous().

Learn how to put standard error bars on this plot here http://jenrichmond.rbind.io/post/apa-figures/

summary %>% 
  ggplot(aes(x = performer, y = mean_dance, fill = performer)) +
  geom_col() +
   theme_classic() +
  easy_remove_legend() +
  scale_y_continuous(expand = c(0,0), limits = c(0,1)) +
  labs(title = "Mean danceability scores for Jenny's favourite artists",
       y = "Mean danceability", x = "Performer")

Save Image

Use ggsave to export a png for sharing to slack or twitter

# This will save your most recent plot
ggsave("danceability.png")

## Saving 7 x 5 in image

Tidy Tuesday

Billboard data

2021-09-15