The tt_load() function will pull this weeks data into RStudio. This week there are two datasets, you can pull each of them out of the list object using list$dataframe
to get separate dataframes for the billboard and audio data.
tt <- tt_load("2021-09-14")
## --- Compiling #TidyTuesday Information for 2021-09-14 ----
## --- There are 2 files available ---
## --- Starting Download ---
##
## Downloading file 1 of 2: `billboard.csv`
## Downloading file 2 of 2: `audio_features.csv`
## --- Download complete ---
billboard <- tt$billboard
audio <- tt$audio_features
the glimpse()
function is a nice way to get an idea of the variables in each dataframe and what kind of data R thinks each variable is.
glimpse(billboard)
## Rows: 327,895
## Columns: 10
## $ url <chr> "http://www.billboard.com/charts/hot-100/1965-0…
## $ week_id <chr> "7/17/1965", "7/24/1965", "7/31/1965", "8/7/196…
## $ week_position <dbl> 34, 22, 14, 10, 8, 8, 14, 36, 97, 90, 97, 97, 9…
## $ song <chr> "Don't Just Stand There", "Don't Just Stand The…
## $ performer <chr> "Patty Duke", "Patty Duke", "Patty Duke", "Patt…
## $ song_id <chr> "Don't Just Stand TherePatty Duke", "Don't Just…
## $ instance <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ previous_week_position <dbl> 45, 34, 22, 14, 10, 8, 8, 14, NA, 97, 90, 97, 9…
## $ peak_position <dbl> 34, 22, 14, 10, 8, 8, 8, 8, 97, 90, 90, 90, 90,…
## $ weeks_on_chart <dbl> 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 1, …
glimpse(audio)
## Rows: 29,503
## Columns: 22
## $ song_id <chr> "-twistin'-White Silver SandsBill Black's Co…
## $ performer <chr> "Bill Black's Combo", "Augie Rios", "Andy Wi…
## $ song <chr> "-twistin'-White Silver Sands", "¿Dònde Està…
## $ spotify_genre <chr> "[]", "['novelty']", "['adult standards', 'b…
## $ spotify_track_id <chr> NA, NA, "3tvqPPpXyIgKrm4PR9HCf0", "1fHHq3qHU…
## $ spotify_track_preview_url <chr> NA, NA, "https://p.scdn.co/mp3-preview/cef48…
## $ spotify_track_duration_ms <dbl> NA, NA, 166106, 172066, 211066, 208186, 2055…
## $ spotify_track_explicit <lgl> NA, NA, FALSE, FALSE, FALSE, FALSE, TRUE, FA…
## $ spotify_track_album <chr> NA, NA, "The Essential Andy Williams", "Comp…
## $ danceability <dbl> NA, NA, 0.154, 0.588, 0.759, 0.613, NA, 0.64…
## $ energy <dbl> NA, NA, 0.185, 0.672, 0.699, 0.764, NA, 0.68…
## $ key <dbl> NA, NA, 5, 11, 0, 2, NA, 2, NA, NA, 7, NA, 1…
## $ loudness <dbl> NA, NA, -14.063, -17.278, -5.745, -6.509, NA…
## $ mode <dbl> NA, NA, 1, 0, 0, 1, NA, 0, NA, NA, 1, NA, 0,…
## $ speechiness <dbl> NA, NA, 0.0315, 0.0361, 0.0307, 0.1360, NA, …
## $ acousticness <dbl> NA, NA, 0.91100, 0.00256, 0.20200, 0.05270, …
## $ instrumentalness <dbl> NA, NA, 2.67e-04, 7.45e-01, 1.31e-04, 0.00e+…
## $ liveness <dbl> NA, NA, 0.1120, 0.1450, 0.4430, 0.1970, NA, …
## $ valence <dbl> NA, NA, 0.150, 0.801, 0.907, 0.417, NA, 0.95…
## $ tempo <dbl> NA, NA, 83.969, 121.962, 92.960, 160.015, NA…
## $ time_signature <dbl> NA, NA, 4, 4, 4, 4, NA, 4, NA, NA, 4, NA, 4,…
## $ spotify_track_popularity <dbl> NA, NA, 38, 11, 77, 73, 61, 40, NA, NA, 31, …
The danceability variable in the audio dataframe peeks my interest. I wonder whether there are differences in the “danceability” of my favourite artists.
First, I used the unique()
function to see which performers are in the audio dataset. Once I confirmed that Britney, Taylor and Billie are there, I filtered the data to include only songs from those 3 performers. The filter()
function makes your data smaller by including only some of the rows in the bigger dataset. Then I used select()
to choose only the columns that I was interested in. The names()
function is a quick way to print the names of the variables in your dataset.
favs <- audio %>%
filter(performer %in% c("Britney Spears", "Taylor Swift", "Billie Eilish")) %>%
select(performer, song, danceability, spotify_track_popularity)
names(favs)
## [1] "performer" "song"
## [3] "danceability" "spotify_track_popularity"
I am interested to see whether songs by my favourite artists differ in their danceability so here use group_by()
and summarise()
to calculate mean danceability scores separately for each artist, averaging across all of their songs.
summary <- favs %>%
group_by(performer) %>%
summarise(mean_dance = mean(danceability, na.rm = TRUE))
summary
## # A tibble: 3 × 2
## performer mean_dance
## <chr> <dbl>
## 1 Billie Eilish 0.625
## 2 Britney Spears 0.714
## 3 Taylor Swift 0.590
It looks like Britney’s song are more danceable than Taylor or Billie, lets plot that in a column graph. Note in this situation I typically use geom_col() instead of geom_bar() because the default for geom_col() is to make the height of the bar a value in your dataset, whereas geom_bar() tries to count frequencies (which is clever, but typically not what I need).
Here I have used easy_remove_legend()
from the ggeasy
package. You can install it by typing install.packages("ggeasy")
into your console. I made the plot APA style-ish by using theme_classic()
and fixing the floating bars with scale_y_continuous()
.
Learn how to put standard error bars on this plot here http://jenrichmond.rbind.io/post/apa-figures/
summary %>%
ggplot(aes(x = performer, y = mean_dance, fill = performer)) +
geom_col() +
theme_classic() +
easy_remove_legend() +
scale_y_continuous(expand = c(0,0), limits = c(0,1)) +
labs(title = "Mean danceability scores for Jenny's favourite artists",
y = "Mean danceability", x = "Performer")
Use ggsave to export a png for sharing to slack or twitter
# This will save your most recent plot
ggsave("danceability.png")
## Saving 7 x 5 in image