As an avid music fan and concert-goer, I’m always looking to discover new artists. I wanted an easier way to discover artists playing at some of my favorite Chicago venues, so I put together the below. I run this code every day (still working on how to schedule it) and it includes all the concerts at the selected venues in the next 30 days.
The code leverages two different APIs - JamBase and Spotify. The code also leverages the tinyspotifyr package - shoutout to Troy Hernandez - instead of spotifyr, as I found it easier to work with.
I haven’t fully reconciled the playlists to the venues, but I’ve spot checked a few. It’s possible that some artists aren’t included or the wrong artist with the same/similar name is included, but for the most part they should be accurate.
I’ve named the playlists “Next30: ‘Venue Name’” on Spotify: * Bottom Lounge * Byline Bank Aragon Ballroom * Chop Shop * Concord Music Hall * House of Blues * Lincoln Hall * Martyrs * Metro * Park West * Radius * Schubas * The Riviera Theatre * The Salt Shed * The Vic Theatre
My next project is to create playlists based on genres. I’m able to extract genres from JamBase, but ran into a few issues using tinyspotifyr. Most of the artists have multiple genres included on JamBase, so some of the playlists might be duplicative, but it should be simple enough to create using the venue code.
I also want to explore creating a model based on audio_features from the Spotify API that clusters/predicts the genres. From what I’ve seen online, it’s a bit difficult to do this accurately, but worth exploring nonetheless.
Libraries
library(tinyspotifyr)
library(httr)
library(lubridate)
library(tidyverse)
library(taskscheduleR)
library(dplyr)
About: Established in 1998, JamBase is the premier website for fans of live music. By providing the largest database of show listings and ticket information, authoritative content, community, and personalization tools for fans, JamBase connects music fans with the music they love and empowers them to go see live music. Originally founded by fans for fans, JamBase quickly evolved to serve not only devoted music enthusiasts, but all lovers of live music, while providing a platform for musicians to be discovered. Today, more than a half million people participate in the JamBase community every month and rely on the website to find the most accurate show listings available for 220,000 artists across 50 genres, performing in 139,000 venues worldwide. JamBase is headquartered in San Francisco and on the Web at http://www.jambase.com
API Docs can be found here
# This page is the link to their API
url <- "https://www.jambase.com/jb-api/v1/events"
# sets date variables - grabbing the next 30 days of concerts
current_date <- Sys.Date()
future_date <- Sys.Date() + 30
# this list includes the venues I'm interested in. There are a ton more, but I went with these to start.
included_venues <- "Bottom%Lounge|Byline%Bank%Aragon%Ballroom|Chop%Shop|Concord%Music%Hall|House%of%Blues|Lincoln%Hall|Martyrs|Metro|Park%West|Radius|Schubas|The%Riviera%Theatre|The%Salt%Shed|The%Vic%Theatre"
# gets the number of shows for the future date
countString <- list(apikey = "e9bc7d9d-0946-43ec-a95e-b8f19cbbb6b1",
geoCityId ="jambase:4230765", # this is the Chicago geoCityId
venueName = included_venues,
eventDateFrom = current_date,
eventDateTo = future_date
)
# call to the API
countResponse <- VERB("GET",
url,
query = countString,
content_type("application/octet-stream"),
accept("application/json"))
# the parsed results
countContent <- content(countResponse,"parsed")
# provides the number of shows at the selected venues in the next 30 days. Used later in the code.
numShows <- countContent$pagination$totalItems
# creates an empty list to store the results
all_results <- list()
# Defines the common parameters.
common_params <- list(
apikey = "e9bc7d9d-0946-43ec-a95e-b8f19cbbb6b1",
geoCityId = "jambase:4230765",
perPage = 100,
venueName = included_venues,
eventDateFrom = current_date,
eventDateTo = future_date,
expandExternalIdentifiers = "true"
)
# Specifies the number of pages to fetch. Only 100 shows can be populated at a time, so the value here is used for the loop below.
num_pages <- ceiling(numShows/100)
# Loops through each page
for (page in 1:num_pages) {
# establishes the page parameter
query_params <- c(common_params, page = page)
# Sends the API request
response <- VERB("GET", url, query = query_params, content_type("application/octet-stream"), accept("application/json"))
# Parses and stores the data
all_results[[page]] <- content(response, "parsed")
}
This code will extract the below: * Date of Concert * Venue * Artist/Artists * Genre/Geners * Spotify ID
# Creates empty vectors to store the data
endDates <- character(0)
locationNames <- character(0)
performerNames <- list()
genreNames <- list()
spotifyIdentifiers <- list()
# Loops through all_results
for (result in all_results) {
# Accesses events
events <- result$events
# Loops through each event
for (event in events) {
# Checks if eventStatus is 'scheduled' - some of them might be canceled
if (event$eventStatus == 'scheduled') {
# Extracts endDate, which is the date of the show
endDate <- event$endDate
endDates <- c(endDates, endDate)
# Extracts location name - aka the venue
locationName <- event$location$name
locationNames <- c(locationNames, locationName)
# Extract performer names and genres - artist names and their genres
performers <- event$performer
# Initialize lists to store performer names, genres, and Spotify identifiers for this event
performerNamesList <- character(0)
genresListEvent <- list()
spotifyIdentifiersList <- list() # Initialize a list for Spotify identifiers
# Most shows will have more than one band/artist, so this loops through each performer.
for (performer in performers) {
performerName <- performer$name
performerNamesList <- c(performerNamesList, performerName)
# Most artists willahve more than one genre, so this loops through each performers' genres.
genres <- performer$genre
genresListEvent <- c(genresListEvent, list(genres))
# This extracts the Spotify Identifier - some performers won't have one listed.
if (!is.null(performer$`x-externalIdentifiers`)) {
# There are multiple externalIdentifiers for each band, this loops finds the one named Spotify
spotify_entry <- lapply(performer$`x-externalIdentifiers`, function(entry) {
if (entry$source == "spotify") {
return(entry$identifier[[1]])
}
})
# This removes the null entries
spotify_entry <- Filter(function(x) !is.null(x), spotify_entry)
# this updates the spotify Identifiers list.
spotifyIdentifiersList <- c(spotifyIdentifiersList, list(spotify_entry))
} else {
# If there are no externalIdentifiers for a performer then the list is udated.
spotifyIdentifiersList <- c(spotifyIdentifiersList, list())
}
}
# Stores the lists of performer names, genres, and Spotify identifiers for each event
performerNames <- c(performerNames, list(performerNamesList))
genreNames <- c(genreNames, list(genresListEvent))
spotifyIdentifiers <- c(spotifyIdentifiers, list(spotifyIdentifiersList))
}
}
}
# Combines the extracted data into a data frame
extractedData <- tibble(
endDate = endDates,
locationName = locationNames,
performerName = performerNames,
genre = genreNames,
spotifyIdentifier = spotifyIdentifiers
)
extractedData1 <- extractedData %>%
# unnests the artist, genre, and spotify ID
unnest(c(performerName,genre,spotifyIdentifier),keep_empty = TRUE) %>%
# further unnesting is needed for genre and spotify ID
unnest(c(genre,spotifyIdentifier),keep_empty = TRUE) %>%
# this checks the length of the value in genre and if it's greater than 0, it extracts the genre
mutate(genre = sapply(genre, function(x) if (length(x) > 0) x[[1]] else NA_character_)) %>%
# same as above, except it does it for Spotify ID
mutate(spotifyIdentifier = sapply(spotifyIdentifier, function(x) if (length(x) > 0) x[[1]] else NA_character_))
I opted to use the tinyspotifyr package instead of spotifyr, as I found it to be a bit simpler to manipualte the data. Documentation can be found here:
You’ll need to access the Spotify for Developers site and create these
# Sys.setenv(SPOTIFY_CLIENT_ID = '')
# Sys.setenv(SPOTIFY_CLIENT_SECRET = '')
#
# access_token <- get_spotify_access_token()
Gets the ids of the artists from the JamBase API call where the Spotify Id is missing
missing_ids <- extractedData1 %>%
filter(is.na(spotifyIdentifier))
Temporarily loads spotifyr. I had issues using search_spotify() in tinyspotifyr. It gets detached at the end of the code.
library(spotifyr)
Attempts to locate the missing Spotify IDs
# Creates an empty dataframe to store the results
results_df <- data.frame(Name = character(0), Spotify = character(0))
# Iterates through each row in missing_ids
for (i in 1:nrow(missing_ids)) {
# Get the value from the current row in missing_ids
length <- missing_ids$performerName [i]
# Runs the Spotify search for the missing ID
a <- spotifyr::search_spotify(
q = length,
type = 'artist',
market = 'US'
)
# Checks if the search results are empty
if (nrow(a) == 0) {
# If there are no results, skip to the next iteration
next
}
# Filters the results to include only the matching artist with followers
filtered_results <- a %>%
filter(name == length & followers.total > 0) %>%
select(name, popularity, followers.total, id, external_urls.spotify)
# Finds the artist with the most popularity. It's not fail safe, but the best thing I could think of.
max_followers <- filtered_results %>%
group_by(name) %>%
slice_max(order_by = followers.total) %>%
ungroup()
# Appends the results to the results dataframe
results_df <- rbind(results_df, max_followers)
}
detach("package:spotifyr", unload = TRUE)
final_df <- extractedData1 %>%
left_join(results_df, by = c("performerName" = "name")) %>%
# repalces the missing IDs in the JamBase API with the collected IDs from the Spotify API
mutate(Spotify = coalesce(id, spotifyIdentifier)) %>%
filter(!is.na(Spotify)) %>%
select(c(1:4, spotifyIdentifier = Spotify))
# Initialize an empty data frame to store the top tracks
top_tracks_df <- data.frame(
artist_name = character(),
artist_id = character(),
duration_ms = integer(),
id = character(),
name = character(),
popularity = integer(),
stringsAsFactors = FALSE
)
# removes duplicate artists - it's possible that an artist might play at two different venues in the next 30 days, but that step is accountd for later.
final_df_dedeuped <- final_df %>%
distinct(performerName, .keep_all = TRUE)
# Iterates through each artist's Spotify ID
for (spotify_id in final_df_dedeuped$spotifyIdentifier) {
if (!is.na(spotify_id)) {
# Gets the top tracks for each artist
artist_info <- spotifyr::get_artist(spotify_id)
top_tracks <- spotifyr::get_artist_top_tracks(spotify_id)
# Checks if top_tracks is empty
if (length(top_tracks) == 0) {
top_tracks <- data.frame(
artist_name = artist_info$name,
artist_id = spotify_id,
duration_ms = NA,
id = NA,
name = NA,
popularity = NA,
stringsAsFactors = FALSE
)
} else {
top_tracks <- top_tracks %>%
mutate(artist_name = artist_info$name) %>%
mutate(artist_id = spotify_id) %>%
select(artist_name, artist_id, duration_ms, id, name, popularity) %>%
top_n(5)
}
# Adds the top tracks to the data frame
top_tracks_df <- bind_rows(top_tracks_df, top_tracks)
}
}
final_df1 <- right_join(final_df, top_tracks_df, by = c("spotifyIdentifier" = "artist_id")) %>%
select(venue = locationName, date = endDate, artist = performerName, artist_id = spotifyIdentifier,song = name, song_id = id, genre,duration_ms, popularity)
I ran into some capcity limitations for my current ID and SECRET, so I generated a new one.
# Sys.setenv(SPOTIFY_CLIENT_ID = '')
# Sys.setenv(SPOTIFY_CLIENT_SECRET = '')
#
# access_token <- get_spotify_access_token()
final_df1 <- df
# Gets the distinct venues, artists, and songs
song_loop_df <- df %>% distinct(venue,artist,song, .keep_all = TRUE)
# generates a list of my current playlists - It's capped at 50. If you have more than 50, a loop may need to be used to ensure that they're found.
my_playlists <- get_my_playlists(limit = 50)
# saves the old counts of songs on each playlist. More of a test to make sure the code works correctly.
old_counts <- my_playlists %>% select(name, tracks.total) %>% filter(grepl('Next30',name))
# sets venue and date variables
unique_venue <- unique(song_loop_df$venue)
date_range1 <- Sys.Date()
date_range2 <- Sys.Date() + 30
# the loop that creates a playlist for each venue based on the artists with shows in the next 30 days.
for (uv in unique_venue) {
# defines the playlist name
playlist_name <- paste('Next30:',uv, sep = " ") # the name of the venue
# Logical to get the playlist that matches
playlist_logical <- (my_playlists$name == playlist_name)
# Other variables, eventually landing on the playlist_id
ind <- which(playlist_logical)
dr <- my_playlists[ind, ]
playlist_id <- sub("spotify:playlist:", "", dr$uri)
# Number of track on the playlist
old_playlist_tracks <- old_counts[grep(uv, old_counts$name), c("tracks.total")]
# batches of 100 are the max the function can handle. The batch size is used below
batch_size <- 100
# Calculates he number of iterations needed for the loop below
num_iterations <- ceiling(old_playlist_tracks / batch_size)
# creates an empty vector
old_playlist_track_ids <- character(0)
# Loops through the iterations
for (i in 1:num_iterations) {
songs <- get_playlist_tracks(playlist_id, offset = (i*100-100))
track_ids <- paste("spotify:track:", songs$track.id, sep="") %>%unlist()
# Extracts the track names from the current batch and add them to the 'all_tracks' vector
old_playlist_track_ids <- c(old_playlist_track_ids, track_ids)
}
# gets updated tracks for the playlist
new_playlist_track_ids <- song_loop_df %>%
filter(venue %in% uv) %>%
select(song_id) %>%
mutate(song_id = paste("spotify:track:",song_id,sep="")) %>%
unlist()
# identifies the songs that are different
old_songs <- setdiff(old_playlist_track_ids,new_playlist_track_ids)
new_songs <- setdiff(new_playlist_track_ids,old_playlist_track_ids)
# Adds new songs
if (length(new_songs) == 0) {
# Skips the code if there are no new songs to be added.
cat("No new songs to add.\n")
} else {
# Calculates the batch size
batch_size <- 100
# Add items to the playlist in batches if needed
for (i in seq(1, length(new_songs), by = batch_size)) {
batch <- new_songs[i:min(i + batch_size - 1, length(new_songs))]
add_items_to_playlist(playlist_id, batch)
}
}
# removes the old songs
if (length(old_songs) == 0) {
# Skips the code if there are no old songs to be removed.
cat("No old songs to remove.\n")
} else {
# Calculates the batch size
batch_size <- 100
# Removes items from the playlist in batches if needed
for (i in seq(1, length(old_songs), by = batch_size)) {
batch <- old_songs[i:min(i + batch_size - 1, length(old_songs))]
tinyspotifyr::remove_tracks_from_playlist(playlist_id, batch)
}
}
# updates the playlist details
change_playlist_details(playlist_id,
description = paste('Top songs on Spotify for bands playing at',
uv,
'between',
format(as.Date(date_range1, format = "%d/%m/%Y"), "%b %d"),
'-',
format(as.Date(date_range2, format = "%d/%m/%Y"), "%b %d"), sep = ' '))
}
# gets the playlist names again
my_playlists <-get_my_playlists(limit = 50)
# creates the new counts for each playlist
new_counts <- my_playlists %>% select(name, tracks.total) %>% filter(grepl('Next30',name))
# compares old vs new to look for anything out of the ordinary. There shouldn't be too many changes day over day unless one of the venues has a festival/all day concert.
old_vs_new <- left_join(old_counts, new_counts, by = 'name', keep = TRUE)
old_vs_new