Introduction

As an avid music fan and concert-goer, I’m always looking to discover new artists. I wanted an easier way to discover artists playing at some of my favorite Chicago venues, so I put together the below. I run this code every day (still working on how to schedule it) and it includes all the concerts at the selected venues in the next 30 days.

The code leverages two different APIs - JamBase and Spotify. The code also leverages the tinyspotifyr package - shoutout to Troy Hernandez - instead of spotifyr, as I found it easier to work with.

I haven’t fully reconciled the playlists to the venues, but I’ve spot checked a few. It’s possible that some artists aren’t included or the wrong artist with the same/similar name is included, but for the most part they should be accurate.

I’ve named the playlists “Next30: ‘Venue Name’” on Spotify:
* Bottom Lounge
* Byline Bank Aragon Ballroom
* Chop Shop
* Concord Music Hall
* House of Blues
* Lincoln Hall
* Martyrs
* Metro
* Park West
* Radius
* Schubas
* The Riviera Theatre
* The Salt Shed
* The Vic Theatre

My next project is to create playlists based on genres. I’m able to extract genres from JamBase, but ran into a few issues using tinyspotifyr. Most of the artists have multiple genres included on JamBase, so some of the playlists might be duplicative, but it should be simple enough to create using the venue code.

I also want to explore creating a model based on audio_features from the Spotify API that clusters/predicts the genres. From what I’ve seen online, it’s a bit difficult to do this accurately, but worth exploring nonetheless.

Libraries

library(tinyspotifyr)
library(httr)
library(lubridate)
library(tidyverse)
library(taskscheduleR)
library(dplyr)

JamBase API

About:
Established in 1998, JamBase is the premier website for fans of live music. By providing the largest database of show listings and ticket information, authoritative content, community, and personalization tools for fans, JamBase connects music fans with the music they love and empowers them to go see live music. Originally founded by fans for fans, JamBase quickly evolved to serve not only devoted music enthusiasts, but all lovers of live music, while providing a platform for musicians to be discovered. Today, more than a half million people participate in the JamBase community every month and rely on the website to find the most accurate show listings available for 220,000 artists across 50 genres, performing in 139,000 venues worldwide. JamBase is headquartered in San Francisco and on the Web at http://www.jambase.com

JamBase API Call

API Docs can be found here

# This page is the link to their API
url <- "https://www.jambase.com/jb-api/v1/events"

# sets date variables - grabbing the next 30 days of concerts
current_date <- Sys.Date()
future_date <- Sys.Date() + 30

# this list includes the venues I'm interested in.  There are a ton more, but I went with these to start.
included_venues <- "Bottom%Lounge|Byline%Bank%Aragon%Ballroom|Chop%Shop|Concord%Music%Hall|House%of%Blues|Lincoln%Hall|Martyrs|Metro|Park%West|Radius|Schubas|The%Riviera%Theatre|The%Salt%Shed|The%Vic%Theatre"

# gets the number of shows for the future date
countString <- list(apikey = "e9bc7d9d-0946-43ec-a95e-b8f19cbbb6b1",
                    geoCityId ="jambase:4230765", # this is the Chicago geoCityId
                    venueName = included_venues,
                    eventDateFrom = current_date,
                    eventDateTo = future_date
                    )
# call to the API
countResponse <- VERB("GET",
                      url,
                      query = countString,
                      content_type("application/octet-stream"),
                      accept("application/json"))

# the parsed results
countContent <- content(countResponse,"parsed")

# provides the number of shows at the selected venues in the next 30 days.  Used later in the code.
numShows <- countContent$pagination$totalItems

Generates Upcoming Concerts

# creates an empty list to store the results
all_results <- list()

# Defines the common parameters.
common_params <- list(
  apikey = "e9bc7d9d-0946-43ec-a95e-b8f19cbbb6b1",
  geoCityId = "jambase:4230765",
  perPage = 100,
  venueName = included_venues,
  eventDateFrom = current_date,
  eventDateTo = future_date,
  expandExternalIdentifiers = "true"
)

# Specifies the number of pages to fetch.  Only 100 shows can be populated at a time, so the value here is used for the loop below.
num_pages <- ceiling(numShows/100)

# Loops through each page
for (page in 1:num_pages) {
  # establishes the page parameter
  query_params <- c(common_params, page = page)
  
  # Sends the API request
  response <- VERB("GET", url, query = query_params, content_type("application/octet-stream"), accept("application/json"))
  
  # Parses and stores the data
  all_results[[page]] <- content(response, "parsed")
}

Extracts Concert Data

This code will extract the below:
* Date of Concert
* Venue
* Artist/Artists
* Genre/Geners
* Spotify ID

# Creates empty vectors to store the data
endDates <- character(0)
locationNames <- character(0)
performerNames <- list()  
genreNames <- list()
spotifyIdentifiers <- list()  

# Loops through all_results
for (result in all_results) {
  # Accesses events
  events <- result$events
  
  # Loops through each event
  for (event in events) {
    
    # Checks if eventStatus is 'scheduled' - some of them might be canceled
    if (event$eventStatus == 'scheduled') {
      
      # Extracts endDate, which is the date of the show
      endDate <- event$endDate
      endDates <- c(endDates, endDate)
      
      # Extracts location name - aka the venue
      locationName <- event$location$name
      locationNames <- c(locationNames, locationName)
      
      # Extract performer names and genres - artist names and their genres
      performers <- event$performer
      
      # Initialize lists to store performer names, genres, and Spotify identifiers for this event
      performerNamesList <- character(0)
      genresListEvent <- list()
      spotifyIdentifiersList <- list()  # Initialize a list for Spotify identifiers
      
      # Most shows will have more than one band/artist, so this loops through each performer.
      for (performer in performers) {
        performerName <- performer$name
        performerNamesList <- c(performerNamesList, performerName)
        
        # Most artists willahve more than one genre, so this loops through each performers' genres.
        genres <- performer$genre
        genresListEvent <- c(genresListEvent, list(genres))
        
        # This extracts the Spotify Identifier - some performers won't have one listed.
        if (!is.null(performer$`x-externalIdentifiers`)) {
          # There are multiple externalIdentifiers for each band, this loops finds the one named Spotify
          spotify_entry <- lapply(performer$`x-externalIdentifiers`, function(entry) {
            if (entry$source == "spotify") {
              return(entry$identifier[[1]])
            }
          })
          # This removes the null entries
          spotify_entry <- Filter(function(x) !is.null(x), spotify_entry)
          
          # this updates the spotify Identifiers list.
          spotifyIdentifiersList <- c(spotifyIdentifiersList, list(spotify_entry))
        } else {
          # If there are no externalIdentifiers for a performer then the list is udated.
          spotifyIdentifiersList <- c(spotifyIdentifiersList, list())
        }
      }
      
      # Stores the lists of performer names, genres, and Spotify identifiers for each event
      performerNames <- c(performerNames, list(performerNamesList))
      genreNames <- c(genreNames, list(genresListEvent))
      spotifyIdentifiers <- c(spotifyIdentifiers, list(spotifyIdentifiersList))
    }
  }
}

# Combines the extracted data into a data frame
extractedData <- tibble(
  endDate = endDates,
  locationName = locationNames,
  performerName = performerNames,
  genre = genreNames,
  spotifyIdentifier = spotifyIdentifiers
)

extractedData1 <- extractedData %>%
  # unnests the artist, genre, and spotify ID
  unnest(c(performerName,genre,spotifyIdentifier),keep_empty = TRUE) %>%
  
  # further unnesting is needed for genre and spotify ID
  unnest(c(genre,spotifyIdentifier),keep_empty = TRUE) %>%
  
  # this checks the length of the value in genre and if it's greater than 0, it extracts the genre
  mutate(genre = sapply(genre, function(x) if (length(x) > 0) x[[1]] else NA_character_)) %>%
  
  # same as above, except it does it for Spotify ID
  mutate(spotifyIdentifier = sapply(spotifyIdentifier, function(x) if (length(x) > 0) x[[1]] else NA_character_))

Spotify API

I opted to use the tinyspotifyr package instead of spotifyr, as I found it to be a bit simpler to manipualte the data. Documentation can be found here:

Sets up the Environment

You’ll need to access the Spotify for Developers site and create these

# Sys.setenv(SPOTIFY_CLIENT_ID = '')
# Sys.setenv(SPOTIFY_CLIENT_SECRET = '')
# 
# access_token <- get_spotify_access_token()

Missing Spotify IDs

Gets the ids of the artists from the JamBase API call where the Spotify Id is missing

missing_ids <- extractedData1 %>%
  filter(is.na(spotifyIdentifier))

Temporarily loads spotifyr. I had issues using search_spotify() in tinyspotifyr. It gets detached at the end of the code.

library(spotifyr)

Attempts to locate the missing Spotify IDs

# Creates an empty dataframe to store the results
results_df <- data.frame(Name = character(0), Spotify = character(0))

# Iterates through each row in missing_ids
for (i in 1:nrow(missing_ids)) {
  # Get the value from the current row in missing_ids
  length <- missing_ids$performerName    [i]
  
  # Runs the Spotify search for the missing ID
  a <- spotifyr::search_spotify(
    q = length,
    type = 'artist',
    market = 'US'
  )
    # Checks if the search results are empty
  if (nrow(a) == 0) {
    # If there are no results, skip to the next iteration
    next
  }
  
  # Filters the results to include only the matching artist with followers
  filtered_results <- a %>% 
    filter(name == length & followers.total > 0) %>%
    select(name, popularity, followers.total, id, external_urls.spotify)
  
  # Finds the artist with the most popularity.  It's not fail safe, but the best thing I could think of.
  max_followers <- filtered_results %>%
    group_by(name) %>%
    slice_max(order_by = followers.total) %>%
    ungroup()
  
  # Appends the results to the results dataframe
  results_df <- rbind(results_df, max_followers)
}
detach("package:spotifyr", unload = TRUE)

Creates Dataframe

final_df <- extractedData1 %>%
  left_join(results_df, by = c("performerName" = "name")) %>%
  
  # repalces the missing IDs in the JamBase API with the collected IDs from the Spotify API
  mutate(Spotify = coalesce(id, spotifyIdentifier)) %>%
  filter(!is.na(Spotify)) %>%
  select(c(1:4, spotifyIdentifier = Spotify))

Top 5 Tracks

  • This generates the top 5 tracks for each artist
  • Ties are included, so some may have more than 5
  • More or less than 5 can be selected
# Initialize an empty data frame to store the top tracks
top_tracks_df <- data.frame(
  artist_name = character(),
  artist_id = character(),
  duration_ms = integer(),
  id = character(),
  name = character(),
  popularity = integer(),
  stringsAsFactors = FALSE
)

# removes duplicate artists - it's possible that an artist might play at two different venues in the next 30 days, but that step is accountd for later.
final_df_dedeuped <- final_df %>%
  distinct(performerName, .keep_all = TRUE)

# Iterates through each artist's Spotify ID
for (spotify_id in final_df_dedeuped$spotifyIdentifier) {
  if (!is.na(spotify_id)) {
    
    # Gets the top tracks for each artist
    artist_info <- spotifyr::get_artist(spotify_id)
    top_tracks <- spotifyr::get_artist_top_tracks(spotify_id)
    
  # Checks if top_tracks is empty
  if (length(top_tracks) == 0) {
      top_tracks <- data.frame(
        artist_name = artist_info$name,
        artist_id = spotify_id,
        duration_ms = NA,
        id = NA,
        name = NA,
        popularity = NA,
        stringsAsFactors = FALSE
      )
    } else {
      top_tracks <- top_tracks %>%
        mutate(artist_name = artist_info$name) %>%
        mutate(artist_id = spotify_id) %>%
        select(artist_name, artist_id, duration_ms, id, name, popularity) %>%
        top_n(5)
    }
    
    # Adds the top tracks to the data frame
    top_tracks_df <- bind_rows(top_tracks_df, top_tracks)
  }
}

Spotify Playlist Creation

Creates Dataframe

final_df1 <- right_join(final_df, top_tracks_df, by = c("spotifyIdentifier" = "artist_id")) %>%
  select(venue = locationName, date = endDate, artist = performerName, artist_id = spotifyIdentifier,song = name, song_id = id, genre,duration_ms, popularity)

Sets up a New Environment

I ran into some capcity limitations for my current ID and SECRET, so I generated a new one.

# Sys.setenv(SPOTIFY_CLIENT_ID = '')
# Sys.setenv(SPOTIFY_CLIENT_SECRET = '')
# 
# access_token <- get_spotify_access_token()

Updates the Playlists

  • I have additional code to create the initial playlist, but since these already exist, I just need to update them
  • This code is from the README.rmd file from the creator of the package
final_df1 <- df

# Gets the distinct venues, artists, and songs
song_loop_df <- df %>% distinct(venue,artist,song, .keep_all = TRUE)

# generates a list of my current playlists - It's capped at 50.  If you have more than 50, a loop may need to be used to ensure that they're found.
my_playlists <- get_my_playlists(limit = 50)

# saves the old counts of songs on each playlist.  More of a test to make sure the code works correctly.
old_counts <- my_playlists %>% select(name, tracks.total) %>% filter(grepl('Next30',name))

# sets venue and date variables
unique_venue <- unique(song_loop_df$venue)
date_range1 <- Sys.Date()
date_range2 <- Sys.Date() + 30

# the loop that creates a playlist for each venue based on the artists with shows in the next 30 days.
for (uv in unique_venue) {
  
  # defines the playlist name
  playlist_name <- paste('Next30:',uv, sep = " ")  # the name of the venue
  
  # Logical to get the playlist that matches
  playlist_logical <- (my_playlists$name == playlist_name)
  
  # Other variables, eventually landing on the playlist_id
  ind <- which(playlist_logical)
  dr <- my_playlists[ind, ]
  playlist_id <- sub("spotify:playlist:", "", dr$uri)

  # Number of track on the playlist
  old_playlist_tracks <- old_counts[grep(uv, old_counts$name), c("tracks.total")]
  
  # batches of 100 are the max the function can handle.  The batch size is used below
  batch_size <- 100
  
  # Calculates he number of iterations needed for the loop below
  num_iterations <- ceiling(old_playlist_tracks / batch_size)
  
  # creates an empty vector
  old_playlist_track_ids <- character(0)
  
  # Loops through the iterations
  for (i in 1:num_iterations) {
    songs <- get_playlist_tracks(playlist_id, offset = (i*100-100)) 
    track_ids <- paste("spotify:track:", songs$track.id, sep="") %>%unlist()
    
    # Extracts the track names from the current batch and add them to the 'all_tracks' vector
    old_playlist_track_ids <- c(old_playlist_track_ids, track_ids)
  }

  # gets updated tracks for the playlist
  new_playlist_track_ids <- song_loop_df %>% 
    filter(venue %in% uv) %>% 
    select(song_id) %>% 
    mutate(song_id = paste("spotify:track:",song_id,sep="")) %>%
    unlist()
  
  # identifies the songs that are different
  old_songs <- setdiff(old_playlist_track_ids,new_playlist_track_ids)
  new_songs <- setdiff(new_playlist_track_ids,old_playlist_track_ids)
    
  # Adds new songs
  if (length(new_songs) == 0) {
    
    # Skips the code if there are no new songs to be added.
    cat("No new songs to add.\n")
    } else {
    
    # Calculates the batch size
    batch_size <- 100
    
    # Add items to the playlist in batches if needed
    for (i in seq(1, length(new_songs), by = batch_size)) {
      batch <- new_songs[i:min(i + batch_size - 1, length(new_songs))]
      add_items_to_playlist(playlist_id, batch)
    }
  }

  # removes the old songs
  if (length(old_songs) == 0) {
   
    # Skips the code if there are no old songs to be removed.
    cat("No old songs to remove.\n")
    } else {
    
    # Calculates the batch size
    batch_size <- 100
    
    # Removes items from the playlist in batches if needed
    for (i in seq(1, length(old_songs), by = batch_size)) {
      batch <- old_songs[i:min(i + batch_size - 1, length(old_songs))]
      tinyspotifyr::remove_tracks_from_playlist(playlist_id, batch)
    }
  }
  
  # updates the playlist details
  change_playlist_details(playlist_id,
                        description = paste('Top songs on Spotify for bands playing at',
                        uv,
                        'between',
                        format(as.Date(date_range1, format = "%d/%m/%Y"), "%b %d"),
                        '-',
                        format(as.Date(date_range2, format = "%d/%m/%Y"), "%b %d"), sep = ' '))
}

# gets the playlist names again
my_playlists <-get_my_playlists(limit = 50) 

# creates the new counts for each playlist
new_counts <- my_playlists %>% select(name, tracks.total) %>% filter(grepl('Next30',name))

# compares old vs new to look for anything out of the ordinary.  There shouldn't be too many changes day over day unless one of the venues has a festival/all day concert.
old_vs_new <- left_join(old_counts, new_counts, by = 'name', keep = TRUE)
old_vs_new