Introduction

This document is to explore a bit more what the students experience might be like, in terms of “releasing songs.” It adopts the conceit that all songs are released on the same day, and so you can go back in time and release more songs on that day. More on this later – not yet making an argument for it, just I didn’t have time to code a simulation with different release dates.

What is modeled here:

  1. there are three genres, pop, rock, and jazz. Each has a different value for its first week listens and its duration in number of weeks. Then noise is added, so the individual songs vary.

  2. Songs can be “on trend” or not. As it stands, a trend kicks in on week 6 and lasts for 4 weeks, and it favors “sad” songs. So sad songs get a boost during that time.

  3. There are two types of plots, one type with individual songs and another that averages by genre. The code for averaging by genre doesn’t seem to be working right, and I haven’t had time to fix it. But at least you can look at the graph.

The command “make song data” is like building and releasing a song.

Just to be clear, I don’t really care if we end up using my model or not. I just want to get to a place where I believe we can create semi-random songs with discernable patterns to them, based on their characteristics.

Constants

This next section is setting up constants used in the simulations. There are three genres of songs, POP, ROCK, and JAZZ. Each starts wwith a different number of listens on week 1: 50 for POP, 25 for ROCK, and 20 for JAZZ. Each has a duration in weeks that it lasts. This is set to 8, 12, and 4 weeks respectively. The simulation can easily be modified by changing these parameters.

POP <- "pop"
JAZZ <- "jazz"
ROCK <- "rock"
HAPPY <- "happy"
SAD  <- "sad"
name_vector <- c(POP,ROCK,JAZZ)
starts <- c(50, 25, 20)
names(starts) <- name_vector
durations <- c(8,12,4)
names(durations) <- name_vector

Simulation set up

The code in here is based on my first file, so if you want to understand how it works, go there. Otherwise skip over this.

library(tidyverse)
# maximum number of weeks a song lasts, e.g. the length of the simulation in weeks
MAX_DURATION <- 16 
# standard deviation, around a mean value of 1, for the amount of noise
# .25 seems to be a pretty good value
RANDOMNESS_FACTOR <- .25
# creates a declining slope to 0
generate_decline <- function(last_week,max_weeks) {
  # initialize to all zeros
  result <- rep(0,max_weeks) 
  # add the declining sequence, needs to be one longer, because the decline includes zero
  result[1:(last_week+1)] <- seq(1,0,length.out=(last_week+1))
  result
}
# trend boost vector, starting in the 6th week and lasting four weeks
trend_boost <- c(rep(1,length.out=5),
                 rep(3,length.out=4),
                 rep(1,length.out=MAX_DURATION - 9))
clear_song_data <- function() {
  data.frame(song_name=character(),
             song_trend=logical(),
             song_genre=character(),
             week=integer(),
             listens = integer(), 
             cumulative_listens=integer())
}
make_song_data <- function (name,genre,trend) {
  
  # retrieve parameters from the vector, based on the genre.
  start_at <- starts[genre]
  duration <- durations[genre]
 # randomness is a vector with a random multiplier for each week
 randomness <- rnorm(MAX_DURATION,mean=1,sd=RANDOMNESS_FACTOR)
 
 # this is a vector of declining values
 declining <- generate_decline(duration,MAX_DURATION)
 
 # our vector of listens multiplies a flat initial value by randomness by declining
 listens <- start_at * randomness * declining
 
 # add the effect of the trend, if the song is effected by the trend, again multiplying
 if (trend) listens <- trend_boost * listens
 
 #make these into integers
 listens <- floor (listens)
                      
 # we use this built-in R function, cumsum, to accumulate the listens
 cum_listens <- cumsum(listens)
 #ok, now we can make a dataframe for this song, and add it to our overall dataframe
 data.frame(song_name = name,
            song_genre = genre,
            song_trend = trend,
            week = 1:MAX_DURATION,
            listens = listens,
            cumulative_listens = cum_listens)
}
plot_songs <- function (song_data) {
  ggplot() +
        geom_col(data=song_data,aes(x=week,y=listens,fill=song_genre)) + 
        geom_smooth(data=song_data,aes(x=week,y=listens,fill=song_name),
                    fill=NA,color="dimgray",method="loess") +
        geom_col(data=trend_df,aes(x=week,y=trend)) +
        coord_cartesian(ylim=c(0,80)) +
        facet_wrap( ~ song_name)
}
 
plot_genre <- function (song_data) {
  
   grouped_data <-  song_data %>% 
                  group_by(song_genre, week) %>%
                  summarize(mean_listens = mean(listens))
 ggplot() +
        geom_col(data=grouped_data,aes(x=week,y=mean_listens,fill=song_genre)) + 
        geom_smooth(data=grouped_data,aes(x=week,y=mean_listens),
                    fill=NA,color="dimgray",method="loess") +
        geom_col(data=trend_df,aes(x=week,y=trend)) +
        coord_cartesian(ylim=c(0,80)) +
        facet_wrap( ~ song_genre)
 
}

let’s start exploring

First, I’ll make one pop song, called “pop yer”, with genre “pop” and no response to trend And also one rock song. And one jazz song.

Can you see which genre of song tends to have more initial listens? Which lasts longer on the charts?

song_listens_df= suppressWarnings(
  bind_rows(
      make_song_data("pop yer",POP,FALSE),
      make_song_data("rock on",ROCK,FALSE),
      make_song_data("jazzy jones",JAZZ,FALSE)
  ))
plot_songs(song_listens_df)

An Idea: pop songs are better.

Since I’m going to add release more pop songs to see if they seem all to be taller…

song_listens_df= suppressWarnings(
  bind_rows(
      song_listens_df,
      make_song_data("pop wop",POP,FALSE),
      make_song_data("pop a lot",POP,FALSE),
      make_song_data("pop be gone ",POP,FALSE),
  ))
plot_songs(song_listens_df)

yes, it seems to me that pop songs all follow a similar pattern. and it looks different from rock songs.

But just to be sure, let’s plot ALL the pop songs (on average vs. our rock song). Please note that this did not seem to work quite right. I’m not sure why, but the averages for pop should be higher. Anyway, the point is a plot by genre type.

plot_genre(song_listens_df)

So that last plot above is the AVERAGE of all the POP song listens against the average of all the rock and jazz song listens – but there’s only one rock and one jazz song.

Reflections

Based on this experiment, I am thinking we need to be careful about the level of pattern finding we are expecting of students. There’s a lot going on in these graphs, and they aren’t necessarily all that easy to read. Just reading one features off the graphs might be an accomplishment.

I do somewhat favor using bars for the number of song listens in a week, rather than going immediately to lines. I think a tall bar for a lot of listens may be easier for a kid to make sense of. I find the smoothed lines summarizing each bar very helpful to my interpretation – much better than lines connecting the tops.

I didn’t yet make a simulation that shows songs being released at different times, but you can imagine it my mentally shifting some of the plots to the right. That being said, it may be asking quite a lot of students for them to make sense of data based on phenomena that start at different points in time. I am wondering whether there should be:

  1. a game context which has a turn-by-turn flow of time, and this is where you earn dollars/followers/scores

  2. but a simpler context wtih a simpler flow of time where you can conduct experiments with less going on. The idea would be that what you learn from b might inform a.

I also wondered about the complexity of our thoughts about trends. We may need to simply this so that only certain trends (changing in time during the game) matter and there are just a few. I was thinking that the properties of genres stay pretty fixed. All the variability would be in response to mood or topic. And I was thinking that we might want to simplify these so that there is one topic and one mood which occassionally is “on trend” but the other ones are just neutral. Hence, perhaps happy songs almost perform about the same, but sad songs periodically get a big boost. I was just getting worried that if everything is potentially changing, it could take a long time to figure it all out.

