General Approach

My general approach to song-listen modeling is multiplicative. That is I’m going to start with a vector that lasts 16 weeks in which the song is listened to the same number of times every week. Then I am going to MULTIPY that vector by other vectors that are centered on 1.0, because multiplying by 1 will have no effect. The other vectors are:

  1. a declining vector, from 1 to 0, so the listens fade out
  2. a trend vector, which multiplies song listens in certain weeks
  3. a noise vector, so it doesn’t all look too smooth

Constants

Here are constants that drive the game simulation

library(tidyverse)
MAX_DURATION <- 16 ## maximum number of weeks a song lasts
#how many songs to make
N_SONGS <- 6
SONG_NAMES <- c("Rockin","Rollin","Hippin","Hoppen","Jazz","DeFunk")
#If true, the song responds to the trend
SONG_ON_TREND <- c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE)
#how many listens each starts at
LISTENS_START_AT <- c(40,40,60,25,20,30)
#how long the duration of each song is, in weeks
SONG_DURATIONS <- c(12,5,15,16,3,10)

Generate Decline

Function to generate a linear decline from 1 to zero over a certain number of weeks, staying at zero until a max number of weeks. Later we can deal with a curving decline. The graph shows the generic decline

generate_decline <- function(last_week,max_weeks) {
  #first make a sequence that declines from a first value to zero
  #we make it one too long and then clip the last value, which is zero
  #this gives us all non-zero values
  result <- seq(1,0,length.out=last_week+1)[c(1:last_week)]
  #then if its too short, we add zeros to the end
  if (last_week < max_weeks) result <- c(result,rep(0,length.out=max_weeks-last_week))
  result
}
linear_decline <- generate_decline(5,10)
plot(linear_decline,type="l")

Trend Boost

A trend boost is the boost a song gets if it is “on trend” in a given week. Eventually there could be different trends; for now, just one. For now, I’m going to say the trend kicks in week 6 and lasts 4 weeks – and it triples the listens for songs that are on trend. It is just a vector of 16 numbers with 1.0 for the normal weeks and 3.0 for the trend weeks.

trend_boost <- c(rep(1,length.out=5),
                 rep(3,length.out=4),
                 rep(1,length.out=7))
barplot(trend_boost)

Generate a data frame with song listen data

# our data frame has columns for the song name, what week it is, listens and cumulative listens
song_listens_df = data.frame(song_name=character(),
                             week=integer(),
                             listens = integer(), 
                             cumulative_listens=integer())
# populate the song listens data frame
for(song in 1:N_SONGS) {
  
 # all songs start out about the same (randomness will make them vary a bit)
 start_at <- LISTENS_START_AT[song]
 
 # randomness is a vector with a random multiplier for each week
 randomness <- rnorm(MAX_DURATION,mean=1,sd=.25)
 
 # this is a vector of declining values
 declining <- generate_decline(SONG_DURATIONS[song],MAX_DURATION)
 
 # of vector of listens multiplies a flat line at the start value by randomness by declining
 listens <- start_at * randomness * declining
 
 # add the effect of the trend, if the song is effected by the trend
 if (SONG_ON_TREND[song]) listens <- trend_boost * listens
 
 #make these into integers
 listens <- floor (listens)
                      
 # we use this built-in R function, cumsum, to accumulate the listens
 cum_listens <- cumsum(listens)
 #ok, now we can make a dataframe for this song, and add it to our overall dataframe
 new_df = data.frame(song_name = SONG_NAMES[song],
                      week = 1:MAX_DURATION,
                      listens = listens,
                      cumulative_listens = cum_listens)
 song_listens_df <- rbind(song_listens_df,new_df)
}
song_listens_df

plot weekly listens

First, I’ll plot it the way Matthew did, but I have to say I like the column plot with the smoothed function better! Can you see which songs are getting a boost in week 6?

ggplot(data=song_listens_df,aes(x=week,y=listens,color=song_name)) +
        geom_line() + 
        coord_cartesian(ylim=c(0,100)) +
        facet_wrap( ~ song_name)

ggplot(data=song_listens_df,aes(x=week,y=listens,fill=song_name)) +
        geom_col() + geom_smooth(fill=NA,color="dimgray",method="loess") +
        coord_cartesian(ylim=c(0,100)) +
        facet_wrap( ~ song_name)

plot cumuluative listens

Can you see which songs are getting a boost in week 6? Look for the steeper slopes from week 6 to 10.

ggplot(data=song_listens_df,aes(x=week,y=cumulative_listens,color=song_name)) +
        geom_line(size=1) +
        coord_cartesian(ylim=c(0,800))

Showing where the boost is

Does graphing where the trend boost is at the bottom help? Note only two songs respond to the boost, though it shows in every plot. Who is responding?

trend_df <- data.frame(week=1:16,trend=trend_boost-1)
ggplot() +
        geom_col(data=song_listens_df,aes(x=week,y=listens,fill=song_name)) + 
        geom_smooth(data=song_listens_df,aes(x=week,y=listens,fill=song_name),
                    fill=NA,color="dimgray",method="loess") +
         geom_col(data=trend_df,aes(x=week,y=trend)) +
        coord_cartesian(ylim=c(0,80)) +
        facet_wrap( ~ song_name)

ggplot() +
        geom_line(data=song_listens_df,
                  aes(x=week,y=cumulative_listens,color=song_name),size = 1) +
        geom_col(data=trend_df,aes(x=week,y=trend)) +
        coord_cartesian(ylim=c(0,800))

And the answer is….

The songs that respond to the boost are Rockin and Hoppen. Hippen looks like it does (steep slope in the relevant weeks, but its just a song that’s doing well over all). Jazz is the world’s best music (that no one listens to). Rollin burned out quickly; DeFunk started out the same as jazz, but it lasted a lot longer in the market.

---
title: "Music Studio Game -- Simulated Data"
output: html_notebook
---

# General Approach
My general approach to song-listen modeling is multiplicative. That is I'm going to start with a vector that lasts 16 weeks in which the song is listened to the same number of times every week. Then I am going to MULTIPY that vector by other vectors that are centered on 1.0, because multiplying by 1 will have no effect. The other vectors are:


1. a declining vector, from 1 to 0, so the listens fade out
2. a trend vector, which multiplies song listens in certain weeks
3. a noise vector, so it doesn't all look too smooth

# Constants
Here are constants that drive the game simulation

```{r}
library(tidyverse)


# maximum number of weeks a song lasts, e.g. the length of the simulation in weeks
MAX_DURATION <- 16 

#how many songs to make
N_SONGS <- 6

#name to use for the songs
SONG_NAMES <- c("Rockin","Rollin","Hippin","Hoppen","Jazz","DeFunk")

#If true, the song responds to the trend
#So Rockin and Hoppen should respond
SONG_ON_TREND <- c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE)


#how many listens each song starts at, in week one
LISTENS_START_AT <- c(40,40,60,25,20,30)

#how long the duration of each song is, in weeks
SONG_DURATIONS <- c(12,5,15,16,3,10)

```


# Generate Decline 

Function to generate a linear decline from 1 to zero over a certain number of weeks, staying at zero until a max number of weeks. Later we can deal with a curving decline. The graph shows the generic decline

```{r}
generate_decline <- function(last_week,max_weeks) {
  #first make a sequence that declines from a first value to zero
  #we make it one too long and then clip the last value, which is zero
  #this gives us all non-zero values
  result <- seq(1,0,length.out=last_week+1)[c(1:last_week)]
  #then if its too short, we add zeros to the end
  if (last_week < max_weeks) result <- c(result,rep(0,length.out=max_weeks-last_week))
  result
}

#this function will pick a random duration for our song, in weeks
RandDuration <- function() {
  round(runif(1,MIN_DURATION,MAX_DURATION))
}

linear_decline <- generate_decline(5,10)
plot(linear_decline,type="l")
```

# Trend Boost

A trend boost is the boost a song gets if it is "on trend" in  a given week. Eventually there could be different trends; for now, just one. For now, I'm going to say the trend kicks in week 6 and lasts 4 weeks -- and it triples the listens for songs that are on trend. It is just a vector of 16 numbers with 1.0 for the normal weeks and 3.0 for the trend weeks.

```{r}
trend_boost <- c(rep(1,length.out=5),
                 rep(3,length.out=4),
                 rep(1,length.out=7))
barplot(trend_boost)
```




# Generate a data frame with song listen data
# 

```{r}
# our data frame has columns for the song name, what week it is, listens and cumulative listens
song_listens_df = data.frame(song_name=character(),
                             week=integer(),
                             listens = integer(), 
                             cumulative_listens=integer())



# populate the song listens data frame. The for loop makes n songs.
for(song in 1:N_SONGS) {
  
 # all songs start out about the same in terms of listens
 start_at <- LISTENS_START_AT[song]
 
 # randomness is a vector with a random multiplier for each week
 randomness <- rnorm(MAX_DURATION,mean=1,sd=.25)
 
 # this is a vector of declining values
 declining <- generate_decline(SONG_DURATIONS[song],MAX_DURATION)
 
 # our vector of listens multiplies a flat initial value by randomness by declining
 listens <- start_at * randomness * declining
 
 # add the effect of the trend, if the song is effected by the trend, again multiplying
 if (SONG_ON_TREND[song]) listens <- trend_boost * listens
 
 #make these into integers
 listens <- floor (listens)
                      
 # we use this built-in R function, cumsum, to accumulate the listens
 cum_listens <- cumsum(listens)

 #ok, now we can make a dataframe for this song, and add it to our overall dataframe
 new_df = data.frame(song_name = SONG_NAMES[song],
                      week = 1:MAX_DURATION,
                      listens = listens,
                      cumulative_listens = cum_listens)
 
 song_listens_df <- rbind(song_listens_df,new_df)
 
}

#this is the resulting n-song data frame
song_listens_df
```


#plot weekly listens

First, I'll plot it the way Matthew did, but I have to say I like the column plot with the smoothed function better! Can you see which songs are getting a boost in week 6?

```{r}
ggplot(data=song_listens_df,aes(x=week,y=listens,color=song_name)) +
        geom_line() + 
        coord_cartesian(ylim=c(0,100)) +
        facet_wrap( ~ song_name)
```


```{r}

ggplot(data=song_listens_df,aes(x=week,y=listens,fill=song_name)) +
        geom_col() + 
        geom_smooth(fill=NA,color="dimgray",method="loess") +
        coord_cartesian(ylim=c(0,100)) +
        facet_wrap( ~ song_name)
```


#plot cumuluative listens

Can you see which songs are getting a boost in week 6? Look for the steeper slopes from week 6 to 10.

```{r}
ggplot(data=song_listens_df,aes(x=week,y=cumulative_listens,color=song_name)) +
        geom_line(size=1) +
        coord_cartesian(ylim=c(0,800))
```
# Showing where the boost is

Does graphing where the trend boost is at the bottom help? Note only two songs respond to the boost, though it shows in every plot. Who is responding?


```{r}

trend_df <- data.frame(week=1:16,trend=trend_boost-1)

ggplot() +
        geom_col(data=song_listens_df,aes(x=week,y=listens,fill=song_name)) + 
        geom_smooth(data=song_listens_df,aes(x=week,y=listens,fill=song_name),
                    fill=NA,color="dimgray",method="loess") +
         geom_col(data=trend_df,aes(x=week,y=trend)) +
        coord_cartesian(ylim=c(0,80)) +
        facet_wrap( ~ song_name)
```

```{r}


ggplot() +
        geom_line(data=song_listens_df,
                  aes(x=week,y=cumulative_listens,color=song_name),size = 1) +
        geom_col(data=trend_df,aes(x=week,y=trend)) +
        coord_cartesian(ylim=c(0,800))
```

#And the answer is....

The songs that respond to the boost are Rockin and Hoppen. Hippen looks like it does (steep slope in the relevant weeks, but its just a song that's doing well over all). Jazz is the world's best music (that no one listens to). Rollin burned out quickly; DeFunk started out the same as jazz, but it lasted a lot longer in the market.


