The dataset I used contains song from Spotify from 1921 to 2020. The dataset contains track information and scoring system provided by spotify.
Songs <- read.csv("data_by_year.csv")
Songsts <- ts(Songs, start = c(1921), frequency = 1)
Songsmelt<-melt(Songs,id.vars = 'year')
ggplot(Songsmelt, aes(x=year, y=value)) + geom_point() + facet_wrap(~variable, scale = "free")+ geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Looking at the data liveness looks interesting to me. As given in the data description liveness is:‘The relative duration of the track sounding as a live performance’. This mean as time continues, songs are composed to sound well for live performance.
SongLiveness <- ts(Songs[,8], start = c(1921), frequency = 1)
autoplot(SongLiveness, main='Song Liveness')
acf(SongLiveness)
pacf(SongLiveness)
train.songs<-window(SongLiveness, end=c(2014))
test.songs<-window(SongLiveness,start=c(2015))
model1=nnetar(train.songs,lambda="auto")
model2=nnetar(train.songs,lambda="auto",size =2)
model3=nnetar(train.songs,lambda="auto",size =3)
model4=nnetar(train.songs,lambda="auto",size=4)
model5=nnetar(train.songs,lambda="auto",size=5)
fcast.nnet1=forecast(model1, PI=TRUE, h=10)
fcast.nnet2=forecast(model2, PI=TRUE, h=10)
fcast.nnet3=forecast(model3, PI=TRUE, h=10)
fcast.nnet4=forecast(model4, PI=TRUE, h=10)
fcast.nnet5=forecast(model5, PI=TRUE, h=10)
autoplot(train.songs)+autolayer(fcast.nnet1$mean)+autolayer(fcast.nnet2$mean)+autolayer(fcast.nnet3$mean)+autolayer(fcast.nnet4$mean)+autolayer(fcast.nnet5$mean)
It is suprising to me the forecast are all increasing in trend. This is unexpected since the liveness factor seems to be in decline over the years.
accuracy(fcast.nnet1, test.songs)
## ME RMSE MAE MPE MAPE
## Training set 0.0009903014 0.01445182 0.01053071 3.601221e-05 5.002212
## Test set -0.0189616307 0.02052054 0.01896163 -1.061692e+01 10.616924
## MASE ACF1 Theil's U
## Training set 0.9027068 0.01085036 NA
## Test set 1.6254175 0.17290689 2.391484
accuracy(fcast.nnet2, test.songs)
## ME RMSE MAE MPE MAPE
## Training set 0.0009808389 0.01403829 0.01027089 0.01259662 4.875538
## Test set -0.0189750727 0.02039496 0.01897507 -10.61771943 10.617719
## MASE ACF1 Theil's U
## Training set 0.8804347 -0.01781574 NA
## Test set 1.6265697 0.14066000 2.366077
accuracy(fcast.nnet3, test.songs)
## ME RMSE MAE MPE MAPE
## Training set 0.0009256189 0.01330489 0.009906566 0.0141037 4.700987
## Test set -0.0185918429 0.02001633 0.018591843 -10.4052049 10.405205
## MASE ACF1 Theil's U
## Training set 0.8492047 -0.0327423 NA
## Test set 1.5937187 0.1340379 2.322602
accuracy(fcast.nnet4, test.songs)
## ME RMSE MAE MPE MAPE
## Training set 0.0009714529 0.01277705 0.009584494 0.06248577 4.542079
## Test set -0.0188406000 0.02027827 0.018840600 -10.54388282 10.543883
## MASE ACF1 Theil's U
## Training set 0.8215962 -0.02300076 NA
## Test set 1.6150426 0.13411009 2.355198
accuracy(fcast.nnet5, test.songs)
## ME RMSE MAE MPE MAPE
## Training set 0.0008577006 0.01224152 0.009320447 0.03689095 4.416738
## Test set -0.0181533464 0.01955676 0.018153346 -10.16065676 10.160657
## MASE ACF1 Theil's U
## Training set 0.7989618 -0.06000015 NA
## Test set 1.5561302 0.11055956 2.268502
The results for the 5 models are very similar. Based on the test set RMSE the best model3 is using NNAR(1,3). The model is expected for slight increase in the liveness of songs in the future 10 years
model3
## Series: train.songs
## Model: NNAR(1,3)
## Call: nnetar(y = train.songs, size = 3, lambda = "auto")
##
## Average of 20 networks, each of which is
## a 1-3-1 network with 10 weights
## options were - linear output units
##
## sigma^2 estimated as 0.09627
autoplot(fcast.nnet3)