Discussion 7

The dataset I used contains song from Spotify from 1921 to 2020. The dataset contains track information and scoring system provided by spotify.

Songs <- read.csv("data_by_year.csv")

Songsts <- ts(Songs, start = c(1921), frequency = 1)
Songsmelt<-melt(Songs,id.vars = 'year')

ggplot(Songsmelt, aes(x=year, y=value)) + geom_point() + facet_wrap(~variable, scale = "free")+ geom_smooth(se = FALSE)

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Looking at the data liveness looks interesting to me. As given in the data description liveness is:‘The relative duration of the track sounding as a live performance’. This mean as time continues, songs are composed to sound well for live performance.

SongLiveness <- ts(Songs[,8], start = c(1921), frequency = 1)
autoplot(SongLiveness, main='Song Liveness')

acf(SongLiveness)

pacf(SongLiveness)

train.songs<-window(SongLiveness, end=c(2014))
test.songs<-window(SongLiveness,start=c(2015))

model1=nnetar(train.songs,lambda="auto")
model2=nnetar(train.songs,lambda="auto",size =2)
model3=nnetar(train.songs,lambda="auto",size =3)
model4=nnetar(train.songs,lambda="auto",size=4)
model5=nnetar(train.songs,lambda="auto",size=5)


fcast.nnet1=forecast(model1, PI=TRUE, h=10)
fcast.nnet2=forecast(model2, PI=TRUE, h=10)
fcast.nnet3=forecast(model3, PI=TRUE, h=10)
fcast.nnet4=forecast(model4, PI=TRUE, h=10)
fcast.nnet5=forecast(model5, PI=TRUE, h=10)

autoplot(train.songs)+autolayer(fcast.nnet1$mean)+autolayer(fcast.nnet2$mean)+autolayer(fcast.nnet3$mean)+autolayer(fcast.nnet4$mean)+autolayer(fcast.nnet5$mean)

It is suprising to me the forecast are all increasing in trend. This is unexpected since the liveness factor seems to be in decline over the years.

accuracy(fcast.nnet1, test.songs)

##                         ME       RMSE        MAE           MPE      MAPE
## Training set  0.0009903014 0.01445182 0.01053071  3.601221e-05  5.002212
## Test set     -0.0189616307 0.02052054 0.01896163 -1.061692e+01 10.616924
##                   MASE       ACF1 Theil's U
## Training set 0.9027068 0.01085036        NA
## Test set     1.6254175 0.17290689  2.391484

accuracy(fcast.nnet2, test.songs)

##                         ME       RMSE        MAE          MPE      MAPE
## Training set  0.0009808389 0.01403829 0.01027089   0.01259662  4.875538
## Test set     -0.0189750727 0.02039496 0.01897507 -10.61771943 10.617719
##                   MASE        ACF1 Theil's U
## Training set 0.8804347 -0.01781574        NA
## Test set     1.6265697  0.14066000  2.366077

accuracy(fcast.nnet3, test.songs)

##                         ME       RMSE         MAE         MPE      MAPE
## Training set  0.0009256189 0.01330489 0.009906566   0.0141037  4.700987
## Test set     -0.0185918429 0.02001633 0.018591843 -10.4052049 10.405205
##                   MASE       ACF1 Theil's U
## Training set 0.8492047 -0.0327423        NA
## Test set     1.5937187  0.1340379  2.322602

accuracy(fcast.nnet4, test.songs)

##                         ME       RMSE         MAE          MPE      MAPE
## Training set  0.0009714529 0.01277705 0.009584494   0.06248577  4.542079
## Test set     -0.0188406000 0.02027827 0.018840600 -10.54388282 10.543883
##                   MASE        ACF1 Theil's U
## Training set 0.8215962 -0.02300076        NA
## Test set     1.6150426  0.13411009  2.355198

accuracy(fcast.nnet5, test.songs)

##                         ME       RMSE         MAE          MPE      MAPE
## Training set  0.0008577006 0.01224152 0.009320447   0.03689095  4.416738
## Test set     -0.0181533464 0.01955676 0.018153346 -10.16065676 10.160657
##                   MASE        ACF1 Theil's U
## Training set 0.7989618 -0.06000015        NA
## Test set     1.5561302  0.11055956  2.268502

The results for the 5 models are very similar. Based on the test set RMSE the best model3 is using NNAR(1,3). The model is expected for slight increase in the liveness of songs in the future 10 years

model3

## Series: train.songs 
## Model:  NNAR(1,3) 
## Call:   nnetar(y = train.songs, size = 3, lambda = "auto")
## 
## Average of 20 networks, each of which is
## a 1-3-1 network with 10 weights
## options were - linear output units 
## 
## sigma^2 estimated as 0.09627

autoplot(fcast.nnet3)

Discussion 7

Yu Mu

12/9/2020