Task 1

4240 Data Literacy

Anastasiya Konovalchuk

2025-03-18

Introduction

  • Objective: Assess the impact of playlist placement on song streaming using a Difference-in-Differences (DiD) approach.
  • Methodology: Compare changes in weekly streams for treated vs. untreated songs before and after playlist addition.

Data & Methodology

  • Dataset: Weekly data on 62 songs on Spotify, of which
  • Treatment Group: 31 were added to the Winter Acoustic Playlist on the 15th December 2023.
  • Control Group: 31 are similar control songs (selected based on song characteristics and success buckets).
  • Key Variable: Change in streams from each previous week 3 weeks before the treatment and 3 weeks after. (delta_streams).

Difference-in-Differences Model

\[ \begin{aligned} \textit{Delta Streams}_{i,t} &= \beta_0 + \beta_1 (Treated_i) + \beta_2 (WeekDiff_t) \\ &\quad + \beta_3 (Treated_i \times WeekDiff_t) + \epsilon_{i,t} \end{aligned} \]

  • Treated: Indicator for playlist inclusion (0 for control group (not treated) and 1 for treatment group.
  • WeekDiff: Time relative to the day of treatment (weeks before/after).
  • Treated × WeekDiff: Interaction term capturing playlist impact.

Difference-in-Differences Model In R:

df_weekly1 <- df_weekly %>%
  arrange(isrc, week_diff) %>%
  group_by(isrc) %>%
  mutate(delta_streams = streams - lag(streams)) %>%
  ungroup()

Summary

model_did_changes <- feols(delta_streams ~ i(week_diff, treated, 0) |
                             isrc + week_diff, 
                           data = df_weekly1, 
                           vcov = ~ isrc)
summary(model_did_changes)
OLS estimation, Dep. Var.: delta_streams
Observations: 372
Fixed-effects: isrc: 62,  week_diff: 6
Standard-errors: Clustered (isrc) 
                       Estimate Std. Error   t value    Pr(>|t|)    
week_diff::-2:treated  0.009701   0.033916  0.286019 0.775832697    
week_diff::-1:treated  0.009358   0.027741  0.337349 0.737011657    
week_diff::1:treated   0.252793   0.057353  4.407688 0.000043069 ***
week_diff::2:treated   0.079110   0.032363  2.444421 0.017413742 *  
week_diff::3:treated  -0.006267   0.038342 -0.163458 0.870698390    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.118151     Adj. R2: 0.171736
                 Within R2: 0.131172

Visualization

iplot(model_did_changes)

Figure 1: Streaming trend changes over time.

Conclusion

As shown in the model summary and the graph (Figure 1), the growth in streaming numbers initially surges after the first week of playlist inclusion but gradually declines over time, eventually approaching zero.

  • Strong Initial Impact: The week immediately following playlist placement shows the highest effect, with an estimated increase of ~0.253 and a highly significant p-value (~0.000043).
  • Gradual Decline: By the second week, the estimate drops to ~0.079, with a reduced significance level (~0.017).
  • No Long-Term Effect: By the third week, the impact is no longer statistically significant (p ~ 0.871), and the estimate even turns slightly negative (-0.0063), indicating no sustained growth beyond the initial boost.

These findings suggest that playlist placements primarily drive short-term streaming spikes rather than long-term sustained growth.