2025-11-08

Introduction

  • Many people believe longer songs may be more artistic or engaging — but are they actually more popular?
  • We’ll use a dataset of songs containing:
    • duration_ms → length of song in milliseconds
    • popularity → Spotify popularity score (0–100)
  • Let’s explore whether song length influences popularity.

The Population Regression Model

We assume a simple linear model relating song popularity and duration:

\[ \text{popularity} = \beta_0 + \beta_1 \cdot \text{duration\_min} + \varepsilon \]

\[ \mathbb{E}[\varepsilon] = 0, \quad \mathrm{Var}(\varepsilon) = \sigma^2 \]

  • \(\beta_0\): intercept — expected popularity when duration = 0
  • \(\beta_1\): slope — change in popularity per extra minute
  • \(\varepsilon\): random error term capturing other influences

We assume errors have zero mean and constant variance.

Scatter Plot: Duration vs Popularity

The red line is the fitted regression line. A mild downward slope suggests longer songs might be slightly less popular.

Fitted Regression Equation

We model song popularity as:

\[ \text{popularity} = \beta_0 + \beta_1\,\text{duration\_min} + \varepsilon \]

Estimated (fitted) model:

\[ \widehat{\text{popularity}} = 33.34 + 0.01 \times \text{duration\_min} \]

Estimated error variance:

\[ \widehat{\sigma}^2 = 499.50 \]

The fitted line helps quantify how much popularity changes with song length. A negative slope means popularity decreases slightly as duration increases.

Residual Plot: Checking Model Fit

Residuals are randomly scattered around zero → the linear model is reasonable and assumptions appear valid.

Estimating Regression Parameters

\[ \begin{split} \hat{\beta}_1 &= \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})} {\sum_i (x_i - \bar{x})^2}, \\ \mathrm{SE}(\hat{\beta}_1) &= \sqrt{\frac{\hat{\sigma}^2}{\sum_i (x_i - \bar{x})^2}}, \\ t &= \frac{\hat{\beta}_1}{\mathrm{SE}(\hat{\beta}_1)}, \quad \text{with } n-2 \text{ degrees of freedom}, \\ \hat{\sigma}^2 &= \frac{1}{n-2}\sum_i \hat{\varepsilon}_i^2 \end{split} \]

These formulas describe how we estimate coefficients, standard errors, and test significance. A large |t| and small p-value indicate a meaningful relationship.

Average Popularity by Duration

Average popularity tends to drop slightly for songs beyond 4–5 minutes. Shorter tracks are generally more popular on streaming platforms.

3D Visualization

This 3D plot shows that duration interacts with other musical features such as danceability and energy — longer songs aren’t always less popular if they’re energetic or danceable.

Conclusion

  • The data suggest a weak negative relationship between song length and popularity.
  • Longer songs tend to have slightly lower average popularity, but the effect size is small.
  • The low \(R^2\) indicates that song duration alone does not explain much of the variation in popularity.
  • Other factors like danceability, energy, and artist popularity likely play a larger role.