Since the late 1950s, the Billboard Hot 100 has proven to be the standard for the measure of popularity of songs based on radio play, weekly sales, and online streams. For several artists the main goal is reaching the #1 spot on the chart, and an even bigger achievement to stay there for several weeks. Some songs stay on the chart for months, whereas others drop after only a week. This makes us question: what makes certain songs stay at the top for longer than others?
In this project we will be exploring one feature, the length of a song, and whether it affects the number of weeks at the top. Song length is an interesting feature because its effect on streaming is unclear, shorter songs may keep listeners more attentive, while longer songs may immerse listeners more fully in the music. So, our guiding question is: Is there a relationship between the length of a song and the number of weeks it stayed at #1 on the Billboard Hot 100?
Thesis: We find no evidence that there is a relationship between the length of a song and the number of weeks it stayed at #1 on the Billboard Hot 100.
Raw Data: The dataset used for this project was compiled by Chris Dalla Riva from publicly available Billboard Top 100 chart archives from 1958 to 2025. The dataset includes every song that has reached the #1 spot on the chart throughout these years.
Weeks at Number One: This represents the total number of weeks a particular song remained at #1 on the Billboard Hot 100 chart.
Length (sec): The duration of the song in seconds.
Rows: Each row in the dataset represents one song that reached number one on the Billboard Hot 100 between 1958 and 2025.
Sample Size: We used a random sample of 100 songs from the full dataset, which was a little over 1000 songs total. Each entry includes the song title, artist, date it first reached #1, its length in seconds, and the total number of weeks it stayed at the top. This sample is large enough to support meaningful inference while keeping the analysis manageable.
For this project, we focused on two main variables: the length of each song and the number of weeks it stayed at number one.
We started by preparing the dataset to make sure everything was clean, complete, and ready for analysis. First, we renamed some of the column names to follow a consistent and readable format. For example, “Weeks.at.Number.One” and “Length..Sec.” were renamed to weeks_at_one and length_sec so they would be easier to work with in code.
Next, we removed any rows with missing values in either of the two variables we’re focusing on: song length and the number of weeks a song spent at #1. Since both are required for our regression and graphs, incomplete rows were excluded from the analysis.
After cleaning, we selected a random sample of 100 songs from the dataset. The full dataset spans several decades of Billboard #1 hits, so using a smaller sample makes the analysis easier to manage while still capturing meaningful trends. We also set a fixed random seed (set.seed(240)) to ensure that our results are reproducible.
The first graph shows how song length relates to the number of weeks a song stayed at number one. Each dot represents a song, plotted by its length in seconds on the x-axis and the number of weeks it stayed at the top on the y-axis. We added a smooth trend line to help visualize the pattern. The results show a very weak negative trend as longer songs tend to stay at #1 slightly less, but the points are scattered with no clear structure. Songs of similar length can perform very differently on the charts, suggesting that length alone isn’t a strong predictor of chart longevity.
To get a better sense of the sample, we also plotted a histogram of song lengths. Most songs fall between 180 and 270 seconds, or roughly 3 to 4.5 minutes. This clustering is typical for mainstream pop music, and we see only a few songs that are much shorter or longer than that. The shape is fairly symmetric and bell-shaped, with no extreme outliers. This tight range limits how much variation the model can pick up from length alone, which may explain why the regression didn’t show a strong relationship.
Based on these cleaning steps and the structure of the sample, we believe this subset is a reasonable representation of the broader set of Billboard #1 song. Our visualizations help confirm that the cleaned data makes sense, and that the variables behave as expected. From here, we can move forward with our statistical analysis.
In this study, we investigate whether there is a statistically significant linear relationship between the length of a song (in seconds) and the number of weeks the song was ranked #1 on the Billboard Hot 100. To analyze this, we construct a simple linear regression model using a random sample of 100 songs and estimate the regression equation and p-value.
The parameter of interest is the true slope of the population regression line:
\[ \beta_1 = \text{the true change in expected weeks at #1 for each additional second of song length.} \]
We also estimate the true intercept:
\[ \beta_0 = \text{the expected number of weeks at #1 when song length } = 0. \]
(Note: \(\beta_0\) has limited practical use because a 0-second song is not possible)
We use a simple linear regression t-test on the slope coefficient \(\beta_1\) to test whether song length is a significant predictor of weeks at #1.
Let:
\[ X = \text{Song Length (seconds)}, \qquad Y = \text{Weeks at #1}. \]
We assume:
The relationship between X and Y is linear, as opposed to a curved/non-linear relationship.
The residual errors are normally distributed around 0.
The residual errors have constant variance/standard deviation, which does not change with X. This assumption was not met for our dataset!
We evaluated these assumptions using a residual plot and only two of the three were reasonably satisfied for this sample.
We test whether song length predicts weeks at #1:
\[ H_0 : \beta_1 = 0 \quad \text{(no relationship)} \]
\[ H_{\alpha} : \beta_1 \neq 0 \quad \text{(there is a linear relationship)} \]
This is a two-sided hypothesis test, because we do not assume in advance whether longer songs stay at #1 for more or fewer weeks.
The test statistic for the slope is:
\[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} \]
where:
Under \(H_0\), the test statistic follows a t-distribution with:
\[ df = n - 2 = 98 \]
since \(n = 100\).
\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X \]
Where:
\[ t = -0.602 \]
\[ df = 98 \]
\[ p = 0.54830 \]
\[ \text{CI}_{95\%}(\beta_1) = [-0.01471615,\; 0.007862305] \]
Based on the simple linear regression analysis, we did not find significant evidence that the length of a song is associated with the number of weeks it reached #1 on the Billboard Hot 100. The p-value was greater than 0.05 and the 95% confidence interval included 0. Therefore, there is no clear linear relationship between the two variables.
Our scatterplot of song length versus the number of weeks at #1 shows a large amount of variability and only a very weak overall trend. Although the regression line has a slightly negative slope, the points are widely scattered, and songs with nearly identical lengths can have very different chart outcomes. This suggests that song length alone is not a meaningful predictor of how long a song stays at #1.
Importantly, the 95% confidence interval for the slope ranges from −0.0147 to 0.0079. This means we are 95% confident that for each additional second of song length, the true change in expected weeks at #1 is between a decrease of about 0.015 weeks and an increase of about 0.008 weeks. Because this interval includes 0 and the range of possible effects is extremely small, the results indicate that even if song length has any effect at all, it is practically negligible.
The histogram of song lengths further explains why the relationship appears so weak. Most #1 songs cluster tightly between 180 and 240 seconds (roughly 3 - 4 minutes), which limits the variation in the predictor. With such a narrow range of song lengths, it becomes difficult for the model to detect meaningful differences in chart performance. Taken together, these results show that while song lengths vary somewhat, these differences do not translate into meaningful differences in chart longevity.
Limited predictors: Our model only uses song length, but chart performance is influenced by many other variables, genre, collaborations, artist fame, marketing campaigns, release timing, social media presence, and more. Ignoring these factors makes it difficult to isolate the effect of length.
Sample variation: We used a random sample of 100 songs rather than the full dataset. A different random sample could produce slightly different patterns or slopes.
Historical changes: The dataset spans several decades, during which the music industry changed significantly. Typical song lengths and listener habits have evolved across eras, but our model does not account for decade or release year.
Regression assumptions: A simple linear regression assumes a straight-line relationship and constant spread of data across all x-values. Our scatterplot shows uneven variability and potential nonlinearity, raising concerns about whether these assumptions hold.
Restricted range: Because all songs in this dataset reached #1, the sample is already biased toward highly successful songs. This limited range of chart performance can weaken detected relationships.
Include more variables such as release year, genre, collaboration status, streaming-era indicators, or artist popularity to build a more informative model.
Analyze trends by decade to see whether song length mattered more in certain eras (e.g., vinyl era vs. streaming era).
Use models suited for count outcomes like a negative binomial regression might capture the distribution of weeks at #1 better than linear regression.
Compare #1 hits to lower-ranked songs to explore whether length affects overall chart performance, not just among the most successful songs.
Overall, our results suggest that song length alone does not meaningfully predict how long a song will remain at #1 on the Billboard charts. Chart success is driven by a combination of musical, cultural, and industry factors that extend far beyond the duration of the track.