Since the late 1950s, the Billboard Hot 100 has proven to be the standard for the measure of popularity of songs based on radio play, weekly sales, and online streams. For several artists the main goal is reaching the #1 spot on the chart, and an even bigger achievement to stay there for several weeks. Some songs stay on the chart for months, whereas others drop after only a week. This makes us question: what makes certain songs stay at the top for longer than others?
In this project we will be exploring one feature - the length of a song - and whether it affects the number of weeks at the top. Our guiding question is: Is there a relationship between the length of a song and the number of weeks it stayed at #1 on the Billboard Hot 100?
We find no evidence that there a relationship between the length of a song and the number of weeks it stayed at #1 on the Billboard Hot 100.
Raw Data: The dataset used for this project was compiled by Chris Dalla Riva from publicly available Billboard Top 100 chart archives from 1958 to 2025. The dataset includes every song that has reached the #1 spot on the chart throughout these years.
Weeks at Number One: This represents the total number of weeks a particular song remained at #1 on the Billboard Hot 100 chart.
Length (sec): The duration of the song in seconds.
Rows: Each row in the dataset represents one song that reached number one on the Billboard Hot 100 between 1958 and 2025.
Sample Size: We used a random sample of 100 songs from the full dataset. Each entry includes the song title, artist, date it first reached #1, its length in seconds, and the total number of weeks it stayed at the top. This sample is large enough to support meaningful inference while keeping the analysis manageable.
For this project, we focused on two main variables: the length of each song and the number of weeks it stayed at number one.
We started by preparing the dataset to make sure everything was clean, complete, and ready for analysis. First, we renamed some of the column names to follow a consistent and readable format. For example, “Weeks.at.Number.One” and “Length..Sec.” were renamed to weeks_at_one and length_sec so they would be easier to work with in code.
Next, we removed any rows with missing values in either of the two variables we’re focusing on: song length and the number of weeks a song spent at #1. Since both are required for our regression and graphs, incomplete rows were excluded from the analysis.
After cleaning, we selected a random sample of 100 songs from the dataset. The full dataset spans several decades of Billboard #1 hits, so using a smaller sample makes the analysis easier to manage while still capturing meaningful trends. We also set a fixed random seed (set.seed(240)) to ensure that our results are reproducible.
The first graph shows how song length relates to the number of weeks a song stayed at number one. Each dot represents a song, plotted by its length in seconds on the x-axis and the number of weeks it stayed at the top on the y-axis. We added a smooth trend line to help visualize the pattern. The results show a very weak negative trend as longer songs tend to stay at #1 slightly less, but the points are scattered with no clear structure. Songs of similar length can perform very differently on the charts, suggesting that length alone isn’t a strong predictor of chart longevity.
To get a better sense of the sample, we also plotted a histogram of song lengths. Most songs fall between 180 and 270 seconds, or roughly 3 to 4.5 minutes. This clustering is typical for mainstream pop music, and we see only a few songs that are much shorter or longer than that. The shape is fairly symmetric and bell-shaped, with no extreme outliers. This tight range limits how much variation the model can pick up from length alone, which may explain why the regression didn’t show a strong relationship.
Based on these cleaning steps and the structure of the sample, we believe this subset is a reasonable representation of the broader set of Billboard #1 song. Our visualizations help confirm that the cleaned data makes sense, and that the variables behave as expected. From here, we can move forward with our statistical analysis.
In this study, we investigate whether there is a statistically significant linear relationship between the length of a song (in seconds) and the number of weeks the song was ranked #1 on the Billboard Hot 100. To analyze this, we construct a simple linear regression model using a random sample of 100 songs and estimate the regression equation and p-value.
The parameter of interest is the true slope of the population regression line:
\[ \beta_1 = \text{the true change in expected weeks at #1 for each additional second of song length.} \]
We also estimate the true intercept:
\[ \beta_0 = \text{the expected number of weeks at #1 when song length } = 0. \]
(Note: \(\beta_0\) has limited practical use because a 0-second song is not possible)
We use a simple linear regression t-test on the slope coefficient \(\beta_1\) to test whether song length is a significant predictor of weeks at #1.
Let:
\[ X = \text{Song Length (seconds)}, \qquad Y = \text{Weeks at #1}. \]
We assume:
The relationship between X and Y is linear, as opposed to a curved/non-linear relationship.
The residual errors are normally distributed around 0.
The residual errors have constant variance/standard deviation, which does not change with X.
We evaluated these assumptions using a residual plot and they are reasonably satisfied for this sample.
We test whether song length predicts weeks at #1:
\[ H_0 : \beta_1 = 0 \quad \text{(no relationship)} \]
\[ H_{\alpha} : \beta_1 \neq 0 \quad \text{(there is a linear relationship)} \]
This is a two-sided hypothesis test, because we do not assume in advance whether longer songs stay at #1 for more or fewer weeks.
The test statistic for the slope is:
\[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} \]
where:
Under \(H_0\), the test statistic follows a t-distribution with:
\[ df = n - 2 = 98 \]
since \(n = 100\).
\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X \]
Where:
\[ t = -0.602 \]
\[ df = 98 \]
\[ p = 0.54830 \]
Because the p-value is significantly larger than 0.05, we
fail to reject the null hypothesis.
There is not enough evidence to conclude that song length predicts the
number of weeks at #1.
\[ \text{CI}_{95\%}(\beta_1) = [-0.01471615,\; 0.007862305] \]
Since the confidence interval includes 0, we cannot conclude that there is a non-zero relationship between song length and weeks at #1.
Based on the simple linear regression analysis, we did not find significant evidence that the length of a song is associated with the number of weeks it reached #1 on the Billboard Hot 100. Both the p-value and the 95% confidence interval for the slope suggest that there is no clear linear relationship between the two variables.
Our scatterplot of song length (seconds) versus number of weeks at #1 shows a large amount of variability and only a very weak overall pattern. The regression line has a slightly negative slope, suggesting that longer songs tend to stay at #1 for slightly fewer weeks on average. However, the points are widely scattered around the line, meaning the length of a song is not a strong or reliable predictor of chart longevity. Two songs with nearly identical lengths may have completely different chart runs, indicating that many other factors (artist popularity, release timing, trends, genre, promotion, etc.) likely play a much bigger role.
The histogram of song lengths helps us understand why the relationship appears so weak. Most #1 songs in our sample cluster tightly between 180 and 240 seconds (around 3 - 4 minutes), which is the standard pop song length. The roughly bell-shaped distribution tells us that Billboard #1 songs tend to be fairly similar in length, with only a few very short (<150 seconds) or very long (>300 seconds) songs appearing in the sample. Because the majority of songs fall within a narrow length range, there simply isn’t enough variation to reveal a strong relationship with weeks at #1. Taken together, these plots suggest that while song length does vary somewhat across hits, it does not meaningfully explain why some songs remain at #1 longer than others. The small differences we observe in length are unlikely to drive large differences in chart performance.
Limited predictors: Our model only uses song length, but chart performance is influenced by many other variables, genre, collaborations, artist fame, marketing campaigns, release timing, social media presence, and more. Ignoring these factors makes it difficult to isolate the effect of length.
Sample variation: We used a random sample of 100 songs rather than the full dataset. A different random sample could produce slightly different patterns or slopes.
Historical changes: The dataset spans several decades, during which the music industry changed significantly. Typical song lengths and listener habits have evolved across eras, but our model does not account for decade or release year.
Regression assumptions: A simple linear regression assumes a straight-line relationship and constant spread of data across all x-values. Our scatterplot shows uneven variability and potential nonlinearity, raising concerns about whether these assumptions hold.
Restricted range: Because all songs in this dataset reached #1, the sample is already biased toward highly successful songs. This limited range of chart performance can weaken detected relationships.
Include more variables such as release year, genre, collaboration status, streaming-era indicators, or artist popularity to build a more informative model.
Analyze trends by decade to see whether song length mattered more in certain eras (e.g., vinyl era vs. streaming era).
Use models suited for count outcomes like a negative binomial regression might capture the distribution of weeks at #1 better than linear regression.
Compare #1 hits to lower-ranked songs to explore whether length affects overall chart performance, not just among the most successful songs.
Overall, our results suggest that song length alone does not meaningfully predict how long a song will remain at #1 on the Billboard charts. Chart success is driven by a combination of musical, cultural, and industry factors that extend far beyond the duration of the track.