Primary Analysis Objective

  1. Determine whether the number of unique songs reaching #1 on the Billboard Hot 100 chart each year has changed over time.

Secondary Analysis Objective

  1. Determine whether the number of artists with #1 hits on the Billboard Hot 100 chart each year has changed over time.

Background

The Billboard Hot 100 chart captures each week’s most popular songs across all genres. Rankings are determined by radio airplay audience impressions, which are measured by Nielsen Music’s radio SoundScan tracking program, Nielsen Music’s sales data, and streaming activity data from various online music sources. A common argument in music discussion circles is that modern pop music is worse than the music of previous decades due to its lack of variety; while this claim is difficult if not impossible to assess, the number of unique songs and artists reaching #1 on the Hot 100 chart provides a proxy for evaluating the relative diversity of popular music over time.

Data Sources

A dataset containing Hot 100 chart data will be used throughout the statistical analyses. Data from 1958 are excluded from the analyses conducted due to the fact that the entire year is not represented (the charts began in August of that year). Since only #1 songs for each week are of interest to these analyses, only these data will be displayed below.

Weekly #1 Hits Data set

display_output(ones, out_type)

The dataset was then manipulated to create a dataset containing the number of unique #1 songs and artists with #1 songs for each year of data. This dataset is displayed below, and the summary statistics below are calculated from this transformed dataset, as these data are used in the analyses presented below.

Annual #1 Hits Data set

display_output(counts, out_type)

Analysis Methods

Assumptions

All inferences are conducted using \(\alpha = 0.05\) unless stated otherwise. No adjustments for multiplicity are made as this is an exploratory analysis. Discrete variables are summarized with proportions and frequencies. Continuous variables are summarized using the following statistics:

Year

mean = 1987
median = 1987
standard deviation = 16.60
coefficient of variation = 0.0083
quantiles = 1959, 1973, 1987, 2001, 2015
minimum = 1959
maximum = 2015

Unique Songs Reaching #1 for each Year

mean = 19.25
median = 19
standard deviation = 7.23
coefficient of variation = .38
quantiles = 8, 14, 19, 24, 36
minimum = 8
maximum = 36

Number of Artists with Songs Reaching #1 for each Year

mean = 17.18
median = 16
standard deviation = 6.26
coefficient of variation = .36
quantiles = 7, 13, 16, 21, 35
minimum = 7
maximum = 35

Primary Objective Analysis

The primary objective analysis uses a regression analysis to determine the relationship between the number of unique songs reaching #1 on the Billboard Hot 100 chart and the year.

Secondary Objective Analysis

The secondary objective analysis uses a regression analysis to determine the relationship between the number of artists with songs reaching #1 on the Billboard Hot 100 chart and the year.

Primary Objective Results

# Below, you can see the results of the regression 
# predicting number of unique songs from year, as well as 
# a plot showing the scatterplot and line of best fit.

summary(results1)
## 
## Call:
## lm(formula = NumSongs ~ Year, data = counts)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.520  -3.972  -1.399   3.049  14.203 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 464.47889  100.16016   4.637 2.22e-05 ***
## Year         -0.22407    0.05041  -4.445 4.30e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.261 on 55 degrees of freedom
## Multiple R-squared:  0.2643, Adjusted R-squared:  0.2509 
## F-statistic: 19.76 on 1 and 55 DF,  p-value: 4.3e-05
plot_1

Secondary Objective Results

# Below, you can see the results of the regression
# predicting number of unique artists from year, as well as 
# a plot showing the scatterplot and line of best fit.
summary(results2)
## 
## Call:
## lm(formula = Artists ~ Year, data = counts)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.288 -4.063 -1.097  2.990 15.451 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 379.98196   88.36052   4.300 7.03e-05 ***
## Year         -0.18259    0.04447  -4.106 0.000135 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.523 on 55 degrees of freedom
## Multiple R-squared:  0.2346, Adjusted R-squared:  0.2207 
## F-statistic: 16.86 on 1 and 55 DF,  p-value: 0.0001346
plot_2

Conclusions and Discussion

Both of the analyses performed yielded statistically significant results for the regression coefficients (p < 0.01).

For each additional year that elapses, there are 0.224 fewer unique songs reaching the #1 spot in the Billboard Hot 100 chart. The year explains 25.09% of the variance in the number of unique songs reaching the #1 spot in the Billboard Hot 100 chart in a given year.

For each additional year that elapses, there are 0.183 fewer unique artists who have songs that reach the #1 spot in the Billboard Hot 100 chart. The year explains 22.07% of the variance in the number of artists who have songs that reach the #1 spot in the Billboard Hot 100 chart in a given year.

In sum, the Billboard Hot 100 Chart is becoming less diverse over time in terms of unique artists and songs per year.

Limitations of the study include naming conventions within the dataset (for example, “Artist A” and “Artist A featuring Artist B” would count as two unique artists for the purposes of this analysis). Additionally, this analysis only extends to the Hot 100 chart; trends may be dissimilar when examining other Billboard charts or other music charts. Finally, Billboard’s methodology for determining placement on the Hot 100 is not widely known and has likely changed over time, so chart positions have likely been obtained via different methods over time.

Future directions for this research could include similar analyses for other genres of Billboard charts, analyses of individual artists’ careers, and analyses of music charts from different sources (e.g., Spotify, Tidal, Apple Music, etc.).

All of the statistical analyses in this document will be performed using R version 3.2.4 Revised (2016-03-16 r70336). R packages used will be maintained using the packrat dependency management system.

sessionInfo()
## R version 3.2.4 Revised (2016-03-16 r70336)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_2.0.0 tidyr_0.4.1   dplyr_0.4.3   DT_0.1        knitr_1.13   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.3      magrittr_1.5     munsell_0.4.3    colorspace_1.2-6
##  [5] R6_2.1.2         stringr_1.0.0    plyr_1.8.3       tools_3.2.4     
##  [9] parallel_3.2.4   grid_3.2.4       gtable_0.1.2     DBI_0.3.1       
## [13] htmltools_0.3    lazyeval_0.1.10  yaml_2.1.13      assertthat_0.1  
## [17] digest_0.6.9     formatR_1.4      htmlwidgets_0.6  evaluate_0.9    
## [21] rmarkdown_1.0    labeling_0.3     stringi_1.0-1    scales_0.3.0    
## [25] jsonlite_0.9.19