Justin Munoz
04/06/17
The Billboard 200 albums chart is a chart based purely on album sales of any genre and open to all artists. Nielsen’s Soundscan system is used to measure and determine album sales; this information is then used to chart albums accordingly in the Billboard 200. These album sales are based off digital and physical copies. In recent years, streaming services have become the standard platform changing the way consumers discover and listen to music. On December 13th, 2012 the inclusion of streaming counts were introduced into the Soundscan system and consequently, the Billboard 200 albums chart (Billboard Staff, 2014).
The aim of this assignment is to review the effects on album sales within different genres before and after the introduction of streaming counts. A Chi-square test of association will be used to distinguish the statistical significance of the impact of streaming counts. Importing data into R, a series of visual representations and suitable hypothesis testing can be used to review the corresponding association.
Data was compiled using Wikipedia, where all chart-topping albums are listed week by week for the Billboard 200 albums (annually). As streaming counts were only recently considered as of December 13th 2012, the data consists of all chart-topping albums 2 years after the introduction of streaming counts and all chart-topping albums 2 years before the introduction of streaming counts (Billboard Staff, 2014). In total, there are 136 albums used in this investigation, 73 albums before and 63 albums after. The data consists of YEAR (when charted), ALBUM NAME, ARTIST, GENRE and SALES (of albums sales in opening week). The genre and album sales of each album were also collected through Wikipedia, as each album has its own page with relevant information sourced directly from Billboard.com. The Billboard 200 albums chart is a chart used for all genres and the albums selected are over the same period of time, by using this selection process it limits any bias towards which albums were chosen.
Data Sources: link link link link link
billboard <- read.csv("/Users/Justin/Desktop/billboard.csv")
billboard$STREAM_introduction <- billboard$STREAM_introduction %>%
factor(levels=c('BEFORE','AFTER'), ordered=TRUE)
billboard$GENRE <- billboard$GENRE %>%
factor(levels=c("METAL","R&B","COUNTRY","HIP HOP","POP","ROCK"), ordered=TRUE)
table(billboard$GENRE, billboard$STREAM_introduction)
##
## BEFORE AFTER
## METAL 3 3
## R&B 8 10
## COUNTRY 12 4
## HIP HOP 11 18
## POP 20 11
## ROCK 19 17
table(billboard$GENRE, billboard$STREAM_introduction) %>% prop.table(margin = 2) %>% addmargins() %>% round(3)
##
## BEFORE AFTER Sum
## METAL 0.041 0.048 0.089
## R&B 0.110 0.159 0.268
## COUNTRY 0.164 0.063 0.228
## HIP HOP 0.151 0.286 0.436
## POP 0.274 0.175 0.449
## ROCK 0.260 0.270 0.530
## Sum 1.000 1.000 2.000
table <- table(billboard$GENRE, billboard$STREAM_introduction) %>% prop.table(margin = 2)
barplot(table,ylab="Proportion Within Group",legend.text = TRUE,
args.legend = list(x=5,y=0.4, bty = "n"),
ylim=c(0,.35), beside=TRUE, xlab="Genre", col = brewer.pal(6, name = "Blues"))
The association between genre and involvement of streaming counts are shown using a clustered bar chart. If the proportions of BEFORE and AFTER within each of the genres were the same, there would be no association. The bar chart above suggests otherwise. There are a few genres that appear to have been impacted by the introduction of streaming counts; R&B and HIP HOP have increased in album sales whereas POP and COUNTRY albums have slightly dropped. A Chi-square test of association will determine whether this relationship is statistically significant.
The statistical hypotheses for this Chi-square test of association can be written as follows:
H0: There is no association between genre and the introduction of streaming counts.
HA: There is an association between genre and the introduction of streaming counts.
Assumptions:
Decision Rules:
Reject H0: If p-value < 0.05 (α significance level)
Otherwise, fail to reject H0.
chi2 <- chisq.test(table(billboard$STREAM_introduction,billboard$GENRE))
chi2
##
## Pearson's Chi-squared test
##
## data: table(billboard$STREAM_introduction, billboard$GENRE)
## X-squared = 7.9435, df = 5, p-value = 0.1594
The observed values were…
chi2$observed
##
## METAL R&B COUNTRY HIP HOP POP ROCK
## BEFORE 3 8 12 11 20 19
## AFTER 3 10 4 18 11 17
The expected values were…
chi2$expected %>% round(2)
##
## METAL R&B COUNTRY HIP HOP POP ROCK
## BEFORE 3.22 9.66 8.59 15.57 16.64 19.32
## AFTER 2.78 8.34 7.41 13.43 14.36 16.68
The above table does include some expected counts below 5, although overall it is less than 25% of expected cell counts, therefore this assumption is met. The Chi-square statistic (x2) is calculated to be 7.95 as we can see from the Chi-squared test performed above, along with df = 9 and p-value = 0.159. As this p-value was more than 0.05 level of significance, we fail to reject H0.
A Chi-square test of association was used to test for a statistically significant association between genre and the introduction of streaming counts. The results of this investigation failed to find a statistically significant association, x2(df = 9) = 7.95, p = 0.159. These findings may reflect natural sampling variability assuming genre of music and the introduction of streaming counts are independent. A limitation for this study would be that the albums chosen were only chart-topping albums, perhaps using a large random sample of albums from before and after the introduction of streaming counts may produce different results as this study heavily reflects popular music.
Billboard Staff (2014), Billboard 200 Makeover: Album Chart to Incorporate Streams & Track Sales, [online] Billboard, available at: link [Accessed 4 Jun. 2017].