Week 7 | Data Dive — Hypothesis Testing

Loading my Billboard Hot 100 Number One’s Dataset

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)

tuesdata <- tidytuesdayR::tt_load(2025, week = 34)

## ---- Compiling #TidyTuesday Information for 2025-08-26 ----
## --- There are 2 files available ---
## 
## 
## ── Downloading files ───────────────────────────────────────────────────────────
## 
##   1 of 2: "billboard.csv"
##   2 of 2: "topics.csv"

billboard <- tuesdata$billboard
topics <- tuesdata$topics

Head of the Dataset

head(billboard)

## # A tibble: 6 × 105
##   song   artist date                weeks_at_number_one non_consecutive rating_1
##   <chr>  <chr>  <dttm>                            <dbl>           <dbl>    <dbl>
## 1 Poor … Ricky… 1958-08-04 00:00:00                   2               0        4
## 2 Nel B… Domen… 1958-08-18 00:00:00                   5               1        7
## 3 Littl… The E… 1958-08-25 00:00:00                   1               0        5
## 4 It's … Tommy… 1958-09-29 00:00:00                   6               0        3
## 5 It's … Conwa… 1958-11-10 00:00:00                   2               1        7
## 6 Tom D… The K… 1958-11-17 00:00:00                   1               0        5
## # ℹ 99 more variables: rating_2 <dbl>, rating_3 <dbl>, overall_rating <dbl>,
## #   divisiveness <dbl>, label <chr>, parent_label <chr>, cdr_genre <chr>,
## #   cdr_style <chr>, discogs_genre <chr>, discogs_style <chr>,
## #   artist_structure <dbl>, featured_artists <chr>,
## #   multiple_lead_vocalists <dbl>, group_named_after_non_lead_singer <dbl>,
## #   talent_contestant <chr>, posthumous <dbl>, artist_place_of_origin <chr>,
## #   front_person_age <dbl>, artist_male <dbl>, artist_white <dbl>, …

head(topics)

## # A tibble: 6 × 1
##   lyrical_topics   
##   <chr>            
## 1 Addiction        
## 2 Anger            
## 3 Appreciation     
## 4 Badassery        
## 5 Bad Behavior     
## 6 Bad Relationships

Cleaning the cdr_genre column and creating primary_genre

billboard <- billboard |>
  mutate(
    primary_genre = str_split_i(cdr_genre, ";", 1)
  )

billboard |>
  select(cdr_genre, primary_genre) |>
distinct()

## # A tibble: 33 × 2
##    cdr_genre          primary_genre
##    <chr>              <chr>        
##  1 Pop;Rock           Pop          
##  2 Pop                Pop          
##  3 Rock               Rock         
##  4 Folk/Country       Folk/Country 
##  5 Folk/Country;March Folk/Country 
##  6 Pop;Folk/Country   Pop          
##  7 Jazz               Jazz         
##  8 Funk/Soul;Rock     Funk/Soul    
##  9 Polka              Polka        
## 10 Funk/Soul          Funk/Soul    
## # ℹ 23 more rows

Converting ‘date’ to ‘year’ for Future Analysis

billboard <- billboard |>
  mutate(
    date = ymd(date),
    year = year(date)
  )

billboard |>
  count(year)

## # A tibble: 68 × 2
##     year     n
##    <dbl> <int>
##  1  1958     8
##  2  1959    15
##  3  1960    19
##  4  1961    21
##  5  1962    19
##  6  1963    20
##  7  1964    23
##  8  1965    25
##  9  1966    27
## 10  1967    18
## # ℹ 58 more rows

Main Variables and Ways to Define Groups A & B for my Null Hypotheses

Null Hypothesis 1: Neyman-Pearson Framework

Research Question: Do songs released after 2010 stay on the Billboard Hot 100 #1 charts longer than songs released prior to 2010?

Null and Alternative Hypotheses:

Null: Mean weeks on the chart is equal for pre-2010 and post-2010 songs
Alternative: Mean weeks on the chart differs (greater for post-2010)

Main Variable (continuous):

weeks_at_number_one

Way to Define Groups A & B:

Group A: songs released before 2010
Group B: songs released in 2010 or later

Establishing the Eras (pre and post 2010)

billboard <- billboard |>
  mutate(era = if_else(year < 2010, "pre_2010", "post_2010"))

Neyman-Pearson Design Choices and Explanation

alpha <- 0.05

power <- 0.80

effect_size <- 0.3

library(pwr)

pwr.t.test(d = effect_size, sig.level = alpha, power = power, type = "two.sample")

## 
##      Two-sample t test power calculation 
## 
##               n = 175.3847
##               d = 0.3
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

billboard |> count(era)

## # A tibble: 2 × 2
##   era           n
##   <chr>     <int>
## 1 post_2010   198
## 2 pre_2010    979

Justification for Alpha, Power, Effect Size, and Why the Study is Sufficient to Continue the Hypothesis Test

The alpha of 0.05 was chosen here in order to balance the risk of false positives with the end goal being to identify meaningful differences in chart longevity. For this test, concluding that songs from different eras (pre and post 2010) differ in their time on the Billboard charts incorrectly would be undesirable but not have extreme consequences, showing that the alpha of 0.05 is appropriate here. The power was set to 0.80 to make sure there was a higher probability of determining a true difference if one exists in this context. Due to the cultural and industry relevance of chart performance, failing to detect a difference between eras (Type 2 Error) would limit how useful this analysis is, so a sufficient power was prioritized here. As for Cohen’s d or effect size, I utilized 0.3 to show that even small differences in weeks on the chart may be meaningful in the music industry, where smaller changes in exposure can have significant cumulative effects. Lastly, the power analysis above indicates that each group (pre and post 2010) needed to have at least 176 observations in order to achieve the desired power. Just after, we see that there are 198 post_2010 songs and 979 pre_2010 songs, meeting the requirement for both groups. So, this study is sufficient in order to conduct the originally planned hypothesis test.

T-test for ‘weeks_at_number_one’ and ‘era’ —> post_2010 & pre_2010

t.test(weeks_at_number_one ~ era, data = billboard)

## 
##  Welch Two Sample t-test
## 
## data:  weeks_at_number_one by era
## t = 4.2554, df = 227.29, p-value = 3.052e-05
## alternative hypothesis: true difference in means between group post_2010 and group pre_2010 is not equal to 0
## 95 percent confidence interval:
##  0.6366849 1.7347825
## sample estimates:
## mean in group post_2010  mean in group pre_2010 
##                3.924242                2.738509

Conclusion for Neyman-Pearson Null Hypothesis

Using a two sample t-test with an alpha of 0.5, we reject the null hypothesis that songs released before 2010 and after 2010 spend the same average number of weeks at number one on the Billboard charts. The p-value of 3.052 x 10^-5 is far below the chosen significance level, which indicates strong evidence against the null hypothesis. Songs released after 2010 on average spent more weeks at number one, 3.92, than songs released before 2010, 2.74, and the estimated difference in means is about 1.19 weeks with a 95% confidence interval ranging from 0.64 to 1.73 weeks, so the true mean difference is roughly in that range. Due to the power of 0.80 to detect a smaller effect size of 0.3 and more than enough samples in each group, we are confident that this result reflects a meaningful difference in chart performance and longevity rather than sampling variability.

Null Hypothesis 2: Fisher’s Significance Testing Framework

Research Question: Are #1 songs released after 2010 more likely to involve featured artists (collaborations) than #1 songs released before 2010?

Null and Alternative Hypotheses:

Null: The presence of featured artists is independent of era (pre vs. post 2010)
Alternative: The presence of featured artists differs by era

Main Variable (binary):

collab —> Yes or No (created from ‘featured_artists’ with NA or a featured artist

Way to Define Groups A & B:

Group A: songs released before 2010
Group B: songs released in or after 2010

billboard <- billboard |>
  mutate(
    collab = if_else(is.na(featured_artists), "No Feature", "Has Feature")
  )

tab <- table(billboard$era, billboard$collab)
tab

##            
##             Has Feature No Feature
##   post_2010          68        130
##   pre_2010          121        858

fisher.test(tab)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  tab
## p-value = 1.708e-12
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.566209 5.327473
## sample estimates:
## odds ratio 
##   3.703794

Conclusion for Fisher’s Significance Testing Hypothesis

In order to determine whether collaborations among Billboard Hot 100 #1 songs differ by era, I conducted a Fisher’s Exact Test comparing songs released before 2010 to those released after 2010. To reiterate, the null hypothesis stated that collaboration status, or the presence of a featured artist, is independent of era. The resulting p-value, 1.708 x 10^-12, provides rather strong evidence against the null hypothesis above, so we reject the null hypothesis and conclude that collaboration rates differ significantly between eras. The sample estimates odds ratio of roughly 3.7 indicates that #1 songs released after 2010 are about 3.7 times more likely to include a feature artist compared to those released before 2010. Then, the 95% confidence interval for the odds ratio, ~2.57 and ~5.33, does not include 1, providing further support for this conclusion. Due to this analysis including a large number of songs across multiple decades and using Fisher’s Exact Test, we can be confident that the resulting difference portrays a meaningful shift in collaboration patterns overall.

Building Two Visualizations that Best Illustrate the Results

Null Hypothesis 1: Neyman-Pearson Framework Visualization

library(ggplot2)

ggplot(billboard, aes(x = era, y = weeks_at_number_one, fill = era)) +
  geom_boxplot(alpha = 0.6, outlier.alpha = 0.3) +
  stat_summary(fun = mean, geom = "point", shape = 23, size = 3, fill = "white") +
  labs(
    title = "Weeks at Number One by Era: Post-2010 vs. Pre-2010",
    x = "Era",
    y = "Weeks at Number One"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Null Hypothesis 2: Fisher’s Significance Testing Framework Visualization

billboard <- billboard |>
  mutate(collab = if_else(is.na(featured_artists), "No Feature", "Has Feature"))

prop_data <- billboard |>
  group_by(era, collab) |>
  summarise(n = n(), .groups = "drop") |>
  group_by(era) |>
  mutate(prop = n / sum(n))

ggplot(prop_data, aes(x = era, y = prop, fill = collab)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Share of Number One Songs with Featured Artists by Era",
    x = "Era",
    y = "Percentage of Songs",
    fill = "Feature Status"
  ) +
  theme_minimal()

Week7DataDive

Grant Starnes

2026-02-24

Week 7 | Data Dive — Hypothesis Testing

Loading my Billboard Hot 100 Number One’s Dataset

Head of the Dataset

Cleaning the cdr_genre column and creating primary_genre

Converting ‘date’ to ‘year’ for Future Analysis

Main Variables and Ways to Define Groups A & B for my Null Hypotheses

Null Hypothesis 1: Neyman-Pearson Framework

Establishing the Eras (pre and post 2010)

Neyman-Pearson Design Choices and Explanation

Justification for Alpha, Power, Effect Size, and Why the Study is Sufficient to Continue the Hypothesis Test

T-test for ‘weeks_at_number_one’ and ‘era’ —> post_2010 & pre_2010

Conclusion for Neyman-Pearson Null Hypothesis

Null Hypothesis 2: Fisher’s Significance Testing Framework

Conclusion for Fisher’s Significance Testing Hypothesis

Building Two Visualizations that Best Illustrate the Results

Null Hypothesis 1: Neyman-Pearson Framework Visualization

Null Hypothesis 2: Fisher’s Significance Testing Framework Visualization