I made this notebook in the fall of 2023 to get data points for my Data Dives podcast episode about Taylor Swift’s success. It analyzes data from Billboard Hot 100.

2024 UPDATE: You can listen to the episode here.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Import the data

hot100 <- read_csv("data-raw/hot-100-current.csv")
## Rows: 340000 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): title, performer
## dbl  (4): current_week, last_week, peak_pos, wks_on_chart
## date (1): chart_week
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Get a summary to see the date range.

I was making this podcast in September 2023, and I wanted it to see the exact date it was last updated so I could tell my listeners and get the most updated data at the time.

summary(hot100)
##    chart_week          current_week      title            performer        
##  Min.   :1958-08-04   Min.   :  1.0   Length:340000      Length:340000     
##  1st Qu.:1974-11-21   1st Qu.: 26.0   Class :character   Class :character  
##  Median :1991-03-05   Median : 51.0   Mode  :character   Mode  :character  
##  Mean   :1991-03-05   Mean   : 50.5                                        
##  3rd Qu.:2007-06-17   3rd Qu.: 75.0                                        
##  Max.   :2023-09-30   Max.   :100.0                                        
##                                                                            
##    last_week         peak_pos       wks_on_chart   
##  Min.   :  0.00   Min.   :  1.00   Min.   : 1.000  
##  1st Qu.: 23.00   1st Qu.: 13.00   1st Qu.: 4.000  
##  Median : 47.00   Median : 38.00   Median : 7.000  
##  Mean   : 47.34   Mean   : 40.74   Mean   : 9.274  
##  3rd Qu.: 71.00   3rd Qu.: 65.00   3rd Qu.:13.000  
##  Max.   :100.00   Max.   :100.00   Max.   :91.000  
##  NA's   :32460

The data ranges from August 4, 1958, to September 30, 2023.

Filter to see just Taylor Swift

taylor <- hot100 |> filter(performer == "Taylor Swift")

Which artists have had the most appearances overall?

hot100 |> group_by(performer) |> summarise(appearances = n()) |> arrange(desc(appearances))

Taylor Swift has had the most number of appearances, followed by Elton John and Madonna.

How many songs has Taylor had in the top 10?

top10 <- taylor |> 
  filter(peak_pos <= 10)

top10 |> 
  group_by(title) |> 
  summarise(appearances = n()) |> 
  arrange(desc(appearances))

Taylor has had 37 songs in the top 10.

What was her most recent song in the top 10?

top10 |> 
  arrange(desc(chart_week))

Cruel Summer was the most recent top 10 hit.

Which top 10 hit made Taylor surpass Madonna?

Arrange Taylor’s top 10 hits by chart week and use slice_min to find her 37th hit.

top10 |> 
  select(title, chart_week) |> 
  group_by(title) |> 
  slice_min(chart_week) |> 
  arrange(chart_week)

I Can See You (Taylor’s Version) (From The Vault) was her 37th top 10 hit

How many #1 hits has Taylor had?

taylor |> 
  filter(peak_pos == 1) |> 
  group_by(title) |> 
  summarise(appearances = n()) |> 
  arrange(desc(appearances))

Taylor has had 8 songs reach number 1.

How many songs in the top 10 has Madonna had?

madonna10 <- hot100 |> 
  filter(performer == "Madonna", peak_pos <= 10) |> 
  group_by(title) |> 
  summarise(appearances = n()) |> 
  arrange(desc(appearances))

madonna10

Madonna has had 36 songs in the top 10.

What years were these songs in the top 10?

I will use this statistic when I discuss Madonna being known as the “queen of pop” and if Taylor is the new holder of this title.

hot100 |> 
  filter(performer == "Madonna", peak_pos <= 10) 

Madonna had songs in the top 10 from 1984 to 2006.

How many unique songs has Taylor had on the chart overall?

taylor |>
  group_by(title) |> 
  summarise(appearances = n()) |> 
  arrange(desc(appearances))

Taylor has had 185 songs on the chart overall.

What was her most recent song on the chart?

taylor |>
  select(title, chart_week) |> 
  group_by(title) |> 
  slice_min(chart_week) |> 
  arrange(chart_week)

Taylor’s most recent song was “When Emma Falls In Love.”

How does Taylor compare to other artists when it comes to the number of songs on th chart?

hot100 |> 
  distinct(title, performer) |> 
  group_by(performer) |> 
  summarise(songs = n()) |> 
  arrange(desc(songs))

Taylor just beat Glee Cast by two songs. Drake, who ranked first in this category in 2022, comes in third place with 125 songs on the chart.