Music serves as a vital component of our daily lives, permeating various aspects of our experiences. With the expansion of the music market and the increasing prevalence of data-driven insights, the music industry offers a plethora of data-rich opportunities for exploration.

This analysis will delve into the underlying factors and nuances that have contributed to the emergence of these general trends and patterns within the Billboard Hot 100, with a focus on identifying any significant shifts or fluctuations over time. By examining these intricate details, we can gain a deeper understanding of the complex interplay between music industry dynamics, cultural trends, technological innovations, and audience preferences that have shaped the trajectory of the Billboard Hot 100 throughout its history.

Why am I doing this project?

As someone fascinated by music’s impact on culture, I am drawn to the Billboard Hot 100 charts as a rich source of data and insights. Analyzing the charts will allow me to identify emerging trends, track changes in the industry over time. For me, the Billboard Hot 100 charts are not simply a list of popular songs, but rather a window into the cultural zeitgeist. Through careful analysis of the charts, I hope to identify emerging trends and patterns that reflect the evolving tastes and preferences of audiences. By tracking changes in the charts over time, I can gain insights into the ways in which music reflects and responds to broader cultural shifts and changes. I’m excited to use this project as an opportunity to gain a deeper understanding of the ways in which music reflects and shapes our world.

Why I chose to use RStudio:

This past semester, I learned how to use RStudio in QTM 150, and I found the platform to be incredibly versatile and user-friendly. I wanted to take the opportunity to practice and utilize the tools of RStudio for a project that would allow me to explore my own interests in greater depth.

One of the reasons I appreciate using RStudio is its powerful data analysis capabilities. The platform provides an extensive library of packages and resources that make it easy to manipulate and visualize data, and I find that RStudio’s flexibility allows me to work more efficiently and effectively on data projects.

What are the Billboard Charts?

In 1958, Billboard’s Hot 100 started as a tool for industry insiders, but it was groundbreaking because it amalgamated three distinct sources of information: sales of singles, jukebox plays, and Top 40 radio broadcasts. These measures revealed diverse aspects of a song’s appeal: sales reflected ardent supporters who purchased an artist’s work in the initial week, while airplay represented a more relaxed barometer of a song’s resonance over time.

Today, the Billboard chart has evolved from a tool for industry insiders and is no longer beneficial for ratings executives or radio program directors due to its lack of detailed data in the digital age. Instead, it has become a universally accepted standard that connects the industry with everyday people, much like the Dow Jones stock index (Molanphy, 2008). Moreover, it serves as a source of pride for artists, who can share their accomplishments on social media through the chart’s statistics.

Cleaning Data sets!

These are the libraries used in RStudio to clean and visualize the data for this project.

library(ggplot2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ✔ purrr   1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)

Cleaning Data sets!

These are the libraries used in RStudio to clean and visualize the data for this project.

spotify_audio_features <- read.csv("Hot 100 Audio Features.csv")
billboard_hot100 <- read.csv("Hot Stuff.csv")

Data Cleaning: For further analysis we need to delete NA’s and unnecessary columns, namely “SongID.x”, “spotify_genre”, “spotify_track_id”, “spotify_track_preview_url”, “url”, “SongID.y” and “Instance”.

billboard_total <- merge(spotify_audio_features, billboard_hot100, by=c("Song", "Performer"))

billboard_total <- na.omit(billboard_total)
billboard_total <- billboard_total[,-c(3, 4, 5, 6, 23, 26, 27)]

#Renaming Columns
billboard_total <- billboard_total %>% 
  rename(
    song = Song,
    performer = Performer, 
    track_album = spotify_track_album,
    track_explicit = spotify_track_explicit,
    track_duration_ms = spotify_track_duration_ms,
    date = WeekID,
    week_position = Week.Position,
    previous_week_position = Previous.Week.Position,
    peak_position = Peak.Position,
    weeks_on_chart = Weeks.on.Chart
    )

#rename mode and key to combine them
billboard_total$mode <- factor(billboard_total$mode, levels = c(0:1), labels = c("minor", "major"))
billboard_total$key <- factor(billboard_total$key, levels = c(0:11), labels = c("C", "C#","D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"))
billboard_total <- unite(billboard_total , key_signature, key, mode, sep = " ")

#convert miliseconds to seconds
billboard_total$track_duration_sec <-billboard_total$track_duration_ms / 1000
billboard_total <- billboard_total[,-c(5)]
billboard_total <- billboard_total %>% 
  relocate(track_duration_sec, .after = time_signature)

#Creation of ong Performer variable
song_performer <- billboard_total %>%
  select(song, performer) %>%
unite("song_performer", song:performer, sep = " | ")
billboard_total <- cbind(billboard_total, song_performer)
billboard_total <- billboard_total %>% 
  relocate(song_performer, .before = song)

#Year Variable
billboard_total$date <-  as.Date(billboard_total$date, format = "%m/%d/%Y")

billboard_total$year <- as.numeric(format(billboard_total$date, "%Y"))

billboard_total_distinct  <- distinct(billboard_total, song_performer, .keep_all = TRUE)

Final Working Data Set

#Final working dataset
str(billboard_total)
## 'data.frame':    259816 obs. of  24 variables:
##  $ song_performer          : chr  "'65 Love Affair | Paul Davis" "'65 Love Affair | Paul Davis" "'65 Love Affair | Paul Davis" "'65 Love Affair | Paul Davis" ...
##  $ song                    : chr  "'65 Love Affair" "'65 Love Affair" "'65 Love Affair" "'65 Love Affair" ...
##  $ performer               : chr  "Paul Davis" "Paul Davis" "Paul Davis" "Paul Davis" ...
##  $ track_duration_ms       : int  219813 219813 219813 219813 219813 219813 219813 219813 219813 219813 ...
##  $ track_explicit          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ danceability            : num  0.647 0.647 0.647 0.647 0.647 0.647 0.647 0.647 0.647 0.647 ...
##  $ energy                  : num  0.686 0.686 0.686 0.686 0.686 0.686 0.686 0.686 0.686 0.686 ...
##  $ key_signature           : chr  "D minor" "D minor" "D minor" "D minor" ...
##  $ loudness                : num  -4.25 -4.25 -4.25 -4.25 -4.25 ...
##  $ speechiness             : num  0.0274 0.0274 0.0274 0.0274 0.0274 0.0274 0.0274 0.0274 0.0274 0.0274 ...
##  $ acousticness            : num  0.432 0.432 0.432 0.432 0.432 0.432 0.432 0.432 0.432 0.432 ...
##  $ instrumentalness        : num  6.19e-06 6.19e-06 6.19e-06 6.19e-06 6.19e-06 6.19e-06 6.19e-06 6.19e-06 6.19e-06 6.19e-06 ...
##  $ liveness                : num  0.133 0.133 0.133 0.133 0.133 0.133 0.133 0.133 0.133 0.133 ...
##  $ valence                 : num  0.952 0.952 0.952 0.952 0.952 0.952 0.952 0.952 0.952 0.952 ...
##  $ tempo                   : num  156 156 156 156 156 ...
##  $ time_signature          : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ track_duration_sec      : num  220 220 220 220 220 ...
##  $ spotify_track_popularity: int  40 40 40 40 40 40 40 40 40 40 ...
##  $ date                    : Date, format: "1982-05-08" "1982-06-26" ...
##  $ week_position           : int  8 84 23 6 70 100 6 7 64 18 ...
##  $ previous_week_position  : int  9 70 36 6 18 95 7 8 82 9 ...
##  $ peak_position           : int  8 6 23 6 6 6 6 7 64 6 ...
##  $ weeks_on_chart          : int  11 18 5 14 17 20 13 12 2 16 ...
##  $ year                    : num  1982 1982 1982 1982 1982 ...

The variables in the final data set can be explained using the Audio Features found from the Spotify Web API.

Acousticness: A confidence measure from 0.0 to 1.0, indicating whether a track is acoustic, where a value of 1.0 represents high confidence in the acoustic nature of the track.

Energy: A measure from 0.0 to 1.0, which represents the perceptual measure of intensity and activity in a song.

Danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

Loudness: Represents the overall loudness of a track in decibels (dB), averaged across the entire track, and useful for comparing the relative loudness of tracks.

Speechiness: Detects the presence of spoken words in a track, with values above 0.66 indicating that the track is probably made entirely of spoken words, values between 0.33 and 0.66 describing tracks that may contain both music and speech, and values below 0.33 representing music and other non-speech-like tracks.

Duration: Represents the duration of a track in seconds.

Acousticness of songs from 1958-2019

acousticness_trend <- billboard_total_distinct %>%
  select(acousticness, year) %>%
  group_by(year) %>%
  summarize(acousticness_mean = mean(acousticness))
ggplot(acousticness_trend, aes(x = year, y = acousticness_mean)) + 
     geom_line(color = "Green", size = 1) + 
  stat_smooth(
  color = "Red", fill = "Red",
  method = "loess" # adding the smooth line 
  ) + theme_minimal() +
   theme (plot.title = element_text(hjust = 0.5, size = 12, face = "bold")) +
  labs(title = "Acousticness of Hot 100 Songs (1958-2019)", x="Year",y="Mean Acousticness")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## `geom_smooth()` using formula = 'y ~ x'

An interesting observation is the significant drop in acousticness in recent years. This trend could be attributed to the advancements in music production technology and the popularity of electronic and synthesized sounds. It appears that modern music production prioritizes the use of digital instruments and software-generated sounds over traditional acoustic instruments, leading to a departure from the traditional acoustic sound in popular music. As a result, the acousticness metric serves as a valuable marker for tracking the evolution of popular music styles and trends. This shift in acousticness also suggests a shift in the broader cultural landscape, as contemporary music reflects and responds to the changing times and technological advancements.

Energy of Songs from 1958-2019

energy_trend <- billboard_total_distinct %>%
  select(energy, year) %>%
  group_by(year) %>%
  summarize(energy_mean = mean(energy))
ggplot(energy_trend, aes(x = year, y = energy_mean)) + 
     geom_line(color = "Green", size = 1) + 
  stat_smooth(
  color = "Red", fill = "Red",
  method = "loess" # adding the smooth line 
  ) + theme_minimal() +
   theme (plot.title = element_text(hjust = 0.5, size = 12, face = "bold")) +
  labs(title = "Energy of Hot 100 Songs (1958-2019)", x="Year",y="Mean Energy")
## `geom_smooth()` using formula = 'y ~ x'

Over the years, the energy level of Billboard hits has experienced a significant surge, indicating a shift towards more dynamic and fast-paced songs. This trend can be attributed to various factors, including changing consumer preferences and advancements in music production technology.

Danceability of Songs from 1958-2019

danceability_trend <- billboard_total_distinct %>%
  select(danceability, year) %>%
  group_by(year) %>%
  summarize(danceability_mean = mean(danceability))
ggplot(danceability_trend, aes(x = year, y = danceability_mean)) + 
     geom_line(color = "Green", size = 1) + 
  stat_smooth(
  color = "Red", fill = "Red",
  method = "loess" # adding the smooth line 
  ) + theme_minimal() +
   theme (plot.title = element_text(hjust = 0.5, size = 12, face = "bold")) +
  labs(title = "Danceability of Hot 100 Songs (1958-2019)", x="Year",y="Mean danceability")
## `geom_smooth()` using formula = 'y ~ x'

Over the years, there has been a significant increase in the dance-ability of Billboard songs, rising from 0.55 in 1958 to nearly 0.67 in 2019. This suggests that there has been a shift in consumer music preferences towards more danceable music.

One possible explanation for the drop in danceability of non-hit songs around 2010 is the short-lived popularity of dubstep music (Vaughn, 2022). Dubstep is a genre of electronic dance music characterized by heavy basslines and a slower tempo, which may have contributed to the lower danceability scores. This theory suggests that the overall preference for danceable music has not necessarily decreased, but that a temporary shift in popular music trends resulted in lower danceability scores for non-hit songs.

Loudness of Songs from 1958-2019

loudness_trend <- billboard_total_distinct %>%
  select(loudness, year) %>%
  group_by(year) %>%
  summarize(loudness_mean = mean(loudness))


ggplot(loudness_trend, aes(x = year, y = loudness_mean)) + 
     geom_line(color = "Green", size = 1) + 
  stat_smooth(
  color = "Red", fill = "Red",
  method = "loess"
  ) + theme_minimal() +
   theme (plot.title = element_text(hjust = 0.5, size = 12, face = "bold")) +
  labs(title = "Loudness of Hot 100 Songs (1958-2019)", x="Year",y="Mean loudness")
## `geom_smooth()` using formula = 'y ~ x'

Over time, the music industry has seen a trend towards increasing volume in popular music, commonly referred to as the “loudness wars”. This phenomenon emerged in the 1990s as engineers began boosting the volume of tracks during the mixing and mastering process in order to make them stand out on the radio and in CD mixes. However, this increase in volume comes at the cost of increased compression, which can result in a loss of dynamic range and overall sound quality.

speechiness_trend <- billboard_total_distinct %>%
  select(speechiness, year) %>%
  group_by(year) %>%
  summarize(speechiness_mean = mean(speechiness))



ggplot(speechiness_trend, aes(x = year, y = speechiness_mean)) + 
     geom_line(color = "Green", size = 1) + 
  stat_smooth(
  color = "Red", fill = "Red",
  method = "loess"
  ) + theme_minimal() +
   theme (plot.title = element_text(hjust = 0.5, size = 12, face = "bold")) +
  labs(title = "Speechiness of Hot 100 Songs (1958-2019)", x="Year",y="Mean Speechiness")
## `geom_smooth()` using formula = 'y ~ x'

It has been observed that a substantial majority of the tracks exhibit a speechiness index in the range of 0 to 0.1. This indicates that a significant proportion of the musical pieces have a sparse amount of vocal content, with fewer lyrics and more emphasis on instrumental and melodic elements. Such a trend may reflect the evolving nature of modern music, which has seen a greater emphasis on instrumentation and musical composition as opposed to lyrical content. Additionally, it could also reflect a change in the preferences of the audience, with listeners increasingly drawn towards tracks that offer a more diverse range of auditory experiences beyond just the lyrics.

In the 1980s, the speechiness of popular music began to decline as electronic instrumentation became more prevalent. This shift in music production may have contributed to a decreased emphasis on lyrics and spoken-word sections.

In the early 2000s, the speechiness of popular music increased again as hip-hop and rap music gained in popularity. This genre is characterized by its heavy use of spoken-word sections and lyrics that often touch on themes such as social justice, personal struggle, and political commentary. ()

Duration of Songs From 1958-2019

track_duration_ms_trend <- billboard_total_distinct %>%
  select(track_duration_ms, year) %>%
  group_by(year) %>%
  summarize(track_duration_ms_mean = mean(track_duration_ms))


ggplot(track_duration_ms_trend, aes(x = year, y = track_duration_ms_mean)) + 
     geom_line(color = "Green", size = 1) + 
  stat_smooth(
  color = "Red", fill = "Red",
  method = "loess"
  ) + theme_minimal() +
   theme (plot.title = element_text(hjust = 0.5, size = 12, face = "bold")) +
  labs(title = "Duration of Hot 100 Songs (1958-2019)", x="Year",y="Mean Duration (ms)")
## `geom_smooth()` using formula = 'y ~ x'

The duration of each song is on average 225000 milliseconds (3 minutes and 45 seconds).

As observed, the duration of songs on the Billboard charts showed an upward trend until the 1990s, after which it began to decline. This discovery is consistent with earlier studies on the evolution of track duration over time. One such example of this research comes from UK record label Ostereo, which has found that the length of the average number one song has decreased by nearly 20% in the past two decades. According to Ostereo, this trend may be influenced by streaming platform algorithms, which tend to favor shorter songs (Bemrose, 2019). This is because streaming algorithms may perceive listeners skipping a song before it has ended as a sign of dissatisfaction, which can negatively impact the song’s performance on the platform.

Summary: The analysis of the Billboard Hot 100 charts reveals a fascinating interplay between music industry dynamics, cultural trends, technological innovations, and audience preferences. The decline in acousticness observed in recent years can be attributed to advancements in music production technology and the increasing popularity of electronic and synthesized sounds. The shift towards danceable music suggests an evolution in consumer preferences, while the fluctuating speechiness of tracks corresponds to the rise and fall of certain music genres.

This comprehensive analysis of the Billboard Hot 100 charts offers valuable insights into the ways in which music reflects and responds to broader cultural shifts and changes. By examining these intricate details, we gain a deeper understanding of the complex relationships between various factors that shape the trajectory of popular music throughout its history. Furthermore, the study highlights the importance of considering the multifaceted influences that contribute to the ever-evolving landscape of the music industry, providing a nuanced perspective on the role of music as a powerful cultural force in our daily lives.

Works Cited:

Bemrose, B. (2019, May 20). Song length: the Spotify effect. PRS for Music: royalties, music copyright and licensing. <https://www.prsformusic.com/m-magazine/featu re > s/song-length-the-spotify-effect

Molanphy, C. (2008, August 1). Bulls, Bears, And Bullets: 50 Years Of The “Billboard” Hot 100 - Chris Molanphy. Chris Molanphy.<https://chris.molanphy.com/bulls-bears-and-bullets-50-ye a> rs-of-the-billboard-hot-100/

Vaughn, B. (2022, October 5). The Decline Of Dubstep: Why The Popular Genre Is Losing Its Edge | Ben Vaughn. Ben Vaughn |. https://www.benvaughn.com/the-decline-of-dubste p-why-the-popular-genre-is-losing-its-edge/