Final-project.knit

Intro

The purpose of this analysis is to elucidate the hidden structure behind the 20 top track of 2022 (deemed by Spotify) by analyzing both the songs’ lyrics and their innert qualities surrounding its composition.

This project is only a descriptive analysis to “sketch” an outline for what a song should possess to have had a chance at success in 2022. This outline is to be compared to in order to predict, it is not predicting in itself.

The project can be extended further by creating a Lyric Generator by training a Long Short Term Memory mode.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.2     ✔ purrr   1.0.1
## ✔ tibble  3.2.1     ✔ dplyr   1.1.2
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(tidytext) #text mining
library(rvest) #web scrape

## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding

library(httr) #web scrape
library(wordcloud) #word cloud visualization

## Loading required package: RColorBrewer

Extracting Lyrics from Genius

The lyrics extracted from Genius coincide to the songs prevalent in Spotify’s most-streamed songs in the U.S. for 2022. The extracted lyrics are then broken down into each section and assigned its own vector. This is to help train the later model in generating lyrics to specific sections of a song

As It Was

hstyles_extracted <- read_html("https://genius.com/Harry-styles-as-it-was-lyrics")
hstyles_lyrics <- html_nodes(hstyles_extracted, ".Dzxov") %>% html_text() 
hstyles_lyrics <- paste(hstyles_lyrics[1], hstyles_lyrics[2])


hstyles_intro <- str_extract(hstyles_lyrics, "\\[Intro.*?\\].*?\\[")
hstyles_verse1 <- str_extract(hstyles_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
hstyles_prechorus <- str_extract(hstyles_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
hstyles_chorus <- str_extract(hstyles_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
hstyles_postchorus <- str_extract(hstyles_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
hstyles_verse2 <- str_extract(hstyles_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
hstyles_bridge <- str_extract(hstyles_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
hstyles_outro <- str_extract(hstyles_lyrics, "\\[Outro.*?\\].*")

Heat Waves

ganimals_extracted <- read_html("https://genius.com/Glass-animals-heat-waves-lyrics")
ganimals_lyrics <- html_nodes(ganimals_extracted, ".Dzxov") %>% html_text() 
ganimals_lyrics <- paste(ganimals_lyrics[1], ganimals_lyrics[2], ganimals_lyrics[3])

ganimals_intro <- str_extract(ganimals_lyrics, "\\[Intro.*?\\].*?\\[")
ganimals_verse1 <- str_extract(ganimals_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
ganimals_prechorus <- str_extract(ganimals_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
ganimals_chorus <- str_extract(ganimals_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
ganimals_postchorus <- str_extract(ganimals_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
ganimals_verse2 <- str_extract(ganimals_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
ganimals_bridge <- str_extract(ganimals_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
ganimals_outro <- str_extract(ganimals_lyrics, "\\[Outro.*?\\].*")

Bad Habit

slacy_extracted <- read_html("https://genius.com/Steve-lacy-bad-habit-lyrics")
slacy_lyrics <- html_nodes(slacy_extracted, ".Dzxov") %>% html_text() 
slacy_lyrics <- paste(slacy_lyrics[1], slacy_lyrics[2], slacy_lyrics[3])

slacy_intro <- str_extract(slacy_lyrics, "\\[Intro.*?\\].*?\\[")
slacy_verse1 <- str_extract(slacy_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
slacy_prechorus <- str_extract(slacy_lyrics, "\\[Pre-Chorus\\].*?(?=\\[)")
slacy_chorus <- str_extract(slacy_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
slacy_postchorus <-  str_extract(slacy_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
slacy_verse2 <- str_extract(slacy_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
slacy_bridge <- str_extract(slacy_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
slacy_instrumental <- str_extract(slacy_lyrics, "\\[Instrumental Break*?\\].*")
slacy_outro <- str_extract(slacy_lyrics, "\\[Outro.*?\\].*")

First Class

jharlow_extracted <- read_html("https://genius.com/Jack-harlow-first-class-lyrics")
jharlow_lyrics <- html_nodes(jharlow_extracted, ".Dzxov") %>% html_text() 
jharlow_lyrics <- paste(jharlow_lyrics[1], jharlow_lyrics[2])

jharlow_intro <- str_extract(jharlow_lyrics, "\\[Intro.*?\\].*?\\[")
jharlow_verse1 <- str_extract(jharlow_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
jharlow_prechorus <- str_extract(jharlow_lyrics, "\\[Pre-Chorus\\].*?(?=\\[)")
jharlow_chorus <- str_extract(jharlow_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
jharlow_postchorus <- str_extract(jharlow_lyrics, "\\[Post-Chorus\\].*?(?=\\[)")
jharlow_verse2 <- str_extract(jharlow_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
jharlow_bridge <- str_extract(jharlow_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
jharlow_outro <- str_extract(jharlow_lyrics, "\\[Outro.*?\\].*")

STAY

klaroi_extracted <- read_html("https://genius.com/The-kid-laroi-and-justin-bieber-stay-lyrics")
klaroi_lyrics <- html_nodes(klaroi_extracted, ".Dzxov") %>% html_text() 
klaroi_lyrics <- paste(klaroi_lyrics[1], klaroi_lyrics[2])

klaroi_intro <- str_extract(klaroi_lyrics, "\\[Intro.*?\\].*?\\[")
klaroi_verse1 <- str_extract(klaroi_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
klaroi_prechorus <- str_extract(klaroi_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
klaroi_chorus <- str_extract(klaroi_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
klaroi_postchorus <- str_extract(klaroi_lyrics, "\\[Post-Chorus\\].*?(?=\\[)")
klaroi_verse2 <- str_extract(klaroi_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
klaroi_bridge <- str_extract(klaroi_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
klaroi_outro <- str_extract(klaroi_lyrics, "\\[Outro.*?\\].*")

Running Up That Hill (A Deal With God)

kbush_extracted <- read_html("https://genius.com/Kate-bush-running-up-that-hill-a-deal-with-god-lyrics")
kbush_lyrics <- html_nodes(kbush_extracted, ".Dzxov") %>% html_text() 
kbush_lyrics <- paste(kbush_lyrics[1], kbush_lyrics[2], kbush_lyrics[3])

kbush_intro <- str_extract(kbush_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
kbush_verse1 <- str_extract(kbush_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
kbush_prechorus <- str_extract(kbush_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
kbush_chorus <- str_extract(kbush_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
kbush_verse2 <- str_extract(kbush_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
kbush_postchorus <- str_extract(kbush_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
kbush_bridge <- str_extract(kbush_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
kbush_outro <- str_extract(kbush_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

Sweater Weather

neighbourhood_extracted <- read_html("https://genius.com/The-neighbourhood-sweater-weather-lyrics")
neighbourhood_lyrics <- html_nodes(neighbourhood_extracted, ".Dzxov") %>% html_text() 
neighbourhood_lyrics <- paste(neighbourhood_lyrics[1], neighbourhood_lyrics[2], neighbourhood_lyrics[2])

neighbourhood_intro <- str_extract(neighbourhood_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
neighbourhood_verse1 <- str_extract(neighbourhood_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
neighbourhood_prechorus <- str_extract(neighbourhood_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
neighbourhood_chorus <- str_extract(neighbourhood_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
neighbourhood_postchorus <- str_extract(neighbourhood_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
neighbourhood_verse2 <- str_extract(neighbourhood_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
neighbourhood_bridge <- str_extract(neighbourhood_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
neighbourhood_outro <- str_extract(neighbourhood_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

Dark Red

slacy2_extracted <- read_html("https://genius.com/Steve-lacy-dark-red-lyrics")
slacy2_lyrics <- html_nodes(slacy2_extracted, ".Dzxov") %>% html_text() 
slacy2_lyrics <- paste(slacy2_lyrics[1], slacy2_lyrics[2])

slacy2_intro <- str_extract(slacy2_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
slacy2_verse1 <- str_extract(slacy2_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
slacy2_prechorus <- str_extract(slacy2_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
slacy2_chorus <- str_extract(slacy2_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
slacy2_verse2 <- str_extract(slacy2_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
slacy2_postchorus <- str_extract(slacy2_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
slacy2_bridge <- str_extract(slacy2_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
slacy2_outro <- str_extract(slacy2_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

We Don’t Talk About Bruno

encanto_extracted <- read_html("https://genius.com/Carolina-gaitan-mauro-castillo-adassa-rhenzy-feliz-diane-guerrero-and-stephanie-beatriz-we-dont-talk-about-bruno-lyrics")
encanto_lyrics <- html_nodes(encanto_extracted, ".Dzxov") %>% html_text() 
encanto_lyrics <- paste(encanto_lyrics[1], encanto_lyrics[2], encanto_lyrics[3])
encanto_intro <- str_extract(encanto_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
encanto_verse1 <- str_extract(encanto_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
encanto_prechorus <- str_extract(encanto_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
encanto_chorus <- str_extract(encanto_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
encanto_verse2 <- str_extract(encanto_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
encanto_postchorus <- str_extract(encanto_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
encanto_bridge <- str_extract(encanto_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
encanto_outro <- str_extract(encanto_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

Industry Baby

lilnasx_extracted <- read_html("https://genius.com/Lil-nas-x-and-jack-harlow-industry-baby-lyrics")
lilnasx_lyrics <- html_nodes(lilnasx_extracted, ".Dzxov") %>% html_text() 
lilnasx_lyrics <- paste(lilnasx_lyrics[1], lilnasx_lyrics[2], lilnasx_lyrics[3])

lilnasx_intro <- str_extract(lilnasx_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
lilnasx_verse1 <- str_extract(lilnasx_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
lilnasx_prechorus <- str_extract(lilnasx_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
lilnasx_chorus <- str_extract(lilnasx_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
lilnasx_verse2 <- str_extract(lilnasx_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
lilnasx_postchorus <- str_extract(lilnasx_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
lilnasx_bridge <- str_extract(lilnasx_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
lilnasx_outro <- str_extract(lilnasx_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

Glimpse of Us

joji_extracted <- read_html("https://genius.com/Joji-glimpse-of-us-lyrics")
joji_lyrics <- html_nodes(joji_extracted, ".Dzxov") %>% html_text() 
joji_lyrics <- paste(joji_lyrics[1], joji_lyrics[2], joji_lyrics[3])

joji_intro <- str_extract(joji_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
joji_verse1 <- str_extract(joji_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
joji_prechorus <- str_extract(joji_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
joji_chorus <- str_extract(joji_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
joji_verse2 <- str_extract(joji_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
joji_postchorus <- str_extract(joji_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
joji_bridge <- str_extract(joji_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
joji_outro <- str_extract(joji_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

No Role Modelz

jcole_extracted <- read_html("https://genius.com/J-cole-no-role-modelz-lyrics")
jcole_lyrics <- html_nodes(jcole_extracted, ".Dzxov") %>% html_text() 
jcole_lyrics <- paste(jcole_lyrics[1], jcole_lyrics[2], jcole_lyrics[3])

jcole_intro <- str_extract(jcole_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
jcole_verse1 <- str_extract(jcole_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
jcole_prechorus <- str_extract(jcole_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
jcole_chorus <- str_extract(jcole_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
jcole_verse2 <- str_extract(jcole_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
jcole_postchorus <- str_extract(jcole_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
jcole_bridge <- str_extract(jcole_lyrics, "\\[Bridge\\].*?(?=\\[)")
jcole_outro <- str_extract(jcole_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

Super Gremlin

kodak_extracted <- read_html("https://genius.com/Kodak-black-super-gremlin-lyrics")
kodak_lyrics <- html_nodes(kodak_extracted, ".Dzxov") %>% html_text() 
kodak_lyrics <- paste(kodak_lyrics[1], kodak_lyrics[2], kodak_lyrics[3])

kodak_intro <- str_extract(kodak_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
kodak_verse1 <- str_extract(kodak_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
kodak_prechorus <- str_extract(kodak_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
kodak_chorus <- str_extract(kodak_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
kodak_verse2 <- str_extract(kodak_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
kodak_postchorus <- str_extract(kodak_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
kodak_bridge <- str_extract(kodak_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
kodak_outro <- str_extract(kodak_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

Knife Talk

drake_extracted <- read_html("https://genius.com/Drake-knife-talk-lyrics")
drake_lyrics <- html_nodes(drake_extracted, ".Dzxov") %>% html_text() 
drake_lyrics <- paste(drake_lyrics[1], drake_lyrics[2], drake_lyrics[3])

drake_intro <- str_extract(drake_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
drake_verse1 <- str_extract(drake_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
drake_prechorus <- str_extract(drake_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
drake_chorus <- str_extract(drake_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
drake_verse2 <- str_extract(drake_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
drake_postchorus <- str_extract(drake_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
drake_bridge <- str_extract(drake_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
drake_outro <- str_extract(drake_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

WAIT FOR U

future_extracted <- read_html("https://genius.com/Future-wait-for-u-lyrics")
future_lyrics <- html_nodes(future_extracted, ".Dzxov") %>% html_text() 
future_lyrics <- paste(future_lyrics[1], future_lyrics[2], future_lyrics[3])

future_intro <- str_extract(future_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
future_verse1 <- str_extract(future_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
future_prechorus <- str_extract(future_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
future_chorus <- str_extract(future_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
future_verse2 <- str_extract(future_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
future_postchorus <- str_extract(future_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
future_bridge <- str_extract(future_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
future_outro <- str_extract(future_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

About Damn Time

lizzo_extracted <- read_html("https://genius.com/Lizzo-about-damn-time-lyrics")
lizzo_lyrics <- html_nodes(lizzo_extracted, ".Dzxov") %>% html_text() 
lizzo_lyrics <- paste(lizzo_lyrics[1], lizzo_lyrics[2], lizzo_lyrics[3])

lizzo_intro <- str_extract(lizzo_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
lizzo_verse1 <- str_extract(lizzo_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
lizzo_prechorus <- str_extract(lizzo_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
lizzo_chorus <- str_extract(lizzo_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
lizzo_verse2 <- str_extract(lizzo_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
lizzo_postchorus <- str_extract(lizzo_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
lizzo_bridge <- str_extract(lizzo_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
lizzo_outro <- str_extract(lizzo_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

Enemy

idragons_extracted <- read_html("https://genius.com/Imagine-dragons-and-jid-enemy-lyrics")
idragons_lyrics <- html_nodes(idragons_extracted, ".Dzxov") %>% html_text() 
idragons_lyrics <- paste(idragons_lyrics[1], idragons_lyrics[2], idragons_lyrics[3])

idragons_intro <- str_extract(idragons_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
idragons_verse1 <- str_extract(idragons_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
idragons_prechorus <- str_extract(idragons_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
idragons_chorus <- str_extract(idragons_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
idragons_verse2 <- str_extract(idragons_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
idragons_postchorus <- str_extract(idragons_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
idragons_bridge <- str_extract(idragons_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
idragons_outro <- str_extract(idragons_lyrics, "\\[Outro\\].*?(?=\\[)")

505

amonkeys_extracted <- read_html("https://genius.com/Arctic-monkeys-505-lyrics")
amonkeys_lyrics <- html_nodes(amonkeys_extracted, ".Dzxov") %>% html_text() 
amonkeys_lyrics <- paste(amonkeys_lyrics[1], amonkeys_lyrics[2], amonkeys_lyrics[3])

amonkeys_intro <- str_extract(amonkeys_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
amonkeys_verse1 <- str_extract(amonkeys_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
amonkeys_prechorus <- str_extract(amonkeys_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
amonkeys_chorus <- str_extract(amonkeys_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
amonkeys_verse2 <- str_extract(amonkeys_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
amonkeys_postchorus <- str_extract(amonkeys_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
amonkeys_bridge <- str_extract(amonkeys_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
amonkeys_outro <- str_extract(amonkeys_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

I Love You So

walters_extracted <- read_html("https://genius.com/The-walters-i-love-you-so-lyrics")
walters_lyrics <- html_nodes(walters_extracted, ".Dzxov") %>% html_text() 
walters_lyrics <- paste(walters_lyrics[1], walters_lyrics[2], walters_lyrics[3])

walters_intro <- str_extract(walters_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
walters_verse1 <- str_extract(walters_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
walters_prechorus <- str_extract(walters_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
walters_chorus <- str_extract(walters_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
walters_verse2 <- str_extract(walters_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
walters_postchorus <- str_extract(walters_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
walters_bridge <- str_extract(walters_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
walters_outro <- str_extract(walters_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

good 4 u

orodrigo_extracted <- read_html("https://genius.com/Olivia-rodrigo-good-4-u-lyrics")
orodrigo_lyrics <- html_nodes(orodrigo_extracted, ".Dzxov") %>% html_text() 
orodrigo_lyrics <- paste(orodrigo_lyrics[1], orodrigo_lyrics[2], orodrigo_lyrics[3])

orodrigo_intro <- str_extract(orodrigo_lyrics, "\\[Intro.*?\\].*?(?=\\[)")
orodrigo_verse1 <- str_extract(orodrigo_lyrics, "\\[Verse 1.*?\\].*?(?=\\[)")
orodrigo_prechorus <- str_extract(orodrigo_lyrics, "\\[Pre-Chorus.*?\\].*?(?=\\[)")
orodrigo_chorus <- str_extract(orodrigo_lyrics, "\\[Chorus.*?\\].*?(?=\\[)")
orodrigo_verse2 <- str_extract(orodrigo_lyrics, "\\[Verse 2.*?\\].*?(?=\\[)")
orodrigo_postchorus <- str_extract(orodrigo_lyrics, "\\[Post-Chorus.*?\\].*?(?=\\[)")
orodrigo_bridge <- str_extract(orodrigo_lyrics, "\\[Bridge.*?\\].*?(?=\\[)")
orodrigo_outro <- str_extract(orodrigo_lyrics, "\\[Outro.*?\\].*?(?=\\[)")

combining lyrics by sections

intro <- c(hstyles_intro, ganimals_intro, slacy_intro, jharlow_intro, klaroi_intro, kbush_intro, neighbourhood_intro, slacy2_intro, encanto_intro, lilnasx_intro, joji_intro, jcole_intro, kodak_intro, drake_intro, future_intro, lizzo_intro, idragons_intro, amonkeys_intro, walters_intro, orodrigo_intro)
intro <- str_remove_all(intro, "\\[Intro.*?]|,|\\[,|\\(|\\)")
intro <-  gsub("([A-Z])", "\n\\1", intro)

verse1 <- c(hstyles_verse1, ganimals_verse1, slacy_verse1, jharlow_verse1, klaroi_verse1, kbush_verse1, neighbourhood_verse1, slacy2_verse1, encanto_verse1, lilnasx_verse1, joji_verse1, jcole_verse1, kodak_verse1, drake_verse1, future_verse1, lizzo_verse1, idragons_verse1, amonkeys_verse1, walters_verse1, orodrigo_verse1)
verse1 <- str_remove_all(verse1, "\\[Verse 1.*?]|,|\\[,|\\(|\\)")
verse1 <- gsub("([A-Z])", " \\1", verse1)

pre_chorus <- c(hstyles_prechorus, ganimals_prechorus, slacy_prechorus, jharlow_prechorus, klaroi_prechorus, kbush_prechorus, neighbourhood_prechorus, slacy2_prechorus, encanto_prechorus, lilnasx_prechorus, joji_prechorus, jcole_prechorus, kodak_prechorus, drake_prechorus, future_prechorus, lizzo_prechorus, idragons_prechorus, amonkeys_prechorus, walters_prechorus, orodrigo_prechorus)
pre_chorus <- str_remove_all(pre_chorus, "\\[Pre-Chorus.*?]|,|\\[,|\\(|\\)")
pre_chorus <-  gsub("([A-Z])", "\n\\1", pre_chorus)

chorus <- c(hstyles_chorus, ganimals_chorus, slacy_chorus, jharlow_chorus, klaroi_chorus, kbush_chorus, neighbourhood_chorus, slacy2_chorus, encanto_chorus, lilnasx_chorus, joji_chorus, jcole_chorus, kodak_chorus, drake_chorus, future_chorus, lizzo_chorus, idragons_chorus, amonkeys_chorus, walters_chorus, orodrigo_chorus)
chorus <- str_remove_all(chorus, "\\[Chorus.*?]|[^[:alnum:]\\s]")
chorus <-  gsub("([A-Z])", "\n\\1", chorus)

post_chorus <- c(hstyles_postchorus, ganimals_postchorus, slacy_postchorus, jharlow_postchorus, klaroi_postchorus, kbush_postchorus, neighbourhood_postchorus, slacy2_postchorus, encanto_postchorus, lilnasx_postchorus, joji_postchorus, jcole_postchorus, kodak_postchorus, drake_postchorus, future_postchorus, lizzo_postchorus, idragons_postchorus, amonkeys_postchorus, walters_postchorus, orodrigo_postchorus)
post_chorus <- str_remove_all(post_chorus, "\\[Post-Chorus.*?]|[^[:alnum:]\\s]")
post_chorus <-  gsub("([A-Z])", "\n\\1", post_chorus)

verse2 <- c(hstyles_verse2, ganimals_verse2, slacy_verse2, jharlow_verse2, klaroi_verse2, kbush_verse2, neighbourhood_verse2, slacy2_verse2, encanto_verse2, lilnasx_verse2, joji_verse2, jcole_verse2, kodak_verse2, drake_verse2, future_verse2, lizzo_verse2, idragons_verse2, amonkeys_verse2, walters_verse2, orodrigo_verse2)
verse2 <- str_remove_all(verse2, "\\[Verse 2.*?]|\\(|\\)|\"|,|\\[,|\\(|\\)")
verse2 <- gsub("([A-Z])", " \\1", verse2)

bridge  <- c(hstyles_bridge, ganimals_bridge, slacy_bridge, jharlow_bridge, klaroi_bridge, kbush_bridge, neighbourhood_bridge, slacy2_bridge, encanto_bridge, lilnasx_bridge, joji_bridge, jcole_bridge, kodak_bridge, drake_bridge, future_bridge, lizzo_bridge, idragons_bridge, amonkeys_bridge, walters_bridge, orodrigo_bridge)
bridge <- str_remove_all(bridge, "\\[Bridge.*?]|,|\\[,|\\(|\\)")
bridge <-  gsub("([A-Z])", "\n\\1", bridge)

outro <- c(hstyles_outro, ganimals_outro, slacy_outro, jharlow_outro, klaroi_outro, kbush_outro, neighbourhood_outro, slacy2_outro, encanto_outro, lilnasx_outro, joji_outro, jcole_outro, kodak_outro, drake_outro, future_outro, lizzo_outro, idragons_outro, amonkeys_outro, walters_outro, orodrigo_outro)
outro <- str_remove_all(outro, "\\[Outro.*?]|,|\\[,|\\(|\\)")
outro <-  gsub("([A-Z])", "\n\\1", outro)

data <- data.frame(
  Artist = c("Harry Styles", "Glass Animals", "Steve Lacy", "Jack Harlow", "The Kid Laroi", "Kate Bush", "the Neighbourhood", "Steve Lacy", "Carolina Gaitan", "Lil Nas X", "Joji", "J.Cole", "Kodak Black", "Drake", "Future", "Lizzo", "Imagine Dragons", "Arctic Monkeys", "The Walters", "Olivia Rodrigo"),
  Song = c("As It Was", "Heat Waves", "Bad Habit", "First Class", "STAY (feat. Justin Bieber)", "Running Up That Hill (A Deal With God)", "Sweater Weather", "Dark Red", "We Don’t Talk About Bruno", "Industry Baby (feat. Jack Harlow)", "Glimpse of Us", "No Role Modelz", "Super Gremlin", "Knife Talk (with 21 Savage ft. Project Pat", "WAIT FOR U (feat. Drake & Tems)", "About Damn Time", "Enemy (with JID)", "505", "I Love You So", "good 4 u" ) 
)

data$intro <- intro
data$verse1 <- verse1
data$prechorus <- pre_chorus
data$chorus <- chorus
data$postchorus <- post_chorus
data$verse2 <- verse2
data$bridge <- bridge
data$outro <- outro

Wide -> Long DataFrame

 pivot_data <- data %>%
  pivot_longer(cols = -c(Artist, Song), names_to = "Section", values_to = "Lyrics") %>%
  mutate(Lyrics = str_split(Lyrics, pattern = "\\s+")) %>%
  unnest()

## Warning: `cols` is now required when using `unnest()`.
## ℹ Please use `cols = c(Lyrics)`.

pivot_data <- pivot_data %>%
filter(Lyrics != "") 
pivot_data

## # A tibble: 5,036 × 4
##    Artist       Song      Section Lyrics   
##    <chr>        <chr>     <chr>   <chr>    
##  1 Harry Styles As It Was intro   Come     
##  2 Harry Styles As It Was intro   on       
##  3 Harry Styles As It Was intro   Harry    
##  4 Harry Styles As It Was intro   we       
##  5 Harry Styles As It Was intro   wanna    
##  6 Harry Styles As It Was intro   say      
##  7 Harry Styles As It Was intro   goodnight
##  8 Harry Styles As It Was intro   to       
##  9 Harry Styles As It Was intro   you[     
## 10 Harry Styles As It Was verse1  Holdin'  
## # ℹ 5,026 more rows

removing the stop words

pivot_data_cleaned <- pivot_data # pivot_data_cleaned is a duplicated pivot_data with the removed stop words
stopwords <- tolower(stop_words$word)

pivot_data_cleaned$Lyrics <- ifelse(tolower(pivot_data$Lyrics) %in% stopwords, "", pivot_data$Lyrics)
pivot_data_cleaned <- pivot_data_cleaned %>%
filter(Lyrics != "") 


more_stopwords <- c("yo", "yeah", "i'ma", "ooh-woah", "l.", "a.", "youre", "im", "id", "dont", "alright", "woah", "ooh", "oh-oh-oh", "a—[", "ah", "ayy", "d-", "baby", "uh", "hah", "nigga", "niggas", "damn", "shit", "y'all", "ch", "'bout", "gettin'", "comin'", "babe", "mmm")

pivot_data_cleaned <- pivot_data_cleaned %>%
  filter(!Lyrics %in% more_stopwords)

Intro

library(knitr)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

pivot_data_cleaned <- pivot_data_cleaned %>%
  filter(!Lyrics %in% more_stopwords)

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "intro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              30.2

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "intro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1               13.7

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "intro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                      10.8

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")

kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	30.18182
Average Excluding Stopwords	13.66667
Average Unique Words (excluding stopwords)	10.77778

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

 ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Intro Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(5, 0, 5, 0)),
        plot.title.position = "plot")

 pivot_data_intro <- pivot_data_cleaned %>%
  filter(Section == "intro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 
pivot_data_intro

## # A tibble: 92 × 2
##    Lyrics      n
##    <chr>   <int>
##  1 gang        4
##  2 heat        4
##  3 jacob       4
##  4 wait        4
##  5 couple      3
##  6 time        3
##  7 yeah        3
##  8 ayy         2
##  9 d-          2
## 10 dummies     2
## # ℹ 82 more rows

pivot_data_intro <- pivot_data_intro %>%
  filter(!Lyrics %in% more_stopwords)

pivot_data_intro %>%
  filter(n > 1) %>%
  ggplot(aes(Lyrics, n)) +
  geom_bar(stat = "identity", fill = "#22BBFE") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Intros") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_intro %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("bing"), by = "word")

## # A tibble: 9 × 3
##   word        n sentiment
##   <chr>   <int> <chr>    
## 1 shimmer     2 negative 
## 2 bleed       1 negative 
## 3 cheat       1 negative 
## 4 error       1 negative 
## 5 fell        1 negative 
## 6 lose        1 negative 
## 7 pretty      1 positive 
## 8 quicker     1 positive 
## 9 tops        1 positive

Verse 1

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "verse1") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              65.4

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "verse1") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1               20.4

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "verse1") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                      18.3

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")

kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	65.36842
Average Excluding Stopwords	20.42105
Average Unique Words (excluding stopwords)	18.26316

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Verse1 Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(10, 0, 10, 0)),
        plot.title.position = "plot")

 pivot_data_verse1 <- pivot_data_cleaned %>%
  filter(Section == "verse1") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 

pivot_data_verse1 <- pivot_data_verse1 %>%
  filter(!Lyrics %in% more_stopwords)


pivot_data_verse1 %>%
  filter(n > 2) %>%
  ggplot(aes(Lyrics, n)) +
  geom_bar(stat = "identity", fill = "#22BBFE") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Verse 1") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_verse1 %>%
  filter(n > 2)

## # A tibble: 12 × 2
##    Lyrics       n
##    <chr>    <int>
##  1 gang         8
##  2 feel         6
##  3 day          4
##  4 girl         4
##  5 time         4
##  6 wanna        4
##  7 world        4
##  8 guess        3
##  9 leave        3
## 10 move         3
## 11 spinnin'     3
## 12 sweet        3

pivot_data_verse1 %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("bing"), by = "word")

## # A tibble: 34 × 3
##    word        n sentiment
##    <chr>   <int> <chr>    
##  1 sweet       3 positive 
##  2 bad         2 negative 
##  3 bitch       2 negative 
##  4 fuck        2 negative 
##  5 hard        2 negative 
##  6 helped      2 positive 
##  7 hurt        2 negative 
##  8 perfect     2 positive 
##  9 wasted      2 negative 
## 10 adore       1 positive 
## # ℹ 24 more rows

Pre-Chorus

Across the pre-chorus of the sampled songs, there’s an average of 23 words with stop words included in a pre-chorus, 8 words with stop words removed where 6 of those words are unique to the verse.

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "prechorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              22.4

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "prechorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1               4.29

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "prechorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                      3.57

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")


kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	22.375000
Average Excluding Stopwords	4.285714
Average Unique Words (excluding stopwords)	3.571429

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(10, 0, 10, 0)),
        plot.title.position = "plot")

 pivot_data_prechorus <- pivot_data_cleaned %>%
  filter(Section == "prechorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 
pivot_data_intro

## # A tibble: 87 × 2
##    Lyrics      n
##    <chr>   <int>
##  1 gang        4
##  2 heat        4
##  3 jacob       4
##  4 wait        4
##  5 couple      3
##  6 time        3
##  7 dummies     2
##  8 night       2
##  9 road        2
## 10 shimmer     2
## # ℹ 77 more rows

pivot_data_prechorus <- pivot_data_prechorus %>%
  filter(!Lyrics %in% more_stopwords)

pivot_data_prechorus %>%
  filter(n > 1) %>%
  ggplot(aes(Lyrics, n)) +
  geom_bar(stat = "identity", fill = "#22BBFE") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Pre-Chorus") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_prechorus %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("bing"), by = "word")

## # A tibble: 7 × 3
##   word      n sentiment
##   <chr> <int> <chr>    
## 1 love      2 positive 
## 2 bitch     1 negative 
## 3 fine      1 positive 
## 4 funny     1 negative 
## 5 hate      1 negative 
## 6 lame      1 negative 
## 7 lost      1 negative

Chorus

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "chorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              53.6

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "chorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1               14.3

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "chorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                      9.26

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")


kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	53.631579
Average Excluding Stopwords	14.263158
Average Unique Words (excluding stopwords)	9.263158

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Chorus Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(10, 0, 10, 0)),
        plot.title.position = "plot")

 pivot_data_chorus <- pivot_data_cleaned %>%
  filter(Section == "chorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 
pivot_data_chorus

## # A tibble: 153 × 2
##    Lyrics     n
##    <chr>  <int>
##  1 im        18
##  2 gang       8
##  3 wait       7
##  4 dont       6
##  5 wanna      6
##  6 time       5
##  7 class      4
##  8 love       4
##  9 save       4
## 10 saved      4
## # ℹ 143 more rows

pivot_data_chorus <- pivot_data_chorus %>%
  filter(!Lyrics %in% more_stopwords)

pivot_data_chorus %>%
  filter(n > 2) %>%
  ggplot(aes(Lyrics, n)) +
   geom_bar(stat = "identity", fill = "#22BBFE") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Chorus") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_chorus %>%
  filter(n > 2) %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("nrc"), by = "word")

## # A tibble: 11 × 3
##    word      n sentiment   
##    <chr> <int> <chr>       
##  1 gang      8 anger       
##  2 gang      8 fear        
##  3 gang      8 negative    
##  4 wait      7 anticipation
##  5 wait      7 negative    
##  6 time      5 anticipation
##  7 love      4 joy         
##  8 love      4 positive    
##  9 save      4 joy         
## 10 save      4 positive    
## 11 save      4 trust

Post-Chorus

Across the post-chorus of the sampled songs, there’s an average of 40 words with stop words included in a pre-chorus, 10 words with stop words removed where 8 of those words are unique to the verse.

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "postchorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              39.7

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "postchorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1               8.33

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "postchorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                      7.33

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")


kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	39.666667
Average Excluding Stopwords	8.333333
Average Unique Words (excluding stopwords)	7.333333

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(10, 0, 10, 0)),
        plot.title.position = "plot")

 pivot_data_postchorus <- pivot_data_cleaned %>%
  filter(Section == "postchorus") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 
pivot_data_postchorus

## # A tibble: 21 × 2
##    Lyrics      n
##    <chr>   <int>
##  1 im          3
##  2 yeah        3
##  3 ate         1
##  4 bet         1
##  5 bitch       1
##  6 blick       1
##  7 caught      1
##  8 fake        1
##  9 gremlin     1
## 10 kit         1
## # ℹ 11 more rows

pivot_data_postchorus <- pivot_data_postchorus %>%
  filter(!Lyrics %in% more_stopwords)


pivot_data_postchorus %>%
  filter(n > 1) %>%
  ggplot(aes(Lyrics, n)) +
  geom_bar(stat = "identity") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Post-Chorus") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_postchorus %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("bing"), by = "word")

## # A tibble: 3 × 3
##   word        n sentiment
##   <chr>   <int> <chr>    
## 1 bitch       1 negative 
## 2 fake        1 negative 
## 3 unhappy     1 negative

Verse 2

On average, the second verse of the sampled songs contain 73.5 words with stop words and 25 words without where 22 of those words are unique

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "verse2") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              86.3

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "verse2") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1               27.9

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "verse2") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                      25.6

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")


kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	86.31579
Average Excluding Stopwords	27.89474
Average Unique Words (excluding stopwords)	25.57895

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Verse 2 Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(10, 0, 10, 0)),
        plot.title.position = "plot")

 pivot_data_verse2 <- pivot_data_cleaned %>%
  filter(Section == "verse2") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 

pivot_data_verse2 <- pivot_data_verse2 %>%
  filter(!Lyrics %in% more_stopwords)

pivot_data_verse2 %>%
  filter(n > 2) %>%
  ggplot(aes(Lyrics, n)) +
   geom_bar(stat = "identity", fill = "#22BBFE") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Verse 2") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_verse2 %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("nrc"), by = "word")

## # A tibble: 208 × 3
##    word      n sentiment   
##    <chr> <int> <chr>       
##  1 wait      9 anticipation
##  2 wait      9 negative    
##  3 time      7 anticipation
##  4 guess     4 surprise    
##  5 leave     3 negative    
##  6 leave     3 sadness     
##  7 leave     3 surprise    
##  8 love      3 joy         
##  9 love      3 positive    
## 10 start     3 anticipation
## # ℹ 198 more rows

Bridge

Across the intros of the sampled songs, there’s an average of 53 words in an intro with stop words included, 23 words with stop words removed where 12 of those words are unique to the verse.

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "bridge") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              50.1

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "bridge") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1               14.5

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "bridge") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                      9.25

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")


kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	50.125
Average Excluding Stopwords	14.500
Average Unique Words (excluding stopwords)	9.250

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(10, 0, 10, 0)),
        plot.title.position = "plot")

 pivot_data_bridge <- pivot_data_cleaned %>%
  filter(Section == "bridge") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 
pivot_data_bridge

## # A tibble: 74 × 2
##    Lyrics        n
##    <chr>     <int>
##  1 tonight      14
##  2 woah          5
##  3 emotional     4
##  4 time          3
##  5 bad           2
##  6 cared         2
##  7 comin'        2
##  8 darlin'       2
##  9 decide        2
## 10 fakin'        2
## # ℹ 64 more rows

pivot_data_bridge <- pivot_data_bridge %>%
  filter(!Lyrics %in% more_stopwords)

pivot_data_bridge %>%
  filter(n > 2) %>%
  ggplot(aes(Lyrics, n)) +
   geom_bar(stat = "identity", fill = "#22BBFE") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Bridge") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_bridge %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("bing"), by = "word")

## # A tibble: 14 × 3
##    word            n sentiment
##    <chr>       <int> <chr>    
##  1 bad             2 negative 
##  2 woo             2 positive 
##  3 wound           2 negative 
##  4 wrong           2 negative 
##  5 angel           1 positive 
##  6 apathy          1 negative 
##  7 comfortable     1 positive 
##  8 fine            1 positive 
##  9 fuck            1 negative 
## 10 hard            1 negative 
## 11 perfectly       1 positive 
## 12 smile           1 positive 
## 13 steal           1 negative 
## 14 wow             1 positive

Outro

Across the outros of the sampled songs, there’s an average of 35 words in an intro with stop words included, 14 words with stop words removed where 9 of those words are unique to the verse.

avg_words_with_stopwords <- pivot_data %>%
  filter(Section == "outro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words_with_stopwords = mean(average_words))
 avg_words_with_stopwords

## # A tibble: 1 × 1
##   mean_average_words_with_stopwords
##                               <dbl>
## 1                              34.7

avg_words_no_stopwords <- pivot_data_cleaned %>%
  filter(Section == "outro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_words = mean(length(Lyrics))) %>%
  summarise(mean_average_words = mean(average_words))
avg_words_no_stopwords

## # A tibble: 1 × 1
##   mean_average_words
##                <dbl>
## 1                 12

avg_unique_words_verse <- pivot_data_cleaned %>%
  filter(Section == "outro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Artist) %>%
  summarise(average_unique_words = mean(length(unique(Lyrics)))) %>%
  summarise(mean_average_unique_words = mean(average_unique_words))
avg_unique_words_verse

## # A tibble: 1 × 1
##   mean_average_unique_words
##                       <dbl>
## 1                         8

dtaf <- c(avg_words_with_stopwords, avg_words_no_stopwords, avg_unique_words_verse)
dtaf <- data.frame(dtaf)
dtaf <- pivot_longer(dtaf, everything(), names_to = "Average", values_to = "Value")
dtaf$Average <- c("Average Including Stopwords", "Average Excluding Stopwords", "Average Unique Words (excluding stopwords)")


kable(dtaf, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Average	Value
Average Including Stopwords	34.66667
Average Excluding Stopwords	12.00000
Average Unique Words (excluding stopwords)	8.00000

df <- data.frame(
  Component = c("Average Words (+ SW)", 
                "Average Words (-SW)", 
                "Average Unique Words (-SW)"),
  Count = c(avg_words_with_stopwords$mean_average_words_with_stopwords, avg_words_no_stopwords$mean_average_words, avg_unique_words_verse$mean_average_unique_words)
)

ggplot(df, aes(x = Component, y = Count, fill = Component)) +
  geom_bar(stat = "identity") +
  labs(title = "Outro Word Composition", y = "Count") +
  theme_minimal() +
  theme(plot.margin = margin(5, 5, 5, 5, "mm"),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5, vjust = 0.5, size = 14, margin = margin(10, 0, 10, 0)),
        plot.title.position = "plot")

 pivot_data_outro <- pivot_data_cleaned %>%
  filter(Section == "outro") %>%
  mutate(Lyrics = tolower(Lyrics)) %>%
  group_by(Lyrics) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) 
pivot_data_outro

## # A tibble: 24 × 2
##    Lyrics       n
##    <chr>    <int>
##  1 heat         4
##  2 biscuits     2
##  3 gravy        2
##  4 mirror       2
##  5 road         2
##  6 shimmer      2
##  7 swimmin'     2
##  8 vision       2
##  9 waves        2
## 10 wigglin'     2
## # ℹ 14 more rows

pivot_data_bridge <- pivot_data_bridge %>%
  filter(!Lyrics %in% more_stopwords)

pivot_data_outro %>%
  filter(n > 1) %>%
  ggplot(aes(Lyrics, n)) +
   geom_bar(stat = "identity", fill = "#22BBFE") +
  xlab("Lyrics") +
  ylab("Frequency") +
  ggtitle("Words Frequently Used in Outro") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  coord_flip()

pivot_data_outro %>%
  rename(word = Lyrics) %>%
  inner_join(get_sentiments("bing"), by = "word")

## # A tibble: 7 × 3
##   word        n sentiment
##   <chr>   <int> <chr>    
## 1 shimmer     2 negative 
## 2 beg         1 negative 
## 3 crazy       1 negative 
## 4 fuck        1 negative 
## 5 lose        1 negative 
## 6 miss        1 negative 
## 7 stupid      1 negative

Compositional Qualities to the Sampled Music

#Extracting Song Qualities from Spotify API Using Spotify API, the qualitative factors of each song is extracted (danceability, energy, tempo, key, etc)

library(spotifyr)

## 
## Attaching package: 'spotifyr'

## The following object is masked from 'package:tidytext':
## 
##     tidy

library(knitr)

Sys.setenv(SPOTIFY_CLIENT_ID = "8a9ee4f456f244fba50f78ca701c4bdd") 
Sys.setenv(SPOTIFY_CLIENT_SECRET = "d7aab5410fa342859cb7a30953003e06")

#Harry Styles' "As It was"
hstyles_qual <- get_track_audio_features("4LRPiXqCikLlN15c3yImP7")
hstyles_qual$Artist <- "Harry Styles"
hstyles_qual$Song <- "As It Was"
hstyles_qual <- hstyles_qual[, c(ncol(hstyles_qual) - 1, ncol(hstyles_qual), 1:(ncol(hstyles_qual) - 2))]

#Glass Animals "Heat Waves"
ganimals_qual <- get_track_audio_features("3USxtqRwSYz57Ewm6wWRMp")
ganimals_qual$Artist <- "Glass Animals"
ganimals_qual$Song <- "Heat Waves"
ganimals_qual <- ganimals_qual[, c(ncol(ganimals_qual) - 1, ncol(ganimals_qual), 1:(ncol(ganimals_qual) - 2))]

#Steve Lacy "Bad Habit"
slacy_qual <- get_track_audio_features("3EaJDYHA0KnX88JvDhL9oa")
slacy_qual$Artist <- "Steve Lacy"
slacy_qual$Song <- "Bad Habit"
slacy_qual <- slacy_qual[, c(ncol(slacy_qual) - 1, ncol(slacy_qual), 1:(ncol(slacy_qual) - 2))]

#Jack Harlow "First Class"
jharlow_qual <- get_track_audio_features("0wHFktze2PHC5jDt3B17DC")
jharlow_qual$Artist <- "Jack Harlow"
jharlow_qual$Song <- "First Class"
jharlow_qual <- jharlow_qual[, c(ncol(jharlow_qual) - 1, ncol(jharlow_qual), 1:(ncol(jharlow_qual) - 2))]
jharlow_qual

## # A tibble: 1 × 20
##   Artist Song  danceability energy   key loudness  mode speechiness acousticness
##   <chr>  <chr>        <dbl>  <dbl> <int>    <dbl> <int>       <dbl>        <dbl>
## 1 Jack … Firs…        0.902  0.582     5    -5.90     0       0.109        0.111
## # ℹ 11 more variables: instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
## #   tempo <dbl>, type <chr>, id <chr>, uri <chr>, track_href <chr>,
## #   analysis_url <chr>, duration_ms <int>, time_signature <int>

#Kid Laroi "STAY" (feat Justin Beiber)
klaroi_qual <- get_track_audio_features("5HCyWlXZPP0y6Gqq8TgA20")
klaroi_qual$Artist <- "The Kid Laroi"
klaroi_qual$Song <- "STAY (feat. Justin Bieber)"
klaroi_qual <- klaroi_qual[, c(ncol(klaroi_qual) - 1, ncol(klaroi_qual), 1:(ncol(klaroi_qual) - 2))]

kbush_qual <- get_track_audio_features("1PtQJZVZIdWIYdARpZRDFO")
kbush_qual$Artist <- "Kate Bush"
kbush_qual$Song <- "Running Up That Hill (A Deal With God)"
kbush_qual <- kbush_qual[, c(ncol(kbush_qual) - 1, ncol(kbush_qual), 1:(ncol(kbush_qual) - 2))]

neighbourhood_qual <- get_track_audio_features("2QjOHCTQ1Jl3zawyYOpxh6")
neighbourhood_qual$Artist <- "the Neighbourhood"
neighbourhood_qual$Song <- "Sweater Weather"
neighbourhood_qual <- neighbourhood_qual[, c(ncol(neighbourhood_qual) - 1, ncol(neighbourhood_qual), 1:(ncol(neighbourhood_qual) - 2))]

slacy2_qual <- get_track_audio_features("3EaJDYHA0KnX88JvDhL9oa")
slacy2_qual$Artist <- "Steve Lacy"
slacy2_qual$Song <- "Dark Red"
slacy2_qual <- slacy2_qual[, c(ncol(slacy2_qual) - 1, ncol(slacy2_qual), 1:(ncol(slacy2_qual) - 2))]

encanto_qual <- get_track_audio_features("2xJxFP6TqMuO4Yt0eOkMz")
encanto_qual$Artist <- "Carolina Gaitan"
encanto_qual$Song <- "We Don’t Talk About Bruno"
encanto_qual <- encanto_qual[, c(ncol(encanto_qual) - 1, ncol(encanto_qual), 1:(ncol(encanto_qual) - 2))]

lilnasx_qual <- get_track_audio_features("5Z9KJZvQzH6PFmb8SNkxuk")
lilnasx_qual$Artist <- "Lil Nas X"
lilnasx_qual$Song <- "Industry Baby (feat. Jack Harlow)"
lilnasx_qual <- lilnasx_qual[, c(ncol(lilnasx_qual) - 1, ncol(lilnasx_qual), 1:(ncol(lilnasx_qual) - 2))]

joji_qual <- get_track_audio_features("4ewazQLXFTDC8XvCbhvtXs")
joji_qual$Artist <- "Joji"
joji_qual$Song <- "Glimpse of Us"
joji_qual <- joji_qual[, c(ncol(joji_qual) - 1, ncol(joji_qual), 1:(ncol(joji_qual) - 2))]

jcole_qual <- get_track_audio_features("68Dni7IE4VyPkTOH9mRWHr")
jcole_qual$Artist <- "J.Cole"
jcole_qual$Song <- "No Role Modelz"
jcole_qual <- jcole_qual[, c(ncol(jcole_qual) - 1, ncol(jcole_qual), 1:(ncol(jcole_qual) - 2))]

kodak_qual <- get_track_audio_features("1Y5Jvi3eLi4Chwqch9GMem")
kodak_qual$Artist <- "Kodak Black"
kodak_qual$Song <- "Super Gremlin"
kodak_qual <- kodak_qual[, c(ncol(kodak_qual) - 1, ncol(kodak_qual), 1:(ncol(kodak_qual) - 2))]
kodak_qual

## # A tibble: 1 × 20
##   Artist Song  danceability energy   key loudness  mode speechiness acousticness
##   <chr>  <chr>        <dbl>  <dbl> <int>    <dbl> <int>       <dbl>        <dbl>
## 1 Kodak… Supe…        0.825  0.414     2    -6.63     1       0.144      0.00265
## # ℹ 11 more variables: instrumentalness <int>, liveness <dbl>, valence <dbl>,
## #   tempo <dbl>, type <chr>, id <chr>, uri <chr>, track_href <chr>,
## #   analysis_url <chr>, duration_ms <int>, time_signature <int>

drake_qual <- get_track_audio_features("2BcMwX1MPV6ZHP4tUT9uq6")
drake_qual$Artist <- "Drake"
drake_qual$Song <- "Knife Talk (with 21 Savage ft. Project Pat)"
drake_qual <- drake_qual[, c(ncol(drake_qual) - 1, ncol(drake_qual), 1:(ncol(drake_qual) - 2))]

future_qual <- get_track_audio_features("59nOXPmaKlBfGMDeOVGrIK")
future_qual$Artist <- "Future"
future_qual$Song <- "WAIT FOR U (feat. Drake & Tems)"
future_qual <- future_qual[, c(ncol(future_qual) - 1, ncol(future_qual), 1:(ncol(future_qual) - 2))]

lizzo_qual <- get_track_audio_features("6HMtHNpW6YPi1hrw9tgF8P")
lizzo_qual$Artist <- "Lizzo"
lizzo_qual$Song <- "About Damn Time"
lizzo_qual <- lizzo_qual[, c(ncol(lizzo_qual) - 1, ncol(lizzo_qual), 1:(ncol(lizzo_qual) - 2))]

idragons_qual <- get_track_audio_features("3CIyK1V4JEJkg02E4EJnDl")
idragons_qual$Artist <- "Imagine Dragons"
idragons_qual$Song <- "Enemy (with JID) "
idragons_qual <- idragons_qual[, c(ncol(idragons_qual) - 1, ncol(idragons_qual), 1:(ncol(idragons_qual) - 2))]

amonkeys_qual <- get_track_audio_features("58ge6dfP91o9oXMzq3XkIS")
amonkeys_qual$Artist <- "Arctic Monkeys"
amonkeys_qual$Song <- "505"
amonkeys_qual <- amonkeys_qual[, c(ncol(amonkeys_qual) - 1, ncol(amonkeys_qual), 1:(ncol(amonkeys_qual) - 2))]

walters_qual <- get_track_audio_features("4SqWKzw0CbA05TGszDgMlc")
walters_qual$Artist <- "The Walters"
walters_qual$Song <- "I Love You So"
walters_qual <- walters_qual[, c(ncol(walters_qual) - 1, ncol(walters_qual), 1:(ncol(walters_qual) - 2))]

orodrigo_qual <- get_track_audio_features("4ZtFanR9U6ndgddUvNcjcG")
orodrigo_qual$Artist <- "Olivia Rodrigo"
orodrigo_qual$Song <- "good 4 u"
orodrigo_qual <- orodrigo_qual[, c(ncol(orodrigo_qual) - 1, ncol(orodrigo_qual), 1:(ncol(orodrigo_qual) - 2))]

Combining rows

API_extract <- rbind(hstyles_qual, ganimals_qual, slacy_qual, jharlow_qual, klaroi_qual)

Reading Data extracted from Tunebat

tunebat_extracted<- read.csv("https://raw.githubusercontent.com/genmid13/data607/main/Tunebat%20Extraction.csv")
tunebat_extracted <- head(tunebat_extracted, n = nrow(tunebat_extracted))
tunebat_extracted

##               Artist                                       Song Duration BPM
## 1      Harry Styles                                  As It Was      2:47 174
## 2      Glass Animals                                 Heat Waves     3:59  81
## 3         Steve Lacy                                  Bad Habit     3:52 169
## 4        Jack Harlow                                First Class     2:54 107
## 5      The Kid Laroi                 STAY (feat. Justin Beiber)     2:22 170
## 6          Kate Bush     Running Up That Hill (A Deal With God)     4:59 108
## 7  the Neighbourhood                            Sweater Weather     4:00 124
## 8         Steve Lacy                                   Dark Red     2:53 172
## 9    Carolina Gaitan                  We Don’t Talk About Bruno     3:36 206
## 10         Lil Nas X          Industry Baby (feat. Jack Harlow)     3:32 150
## 11              Joji                              Glimpse of Us     3:53 170
## 12            J.Cole                             No Role Modelz     4:53 100
## 13       Kodak Black                              Super Gremlin     3:21  73
## 14             Drake Knife Talk (with 21 Savage ft. Project Pat     4:03 146
## 15            Future            WAIT FOR U (feat. Drake & Tems)     3:10  83
## 16             Lizzo                            About Damn Time     3:12 109
## 17   Imagine Dragons                           Enemy (with JID)     2:53  77
## 18    Arctic Monkeys                                        505     4:14 140
## 19       The Walters                              I Love You So     2:40  76
## 20    Olivia Rodrigo                                   good 4 u     2:58 167
##    Release.Date Explicit
## 1       3/31/22       No
## 2        8/7/20       No
## 3       6/29/22      Yes
## 4        5/6/22      Yes
## 5        7/9/21      Yes
## 6       9/16/85       No
## 7       4/19/13       No
## 8       2/20/17       No
## 9      11/19/21       No
## 10      7/23/21      Yes
## 11      6/10/22       No
## 12      12/9/14      Yes
## 13     10/30/21      Yes
## 14       9/3/21      Yes
## 15      4/29/22      Yes
## 16      4/14/22      Yes
## 17     10/28/21       No
## 18      4/24/07       No
## 19     11/28/14       No
## 20      5/21/21      Yes
##                                                         Album
## 1                                                      Single
## 2                                                   Dreamland
## 3                                                      Single
## 4                                Come Home The Kids Miss Your
## 5                                                      Single
## 6                                              Hounds Of Love
## 7                                                 I Love You.
## 8                                                      Single
## 9                Encanto (Original Motion Picture Soundtrack)
## 10                                                     Single
## 11                                                     Single
## 12                                    2014 Forest Hills Drive
## 13 Sniper Gang Presents Syko Bob & Snapkatt: Nightmare Babies
## 14                                        Certified Lover Boy
## 15                                          I NEVER LIKED YOU
## 16                                                     Single
## 17                                                     Single
## 18               Favourite Worst Nightmare (Standard Version)
## 19                                                     Single
## 20                                                       SOUR
##                              Label
## 1                         Columbia
## 2                  Polydor Records
## 3          L-M Records/RCA Records
## 4           Genertion Now/Atlantic
## 5                         Columbia
## 6                      Fish People
## 7                         Columbia
## 8                    Three Quarter
## 9              Walt Disney Records
## 10                        Columbia
## 11   88rising Music/Warner Records
## 12          Roc Nation Records LLC
## 13            Atlantic/Sniper Gang
## 14                             OVO
## 15                  Epic/Freebandz
## 16              Nice Life/Atlantic
## 17 KIDinaKORNER/Interscope Records
## 18           Domino/Warner Records
## 19                  Warner Records
## 20               Olivia Rodrigo PS

data <- cbind(tunebat_extracted, API_extract)
data <- data[, -c(9,10)]
data <- data %>%
  select(Artist, Song, duration_ms, BPM, Release.Date, Explicit, Album, Label, danceability, energy, key, loudness, mode, Song, speechiness, acousticness, instrumentalness, liveness, valence, tempo, tempo, time_signature)
data$duration_ms <- round(data$duration_ms, -3)  # Round to the nearest thousand
data$duration_ms <- substr(data$duration_ms, 1, 3)  # Keep only the first three numbers
data

##               Artist                                       Song duration_ms BPM
## 1      Harry Styles                                  As It Was          167 174
## 2      Glass Animals                                 Heat Waves         239  81
## 3         Steve Lacy                                  Bad Habit         173 169
## 4        Jack Harlow                                First Class         174 107
## 5      The Kid Laroi                 STAY (feat. Justin Beiber)         142 170
## 6          Kate Bush     Running Up That Hill (A Deal With God)         167 108
## 7  the Neighbourhood                            Sweater Weather         239 124
## 8         Steve Lacy                                   Dark Red         173 172
## 9    Carolina Gaitan                  We Don’t Talk About Bruno         174 206
## 10         Lil Nas X          Industry Baby (feat. Jack Harlow)         142 150
## 11              Joji                              Glimpse of Us         167 170
## 12            J.Cole                             No Role Modelz         239 100
## 13       Kodak Black                              Super Gremlin         173  73
## 14             Drake Knife Talk (with 21 Savage ft. Project Pat         174 146
## 15            Future            WAIT FOR U (feat. Drake & Tems)         142  83
## 16             Lizzo                            About Damn Time         167 109
## 17   Imagine Dragons                           Enemy (with JID)         239  77
## 18    Arctic Monkeys                                        505         173 140
## 19       The Walters                              I Love You So         174  76
## 20    Olivia Rodrigo                                   good 4 u         142 167
##    Release.Date Explicit
## 1       3/31/22       No
## 2        8/7/20       No
## 3       6/29/22      Yes
## 4        5/6/22      Yes
## 5        7/9/21      Yes
## 6       9/16/85       No
## 7       4/19/13       No
## 8       2/20/17       No
## 9      11/19/21       No
## 10      7/23/21      Yes
## 11      6/10/22       No
## 12      12/9/14      Yes
## 13     10/30/21      Yes
## 14       9/3/21      Yes
## 15      4/29/22      Yes
## 16      4/14/22      Yes
## 17     10/28/21       No
## 18      4/24/07       No
## 19     11/28/14       No
## 20      5/21/21      Yes
##                                                         Album
## 1                                                      Single
## 2                                                   Dreamland
## 3                                                      Single
## 4                                Come Home The Kids Miss Your
## 5                                                      Single
## 6                                              Hounds Of Love
## 7                                                 I Love You.
## 8                                                      Single
## 9                Encanto (Original Motion Picture Soundtrack)
## 10                                                     Single
## 11                                                     Single
## 12                                    2014 Forest Hills Drive
## 13 Sniper Gang Presents Syko Bob & Snapkatt: Nightmare Babies
## 14                                        Certified Lover Boy
## 15                                          I NEVER LIKED YOU
## 16                                                     Single
## 17                                                     Single
## 18               Favourite Worst Nightmare (Standard Version)
## 19                                                     Single
## 20                                                       SOUR
##                              Label danceability energy key loudness mode
## 1                         Columbia        0.520  0.731   6   -5.338    0
## 2                  Polydor Records        0.761  0.525  11   -6.900    1
## 3          L-M Records/RCA Records        0.603  0.784   6   -4.023    1
## 4           Genertion Now/Atlantic        0.902  0.582   5   -5.902    0
## 5                         Columbia        0.591  0.764   1   -5.484    1
## 6                      Fish People        0.520  0.731   6   -5.338    0
## 7                         Columbia        0.761  0.525  11   -6.900    1
## 8                    Three Quarter        0.603  0.784   6   -4.023    1
## 9              Walt Disney Records        0.902  0.582   5   -5.902    0
## 10                        Columbia        0.591  0.764   1   -5.484    1
## 11   88rising Music/Warner Records        0.520  0.731   6   -5.338    0
## 12          Roc Nation Records LLC        0.761  0.525  11   -6.900    1
## 13            Atlantic/Sniper Gang        0.603  0.784   6   -4.023    1
## 14                             OVO        0.902  0.582   5   -5.902    0
## 15                  Epic/Freebandz        0.591  0.764   1   -5.484    1
## 16              Nice Life/Atlantic        0.520  0.731   6   -5.338    0
## 17 KIDinaKORNER/Interscope Records        0.761  0.525  11   -6.900    1
## 18           Domino/Warner Records        0.603  0.784   6   -4.023    1
## 19                  Warner Records        0.902  0.582   5   -5.902    0
## 20               Olivia Rodrigo PS        0.591  0.764   1   -5.484    1
##    speechiness acousticness instrumentalness liveness valence   tempo
## 1       0.0557       0.3420         1.01e-03   0.3110   0.662 173.930
## 2       0.0944       0.4400         6.70e-06   0.0921   0.531  80.870
## 3       0.0620       0.4460         8.32e-06   0.1190   0.769 172.041
## 4       0.1090       0.1110         3.18e-06   0.1110   0.332 107.005
## 5       0.0483       0.0383         0.00e+00   0.1030   0.478 169.928
## 6       0.0557       0.3420         1.01e-03   0.3110   0.662 173.930
## 7       0.0944       0.4400         6.70e-06   0.0921   0.531  80.870
## 8       0.0620       0.4460         8.32e-06   0.1190   0.769 172.041
## 9       0.1090       0.1110         3.18e-06   0.1110   0.332 107.005
## 10      0.0483       0.0383         0.00e+00   0.1030   0.478 169.928
## 11      0.0557       0.3420         1.01e-03   0.3110   0.662 173.930
## 12      0.0944       0.4400         6.70e-06   0.0921   0.531  80.870
## 13      0.0620       0.4460         8.32e-06   0.1190   0.769 172.041
## 14      0.1090       0.1110         3.18e-06   0.1110   0.332 107.005
## 15      0.0483       0.0383         0.00e+00   0.1030   0.478 169.928
## 16      0.0557       0.3420         1.01e-03   0.3110   0.662 173.930
## 17      0.0944       0.4400         6.70e-06   0.0921   0.531  80.870
## 18      0.0620       0.4460         8.32e-06   0.1190   0.769 172.041
## 19      0.1090       0.1110         3.18e-06   0.1110   0.332 107.005
## 20      0.0483       0.0383         0.00e+00   0.1030   0.478 169.928
##    time_signature
## 1               4
## 2               4
## 3               4
## 4               4
## 5               4
## 6               4
## 7               4
## 8               4
## 9               4
## 10              4
## 11              4
## 12              4
## 13              4
## 14              4
## 15              4
## 16              4
## 17              4
## 18              4
## 19              4
## 20              4

data$duration_ms <- as.numeric(data$duration_ms)
mean_duration <- mean(data$duration_ms, na.rm = TRUE)
cat("The average duration of the most streamed songs in 2022 is", mean_duration, "seconds with songs ranging", sd(data$duration_ms, na.rm = TRUE), "seconds below or above the mean")

## The average duration of the most streamed songs in 2022 is 179 seconds with songs ranging 33.00399 seconds below or above the mean

ggplot(data, aes(x = duration_ms)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 4) +
  geom_vline(xintercept = mean_duration, color = "red", linetype = "dashed", size = 1)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

)

mean_tempo <- mean(data$tempo, na.rm = TRUE)
cat("The average tempo of the most streamed songs in 2022 is", mean_tempo, "with songs ranging", sd(data$temp, na.rm = TRUE), "below or above the mean")

## The average tempo of the most streamed songs in 2022 is 140.7548 with songs ranging 40.1464 below or above the mean

ggplot(data, aes(x = tempo)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 10) +
  geom_vline(xintercept = mean_tempo, color = "red", linetype = "dashed", size = 1)

mean_danceability <- mean(data$danceability, na.rm = TRUE)
cat("The average danceability of the most streamed songs in 2022 is", mean_danceability, "with songs ranging", sd(data$danceability, na.rm = TRUE), "below or above the mean")

## The average danceability of the most streamed songs in 2022 is 0.6754 with songs ranging 0.1415621 below or above the mean

ggplot(data, aes(x = danceability)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_danceability, color = "red", linetype = "dashed", size = 1)

mean_energy <- mean(data$energy, na.rm = TRUE)
cat("The average energy of the most streamed songs in 2022 is", mean_energy, "with songs ranging", sd(data$energy, na.rm = TRUE), "below or above the mean")

## The average energy of the most streamed songs in 2022 is 0.6772 with songs ranging 0.1066848 below or above the mean

ggplot(data, aes(x = energy)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_energy, color = "red", linetype = "dashed", size = 1)

mean_key <- mean(data$key, na.rm = TRUE)
cat("The average key of the most streamed songs in 2022 is", mean_key, "with songs ranging", sd(data$key, na.rm = TRUE), "below or above the mean")

## The average key of the most streamed songs in 2022 is 5.8 with songs ranging 3.270281 below or above the mean

ggplot(data, aes(x = key)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_key, color = "red", linetype = "dashed", size = 1)

mean_loudness <- mean(data$loudness, na.rm = TRUE)
cat("The average loudness of the most streamed songs in 2022 is", mean_loudness, "with songs ranging", sd(data$loudness, na.rm = TRUE), "below or above the mean")

## The average loudness of the most streamed songs in 2022 is -5.5294 with songs ranging 0.9542494 below or above the mean

ggplot(data, aes(x = loudness)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_loudness, color = "red", linetype = "dashed", size = 1)

mean_speechiness <- mean(data$speechiness, na.rm = TRUE)
cat("The average speechiness of the most streamed songs in 2022 is", mean_speechiness, "with songs ranging", sd(data$speechiness, na.rm = TRUE), "below or above the mean")

## The average speechiness of the most streamed songs in 2022 is 0.07388 with songs ranging 0.02419425 below or above the mean

ggplot(data, aes(x = speechiness)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_speechiness, color = "red", linetype = "dashed", size = 1)

mean_acousticness <- mean(data$acousticness, na.rm = TRUE)
cat("The average acousticness of the most streamed songs in 2022 is", mean_acousticness, "with songs ranging", sd(data$acousticness, na.rm = TRUE), "below or above the mean")

## The average acousticness of the most streamed songs in 2022 is 0.27546 with songs ranging 0.1740398 below or above the mean

ggplot(data, aes(x = acousticness)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_acousticness, color = "red", linetype = "dashed", size = 1)

mean_instrumentalness <- mean(data$instrumentalness, na.rm = TRUE)
cat("The average instrumentalness of the most streamed songs in 2022 is", mean_instrumentalness, "with songs ranging", sd(data$instrumentalness, na.rm = TRUE), "below or above the mean")

## The average instrumentalness of the most streamed songs in 2022 is 0.00020564 with songs ranging 0.0004126385 below or above the mean

ggplot(data, aes(x = instrumentalness)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_instrumentalness, color = "red", linetype = "dashed", size = 1)

mean_liveness <- mean(data$liveness, na.rm = TRUE)
cat("The average liveness of the most streamed songs in 2022 is", mean_liveness, "with songs ranging", sd(data$liveness, na.rm = TRUE), "below or above the mean")

## The average liveness of the most streamed songs in 2022 is 0.14722 with songs ranging 0.08451193 below or above the mean

ggplot(data, aes(x = liveness)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_liveness, color = "red", linetype = "dashed", size = 1)

mean_valence <- mean(data$valence, na.rm = TRUE)
cat("The average valence of the most streamed songs in 2022 is", mean_valence, "with songs ranging", sd(data$valence, na.rm = TRUE), "below or above the mean")

## The average valence of the most streamed songs in 2022 is 0.5544 with songs ranging 0.1545637 below or above the mean

ggplot(data, aes(x = valence)) +
  geom_histogram(fill = "lightblue", color = "black", bins = 3) +
  geom_vline(xintercept = mean_valence, color = "red", linetype = "dashed", size = 1)

avg_qual <- data.frame(mean_duration, mean_tempo, mean_danceability, mean_energy, mean_key, mean_loudness, mean_speechiness, mean_acousticness, mean_instrumentalness, mean_liveness, mean_valence)
avg_qual <- pivot_longer(avg_qual, everything(), names_to = "Qualities", values_to = "Average")
avg_qual$Qualities <- c("Duration (in seconds)", "Tempo", "Danceability", "Energy", "Key", "Loudness", "Speechiness", "Acousticness", "Instrumentalness", "Liveness", "Valence")

kable(avg_qual, format = "html", caption = "Summary Table") %>%
  kable_styling("striped", full_width = FALSE)

Summary Table
Qualities	Average
Duration (in seconds)	179.0000000
Tempo	140.7548000
Danceability	0.6754000
Energy	0.6772000
Key	5.8000000
Loudness	-5.5294000
Speechiness	0.0738800
Acousticness	0.2754600
Instrumentalness	0.0002056
Liveness	0.1472200
Valence	0.5544000

Conclusion

Intro (Theme: Summer Experience) (optional) If used, the intro should be the shortest section while using the most non-filler language. Most of the language should only be repeated once here.

Verse 1 (Theme: Romance//Love) The first verse the second longest section where only 1/3 should use non-filler language. However, most of that language should be distinct.

Chorus (theme: Prioritizing Relationships) The chorus should be longer than the intro/outro but less than the verses. The chorus has most filler language and the least distinctiveness in its non-filler language

Verse 2 (Theme: Temporal Longing) The second verse has the most words used. Almost all of it non-filler language (that only take 1/3 of the section) is distinct.

Outro (Theme: Summer Vacation) (optional) The outro uses more words than the intro (and less than the others) but is less complex in its non-filler language.