I will use the below dataset from Kaggle to identify some popular songs and artists on Spotify to use in my questions.
library(readr)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.1 ✔ purrr 1.0.1
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rvest)
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
library(xml2)library(httr)library(magrittr)
Attaching package: 'magrittr'
The following object is masked from 'package:purrr':
set_names
The following object is masked from 'package:tidyr':
extract
Rows: 9999 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (16): Track URI, Track Name, Artist URI(s), Artist Name(s), Album URI, ...
dbl (16): Disc Number, Track Number, Track Duration (ms), Popularity, Dance...
lgl (2): Explicit, Album Genres
dttm (1): Added At
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Question 1: According to the “songs” dataset, Olivia Rodrigo, Dua Lipa, Miley Cyrus, OneRepublic, and Taylor Swift all are artists of the 10 most popular songs. Which of these artists has the most positive and most negative lyrics?
I will answer this question by gathering lyrics for 20 songs from each of the above artists for a total of 100 songs. Then, I will remove their stop words and compare the sentiment analysis for each of the artists using the NRC emotive lexicon. Finally, I will put my findings in a chart that allows us to see which of the artists uses the most words related to anger, joy, disgust, etc. in their lyrics.
The artists have similar ratios of emotions within their lyrics because though I scraped 20 songs for all artists, these songs differed in length and wordiness. This shows us that Taylor Swift has some of the most verbose songs while OneRepublic and Dua Lipa are less wordy and lengthy.
Other than that, I must note that Miley Cyrus outranks Taylor Swift (and everyone else) in joy-related lyrics even though Taylor’s songs are wordier than hers. OneRepublic scored the lowest on joy- and positive-related words, which implies that a lot of their songs are melancholic.
Question 2: According to the “songs” dataset, “Vampire” by Olivia Rodrigo and “Flowers” by Miley Cyrus are both in the top 3 most popular songs. They are also comparable because they’re both around the same length and by solo female artists. I wonder how the sentiment analysis of these songs compare chronologically?
I will answer this question by gathering the lyrics for “Vampire” and “Flowers” from Genius. Then, I will remove their stop words and use bing to determine their positivity and negativity. I also will create an index for each of the songs that will measure where each of the measurable words is relative to the rest of the lyrics in the song. Lastly, I put my findings in a column chart to show where positive and negative lyrics lie in each of the songs.
The above chart shows us that overall, “Flowers” is a more positive song than “Vampire.” One could assume this is the case even before running the analysis simply based on the titles of the songs. It is crucial to note that “Vampire” has a significantly higher amount of negative words which are scattered throughout the song. On the other hand, “Flowers” has a plethora of positive lyrics that are positioned at four specific hot spots. It has fewer negative words and even has a couple double positive rankings at certain time frames of the song.
Question 3: The “songs” dataset includes information on the 100 most popular songs. How do the sentiment analysis of these successful songs compare to each other?
I will answer this question by gathering the lyrics for the top 100 most popular songs from the “songs” dataset. After, I will remove their stop words and use the NRC emotive lexicon to find what emotion the lyrics are associated with. Finally, I will compare the emotions in each of the songs to one other in a table, which is downloaded below.
Rows: 100 Columns: 47
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (16): Track URI, Track Name, Artist URI(s), Artist Name(s), Album URI, ...
dbl (29): Disc Number, Track Number, Track Duration (ms), Popularity, Dance...
lgl (1): Explicit
dttm (1): Added At
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
This chart shows us that of the top 100 streamed songs on Spotify, “Last Night” by Morgan Wallen had the highest positivity score where positivity = positive lyrics - negative lyrics. It also had the most joy-related lyrics. “Heather” by Conan Gray had the highest sadness-, fear-, anger-, and disgust-related lyrics. “Anti-Hero” by Taylor Swift had the most trust-related lyrics, “Humble” by Kendrick Lamar had the most surprise-related lyrics, and “Livin’ on a Prayer” by Bon Jovi had the most anticipation-related lyrics.
“Heather” was only 45th most popular of the 100 songs, though it made a very strong impression on several more negative emotive sentiments. On the other hand, “Last Night” was found as 27th most popular, and it related more to joy and positivity. Each of these songs has a unique emotive lexicon that played into their success.