Since almost two years, me and a friend who produces music are looking for a tool : a generator of chords to help beginners to produce music.
Example of a magical chord : https://youtu.be/nhivdSMdMxc
Similar project : https://autochords.com/
Our initial idea was consisted of two parts/ processes:
1 : Importation and analysis of trends
2 : Reverse process :
-R or Python or the two
-How to manipulate songs in a database (extern links, midi,wav … ???)
=> MIDI refs : https://www.kaggle.com/code/wfaria/midi-music-data-extraction-using-music21/notebook
-How to compare different databases
-Four chords can may be in different orders but still songs good)
“The Call” through “Neopolitan Dreams” are G, D, Em, C
“As Long As You Love Me” is C, G, Em, D
“Round Here” is C, D, Em, G -“Drunk” is Em, D, G, C
“We Are Never Ever Getting Back Together” is C, G, D, Em
“I Knew You Were Trouble” is G, D, Em, C (verses) and Em, C, D, G (chorus)
=> I, III, VI dominante et peuvent être interchangées
II and IV sous dominante et peuvent être interchangées
V and VII dominantes et peuvent être interchangées
MINOR CHORD PROGRESSION = i ii° III iv v VI VII
MAJOR CHORD PROGRESSION = I ii iii IV V vi vii°
After few benchmarks we still never find an app well optimized to provide accords depending on actual trends so we know there is a place on the market of “music software and FL’s plug-in” for an app like that.
The principal goals of the app are :
-Depending on input/filters as : genre, favorite artist, country, favorites keys etc …
=> TO:
-Give a key
-Play randomly the best chords
-Export accords as .midi files
-Giving the degrees https://www.youtube.com/watch?v=v3YbEL-_eoI&t=97s&ab_channel=DavidBennettPiano
Last summer I created a first data base with notes, accords and music scale with Fl Studio. => .wav / .fl / .mp3
To import top charts Spotify I have few options :
- Databases on Kaggle/Git-Hub (!! ‘Keys’ are not on every db)
- Creating my own database with a Spotify API (4 sure I have keys)
- Mix of the two previous options
The big problem is to import accords and melodies, the only website I find is : https://www.boiteachansons.net/ and it’s most only pop and rock frenchs songs …
Pep’s Excel files with harmonization and degrees and Chords progressions.
https://www.youtube.com/watch?v=v3YbEL-_eoI&t=97s&ab_channel=DavidBennettPiano
For this project, my objectives are to :
- Import and make analysis of top charts from Spotify to see what’s can be useful in the future on one part
- Import and make analysis of the chords from : boiteachansons.net
-Make a reverse code to output recurrent chords (if there are) from the data I imported with boiteachansons.net and randomize then by using degrees and harmonization.
Danceability: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
Energy: Is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
Key: The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on.
Loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typical range between -60 and 0 db.
Mode: Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
Speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
Instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
Liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
Valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
Tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
Duration_ms: The duration of the track in milliseconds.
The time signature: It’s a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of “3/4”, to “7/4”.
==> The most usefull features to provide the best chords are key, mode, tempo and time signature.
To extract data I decided to scrap the website, I wanted to code it first but I found two online tools to make first tries speedly : phantombuster.com and webscraper.io
With webscraped I scraped the links for each songs from the top50 of the site : https://www.boiteachansons.net/partitions/top50Chansons
After that, I stocked the links in a G-sheets : https://docs.google.com/spreadsheets/d/16qCU1v3u2dbpp70rAiHiKkpfbDSRwpIrMofoQxriX1Q/edit#gid=0
To finish I downloaded chords and keys for each songs with Phantombuster, the formular looked like that in JSON :
{ “urls”: [ { “link”: “https://docs.google.com/spreadsheets/d/16qCU1v3u2dbpp70rAiHiKkpfbDSRwpIrMofoQxriX1Q/edit?usp=sharing”, “selectors”: [ { “selector”: “#diagAccords div.dNomAcc”, “label”: “notes” }, { “selector”: “li.liTonalElmtSlct”, “label”: “tonalité” }], “timeToWaitSelector”: 5000 } ], “csvName”: “notes et tona top50”
I just had to choose CSS selectors to copy what I wanted from the links of the Gsheets.
Result with RAW Data :
print(notes.et.tona.top50)
https://developer.spotify.com/console/get-audio-features-several-tracks/?ids=79t297h5zlQXmcNc9Vxb24
Everyone can try easily to get features from a song with an easy-to-use interface but it’s one song per one.
links to refer :
https://developer.spotify.com/documentation/general/guides/authorization/ https://developer.spotify.com/documentation/general/guides/authorization/code-flow/ https://developer.spotify.com/documentation/general/guides/authorization/app-settings/
install.packages('spotifyr')
Sys.setenv(SPOTIFY_CLIENT_ID = '4fc767de266746a9a3d38534f3c125f7')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'bd3711f4c6454215a5d7e6ad97c04038')
access_token <- get_spotify_access_token()
ERROR : Maybe the package who is too old
spotifyKey <- "bd3711f4c6454215a5d7e6ad97c04038"
spotifySecret <- "4fc767de266746a9a3d38534f3c125f7"
install.packages('Rspotify')
install.packages('httr')
install.packages('jsonlite')
library(Rspotify)
library(httr)
library(jsonlite)
my_oauth <- spotifyOAuth(app_id="xxxx",client_id="yyyy",client_secret="zzzz")
save(my_oauth, file="my_oauth")
load("my_oauth")
tiago <- getUser(user_id="t.mendesdantas",token=my_oauth)
2nd error so let’s try another way for the moment.
I find interesting databases like those ones but they don’t have the KEY feature who’s one of the most important feature
https://www.kaggle.com/datasets/leonardopena/top-spotify-songs-from-20102019-by-year
https://www.kaggle.com/datasets/iamsumat/spotify-top-2000s-mega-dataset
Best ones :
https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset
https://www.kaggle.com/code/aeryan/spotify-music-analysis/data
In this one I have different genre, lot of tracks, and especially features : key, mode, tempo and time signature.
The real goal is definitely to create a code to use the API from Spotify.
=> Maybe with https://github.com/tgel0/spotify-data/blob/master/notebooks/SpotifyDataRetrieval.ipynb later in Python
head(notes.et.tona.top50)
After some cleaning with Excel I obtained this file
top50v3 <- top50v2 %>%
# manually re-name columns
# NEW name # OLD name
rename(url = V1,
selector = V2,
chords = V4,
key = V5)
names(top50v3)
[1] "url" "selector" "V3" "chords" "key"
top50v3$V3 <- NULL
names(top50v3)
[1] "url" "selector" "chords" "key"
top50v3 <- top50v3[-1,]
head(top50v3)
dataset2 <- read.csv("~/datasetspotify/dataspotify2/dataset2.csv")
head(dataset2)
for the next steps I will follow the processes from that link : https://epirhandbook.com/en/cleaning-data-and-core-functions.html
install.packages("here")
Installation du package dans ‘C:/Users/Microtek/AppData/Local/R/win-library/4.2’
(car ‘lib’ n'est pas spécifié)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/here_1.0.1.zip'
Content type 'application/zip' length 63959 bytes (62 KB)
downloaded 62 KB
le package ‘here’ a été décompressé et les sommes MD5 ont été vérifiées avec succés
Les packages binaires téléchargés sont dans
C:\Users\Microtek\AppData\Local\Temp\RtmpKKAIjY\downloaded_packages
install.packages("janitor")
Installation du package dans ‘C:/Users/Microtek/AppData/Local/R/win-library/4.2’
(car ‘lib’ n'est pas spécifié)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/janitor_2.1.0.zip'
Content type 'application/zip' length 252986 bytes (247 KB)
downloaded 247 KB
le package ‘janitor’ a été décompressé et les sommes MD5 ont été vérifiées avec succés
Les packages binaires téléchargés sont dans
C:\Users\Microtek\AppData\Local\Temp\RtmpKKAIjY\downloaded_packages
install.packages("lubridate")
Installation du package dans ‘C:/Users/Microtek/AppData/Local/R/win-library/4.2’
(car ‘lib’ n'est pas spécifié)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/lubridate_1.9.0.zip'
Content type 'application/zip' length 942344 bytes (920 KB)
downloaded 920 KB
le package ‘lubridate’ a été décompressé et les sommes MD5 ont été vérifiées avec succés
Les packages binaires téléchargés sont dans
C:\Users\Microtek\AppData\Local\Temp\RtmpKKAIjY\downloaded_packages
install.packages("matchmaker")
Installation du package dans ‘C:/Users/Microtek/AppData/Local/R/win-library/4.2’
(car ‘lib’ n'est pas spécifié)
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/matchmaker_0.1.1.zip'
Content type 'application/zip' length 58501 bytes (57 KB)
downloaded 57 KB
le package ‘matchmaker’ a été décompressé et les sommes MD5 ont été vérifiées avec succés
Les packages binaires téléchargés sont dans
C:\Users\Microtek\AppData\Local\Temp\RtmpKKAIjY\downloaded_packages
install.packages("epitkit")
Installation du package dans ‘C:/Users/Microtek/AppData/Local/R/win-library/4.2’
(car ‘lib’ n'est pas spécifié)
Warning in install.packages :
le package ‘epitkit’ n'est pas disponible for this version of R
Une version de ce package pour votre version de R est peut-être disponible ailleurs,
Voyez des idées à
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
install.packages("magrittr")
Error in install.packages : Updating loaded packages
library(rio)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
The following rio suggested packages are not installed: ‘arrow’, ‘feather’, ‘fst’, ‘hexView’, ‘pzfx’, ‘readODS’, ‘rmatio’
Use 'install_formats()' to install them
skimr::skim(dataset2)
── Data Summary ────────────────────────
Values
Name dataset2
Number of rows 114000
Number of columns 21
_______________________
Column type frequency:
character 6
numeric 15
________________________
Group variables None
names(dataset2)
[1] "X" "track_id" "artists" "album_name" "track_name" "popularity" "duration_ms"
[8] "explicit" "danceability" "energy" "key" "loudness" "mode" "speechiness"
[15] "acousticness" "instrumentalness" "liveness" "valence" "tempo" "time_signature" "track_genre"
dataset2 %>%
select(track_name,key,mode,tempo,time_signature,track_genre,popularity) %>%
names()
[1] "track_name" "key" "mode"
[4] "tempo" "time_signature" "track_genre"
[7] "popularity"
ds2_by_genre <- dataset2 %>%
group_by(track_genre)
ds2_by_genre
count(dataset2,track_genre)
NA
ds2_by_genre %>%
group_by(track_genre) %>% # group data by unique values in column track_genre
summarise(n_rows = n())
dataset2 %>%
count(track_genre="jazz",key)
keyjazz_summary <- dataset2 %>%
count(track_genre="jazz",key) %>% # group and count by gender (produces "n" column)
mutate( # create percent of column - note the denominator
percent = scales::percent(n / sum(n)))
# print
keyjazz_summary
dataset2 %>%
count(track_genre, key) %>% # group and tabulate counts by two columns
ggplot()+ # pass new data frame to ggplot
geom_col( # create bar plot
mapping = aes(
x = (track_genre), # map outcome to x-axis
fill = key, # map age_cat to the fill
y = n))
key_by_genre <- dataset2 %>% # begin with linelist
group_by(track_genre) %>% # group by outcome
count(key) %>% # group and count by age_cat, and then remove age_cat grouping
mutate(percent = scales::percent(n / sum(n))) # calculate percent - note the denominator is by outcome group
key_by_genre
SPOTIFY : Check Kaggle et autres
=> exporting best keys per genre only for songs with popularity > 0,6
BOITEACHANSON : Check les cours
=> output best chords and best keys
=> how to create chords with notes.wav
top50v3 %>%
count(key)