Final Project Proposal

What we plan to do

We plan to draw correlations between people’s mbti personality type and the mean music metadata of their playlists. Then, using music metadata (danceability, energy, loudness, speechiness, acousticness, liveliness, valence, tempo, instrumentalness), we can find specific songs from “Spotify’s trending music 2023” music library using either an API or kaggle static dataset.

What we plan to achieve

We plan to find patterns between people’s mbti personality type and their music taste, as well as showing our audience some example songs and artists. We plan to confirm articles like hypothesis and hypothesis 2

Data and audience

Target audience

Teenagers and young adults who want to know the music tastes of their friends based on their mbti or knowledge about their personality, so that they can play songs that they might like. (This might demand a more predictive model using machine learning)

Datasets

The first dataset we are using is mbti-spotify-dataset by TL,

This combined dataset contains 4081 rows of data, on each row is aggregated information for a Spotify playlist. The data is crowd sourced by Spotify users.

The second dataset we are using is Most Streamed Songs on Spotify in 2023 collected by Nidula Elgiriyewithana on Kaggle.

Important variables

Spotify Music Metadata

The Spotify audio quality features are described as - acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.
liveness: Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.
speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

Background

MBTI stands for Myers-Briggs Type Indicator, and it was developed by Katharine Cook Briggs and Isabel Briggs Myers in the 1940s, based on the psychological theories of Carl Jung. The idea behind MBTI is that individuals can be categorized into 16 distinct personality types based on four basic preferences: extraversion (E) vs. introversion (I) sensing (S) vs. intuition (N) thinking (T) vs. feeling (F) judging (J) vs. perceiving (P). Each of these preferences can manifest in different ways, resulting in a complex system of personality types that can help individuals better understand themselves and others.

The MBTI has been widely used in various fields, including psychology, career counseling, and personal development. Now, we’re curious if MBTI correlates with music taste.

4 questions to explore

Do mbti traits (E vs I, N vs S, F vs T, P vs J) correlate with intangible music taste (dancebility, energy, Acousticness)?
Do mbti traits (E vs I, N vs S, F vs T, P vs J) correlate music taste (loudness, tempo, speechiness, duration, Key, Time signature tangible traits)?
Do mbti traits (E vs I, N vs S, F vs T, P vs J) correlate specific genres?
What are 16 different personality’s fav top 3 songs and artists?

Technical Description

How will you be reading in your data (i.e., are you using an API, or is it a static .csv/.json file)?

We will read two kaggle static dataset OR use spotify api to access dynamic dataset.

What kind of data processing (reshaping, reformatting, etc.) will you need to do to your data?

We decide to connect the two datasets using their common columns of spotify music metadata. Alternatively, Amelia is experimenting with the Spotify API to get a dynamic dataset from the spotify music library.

What (major/new) libraries you will be using in this project (no need to list common libraries that are used in many projects such as dplyr)

library(spotifyr) library(tidyverse) library(fmsb)

What questions, if any, will you be answering with statistical analysis/machine learning?

Do mbti traits (E vs I, N vs S, F vs T, P vs J) correlate with intangible music taste (dancebility, energy, Acousticness)?
Do mbti traits (E vs I, N vs S, F vs T, P vs J) correlate music taste (loudness, tempo, speechiness, duration, Key, Time signature tangible traits)?
Do mbti traits (E vs I, N vs S, F vs T, P vs J) correlate specific genres?
What are some possible 16 personality’s favorite top 5 songs and artists from the trending music in 2023?

We also hope to figure out if we can use machine learning to train the model such that we can recommend songs from spotify real time based on user’s inputing mbti.

Major challenges we anticipate

Get dynamic dataset from spotify using api generalizing data visualization, especially radar chart Making sense out of the variables (they all have different measurement and scales)

Visualizations

For question 1 and 2, we envision generating pivot tables, which show each personality trait’s preference of each music element.
For question 3, we can use a radar graph to show each MBTI type preferences of each music genre.
For question 4, we plan to create a network graph or just give users output of top songs and artists.