What we plan to do

We plan to draw correlations between people’s mbti personality type and their music taste, measured by the mean of all music metadata of their playlists. Then, using music metadata (danceability, energy, loudness, speechiness, acousticness, liveliness, valence, tempo, instrumentalness), we can find specific songs from “Spotify’s trending music 2023” music library using either an API or kaggle static dataset.

What we plan to achieve

We plan to find patterns between people’s mbti personality type and their music taste, as well as recommending our audience some songs and artists based on their their preferred music elements correlated to their inputting mbti. We plan to confirm articles like hypothesis and hypothesis 2

Data and audience

Target audience

Teenagers and young adults who want to know the music tastes of their friends based on their mbti or knowledge about their personality, so that they can play songs that they might like. (This might demand a more predictive model using machine learning)

A noticeable number of teens and young adults are familiar with the MBTI personality classification. Many will need more detailed explanations of each of the functions in the MBTI personality type classifications to be able to fully comprehend our data. Furthermore, though a large portion of the target audience population have used Spotify or other music players, few will have had the chance to explore what the variables in our music metadata mean.

We will need to explain in detail what concepts MBTI utilizes. We will also need to explain any variables in the music metadata if we wish to present significant findings regarding any variable and listener MBTI but will not need to explain this if we only wish to present broader relationships between general music taste and listener MBTI.

Datasets

The first dataset we are using is mbti-spotify-dataset by Trung Le,

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
mbti <- read_delim("../MBTI-Recommender/MBTI Spotify data/combined_mbti_df.csv")
## Rows: 4081 Columns: 46
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): mbti, function_pair
## dbl (44): danceability_mean, danceability_stdev, energy_mean, energy_stdev, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mbti %>% 
  sample_n(30)
## # A tibble: 30 × 46
##    mbti  function_pair danceability_mean danceability_stdev energy_mean
##    <chr> <chr>                     <dbl>              <dbl>       <dbl>
##  1 ENFJ  NF                        0.599              0.153       0.558
##  2 ENFP  NF                        0.654              0.144       0.589
##  3 ENFJ  NF                        0.684              0.124       0.691
##  4 ENFJ  NF                        0.602              0.120       0.539
##  5 ENFJ  NF                        0.583              0.109       0.601
##  6 ISTP  SP                        0.706              0.141       0.651
##  7 INFP  NF                        0.534              0.147       0.532
##  8 ESTJ  SJ                        0.492              0.164       0.709
##  9 ISTP  SP                        0.538              0.172       0.557
## 10 INFJ  NF                        0.568              0.131       0.596
## # ℹ 20 more rows
## # ℹ 41 more variables: energy_stdev <dbl>, loudness_mean <dbl>,
## #   loudness_stdev <dbl>, mode_mean <dbl>, mode_stdev <dbl>,
## #   speechiness_mean <dbl>, speechiness_stdev <dbl>, acousticness_mean <dbl>,
## #   acousticness_stdev <dbl>, liveness_mean <dbl>, liveness_stdev <dbl>,
## #   valence_mean <dbl>, valence_stdev <dbl>, tempo_mean <dbl>,
## #   tempo_stdev <dbl>, instrumentalness_mean <dbl>, …

This combined dataset contains 4081 rows of data, on each row is aggregated information for a Spotify playlist. The data is crowd sourced by Spotify users.

The second dataset we are using is Most Streamed Songs on Spotify in 2023 collected by Nidula Elgiriyewithana on Kaggle.

Both datasets are from Kaggle, an online repository that allows users to share models and datasets for the community to use.

Because of this nature of the website, authors of the datasets update data on their own time. The first dataset was updated a year ago, and the second updated two months ago. These datasets were retrieved using reliable techniques such as scraping the Spotify API and computational methods that will minimize human input error, giving it good integrity for our purposes.

Important variables

Spotify Music Metadata

The Spotify audio quality features are described as - acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

  • danceability: Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

  • energy: Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

  • instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly “vocal”. The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content.

  • loudness: The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.

  • tempo: The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

  • valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

Background

MBTI stands for Myers-Briggs Type Indicator, and it was developed by Katharine Cook Briggs and Isabel Briggs Myers in the 1940s, based on the psychological theories of Carl Jung. The idea behind MBTI is that individuals can be categorized into 16 distinct personality types based on four basic preferences: extraversion (E) vs. introversion (I) sensing (S) vs. intuition (N) thinking (T) vs. feeling (F) judging (J) vs. perceiving (P). Each of these preferences can manifest in different ways, resulting in a complex system of personality types that can help individuals better understand themselves and others.

The MBTI has been widely used in various fields, including psychology, career counseling, and personal development. Now, we’re curious if MBTI correlates with music taste.

4 questions to explore

We want to explain to our audience any relationships between the ways they tend to think and the music that they or others might like to listen to based on their habits of views of the world. People take MBTI tests with light interest or deep passion about finding and getting to know themselves better. The answers to these questions may serve different types of people in different ways, ranging from aiding people in exploring themselves better to discovering and healing parts of themselves that they could not reach. They might discover new communities and new worlds of music that others who have some resemblance to their view of the world enjoy, or celebrate the idiosyncrasies of each individual’s music tastes.

  1. Is there a correlation between musical positiveness (valence) and people’s mbti types?
  2. Do different functioning pairs (NF,NT,ST,SF) correlate with intangible/feeling music taste (dancebility, energy, valence) and tangibe/factual music taste (loudness, tempo, duration, instrumentalness)?
  3. What are the music preferences of each personality type in 6 dimensions of measure?
  4. What might be 16 personality type’s top 3 fav songs and artists now (2023)? (If we are able to use spotify API, then this result would be dynamic)

Technical Description

  1. How will you be reading in your data (i.e., are you using an API, or is it a static .csv/.json file)?

We will read two kaggle static dataset OR use spotify api to access dynamic dataset.

Alternatively, Amelia is experimenting with the Spotify API to get a dynamic dataset from the spotify music library.

  1. What kind of data processing (reshaping, reformatting, etc.) will you need to do to your data?

Grouping using facet

We plan to group 16 personality into three levels: 1. 16 unique mbti personalities 2. 4 function pairs: Analysts (purple), Diplomats (Green), Sentinels (Blue), and Explorers (yellow) 3. Introverted vs. Extroverted

Reformatting

There are ~300 individuals for each personality types. It would be messy to include all of them on a scatterplot. Therefore, Meiyao plans to reformat them such that there are 10 groups (each take means for 30 individuals) for each personality types.

  1. What (major/new) libraries you will be using in this project (no need to list common libraries that are used in many projects such as dplyr)
  1. What questions, if any, will you be answering with statistical analysis/machine learning?

We also hope to figure out if we can use machine learning to train the model such that we can recommend songs from spotify real time based on user’s inputing mbti.

Major challenges we anticipate

  1. Get dynamic dataset from spotify using api, and how to link it with the data in mbti-spotify. See if it is possible for users to enter their mbti, and then search the top 3 songs they may like from the spotify dynamic database
  2. Cleaning and organizing data groups
  3. Creating data visualization, especially radar chart
  4. Making sense out of the variables (they all have different measurement and scales)

Visualizations

Question 1

  • Each data point is a group of 30 indivisuals with the same personality type. We group this because there are a total of ~300 participants for each personality type in the dataset.
  • Then, we will take the mean of valence (music element that prefict the positivity of a song), and plot a scatterplot
  • By using aes(valence_mean, col = mbti) and Scale_fill_manual, we can creat mapping according to the variable mbti.

Question 2

  • We plan to plot 2 box plots, with x being each music elements (discrete categorical), and y being the preferrance (count)
  • Again, color mapping to how general public perceived/get introduced to mbti, being that there are 4 function pairs: Analysts (purple), Diplomats (Green), Sentinels (Blue), and Explorers (yellow).
  • use geom_histogram(pos="dodge") to contras their differences.

Question 3

  • we plan to use radar graphs with six dimensions (dancebility, energy, instrumentalist, loudness, tempo, duration) to show each MBTI type preferences of each of these music genre/element.
  • Alternatively, we can try to connect music metadata with music genre, and plot radar graphs with these six dimensions (pop, musical theater, indie, EDM, Rap). However, there might not be a clear and direct relationship between these six dimensions
  • facet_wrap: since the personality types are categorical, this function can be used to display 16 subplots for 16 types.

Question 4

  • we plan to create an interactive interface that give users output of top 3 recommended songs and artists based on their preferred music metadata.
  • Alternatively, we can show distributions of mbti’s preference for each music element if this plan does not work out