Introduction

Motivation

If you are a Millennial or a Gen Z on social media during the first week of December, you might see your friends reposting their Spotify Wrapped all over social media. Spotify Wrapped lets users share the 5 artists/songs a user has listened to the most and branded its so-called “listening personality”. Similarly, mbti has become popular among young adults. We start to wonder if people with a certain mbti personality type would have a certain kind of Spotify Wrapped.

Context: What is MBTI?

MBTI stands for Myers-Briggs Type Indicator, developed by Katharine Cook Briggs and Isabel Briggs Myers in the 1940s based on the psychological theories of Carl Jung. The idea behind MBTI is that individuals can be categorized into 16 distinct personality types based on four basic preferences: extraversion (E) vs. introversion (I), sensing (S) vs. intuition (N), thinking (T) vs. feeling (F), and judging (J) vs. perceiving (P).

Extraversion (E) vs. introversion (I) measure whether a person gains more energy by being in the company of other people or alone. Sensing (S) vs. intuition (I) measures whether a person relies on their past experiences to interpret their world now, or if they rely on gut feeling to interpret the world. Thinking (T) vs. feeling (F) measures whether a person values logic or emotions when they are in the need of making any decisions or in any other context. Judging (J) vs. perceiving (P) is the extent to which an individual plans ahead or needs a structured plan; judging types tend to need a more formal and solidified expectation to their tasks ahead, while perceiving types are more likely to be comfortable with a rough outline and approach things as they come. Not only are these tendencies and values expressed in personal decision-making processes, it can be seen in an individual’s worldview and their mindset, affecting how they perceive the world and how they approach other people’s situations too. All combinations of these four facets come out to a total of 16 personalities in the form of four letters, such as ENFP.

There are also broader categories, namely, the four function pairs: NF, NT, SJ, SP. These are paired to look at personalities from a broader sense due to their tendencies to be more similar to each other. It is notable that they do not pair the same two facets; intuition (N) pairs are paired with whether they lean towards thinking (T) or feeling (F), while sensing (S) pairs are paired with whether they lean towards judging (J) or perceiving (P). This is due to the nature of N versus S and how the combinations work; for example, it is broadly assumed that an INFP individual and an INFJ individual can be very hard to tell apart due to the nature of the combination of N and F or N and T forming a more distinctly different personality type than say an N and J combination. For sensing (S) types, this is not the case, and instead an ESFJ and ESFP will be more different than an ESFJ and ESTJ.

Questions that we want to answer:

Is there a correlation between musical positiveness (valence) and people’s MBTI types? In other words, do certain MBTI types generally like to listen to happier or sadder songs, and are there any patterns in relation to any of the facets?
Do different functioning pairs (NF, NT, SJ, SP) correlate with intangible/feeling music taste (dancebility, energy, valence) and tangibe/factual music taste (loudness, tempo, duration, instrumentalness)? For this question it is our attempt to see if there are any other factors of music that can reveal and relate a lot to one’s personality type.
What are the music preferences of each personality type in 6 dimensions of measure? In other words, 3. is there a repeatable pattern between MBTI and music taste, enough that a preference can be seen and predicted in a person’s music taste and so that recommendations can be made to them knowing their MBTI?
What might be each of the 16 personality type’s top three favorite songs and artists now (2023)? Here we attempted to find a more specific storyline in identifying what popular songs right now may most appeal to each personality type. This could even work both ways in that the way it may appeal to a personality type will also increase its popularity.

Why do we do this?

This is a fascinating topic because the human mind is a very complex system that makes fully understanding ourselves and others something both seemingly reachable yet unthinkable. From an outsider’s perspective, it feels incredible that philosophers such as Carl Jung were able to pull out several distinct essences and devise explanations for our complicated human beings. It is a means for the general public to be able to somewhat catch a glimpse into a potential deeper grasp of who they are, something that many people seek in the modern world where self-discovery has slowly become an important desire in life. Though many people have now been exposed to the idea of the 16 personalities in MBTI, the original philosophy is actually a very convoluted model that captures a lot more of the nuance to Jung’s theory, ranging from developmental changes and predictions in facets of the personality morphing throughout life to shadow personalities and hidden masks and personas. The MBTI is a lot more simplistic in the way that it is able to provide non-fanatics some of the rudimentary ideas of Jung’s philosophy to suit their own needs, carrying a lingerance of accuracy and reliability in giving people a glimpse of their inner personality and mindset. Through the MBTI test, people can read up on who they are more likely to be compatible with and create deeper bonds with others by examining what the websites propose their personalities are like. Much like the MBTI, music taste is a very nuanced interest for people. Using metadata about songs on Spotify, the metadata on aspects of music such as valence can be a way to describe the happiness expression extent of the music. It is a multidimensional approach to classifying music and therefore in a way the people who enjoy listening to that music. Understanding music taste is understanding yet another crucial dimension to a person. The role that music plays for an individual and what the story of the music they listen to is inherently shaped by who they are, their own stories, their preferences, and what they feel connected to. Thus, we planned to draw correlations and find patterns between people’s mbti personality type and their music taste, measured by the mean values of all music metadata of their playlists. Then, using music metadata (danceability, energy, loudness, speechiness, acousticness, liveliness, valence, tempo, instrumentalness), we can find specific songs from “Spotify’s trending music 2023” music library to recommend to our audience some songs based on their inputting their MBTI.

Data Explaination

Load Original Dataset

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 4081 Columns: 46
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): mbti, function_pair
## dbl (44): danceability_mean, danceability_stdev, energy_mean, energy_stdev, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 953 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): track_name, artist(s)_name, streams, key, mode
## dbl (17): artist_count, released_year, released_month, released_day, in_spot...
## num  (2): in_deezer_playlists, in_shazam_charts
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Original Dataset Explaination

The first dataset we are using is mbti-spotify-dataset by Trung Le. This combined dataset contains 4081 rows and 49 columns of data. Each row is an observation of aggregated mean music metadata for a Spotify playlist. The data is crowd sourced by Spotify users, manifested as playlists names that include self-identified mbti types. While the data set is reasonable and integral, it is susceptible to biases rooted in the expressiveness of each personality type. The data set also has errors such as marking a playlist named “esty, entj, intj, istj, istp” as ISTJ, which can possibly undermine or deviate the pattern we find.

The second dataset we are using is Most Streamed Songs on Spotify in 2023 collected by Nidula Elgiriyewithana on Kaggle. This dataset contains 943 rows and 24 columns of data. Each row is a song with spotify metadata, such as artist name, released date, and audio quality measurements. The dataset has some encoding errors, as manifested by ï¿½. We need to manually filter out songs that include garbled code to prevent its presence in our recommended songs for part 4.

Both datasets are from Kaggle, an online repository that allows users to share models and datasets for the community to non-commercial use. The first dataset was updated a year ago, and the second updated two months ago. These datasets were retrieved using reliable techniques such as scraping the Spotify API and computational methods that will minimize human input error, giving it good integrity for our purposes.

The mbtis dataset looks good, with no null or unreasonable data. All codes in the first three columns should consist of characters without any numeric values, so any data containing numeric values will be considered invalid. According to the code, there is no invalid data in the first three columns. Musical feature data should not contain anything other than numerical values, so any data that is not entirely numerical will be considered worthless. Judging from the code, there is no invalid data in the remaining columns. According to the calculation standard of the data set, the music features should all be in the range of 0 to 1. By finding the maximum and minimum values, all values are within that range.

The spotify data doesn’t look good. The first two columns are song names and authors that should be in character form. However, as we loaded the data, some of them were unable to translate to a wide string. We found out that some characters in Spanish were not able to be shown in the dataset. These data was replaced by replacement character such as �, ï,¿,½. The song will be displaying like ‘Frï¿½ï¿½gil’, which is unable to be comprehend by audience. It’s probably due to how the dataset was encoded and we couldn’t find the right way to encode it. Therefore, we decided to filter all rows with unreadable song name or author name out. There are 70 song names with replacement character and 48 song authors with replacement character. For streams, there is one song with a stream starting with “BPM110” which shouldn’t be like that. So we decided to filter this out also. The value of music traits looks good without character in it. The range was from 0-100 which is reasonable.

Data cleaning

We selected variables we want as we mentioned above. We also added a column about common names for mbti which is used for shiny app. We renamed the variable so that it’s more clear and easy to manipulate.

## # A tibble: 6 × 10
##   mbti  function_pair mbti_name danceability valence energy acousticness
##   <chr> <chr>         <chr>            <dbl>   <dbl>  <dbl>        <dbl>
## 1 INFP  NF            Mediator         0.558   0.449  0.553       0.293 
## 2 INFP  NF            Mediator         0.588   0.448  0.556       0.341 
## 3 INFP  NF            Mediator         0.677   0.265  0.851       0.0581
## 4 INFP  NF            Mediator         0.517   0.338  0.513       0.442 
## 5 INFP  NF            Mediator         0.560   0.351  0.446       0.480 
## 6 INFP  NF            Mediator         0.614   0.508  0.728       0.188 
## # ℹ 3 more variables: instrumentalness <dbl>, liveness <dbl>, speechiness <dbl>

We checked if the dataset has invalid or unreasonable data. The first three column should be consists of characters only and the rest should consist of numbers only. Based on how data was collected, the max and min value of each music attributes should be in range 0 - 1. So we checked the range to make sure there are no invalid data.

## $Check_for_numeric
##          mbti function_pair     mbti_name 
##         FALSE         FALSE         FALSE 
## 
## $Check_for_character
##     danceability          valence           energy     acousticness 
##            FALSE            FALSE            FALSE            FALSE 
## instrumentalness         liveness      speechiness 
##            FALSE            FALSE            FALSE 
## 
## $MinValues
##     danceability          valence           energy     acousticness 
##                0                0                0                0 
## instrumentalness         liveness      speechiness 
##                0                0                0 
## 
## $MaxValues
##     danceability          valence           energy     acousticness 
##        0.8402222        0.8344000        0.9662000        0.9831800 
## instrumentalness         liveness      speechiness 
##        0.9168333        0.6671960        0.8996600

We selected variables we want (mentioned above) and renamed them.

## # A tibble: 6 × 10
##   track_name             artist streams danceability valence energy acousticness
##   <chr>                  <chr>  <chr>          <dbl>   <dbl>  <dbl>        <dbl>
## 1 Seven (feat. Latto) (… Latto… 141381…           80      89     83           31
## 2 LALA                   Myke … 133716…           71      61     74            7
## 3 vampire                Olivi… 140003…           51      32     53           17
## 4 Cruel Summer           Taylo… 800840…           55      58     72           11
## 5 WHERE SHE GOES         Bad B… 303236…           65      23     80           14
## 6 Sprinter               Dave,… 183706…           92      66     58           19
## # ℹ 3 more variables: instrumentalness <dbl>, liveness <dbl>, speechiness <dbl>

We checked if the data was good by following standard. The first two columns should consist of character only with no unreadable symbols. The rest of columns should consist of numbers only. For music attributes, based on how data was collected, the value should be in range of 0-100. So we find the max and min values of 7 music attibutes.

## $Check_for_pattern
## track_name     artist 
##      FALSE      FALSE 
## 
## $Count_invalid
## track_name     artist 
##         70         48 
## 
## $Check_for_character
##          streams     danceability          valence           energy 
##             TRUE            FALSE            FALSE            FALSE 
##     acousticness instrumentalness         liveness      speechiness 
##            FALSE            FALSE            FALSE            FALSE 
## 
## $MinValues
##          streams     danceability          valence           energy 
##      "100409613"             "23"              "4"              "9" 
##     acousticness instrumentalness         liveness      speechiness 
##              "0"              "0"              "3"              "2" 
## 
## $MaxValues
##                                                                                                  streams 
## "BPM110KeyAModeMajorDanceability53Valence75Energy69Acousticness7Instrumentalness0Liveness17Speechiness3" 
##                                                                                             danceability 
##                                                                                                     "96" 
##                                                                                                  valence 
##                                                                                                     "97" 
##                                                                                                   energy 
##                                                                                                     "97" 
##                                                                                             acousticness 
##                                                                                                     "97" 
##                                                                                         instrumentalness 
##                                                                                                     "91" 
##                                                                                                 liveness 
##                                                                                                     "97" 
##                                                                                              speechiness 
##                                                                                                     "64"

The first two column are song names and authors that should be in character form. However, as we load the data, part of them were unable to translate to wide string. As we deep futher into the problem, we found out that some characters in Spanish were not able to be shown in the dataset. These data was replaced by replacement character such as �, ï,¿,½. The song will be displaying like ‘Frï¿½ï¿½gil’, which is unable to be comprehend by audience. It’s probably due to how the dataset was encoded and we couldn’t find the right way to encode it. Therefore, we decided to filter all rows with unreadable song name or author name out. There are 70 song names with replacement character and 48 song authors with replacement character. For streams, there is a song with stream number as character which should’t be like that. So we decided to filter this out also. The value of music traits looks good without character in it. The range was from 0-100 which is reasonable.

Based on summrization above, we noticed that there are one row in data that has non numeric values in streams column. So we decided to filter it out. Additionally, due to how data was encoded, Spanish character in track_name and artist column cannot be displayed and was replaced by replacement character such as “ï¿½”. These symbol doesn’t make sense so we decided to filter them out.

Our final dataset

Each row in the song recommendation dataset represents the possible favorite song by each mbti. We have 4082 observations of the mbti type and 724 songs from Spotify. Due to running speed, we limited the songs to the top 300 songs with most streams for analysis. There are two variables that are included in the final dataset. Mbti represents 16 mbti types and song_name represents their possible favorite songs. The detailed method of choosing will be in the methods part. There are no missing, non-plausible values or quality issues.

Each row in the artist recommendation dataset represents the possible favorite artist by each mbti. We have 4082 observations of the mbti type and 724 songs from Spotify. Due to running speed and the limited ability we have. We are only able to analyze the songs with single artists. So we have 489 observations from Spotify. There are two variables that are included in the final dataset. Mbti represents 16 mbti types and artists represent music creators. There are no missing, non-plausible values or quality issues.

Methods

Question 1: Do certain MBTI types generally like to listen to happier or sadder songs?

## `summarise()` has grouped output by 'mbti'. You can override using the
## `.groups` argument.

## # A tibble: 6 × 4
## # Groups:   mbti [2]
##   mbti  group     n four_cats
##   <chr> <dbl> <dbl> <chr>    
## 1 ESTJ    133 0.487 ST       
## 2 ESTJ    134 0.544 ST       
## 3 ESTJ    135 0.503 ST       
## 4 ESTJ    136 0.485 ST       
## 5 ESTJ    137 0.556 ST       
## 6 ESTP    106 0.531 ST

In the first graph we visualized their preferred musical valence values. To enhance the visual appeal of the dot plot, we employed the ceiling(row_number()) function to selectively sample data points, not overwhelming the graph with excessive data. Then we calculated the mean valence values of the respective MBTI types from the dataframe. We further organized the data based on their functioning pairs, augmenting the dataframe with an additional column specifying the functional pair names. We then color-coded the graph for each MBTI accordingly, adding more visually informative representation in the dot plot.

Question 2: Are functioning pairs an influencing factor?

## # A tibble: 6 × 3
##   function_pair attribute         mean_value
##   <chr>         <chr>                  <dbl>
## 1 NF            mean_danceability      0.581
## 2 NF            mean_energy            0.604
## 3 NF            mean_valence           0.461
## 4 NT            mean_danceability      0.600
## 5 NT            mean_energy            0.657
## 6 NT            mean_valence           0.475

We grouped data by function pairs and find out the mean values of subjective music arributes danceability, energy, and valence. We make these columns in longer format, and make these attributes into single column attributes. Then, we use ggplot to plot the graph. While plotting we made customize graph by using stat = “identity”, position = “dodge” to make bars next to each other. We used facet_wrap(~ attribute, scales = “free_y”) to create seperate plots for each attribute. and scale_fill_manual() to fit costomized color based on the color representation of mbti

## # A tibble: 6 × 3
##   function_pair attribute             mean_value
##   <chr>         <chr>                      <dbl>
## 1 NF            mean_speechiness          0.0684
## 2 NF            mean_acousticness         0.301 
## 3 NF            mean_instrumentalness     0.0743
## 4 NT            mean_speechiness          0.0867
## 5 NT            mean_acousticness         0.208 
## 6 NT            mean_instrumentalness     0.0949

Then we did the same thing for objective music attributes speechiness, acousticness, instrumentalness.

Question 3: Radar Chart

## # A tibble: 6 × 7
##   mbti  Danceability Energy Valence Instrumentalness Acousticness Speechiness
##   <chr>        <dbl>  <dbl>   <dbl>            <dbl>        <dbl>       <dbl>
## 1 ENFJ         0.588  0.613   0.477           0.0601        0.277      0.0710
## 2 ENFP         0.617  0.665   0.530           0.0559        0.231      0.0730
## 3 ENTJ         0.632  0.677   0.495           0.0518        0.180      0.100 
## 4 ENTP         0.632  0.688   0.531           0.0672        0.162      0.0942
## 5 ESFJ         0.597  0.628   0.500           0.0439        0.269      0.0720
## 6 ESFP         0.638  0.692   0.555           0.0414        0.187      0.0830

For the radar chart we calculated the mean values for specific musical attributes grouped by MBTI type. We then created a vector of MBTI colors associated with each MBTI type to use in the radar chart. And we assigned maximum and minimum value for each attribute and stored it in new dataframes. We iterated through each MBTI to create 16 radar chart, and each is created by combining maximum, minimum, and average value for each attribute; in the radarchart function, we used different functions to set varies cosmetic aspect of the chart.

Question 4: So… Spotify Wrapped?

## # A tibble: 6 × 8
##   mbti  danceability valence energy acousticness instrumentalness liveness
##   <chr>        <dbl>   <dbl>  <dbl>        <dbl>            <dbl>    <dbl>
## 1 ENFJ          41.3    45.5   43.3       50.2              19.3      15.1
## 2 ENFP          75.3    82.3   75.5       30.3              15.0      61.5
## 3 ENTJ          92.6    57.9   83.0        8.00             10.8      70.9
## 4 ENTP          92.4    82.8   89.8        0.299            26.7      93.8
## 5 ESFJ          52.4    61.4   52.7       46.8               2.58     38.0
## 6 ESFP         100     100     92.1       11.1               0        76.5
## # ℹ 1 more variable: speechiness <dbl>

## # A tibble: 6 × 8
##   track_name  danceability valence energy acousticness instrumentalness liveness
##   <chr>              <dbl>   <dbl>  <dbl>        <dbl>            <dbl>    <dbl>
## 1 Blinding L…           50      38     80            0                0        9
## 2 Shape of Y…           83      93     65           58                0        9
## 3 Someone Yo…           50      45     41           75                0       11
## 4 Dance Monk…           82      54     59           69                0       18
## 5 One Dance             77      36     63            1                0       36
## 6 STAY (with…           59      48     76            4                0       10
## # ℹ 1 more variable: speechiness <dbl>

We used the same manipulated mbti data from question 3. For spotify dataset, we filtered out the song with multiple artist and keep the song with only one artist. We grouped the data by artist, change streams from character to number, calcualte the sum number of streams and choose the artists with top 100 streams. We found these 100 artist in the original dataset and keep only these artist. Then we group the filtered dataset by artist and calculate each artist’s 7 mean music attributes.

## # A tibble: 6 × 8
##   artist      danceability valence energy acousticness instrumentalness liveness
##   <chr>              <dbl>   <dbl>  <dbl>        <dbl>            <dbl>    <dbl>
## 1 Adele               61.8    41.8   63.5         20                  0      8.5
## 2 Aerosmith           39      24     43           39                  0     23  
## 3 Alec Benja…         65      51     55           73                  0     14  
## 4 Andy Willi…         24      76     60           77                  0     12  
## 5 Arctic Mon…         48      44     42           12                  2     11  
## 6 Ariana Gra…         59.5    59.5   71.5         24.5                0     19.5
## # ℹ 1 more variable: speechiness <dbl>

The implementation is the same as the one above to provide recommend song.

Results

Question 1: Do certain MBTI types generally like to listen to happier or sadder songs?

We first grouped the mbti-spotify data based on 16 MBTI types and generated a summary variable for their primary musical valence. By assigning different colors to each MBTI function pair, we positioned extraverts to the left and introverts to the right in the graph. The arrangement makes it easy to see that extraverted individuals tend to gravitate toward music with higher valence. Valence is a measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). On the x-axis of the plot each MBTI type is listed, and on the y-axis of the plot is the mean valence of the music playlists. The extroverted (E) MBTI types are on the left and introverted types are on the right. There is a clear visible trend of extroverted types having a higher mean valence of songs than introverted types. In general, perceiving (P) types and sensing (S) types correlate to a higher mean valence, whereas judging (J) types and intuition (N) correlate to a lower mean valence. We are curious if different axis combination has different effects on people’s music taste, so we generated our second plot based on 4 functioning pair groups.

Question 2: Are functioning pairs an influencing factor?

In this plot, we grouped playlists based on functioning pairs (NF, NT, SJ, SP) and make inferences of more characteristics of music. We chose three subjective measurements (danceability, energy, valence) and three objective measurements (Acousticness, instrumentalness, speechiness) to classify music taste, which is abstract. First, looking at the mean valence again on the graph on the right, we see that SJ has songs with lowest mean valence followed closely by NF, then NT has slightly higher mean valence and SP clearly has higher mean valence than the rest. The two graphs on the left showcase respectively mean danceability of the songs, which means how suitable the songs are for dancing with higher value being more suitable, and mean energy of the songs, which means the perceived energeticness of the song with higher value being more energetic such as louder and/or faster. Surprisingly, for danceability and energy, there is the exact same pattern for the four function pairs, with least to greatest being SJ, NF, NT, and SP, and even with similar gaps with SJ and NF being closer, NT being slightly higher, and SP being higher than NT.

Question 3: music preferences of each personality type

Lastly, we use radar charts of variables for each of the 16 MBTI types to infer individual preferences of the six music characteristics we picked. Here there is a distinct pattern of introverted (I) types being characterized by a large amount of instrumentalness, the only slight exceptions being introverted SP function pairs (ISxP) do not lean towards instrumentalness so much. But, in all of the personalities, the extroverted (E) types very clearly lean away from instrumentalness and lean more towards every other attribute; there is no extroverted (E) type that has another attribute less than instrumentalness. It is also clear that there are distinct differences in each of the sixteen personality types, with some similarities between function pairs.

Question 4: So… Spotify Wrapped?

Song Recommender:

## # A tibble: 6 × 2
##   mbti  song_name                                                  
##   <chr> <chr>                                                      
## 1 ENFJ  Lover                                                      
## 2 ENFP  Enemy (with JID) - from the series Arcane League of Legends
## 3 ENTJ  Stan                                                       
## 4 ENTP  Super Freaky Girl                                          
## 5 ESFJ  Ain't Shit                                                 
## 6 ESFP  Super Freaky Girl

Artist Recommender:

## # A tibble: 6 × 2
##   mbti  artist       
##   <chr> <chr>        
## 1 ENFJ  Labrinth     
## 2 ENFP  Lizzo        
## 3 ENTJ  BLACKPINK    
## 4 ENTP  Dua Lipa     
## 5 ESFJ  Justin Bieber
## 6 ESFP  Lizzo

By optimizing Euclidean distance between mbti data and the mean music metadata from spotify library, we found the top three artists and songs that each mbti might find gravitated to. Young adults who find both Spotify Wrapped and MBTI interesting might use this to find songs that they find interesting. This is also an attempt to visualize the pattern we find and confirm with people around us with different MBTI personality type, as mean music metadata might be too abstract to discuss.

Findings

What does data tell us?

The results revealed to us that there is a correlation between music taste and how people view and interact with the world. For example, extroverted people derive satisfaction from external stimuli, which corresponds to songs with higher energy and danceability. In contrast, introverted people derive satisfaction from introspections, which correspond to songs with instrumental and calm elements. Moreover, people with judging (J) and intuition (N) preferences tend to plan and think more. People report choosing to listen to sad music more often when they are alone, when they are in emotional distress or feeling lonely, when they are in reflective or introspective moods (Taruffi and Koelsch, 2014), which might explain their preference to songs with lower valence. The diverse distribution of the radar charts also reveals that people with different personality types have different music preferences.

Limitations

However, our theory has many limitations. From the technical aspect, our original datasets have different scales for the music metadata. The mbti-spotify combined dataset records from 0 to 1, whereas the spotify 2023 one is recorded in percentage. This might lead to potential inaccuacy during mathematical transformation. The dataset that we worked on are playlists created by people who self-identified their MBTI types, which is susceptible to stereotypes and popular culture. At the same time, the mean music metadata are aggregate values of 300+ individual playlists, meaning that individual preferences may vary. Music taste is also a nuance reflection of one’s personality, emotions, and experiences. Confounding variables like culture, identity, and age also contribute to the human diversity of music taste. Finally, correlations do not imply causation, so these observations should be interpreted cautiously.

Future work

We can refine the prediction method, using a more complex algorithm to calculate similar score. We can also predict the five most favorite like Spotify Wrapped to give users more options and insights.

Additionally, if we can obtain a less biased database, we can generate a more accurate model that has the potential to calculate the mean music metadata of each country based on the proportion of mbti personalities. The reason for doing this is that we are also curious why certain kinds of music get popular in one country, but not in another one. Maybe American Pop songs are more danceable than Chinese Pop because the American public are more extroverted than the Chinese public? This might be fun to explore.

Summary/takeaway message:

By using metadata about songs on Spotify and basic data manipulation, we found a broad correlation and pattern between mbti personality types and their music tastes. We used scatterplot, bar plots, radar chart, and programming to demonstrate our findings, and we learned how to use shiny app to create interactive websites. Although the data can be biased, the story that data tells us matches our real life experience with music preferences. The takeaway from this project is that we should celebrate human’s diversed and nuanced feelings.

Info201 Final

2023-12-08

Introduction

Motivation

Context: What is MBTI?

Questions that we want to answer:

Why do we do this?

Data Explaination

Load Original Dataset

Original Dataset Explaination

Data cleaning

Our final dataset

Methods

Question 1: Do certain MBTI types generally like to listen to happier or sadder songs?

Question 2: Are functioning pairs an influencing factor?

Question 3: Radar Chart

Question 4: So… Spotify Wrapped?

Results

Question 1: Do certain MBTI types generally like to listen to happier or sadder songs?

Question 2: Are functioning pairs an influencing factor?

Question 3: music preferences of each personality type

Question 4: So… Spotify Wrapped?

Findings

What does data tell us?

Limitations

Future work

Summary/takeaway message: