The Rhythm of Bob Marley: A Data-Driven Exploration

Introduction:

Bob Marley 6 February 1945 – 11 May 19811, born Robert Nesta Marley, stands as a legendary figure in the world of reggae music and cultural expression. His work is deeply intertwined with the Rastafari movement, which emerged in Jamaica in the 1930s. This movement, which venerates Haile Selassie I of Ethiopia as the returned messiah, focuses on themes of black empowerment, spiritual growth, and resistance against oppression2. Marley’s music served as a powerful vehicle for these messages, making him not just a musician, but a global symbol of social justice and cultural pride.

On the methodology:

I extracted data from Spotify’s Web API using the spotifyr package in R. By setting up my client credentials (SPOTIFY_CLIENT_ID and SPOTIFY_CLIENT_SECRET), I authenticated and generated an access token to retrieve artist data. Using the function get_artist_audio_features(), I collected Bob Marley’s track-level audio features, including danceability, energy, and valence. The extracted dataset was then saved as a CSV file (my_artist.csv) for further analysis and visualization. This process enabled a data-driven exploration of Bob Marley’s musical characteristics across different albums and eras.

Sys.setenv(SPOTIFY_CLIENT_ID = '81fe722e3e3c48e79f1a298af3b3c353') 
Sys.setenv(SPOTIFY_CLIENT_SECRET = '9f34a7dfa4b24c06ad5143fe4c79dce2')
access_token <- get_spotify_access_token()
my_artist <- get_artist_audio_features("Bob Marley")
write_csv(my_artist, "my_artist.csv")
artist <- read_csv("my_artist.csv",show_col_types = FALSE)

How do the musical characteristics (danceability and valence) of Bob Marley’s original tracks compare to those of remixes and live recordings?

For this project I decided to look at the two variables danceability and valence. Analyzing danceability and valence of Bob Marley’s work could offer insights into the emotional and rhythmic elements that make his music resonate so deeply. Danceability captures how appealing to dance are his songs, and Valence, on the other hand, measures the emotional tone of his music, which ranges from joyful and happy to sad and calm.

By comparing these characteristics across original tracks, remixes, and live recordings, we can better understand how different interpretations and settings influence the emotional impact of Marley’s music.

Exploratory analysis:

Note: Given that Bob Marley passed away in 1981, all album releases after this date are either remixes or compilations of previously recorded songs. This presents a challenge in analyzing trends based on album release years, as these posthumous releases do not reflect new work by Marley himself.

After thorough research I was able to determine all of the albums in the dataset that are remixes, which was then used to create my categories.

remixes <- c(
  "Africa Unite",
  "Bob Marley with the Chineke! Orchestra",
  "Exodus 40",
  "Legend Remixed",
  "Roots, Rock, Remixed: The Complete Sessions",
  "Chant Down Babylon",
  "Dreams Of Freedom (Ambient Translations Of Bob Marley In Dub)"
)

remix_bob <- artist %>%
  filter(album_name %in% remixes) %>% 
  select(c(9,10,17,16,18,19,30,36)) %>%
  mutate(category = "Remix")

bob_orig <- artist %>%
  filter(!album_name %in% remixes) %>% 
  select(c(9,10,17,16,18,19,30,36)) %>%
  mutate(category = "Original")

# Extracting only live recordings
bob_orig_live <- bob_orig %>%
  filter(str_detect(track_name, regex("live", ignore_case = TRUE))) %>%
  mutate(category = "Live")

# Making sure that bob_orig doesn't include live recordings, for comparing categories
bob_orig <- bob_orig %>%
  filter(!str_detect(track_name, regex("live", ignore_case = TRUE)))

I’m curious to see how danceability in albums varies across the three categories:

Null Hypothesis (H0): There is no significant difference in danceability between the categories (Original, Live, Remix) of Bob Marley’s tracks.

Alternative Hypothesis (H1): There is a significant difference in danceability between the categories (Original, Live, Remix) of Bob Marley’s tracks.

combined_data <- bind_rows(remix_bob, bob_orig, bob_orig_live)
summary_by_category <- combined_data %>%
  group_by(category) %>%
  summarize(
    avg_danceability = mean(danceability, na.rm = TRUE),
    avg_valence = mean(valence, na.rm = TRUE),
    n = n()
  )

# Create a gt table
summary_combined_gt <- summary_by_category %>%
    rename (Category = category,Avg_dance= avg_danceability,Avg_valence = avg_valence,Count=n) %>%
  gt()
summary_combined_gt
Category Avg_dance Avg_valence Count
Live 0.6032304 0.7085913 230
Original 0.7683247 0.7565609 271
Remix 0.7041236 0.6414157 89
summary_combined <- combined_data %>%
  group_by(album_name, category) %>%
  summarize(avg_danceability = mean(danceability, na.rm = TRUE),
            avg_valence= mean(valence, na.rm = TRUE))
ggplot(summary_combined, aes(x = category, 
                             y = avg_danceability, 
                             fill = category)) +
  geom_boxplot(show.legend = FALSE) +
  geom_jitter(alpha=0.7,
              show.legend = FALSE)+
  labs(title = "Box Plot of danceability by album for Original, Live and Remixes",
       x = "Category",
       y = "Danceability") +
  scale_fill_brewer(palette="Dark2")+
  theme_minimal()

Fig.1: Box Plot of danceability by album across categories

Interpretation:

  • From the graph we can see that the median danceability score for three categories are between 0.65 to 0.7 which show a significant level of danceability across the three categories.

Overall while original tracks are generally more consistent in danceability, live tracks and remixes exhibit more variability (spread). Maybe this is due to the live performances sound differences or different productions of remixes.

Let’s perform an Anova and Tukey test to see the differences in danceability across the three categories:

aov_danceability <- aov(danceability ~ category, data = combined_data)
summary(aov_danceability)
             Df Sum Sq Mean Sq F value Pr(>F)    
category      2  3.401  1.7006   135.9 <2e-16 ***
Residuals   587  7.343  0.0125                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tukey_bob <- TukeyHSD(aov_danceability)
tukey_bob
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = danceability ~ category, data = combined_data)

$category
                      diff         lwr         upr   p adj
Original-Live   0.16509429  0.14153267  0.18865590 0.0e+00
Remix-Live      0.10089316  0.06808581  0.13370051 0.0e+00
Remix-Original -0.06420113 -0.09630862 -0.03209364 9.8e-06

Interpretation:

  • The ANOVA results show a very high F value (135.9) and a p-value much smaller than 0.05 (p= 2e-16). This strongly suggests that we reject the null hypothesis (H0). Therefore, there is a significant difference in danceability between the categories (Original, Live, Remix) of Bob Marley’s tracks.

  • The Turkey multiple comparisons clarify more these differences:

1. Original vs. Live: The difference in danceability between Original and Live tracks is 0.165, this suggests that Original tracks have significantly higher danceability than Live tracks.

2. Remix vs. Live: The difference in danceability between Remix and Live tracks is 0.101, this suggests that Remix tracks also have significantly higher danceability than Live tracks.

3. Remix vs. Original: The difference in danceability between Remix and Original tracks is -0.064, this suggests that Original tracks have significantly higher danceability than Remix tracks.

These results indicate that there are significant differences in danceability between the different categories of Bob Marley’s tracks, thus we reject the null hypothesis. Specifically, both Original and Remix tracks have higher danceability compared to Live tracks. Additionally, Original tracks have higher danceability compared to Remix tracks. These differences might be attributed to the controlled production environment of studio recordings, the experimental nature of remixes, and the variability inherent in live performances.

Let’s see how valence influences danceability:

Hypotheses:

Null Hypothesis (H0): There is no significant relationship between valence and danceability across all categories.

Alternative Hypothesis (H1): There is a significant relationship between valence and danceability across all categories.

ggplot(combined_data, aes(x = valence, 
                          y = danceability)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm",
              se = FALSE) +
  labs(title = "Danceability vs. Valence ",
       x = "Valence",
       y = "Danceability",
       color = "Category") +
  theme_minimal()

Fig.2: Relationship between Valence and Danceability in Bob Marley's Tracks

model_1 <- lm(danceability ~ valence, 
              data = combined_data)
summary(model_1)

Call:
lm(formula = danceability ~ valence, data = combined_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.42960 -0.09175  0.01099  0.09680  0.33378 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.47267    0.02329   20.30   <2e-16 ***
valence      0.30759    0.03151    9.76   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1254 on 588 degrees of freedom
Multiple R-squared:  0.1394,    Adjusted R-squared:  0.138 
F-statistic: 95.26 on 1 and 588 DF,  p-value: < 2.2e-16
augmented_data <- augment(model_1)

ggplot(augmented_data, aes(x = .fitted, 
                           y = .resid)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, color = "red") +
  labs(title = "Residual Plot for Danceability vs. Valence",
       x = "Fitted Values",
       y = "Residuals") +
  theme_minimal()

Fig.3 residual plot for Valence and danceability

Interpretation:

  • Given the p-value for the slope is less than 2e-16, which is much smaller than the significance level of 0.05, we reject the null hypothesis. This indicates that there is a significant relationship between valence and danceability. As valence increases, indicating higher levels of happiness or positivity, danceability also tends to increase across all categories.
  1. The residuals are randomly scattered around the horizontal line at 0, indicating that the linearity assumption is likely satisfied.
  2. The variance of the residuals appears to be constant across different levels of fitted values, suggesting that the homoscedasticity assumption is met.
  3. No pattern or shape is observed, supporting the assumption of constant variance.
  4. There are no extreme outliers or influential points that significantly deviate from the rest.

Overall, the residual plot suggests that the linear regression model is appropriate for analyzing the relationship between danceability and valence in the dataset.

Conclusion:

Overall, this project revealed significant differences in danceability between Bob Marley’s original tracks, remixes, and live recordings, with original tracks showing the highest danceability. Additionally, valence positively influences danceability across all categories. These findings provide valuable insights into the emotional and rhythmic elements of Bob Marley’s music, helping us understand how different interpretations and settings impact its appeal and emotional impact.

My guess is that Original tracks, produced in a controlled studio environment, tend to be more danceable and have a positive emotional tone. Remixes, while still danceable, could vary because of the production changes people made during the covers. Live recordings, capturing the raw essence of performances, show the most variability in danceability.

This shows how different settings and production choices influence the reception of music.