ISCI 301 Presentation

Katelyn Peroni, Caitlin, and Aidan

Introduction

We hypothesize that songs with higher energy levels and danceability levels will be more popular, as measured by their rankings on the Billboard charts.

Dataset Overview

The dataset used in this analysis consists of approximately 600 songs that were among the top songs of the year from 2010 to 2019. It includes features such as:

Song Name
Artist
Release Year
Energy
Popularity
Danceability

Data Analysis Method

r # Load necessary libraries library(dplyr) library(ggplot2) library(readr) # Loading our dataset from the Downloads folder music_data <- read_csv("/Users/katelynperoni/Downloads/spotify_top_music.csv") # Viewing the first few rows of the dataset head(music_data) # Checking the structure of the dataset str(music_data) # Selecting relevant columns selected_data <- music_data %>% select(nrgy, pop) # Calculating correlation between energy and popularity correlation <- cor(selected_data$nrgy, selected_data$pop, use = "complete.obs") print(paste("Correlation between energy and popularity:", correlation))' # Creating a scatter plot to visualize the relationship ggplot(selected_data, aes(x = nrgy, y = pop)) + geom_point(alpha = 0.6) + labs(title = "Energy vs. Popularity", x = "Energy Level", y = "Popularity Rank") + theme_minimal() + geom_smooth(method = "lm", color = "#1ED760") # Adds a linear regression line' # Selecting relevant columns for danceability selected_data <- music_data %>% select(dnce, pop) # Calculate correlation between danceability and popularity correlation <- cor(selected_data$dnce, selected_data$pop, use = "complete.obs") print(paste("Correlation between danceability and popularity:", correlation))# Create a scatter plot to visualize the relationship ggplot(selected_data, aes(x = dnce, y = pop)) + geom_point(alpha = 0.6) + labs(title = "Danceability vs. Popularity", x = "Danceability Level", y = "Popularity Rank") + theme_minimal() + geom_smooth(method = "lm", color = "#1ED760") # Adds a linear regression line # Count occurrences of each artist artist_counts <- music_data %>% group_by(artist) %>% summarise(frequency = n()) %>% arrange(desc(frequency)) library(wordcloud2) # Creating and displaying the word cloud wordcloud2(data = artist_counts, size = 1.5, color = rep(c("#1DB954", "gray", "#FFFFFF", "#282828", "#1ED760"), length.out = nrow(artist_counts)), backgroundColor = "#191414", shape = "circle")# Creating a bar graph of the top 20 artists by song count artist_song_counts <- music_data %>% group_by(artist) %>% summarise(song_count = n()) %>% arrange(desc(song_count)) %>% slice_head(n = 20) ggplot(data = artist_song_counts, aes(x = reorder(artist, -song_count), y = song_count)) + geom_bar(stat = "identity", fill = "#1DB954") + # Spotify green color labs(title = "Top 20 Artists by Number of Songs", x = "Artist", y = "Number of Songs") + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Findings

We did not find a strong correlation between energy levels and popularity like we had hoped.

There was a slight positive correlation between danceability levels and popularity, but not as strong as we had thought.

Discussion

The analysis does not indicate a correlation between energy and popularity, but that does not necessary mean they are not related. The analysis did indicate a slight positive correlation between danceability and popularity. This means that as danceability increased, the songs popularity also increased. In order for us to determine if our hypothesis was true we would have to look at a larger more diverse set of data. It is also essential to consider that popularity can be influenced by various factors, including marketing, artist visibility, and cultural trends, which may not be captured solely by energy or danceability levels.

Limitations and Future Directions

Limitations: -

Dataset Size: While the dataset contains around 600 songs, a more extensive dataset could provide a more comprehensive view.
Causation vs. Correlation: The correlation observed does not imply causation; other factors may influence popularity.
Using biased data: Our dataset only included popular songs from the Billboard charts, so we were analyzing popularity among already successful tracks. Our results may have been biased, because we didn’t consider any less popular songs for comparison.

Future Directions:- Future research could explore other musical features, such as tempo, loudness, or genre, to provide a more distinct understanding of what contributes to a song’s success.