Hello, Welcome to my first Project

Essay Part A

In this coding project I will use the datset called “spotifysongs.csv”. This datset is found/sourced from Link to source This datset explores many different artists and their songs. It provides data about those artist by including genre, popularity, year, duration of the song and more! However, for my project I wanted to explore the top 5 artist for a particular year(2014). I was going to determine this by identifying their tops songs and how popular they were during that year. Meaning, the variables I will be using are: artist, song, year, and popularity. Popularity is a quantitative variable and is one I will be using to measure and rank these artist. To clean this datset, I first need to filter for the year 2014. By doing this, I now have all the data for only that year. Next, I need to find the top 5 artist for that year by measuring their popularity. This was achieved, by creating a new variable called top_artists. Which contains the group_by, summarize, and arrange function to group all the artist together and add all of their popularity points. After this, using the same functions I was able to find their top song and match them to that particular artist. As a result, I successfully cleaned my dataset by filtering only the most relevant data needed for me to embark on this journey to find the top 5 artist/songs in the year 2014. Please see all chunks with their specific clarification for more details.

Project Breakdown

In this project, I will examine the top 5 artist for a particular year and identify their most popular song.

What is my dataset?

Great question! My dataset is about Spotify songs! In this dataset I will focus on a few variables/columns. These include: year, artist, song, and popularity. My x variable will be songs. Y-axis will be popularity. And my legend will be the artist name that will be shown on the graph by color.

It’s time to load any library I’ll need

These just provide me with the tools/functions I’ll need.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tinytex)

I also need to set my working directory

This is because I downloaded my dataset. As a result, I need to tell Rstudio how to find that data and load it into the program.

setwd("/Users/mikea/Desktop/Data 110 ")
Spotify <- read_csv("spotifysongs.csv")

Ok now its time to pick a year, but first let’s see what years are included in the spotify dataset.

unique(Spotify$year)

##  [1] 2000 1999 2001 2011 2002 2016 1998 2018 2004 2010 2015 2006 2008 2019 2003
## [16] 2013 2005 2012 2020 2007 2009 2017 2014

Alright! I’ll have to choose the year 2014 for nostaglia purposes

Reason? This was my freshman year in High School and I wanted to find out what songs were most popular at the time and see if I also listened to those artists/songs. Plus, nostalgia.

Here I created a new variable called spotify_2014

The reason? I need to filter out the artist and other information to only that year. So, I used the filter function to look at the column “year” and return only the data that pertains to 2014.

spotify_2014 <- Spotify %>%
  filter(year == 2014)
spotify_2014

## # A tibble: 104 × 18
##    artist  song  duration_ms explicit  year popularity danceability energy   key
##    <chr>   <chr>       <dbl> <lgl>    <dbl>      <dbl>        <dbl>  <dbl> <dbl>
##  1 Azeali… "212"      204956 TRUE      2014          0        0.847  0.769    11
##  2 Klangk… "Son…      238120 FALSE     2014         68        0.579  0.549     5
##  3 Ellie … "Bur…      231211 FALSE     2014         70        0.559  0.777     1
##  4 Storm … "Loo…      150400 FALSE     2014          0        0.832  0.815     0
##  5 Pharre… "Hap…      232720 FALSE     2014         79        0.647  0.822     5
##  6 ScHool… "Col…      299960 TRUE      2014          0        0.826  0.571    11
##  7 Iggy A… "Fan…      199938 TRUE      2014         69        0.912  0.716    10
##  8 Maroon… "Ani…      231013 FALSE     2014         79        0.279  0.742     4
##  9 Sam Sm… "Sta…      172723 FALSE     2014         80        0.418  0.42      0
## 10 MAGIC!  "Rud…      224840 FALSE     2014         80        0.773  0.758     1
## # ℹ 94 more rows
## # ℹ 9 more variables: loudness <dbl>, mode <dbl>, speechiness <dbl>,
## #   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
## #   tempo <dbl>, genre <chr>

Ok now we have the artists for the year 2014. Now it’s time to filter the 5 most popular artist.

Why? Remember I’m looking for those top 5 artist and the variable spotify_2014 loaded in all the artists from 2014. I now created a new variable called top_artist. By using the group_by function it will group all the artist together and add all of their popularity points. After, it will return the 5 artist with the most popularity in a descending order. Ranking it from most popular to least in this 1-5 scale.

top_artists <- spotify_2014 %>%
  group_by(artist) %>%
  summarise(total_popularity = sum(popularity)) %>%
  arrange(desc(total_popularity)) %>%
  top_n(5)

## Selecting by total_popularity

top_artists

## # A tibble: 5 × 2
##   artist        total_popularity
##   <chr>                    <dbl>
## 1 Taylor Swift               431
## 2 Ariana Grande              300
## 3 Calvin Harris              246
## 4 Ed Sheeran                 222
## 5 Sam Smith                  221

Quick comment: OMG i love these artist.

Now It’s time to find the songs that made these artist the top 5

I’ll just create a new variable and filter for the songs that belong to those top 5 artist by referring to my previous variable top_artists and looking at the artist column.

filtered_songs <- spotify_2014 %>%
  filter(artist %in% top_artists$artist) %>%
  distinct(song)

filtered_songs

## # A tibble: 19 × 1
##    song                          
##    <chr>                         
##  1 Stay With Me                  
##  2 Bad Blood                     
##  3 Under Control (feat. Hurts)   
##  4 Blame (feat. John Newman)     
##  5 Summer                        
##  6 Break Free                    
##  7 Don't                         
##  8 Love Me Harder                
##  9 Style                         
## 10 Problem                       
## 11 Shake It Off                  
## 12 Sing                          
## 13 Money On My Mind              
## 14 Thinking out Loud             
## 15 Outside (feat. Ellie Goulding)
## 16 Blank Space                   
## 17 Like I Can                    
## 18 Wildest Dreams                
## 19 One Last Time

Now I want to find their top song by popurlaity and match that song to the artist.

In the previous variable it showed me all of the songs released by those artist in 2014. Now I need to find their TOP song. I created a new a variable top_songs_by_artist. I again need to filter for the same 5 artist using the top_artists variable. Next, group by artist and their songs. Then same as before, were arrange in a descending order showing the top songs by calculating the sum of their popularity. Lastly, well group by artist and choose their number one song and print.

This was the hardest part. Now the heavy lifting has been done and its time to plot!

top_songs_by_artist <- spotify_2014 %>%
  filter(artist %in% top_artists$artist) %>%
  group_by(artist, song) %>%
  summarise(total_popularity = sum(popularity)) %>%
  arrange(artist, desc(total_popularity)) %>%
  group_by(artist) %>%
  top_n(1)

## `summarise()` has grouped output by 'artist'. You can override using the
## `.groups` argument.
## Selecting by total_popularity

top_songs_by_artist

## # A tibble: 5 × 3
## # Groups:   artist [5]
##   artist        song                           total_popularity
##   <chr>         <chr>                                     <dbl>
## 1 Ariana Grande Love Me Harder                              148
## 2 Calvin Harris Outside (feat. Ellie Goulding)               78
## 3 Ed Sheeran    Thinking out Loud                            81
## 4 Sam Smith     Stay With Me                                 80
## 5 Taylor Swift  Style                                       138

Now lets plot!

Great! The heavy liftings been done. Now it’s just a matter of putting it all into a graph. First i used to ggplot function and called the top_songs_by_artist variable. Next using the aes() I put songs as my x-axis. total_popularity as my Y axis. And fill by artist.

The rest is just me fixing the aesthetic part by putting an x and y label. A title and fixing the proportions of the graph to fit the artist/songs names. This is to achieve a pleasing visualization to the eye.

ggplot(top_songs_by_artist, aes(x = song, y = total_popularity, fill = artist)) +
  geom_bar(stat = "identity") +
  scale_fill_discrete(name = "Artist Names") +
  xlab(" Top Songs of the year") +
  ylab("Popularity Ratings") +
  ggtitle("Top 5 Songs by Artist in the year 2014") +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 5)) +
  theme(legend.position = "bottom")

Essay Part B and C

This visualization shows accurate information on the top 5 artist and their songs in 2014. This visualization proves that the number 1 song was “Love Me Harder” by Ariana Grande. In 2nd Place, “style” by Taylor Swift. In 3rd place, “Thinking out loud” by Ed Sheeran. In 4th place was, “Stay With Me” by Sam Smith. Lastly, “Outside” by Calvin Harris. Thus, we have successfully completed our project goal by identifying the top artist and their most popular song in 2014 on Spotify. This was a very fun project to explore, however I think for my next project I would like the explore the genre aspect of spotify. I would like to compare the most popular genre types between 2000-2010. I think this would be a good project to do next because it would be interesting to see if there are any repeating genre types or drastic changes. My prediction is that I will see a spike in the rap genre, due to its increasing popularity among teens. These are some new things that I would like to include as I continue to explore the spotify dataset.

Spotify Project

Mike Alfaro

2023-06-15