Task

Using three datasets identified in the discussion board items or your own dataset, create a .CSV file that includes all of the information in the dataset; read the information from your CSV file into R and use tidyr and dplyr as needed to to tidy and transform data; perform analysis, and present conclusions.

Introduction

Anthony Fontano is a music critic who creates YouTube content which detail his reviews of music projects from all genres. His review scores have come under attack from the artists he reviews - most recently by Yasiin Bey of Hip Hop group Black Star. In this project, I will be looking at the ratings of music projects and seeing if he has any preferences on project types (album,mixtape,track), genres, or if his tastes are influenced by popularity (spotify ranking).

Dataset

I will be utilizing an Anthony Fontano album review dataset from Kaggle. Using a random sample of 20 rows, I created the projects dataset. For the 2nd and 3rd dataset, I created seperate files which list the genre and spotify rank of the artists from the projects dataset.

Load library

library(tidyr)
library(dplyr)

Load datasets

data <- 'https://raw.githubusercontent.com/curiostegui/CUNY-SPS/main/Data%20607/Project%202/sample_reviews.csv'
projects <- read.csv(file = data, header = TRUE, sep = ",")

data2 <- 'https://raw.githubusercontent.com/curiostegui/CUNY-SPS/main/Data%20607/Project%202/genre_artist.csv'
genre <- read.csv(file = data2, header = TRUE, sep = ",")

data3 <- 'https://raw.githubusercontent.com/curiostegui/CUNY-SPS/main/Data%20607/Project%202/rank_artists.csv'
rank <- read.csv(file = data3, header = TRUE, sep = ",")

Tidy Data

Projects dataset

Looking at the projects dataset, we see three problems: The wide format, Null values, and an extra column that needs to be removed.

as_tibble(projects)
## # A tibble: 20 x 9
##      X.1     X  year tracks artist                    rating album mixtape track
##    <int> <int> <int>  <int> <chr>                      <int> <chr> <chr>   <chr>
##  1     1   119  2014      3 weezer                         7 <NA>  <NA>    memo~
##  2     2   259  2010     11 women                          9 publ~ <NA>    <NA> 
##  3     3   260  2010     10 röyksopp                       5 seni~ <NA>    <NA> 
##  4     4   712  2012     11 mac demarco                    2 2     <NA>    <NA> 
##  5     5   825  2012     10 kindness                       7 worl~ <NA>    <NA> 
##  6     6  1028  2013     11 bibio                          7 silv~ <NA>    <NA> 
##  7     7  1346  2015     12 health                         5 deat~ <NA>    <NA> 
##  8     8  1404  2015     11 erykah badu                    6 <NA>  but yo~ <NA> 
##  9     9  1548  2022      1 m.i.a.                         4 a.i.~ <NA>    <NA> 
## 10    10  1778  2017     11 gas                            2 nark~ <NA>    <NA> 
## 11    11  1894  2018     23 lil wayne                      6 tha ~ <NA>    <NA> 
## 12    12  1950  2018     16 trippie redd                   7 <NA>  a love~ <NA> 
## 13    13  2026  2021     23 young thug                     5 slim~ <NA>    <NA> 
## 14    14  2059  2018     16 blood orange                   8 negr~ <NA>    <NA> 
## 15    15  2111  2019     12 doja cat                       4 hot ~ <NA>    <NA> 
## 16    16  2334  2020     12 tame impala                    6 the ~ <NA>    <NA> 
## 17    17  2553  2021     20 summer walker                  7 stil~ <NA>    <NA> 
## 18    18  2595  2021     23 young stoner life, young~      5 slim~ <NA>    <NA> 
## 19    19  2860  2022     12 ibibio sound machine           7 elec~ <NA>    <NA> 
## 20    20  3000  2022     15 aurora                         6 the ~ <NA>    <NA>

Below I use the gather function to join and create new columns that seperate objects by the project type and the name. I also remove the extra column through the select function.

projects <- gather(projects,"project_type","project_name",7:9)
projects <- projects %>%
  filter(projects$project_name != "NA") %>%
  select('X','year','tracks','artist','rating','project_type','project_name')

Genre dataset

Here I will turn the wide format into a tidy dataframe.

as_tibble(genre[,1:9])
## # A tibble: 1 x 9
##   ï..   weezer women rÃ.yksopp mac.demarco kindness bibio health erykah.badu
##   <chr> <chr>  <chr> <chr>     <chr>       <chr>    <chr> <chr>  <chr>      
## 1 Genre rock   rock  edm       indie       indie    indie indie  r&b

I utilize the gather and select function to tidy the data.

genre <- genre %>%
  gather("artist","genre",2:21)
genre <- select(genre,"artist","genre")

Rank dataset

Like the previous dataset, I will also tidy the spotify rank dataframe.

as_tibble(rank[,1:9])
## # A tibble: 1 x 9
##   ï..   weezer women rÃ.yksopp mac.demarco kindness bibio health erykah.badu
##   <chr>  <int> <lgl> <lgl>     <lgl>       <lgl>    <lgl> <lgl>  <lgl>      
## 1 Rank     459 NA    NA        NA          NA       NA    NA     NA

I utilize the gather and select function to tidy the data.

rank <- rank %>%
  gather("artist","rank",2:21)
rank <- select(rank,"artist","rank")

Fix typos

I fixed a typo I spotted in one of the artist names for both the rank and genre dataset.

genre$artist[3] = "röyksopp"
rank$artist[3] = "röyksopp"

Join

After tidying the data, I decided to join all the datasets so that I can conduct my analysis.

joined <- left_join(genre,rank,by='artist')

I observed a period in between some of the artist names. This can pose a problem when joining with the projects set, since the artist names in that set are separated by a space.

as_tibble(joined)
## # A tibble: 20 x 3
##    artist                               genre       rank
##    <chr>                                <chr>      <int>
##  1 weezer                               rock         459
##  2 women                                rock          NA
##  3 röyksopp                             edm           NA
##  4 mac.demarco                          indie         NA
##  5 kindness                             indie         NA
##  6 bibio                                indie         NA
##  7 health                               indie         NA
##  8 erykah.badu                          r&b           NA
##  9 m.i.a.                               alternate     NA
## 10 gas                                  edm           NA
## 11 lil.wayne                            rap          104
## 12 trippie.redd                         rap          239
## 13 young.thug                           rap          110
## 14 blood.orange                         alternate     NA
## 15 doja.cat                             rap           11
## 16 tame.impala                          indie        177
## 17 summer.walker                        r&b          264
## 18 young.stoner.life..young.thug..gunna rap           NA
## 19 ibibio.sound.machine                 electronic    NA
## 20 aurora                               pop          490

Below I remove the period in between some of the artist names.

joined$artist[4] = "mac demarco"
joined$artist[8] = "erykah badu"
joined$artist[11] = "lil wayne"
joined$artist[12] = "trippie redd"
joined$artist[13] = "young thug"
joined$artist[14] = "blood orange"
joined$artist[15] = "doja cat"
joined$artist[16] = "tame impala"
joined$artist[17] = "summer walker"
joined$artist[18] = "young stoner life, young thug, gunna"
joined$artist[19] = "ibibio sound machine"

Final dataset

finalset <- left_join(projects,joined,by='artist')
as_tibble(finalset)
## # A tibble: 20 x 9
##        X  year tracks artist        rating project_type project_name genre  rank
##    <int> <int>  <int> <chr>          <int> <chr>        <chr>        <chr> <int>
##  1   259  2010     11 women              9 album        public stra~ rock     NA
##  2   260  2010     10 röyksopp           5 album        senior       edm      NA
##  3   712  2012     11 mac demarco        2 album        2            indie    NA
##  4   825  2012     10 kindness           7 album        world, you ~ indie    NA
##  5  1028  2013     11 bibio              7 album        silver wilk~ indie    NA
##  6  1346  2015     12 health             5 album        death magic  indie    NA
##  7  1548  2022      1 m.i.a.             4 album        a.i.m.       alte~    NA
##  8  1778  2017     11 gas                2 album        narkopop     edm      NA
##  9  1894  2018     23 lil wayne          6 album        tha carter v rap     104
## 10  2026  2021     23 young thug         5 album        slime langu~ rap     110
## 11  2059  2018     16 blood orange       8 album        negro swan   alte~    NA
## 12  2111  2019     12 doja cat           4 album        hot pink     rap      11
## 13  2334  2020     12 tame impala        6 album        the slow ru~ indie   177
## 14  2553  2021     20 summer walker      7 album        still over ~ r&b     264
## 15  2595  2021     23 young stoner~      5 album        slime langu~ rap      NA
## 16  2860  2022     12 ibibio sound~      7 album        electricity  elec~    NA
## 17  3000  2022     15 aurora             6 album        the gods we~ pop     490
## 18  1404  2015     11 erykah badu        6 mixtape      but you cai~ r&b      NA
## 19  1950  2018     16 trippie redd       7 mixtape      a love lett~ rap     239
## 20   119  2014      3 weezer             7 track        memories     rock    459

Analysis

In his top five highest reviews, we can observe that with the exception of Summer Walker, the other artists are not very popular, being that they don’t appear in Spotify’s top 500 streaming ranking. The genre of his top five reviews are rock,alternative,indie and R&B.

as_tibble(finalset) %>% arrange(desc(rating))
## # A tibble: 20 x 9
##        X  year tracks artist        rating project_type project_name genre  rank
##    <int> <int>  <int> <chr>          <int> <chr>        <chr>        <chr> <int>
##  1   259  2010     11 women              9 album        public stra~ rock     NA
##  2  2059  2018     16 blood orange       8 album        negro swan   alte~    NA
##  3   825  2012     10 kindness           7 album        world, you ~ indie    NA
##  4  1028  2013     11 bibio              7 album        silver wilk~ indie    NA
##  5  2553  2021     20 summer walker      7 album        still over ~ r&b     264
##  6  2860  2022     12 ibibio sound~      7 album        electricity  elec~    NA
##  7  1950  2018     16 trippie redd       7 mixtape      a love lett~ rap     239
##  8   119  2014      3 weezer             7 track        memories     rock    459
##  9  1894  2018     23 lil wayne          6 album        tha carter v rap     104
## 10  2334  2020     12 tame impala        6 album        the slow ru~ indie   177
## 11  3000  2022     15 aurora             6 album        the gods we~ pop     490
## 12  1404  2015     11 erykah badu        6 mixtape      but you cai~ r&b      NA
## 13   260  2010     10 röyksopp           5 album        senior       edm      NA
## 14  1346  2015     12 health             5 album        death magic  indie    NA
## 15  2026  2021     23 young thug         5 album        slime langu~ rap     110
## 16  2595  2021     23 young stoner~      5 album        slime langu~ rap      NA
## 17  1548  2022      1 m.i.a.             4 album        a.i.m.       alte~    NA
## 18  2111  2019     12 doja cat           4 album        hot pink     rap      11
## 19   712  2012     11 mac demarco        2 album        2            indie    NA
## 20  1778  2017     11 gas                2 album        narkopop     edm      NA

The genre with the highest reviews is the rock genre. The lowest reviewed genre is EDM

as_tibble(finalset) %>% group_by(genre) %>%
  summarise(mean_rating=mean(rating)) %>%
  arrange(desc(mean_rating))
## # A tibble: 8 x 2
##   genre      mean_rating
##   <chr>            <dbl>
## 1 rock               8  
## 2 electronic         7  
## 3 r&b                6.5
## 4 alternate          6  
## 5 pop                6  
## 6 indie              5.4
## 7 rap                5.4
## 8 edm                3.5

The project type with the highest reviews is singles(tracks). Album projects had the lowest reviews.

as_tibble(finalset) %>% group_by(project_type) %>%
  summarise(mean_rating=mean(rating)) %>%
  arrange(desc(mean_rating))
## # A tibble: 3 x 2
##   project_type mean_rating
##   <chr>              <dbl>
## 1 track               7   
## 2 mixtape             6.5 
## 3 album               5.59

Conclusion

We can observe that in this dataset, Anthony likes the rock genre; the artists that have the best reviews tend to not be on Spotify’s top streaming charts, and his favorite project type are individual tracks. It is important to not that this was a very small subset of over thousands of album reviews he has done.

Within this small dataset, we can see that there is an overwhelming number of mixtapes. We can also observe that his highest reviewed genre rock, only comprises 2 out of the 20 album reviews. The high reviews can skew our summary.

finalset %>%
  count(project_type) 
##   project_type  n
## 1        album 17
## 2      mixtape  2
## 3        track  1
finalset %>%
  count(genre)
##        genre n
## 1  alternate 2
## 2        edm 2
## 3 electronic 1
## 4      indie 5
## 5        pop 1
## 6        r&b 2
## 7        rap 5
## 8       rock 2

Because of the limitations listed above, we can not definitively state what his musical tastes are. If I had to conduct the study again, I would choose a larger dataset.