Using three datasets identified in the discussion board items or your own dataset, create a .CSV file that includes all of the information in the dataset; read the information from your CSV file into R and use tidyr and dplyr as needed to to tidy and transform data; perform analysis, and present conclusions.
Anthony Fontano is a music critic who creates YouTube content which detail his reviews of music projects from all genres. His review scores have come under attack from the artists he reviews - most recently by Yasiin Bey of Hip Hop group Black Star. In this project, I will be looking at the ratings of music projects and seeing if he has any preferences on project types (album,mixtape,track), genres, or if his tastes are influenced by popularity (spotify ranking).
I will be utilizing an Anthony Fontano album review dataset from Kaggle. Using a random sample of 20 rows, I created the projects dataset. For the 2nd and 3rd dataset, I created seperate files which list the genre and spotify rank of the artists from the projects dataset.
library(tidyr)
library(dplyr)
data <- 'https://raw.githubusercontent.com/curiostegui/CUNY-SPS/main/Data%20607/Project%202/sample_reviews.csv'
projects <- read.csv(file = data, header = TRUE, sep = ",")
data2 <- 'https://raw.githubusercontent.com/curiostegui/CUNY-SPS/main/Data%20607/Project%202/genre_artist.csv'
genre <- read.csv(file = data2, header = TRUE, sep = ",")
data3 <- 'https://raw.githubusercontent.com/curiostegui/CUNY-SPS/main/Data%20607/Project%202/rank_artists.csv'
rank <- read.csv(file = data3, header = TRUE, sep = ",")
Looking at the projects dataset, we see three problems: The wide format, Null values, and an extra column that needs to be removed.
as_tibble(projects)
## # A tibble: 20 x 9
## X.1 X year tracks artist rating album mixtape track
## <int> <int> <int> <int> <chr> <int> <chr> <chr> <chr>
## 1 1 119 2014 3 weezer 7 <NA> <NA> memo~
## 2 2 259 2010 11 women 9 publ~ <NA> <NA>
## 3 3 260 2010 10 röyksopp 5 seni~ <NA> <NA>
## 4 4 712 2012 11 mac demarco 2 2 <NA> <NA>
## 5 5 825 2012 10 kindness 7 worl~ <NA> <NA>
## 6 6 1028 2013 11 bibio 7 silv~ <NA> <NA>
## 7 7 1346 2015 12 health 5 deat~ <NA> <NA>
## 8 8 1404 2015 11 erykah badu 6 <NA> but yo~ <NA>
## 9 9 1548 2022 1 m.i.a. 4 a.i.~ <NA> <NA>
## 10 10 1778 2017 11 gas 2 nark~ <NA> <NA>
## 11 11 1894 2018 23 lil wayne 6 tha ~ <NA> <NA>
## 12 12 1950 2018 16 trippie redd 7 <NA> a love~ <NA>
## 13 13 2026 2021 23 young thug 5 slim~ <NA> <NA>
## 14 14 2059 2018 16 blood orange 8 negr~ <NA> <NA>
## 15 15 2111 2019 12 doja cat 4 hot ~ <NA> <NA>
## 16 16 2334 2020 12 tame impala 6 the ~ <NA> <NA>
## 17 17 2553 2021 20 summer walker 7 stil~ <NA> <NA>
## 18 18 2595 2021 23 young stoner life, young~ 5 slim~ <NA> <NA>
## 19 19 2860 2022 12 ibibio sound machine 7 elec~ <NA> <NA>
## 20 20 3000 2022 15 aurora 6 the ~ <NA> <NA>
Below I use the gather function to join and create new columns that seperate objects by the project type and the name. I also remove the extra column through the select function.
projects <- gather(projects,"project_type","project_name",7:9)
projects <- projects %>%
filter(projects$project_name != "NA") %>%
select('X','year','tracks','artist','rating','project_type','project_name')
Here I will turn the wide format into a tidy dataframe.
as_tibble(genre[,1:9])
## # A tibble: 1 x 9
## ï.. weezer women rÃ.yksopp mac.demarco kindness bibio health erykah.badu
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Genre rock rock edm indie indie indie indie r&b
I utilize the gather and select function to tidy the data.
genre <- genre %>%
gather("artist","genre",2:21)
genre <- select(genre,"artist","genre")
Like the previous dataset, I will also tidy the spotify rank dataframe.
as_tibble(rank[,1:9])
## # A tibble: 1 x 9
## ï.. weezer women rÃ.yksopp mac.demarco kindness bibio health erykah.badu
## <chr> <int> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
## 1 Rank 459 NA NA NA NA NA NA NA
I utilize the gather and select function to tidy the data.
rank <- rank %>%
gather("artist","rank",2:21)
rank <- select(rank,"artist","rank")
I fixed a typo I spotted in one of the artist names for both the rank and genre dataset.
genre$artist[3] = "röyksopp"
rank$artist[3] = "röyksopp"
After tidying the data, I decided to join all the datasets so that I can conduct my analysis.
joined <- left_join(genre,rank,by='artist')
I observed a period in between some of the artist names. This can pose a problem when joining with the projects set, since the artist names in that set are separated by a space.
as_tibble(joined)
## # A tibble: 20 x 3
## artist genre rank
## <chr> <chr> <int>
## 1 weezer rock 459
## 2 women rock NA
## 3 röyksopp edm NA
## 4 mac.demarco indie NA
## 5 kindness indie NA
## 6 bibio indie NA
## 7 health indie NA
## 8 erykah.badu r&b NA
## 9 m.i.a. alternate NA
## 10 gas edm NA
## 11 lil.wayne rap 104
## 12 trippie.redd rap 239
## 13 young.thug rap 110
## 14 blood.orange alternate NA
## 15 doja.cat rap 11
## 16 tame.impala indie 177
## 17 summer.walker r&b 264
## 18 young.stoner.life..young.thug..gunna rap NA
## 19 ibibio.sound.machine electronic NA
## 20 aurora pop 490
Below I remove the period in between some of the artist names.
joined$artist[4] = "mac demarco"
joined$artist[8] = "erykah badu"
joined$artist[11] = "lil wayne"
joined$artist[12] = "trippie redd"
joined$artist[13] = "young thug"
joined$artist[14] = "blood orange"
joined$artist[15] = "doja cat"
joined$artist[16] = "tame impala"
joined$artist[17] = "summer walker"
joined$artist[18] = "young stoner life, young thug, gunna"
joined$artist[19] = "ibibio sound machine"
finalset <- left_join(projects,joined,by='artist')
as_tibble(finalset)
## # A tibble: 20 x 9
## X year tracks artist rating project_type project_name genre rank
## <int> <int> <int> <chr> <int> <chr> <chr> <chr> <int>
## 1 259 2010 11 women 9 album public stra~ rock NA
## 2 260 2010 10 röyksopp 5 album senior edm NA
## 3 712 2012 11 mac demarco 2 album 2 indie NA
## 4 825 2012 10 kindness 7 album world, you ~ indie NA
## 5 1028 2013 11 bibio 7 album silver wilk~ indie NA
## 6 1346 2015 12 health 5 album death magic indie NA
## 7 1548 2022 1 m.i.a. 4 album a.i.m. alte~ NA
## 8 1778 2017 11 gas 2 album narkopop edm NA
## 9 1894 2018 23 lil wayne 6 album tha carter v rap 104
## 10 2026 2021 23 young thug 5 album slime langu~ rap 110
## 11 2059 2018 16 blood orange 8 album negro swan alte~ NA
## 12 2111 2019 12 doja cat 4 album hot pink rap 11
## 13 2334 2020 12 tame impala 6 album the slow ru~ indie 177
## 14 2553 2021 20 summer walker 7 album still over ~ r&b 264
## 15 2595 2021 23 young stoner~ 5 album slime langu~ rap NA
## 16 2860 2022 12 ibibio sound~ 7 album electricity elec~ NA
## 17 3000 2022 15 aurora 6 album the gods we~ pop 490
## 18 1404 2015 11 erykah badu 6 mixtape but you cai~ r&b NA
## 19 1950 2018 16 trippie redd 7 mixtape a love lett~ rap 239
## 20 119 2014 3 weezer 7 track memories rock 459
In his top five highest reviews, we can observe that with the exception of Summer Walker, the other artists are not very popular, being that they don’t appear in Spotify’s top 500 streaming ranking. The genre of his top five reviews are rock,alternative,indie and R&B.
as_tibble(finalset) %>% arrange(desc(rating))
## # A tibble: 20 x 9
## X year tracks artist rating project_type project_name genre rank
## <int> <int> <int> <chr> <int> <chr> <chr> <chr> <int>
## 1 259 2010 11 women 9 album public stra~ rock NA
## 2 2059 2018 16 blood orange 8 album negro swan alte~ NA
## 3 825 2012 10 kindness 7 album world, you ~ indie NA
## 4 1028 2013 11 bibio 7 album silver wilk~ indie NA
## 5 2553 2021 20 summer walker 7 album still over ~ r&b 264
## 6 2860 2022 12 ibibio sound~ 7 album electricity elec~ NA
## 7 1950 2018 16 trippie redd 7 mixtape a love lett~ rap 239
## 8 119 2014 3 weezer 7 track memories rock 459
## 9 1894 2018 23 lil wayne 6 album tha carter v rap 104
## 10 2334 2020 12 tame impala 6 album the slow ru~ indie 177
## 11 3000 2022 15 aurora 6 album the gods we~ pop 490
## 12 1404 2015 11 erykah badu 6 mixtape but you cai~ r&b NA
## 13 260 2010 10 röyksopp 5 album senior edm NA
## 14 1346 2015 12 health 5 album death magic indie NA
## 15 2026 2021 23 young thug 5 album slime langu~ rap 110
## 16 2595 2021 23 young stoner~ 5 album slime langu~ rap NA
## 17 1548 2022 1 m.i.a. 4 album a.i.m. alte~ NA
## 18 2111 2019 12 doja cat 4 album hot pink rap 11
## 19 712 2012 11 mac demarco 2 album 2 indie NA
## 20 1778 2017 11 gas 2 album narkopop edm NA
The genre with the highest reviews is the rock genre. The lowest reviewed genre is EDM
as_tibble(finalset) %>% group_by(genre) %>%
summarise(mean_rating=mean(rating)) %>%
arrange(desc(mean_rating))
## # A tibble: 8 x 2
## genre mean_rating
## <chr> <dbl>
## 1 rock 8
## 2 electronic 7
## 3 r&b 6.5
## 4 alternate 6
## 5 pop 6
## 6 indie 5.4
## 7 rap 5.4
## 8 edm 3.5
The project type with the highest reviews is singles(tracks). Album projects had the lowest reviews.
as_tibble(finalset) %>% group_by(project_type) %>%
summarise(mean_rating=mean(rating)) %>%
arrange(desc(mean_rating))
## # A tibble: 3 x 2
## project_type mean_rating
## <chr> <dbl>
## 1 track 7
## 2 mixtape 6.5
## 3 album 5.59
We can observe that in this dataset, Anthony likes the rock genre; the artists that have the best reviews tend to not be on Spotify’s top streaming charts, and his favorite project type are individual tracks. It is important to not that this was a very small subset of over thousands of album reviews he has done.
Within this small dataset, we can see that there is an overwhelming number of mixtapes. We can also observe that his highest reviewed genre rock, only comprises 2 out of the 20 album reviews. The high reviews can skew our summary.
finalset %>%
count(project_type)
## project_type n
## 1 album 17
## 2 mixtape 2
## 3 track 1
finalset %>%
count(genre)
## genre n
## 1 alternate 2
## 2 edm 2
## 3 electronic 1
## 4 indie 5
## 5 pop 1
## 6 r&b 2
## 7 rap 5
## 8 rock 2
Because of the limitations listed above, we can not definitively state what his musical tastes are. If I had to conduct the study again, I would choose a larger dataset.