For this project, I used a library adapted from The Million Song Dataset. The Million Song Dataset is a “freely-available collection of audio features and metadata for a million contemporary popular music tracks”, provided by The Echo Nest. This adaption was developed for another project by Ryan Whitcomb. I wanted to work with it, as it already had a reduced number of observations from 1,000,000 to 10,000.
I want to investigate a couple of aspects of songs in comparison to their “song hotness” ranking. I decide to pose the question: How does duration and tempo contribute to a song’s hotness ranking? Did this change over time?
First, I imported data and stored it in the “music” variable, so it is easier to call. I set the argument stringsAsFactors = FALSE, so that it would import strings as characters.
music <-read.csv("music2.csv", stringsAsFactors = FALSE)
I will be using functions from the dplyr, tidyr, ggplot2, and gganimate libraries. It can be installed with install.packages() if it is not already on the device.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(gganimate)
I assume the imported data will be a data frame, as it contains data of different data types. I can confirm this by utilizing the class() function. I will also check the dimensions and structure to get a better overview of the data set.
#checking characteristics of the dataset
class(music)
## [1] "data.frame"
#checking the number of rows and columns
nrow(music)
## [1] 10000
ncol(music)
## [1] 35
#this can also be done all at once with dimension
dim(music)
## [1] 10000 35
I want to check the structure of the data frame to see the names of the columns and tables. I can also see the data type contained in each column.
#checking the structure
str(music)
## 'data.frame': 10000 obs. of 35 variables:
## $ artist.hotttnesss : num 0.402 0.417 0.343 0.454 0.402 ...
## $ artist.id : chr "ARD7TVE1187B99BFB1" "ARMJAGH1187FB546F3" "ARKRRTF1187B9984DA" "AR7G5I41187FB4CE6C" ...
## $ artist.name : chr "Casual" "The Box Tops" "Sonora Santanera" "Adam Ant" ...
## $ artist_mbtags : chr "" "classic pop and rock" "" "uk" ...
## $ artist_mbtags_count : num 0 1 0 1 0 0 0 0 0 0 ...
## $ bars_confidence : num 0.643 0.007 0.98 0.017 0.175 0.121 0.709 0.142 0.806 0.047 ...
## $ bars_start : num 0.585 0.711 0.732 1.306 1.064 ...
## $ beats_confidence : num 0.834 1 0.98 0.809 0.883 0.438 0.709 0.234 0.44 1 ...
## $ beats_start : num 0.585 0.206 0.732 0.81 0.136 ...
## $ duration : num 219 148 177 233 210 ...
## $ end_of_fade_in : num 0.247 0.148 0.282 0 0.066 ...
## $ familiarity : num 0.582 0.631 0.487 0.63 0.651 ...
## $ key : num 1 6 8 0 2 5 1 4 4 7 ...
## $ key_confidence : num 0.736 0.169 0.643 0.751 0.092 0.635 0 0 0.717 0.053 ...
## $ latitude : num 37.2 35.1 37.2 37.2 37.2 ...
## $ location : chr "California - LA" "Memphis, TN" "Not available" "London, England" ...
## $ longitude : num -63.9 -90 -63.9 -63.9 -63.9 ...
## $ loudness : num -11.2 -9.84 -9.69 -9.01 -4.5 ...
## $ mode : int 0 0 1 1 1 1 1 0 1 0 ...
## $ mode_confidence : num 0.636 0.43 0.565 0.749 0.371 0.557 0 0.16 0.652 0.473 ...
## $ release.id : int 300848 300822 514953 287650 611336 41838 25824 8876 358182 692313 ...
## $ release.name : chr "Fear Itself" "Dimensions" "Las Numero 1 De La Sonora Santanera" "Friend Or Foe" ...
## $ similar : chr "ARV4KO21187FB38008" "ARSZWK21187B9B26D7" "ARFSJUG11C8A421AAD" "AR4R0741187FB39AF2" ...
## $ song.hotttnesss : num 0.602 NA NA NA 0.605 ...
## $ song.id : chr "SOMZWCG12A8C13C480" "SOCIWDW12A8C13D406" "SOXVLOJ12AB0189215" "SONHOTT12A8C13493C" ...
## $ start_of_fade_out : num 219 138 172 217 199 ...
## $ tatums_confidence : num 0.779 0.969 0.482 0.601 1 0.136 0.467 0.292 0.121 1 ...
## $ tatums_start : num 0.285 0.206 0.421 0.563 0.136 ...
## $ tempo : num 92.2 121.3 100.1 119.3 129.7 ...
## $ terms : chr "hip hop" "blue-eyed soul" "salsa" "pop rock" ...
## $ terms_freq : num 1 1 1 0.989 0.887 ...
## $ time_signature : num 4 4 1 4 4 3 1 3 4 4 ...
## $ time_signature_confidence: num 0.778 0.384 0 0 0.562 0.454 0 0.408 0.487 0.878 ...
## $ title : chr "I Didn't Mean To" "Soul Deep" "Amor De Cabaret" "Something Girls" ...
## $ year : int 0 1969 0 1982 2007 0 0 0 1984 0 ...
I am interested in checking the head and tails of the data. I can set the second argument in these functions to 10 in order to see the first 10 observations of the data frame.
#Head of data
head(music, 10)
## artist.hotttnesss artist.id artist.name
## 1 0.4019975 ARD7TVE1187B99BFB1 Casual
## 2 0.4174996 ARMJAGH1187FB546F3 The Box Tops
## 3 0.3434284 ARKRRTF1187B9984DA Sonora Santanera
## 4 0.4542312 AR7G5I41187FB4CE6C Adam Ant
## 5 0.4017237 ARXR32B1187FB57099 Gob
## 6 0.3854706 ARKFYS91187B98E58F Jeff And Sheri Easter
## 7 0.2619412 ARD0S291187B9B7BF5 Rated R
## 8 0.6055071 AR10USD1187B99F3F1 Tweeterfriendly Music
## 9 0.3322757 AR8ZCNI1187B9A069B Planet P Project
## 10 0.4227056 ARNTLGG11E2835DDB9 Clp
## artist_mbtags artist_mbtags_count bars_confidence bars_start
## 1 0 0.643 0.58521
## 2 classic pop and rock 1 0.007 0.71054
## 3 0 0.980 0.73152
## 4 uk 1 0.017 1.30621
## 5 0 0.175 1.06368
## 6 0 0.121 1.17118
## 7 0 0.709 0.27253
## 8 0 0.142 0.65428
## 9 0 0.806 1.91886
## 10 0 0.047 0.62445
## beats_confidence beats_start duration end_of_fade_in familiarity key
## 1 0.834 0.58521 218.9318 0.247 0.5817938 1
## 2 1.000 0.20627 148.0355 0.148 0.6306300 6
## 3 0.980 0.73152 177.4755 0.282 0.4873568 8
## 4 0.809 0.81002 233.4036 0.000 0.6303823 0
## 5 0.883 0.13576 209.6061 0.066 0.6510457 2
## 6 0.438 0.74856 267.7024 2.264 0.5352927 5
## 7 0.709 0.27253 114.7816 0.096 0.5564956 1
## 8 0.234 0.65428 189.5702 0.319 0.8011364 4
## 9 0.440 1.22595 269.8183 5.300 0.4266679 4
## 10 1.000 0.09933 266.3963 0.084 0.5505137 7
## key_confidence latitude location longitude loudness
## 1 0.736 37.15736 California - LA -63.93336 -11.197
## 2 0.169 35.14968 Memphis, TN -90.04892 -9.843
## 3 0.643 37.15736 Not available -63.93336 -9.689
## 4 0.751 37.15736 London, England -63.93336 -9.013
## 5 0.092 37.15736 Not available -63.93336 -4.501
## 6 0.635 37.15736 Not available -63.93336 -9.323
## 7 0.000 37.15736 Ohio -63.93336 -17.302
## 8 0.000 37.15736 Burlington, Ontario, Canada -63.93336 -11.642
## 9 0.717 37.15736 Not available -63.93336 -13.496
## 10 0.053 37.15736 Not available -63.93336 -6.697
## mode mode_confidence release.id release.name
## 1 0 0.636 300848 Fear Itself
## 2 0 0.430 300822 Dimensions
## 3 1 0.565 514953 Las Numero 1 De La Sonora Santanera
## 4 1 0.749 287650 Friend Or Foe
## 5 1 0.371 611336 Muertos Vivos
## 6 1 0.557 41838 Ordinary Day
## 7 1 0.000 25824 Da Ghetto Psychic
## 8 0 0.160 8876 Gin & Phonic
## 9 1 0.652 358182 Pink World
## 10 0 0.473 692313 Superinstrumental
## similar song.hotttnesss song.id start_of_fade_out
## 1 ARV4KO21187FB38008 0.6021200 SOMZWCG12A8C13C480 218.932
## 2 ARSZWK21187B9B26D7 NA SOCIWDW12A8C13D406 137.915
## 3 ARFSJUG11C8A421AAD NA SOXVLOJ12AB0189215 172.304
## 4 AR4R0741187FB39AF2 NA SONHOTT12A8C13493C 217.124
## 5 ARUA62A1187B99D9B0 0.6045007 SOFSOCN12A8C143F5D 198.699
## 6 ARHNMEZ11F50C4706C NA SOYMRWW12A6D4FAB14 254.270
## 7 ARF93II1187B99F981 NA SOMJBYD12A6D4F8557 114.782
## 8 ARJXL4Z1187B9A5920 NA SOHKNRJ12A6701D1F8 181.023
## 9 ARWVP631187FB4D016 0.2658610 SOIAZJW12AB01853F1 258.990
## 10 ARAR1XA11C8A415BE5 NA SOUDSGM12AC9618304 261.747
## tatums_confidence tatums_start tempo terms terms_freq
## 1 0.779 0.28519 92.198 hip hop 1.0000000
## 2 0.969 0.20627 121.274 blue-eyed soul 1.0000000
## 3 0.482 0.42132 100.070 salsa 1.0000000
## 4 0.601 0.56254 119.293 pop rock 0.9885839
## 5 1.000 0.13576 129.738 pop punk 0.8872883
## 6 0.136 0.53929 147.782 southern gospel 1.0000000
## 7 0.467 0.05611 111.787 breakbeat 1.0000000
## 8 0.292 0.36129 101.430 post-hardcore 0.9998180
## 9 0.121 1.22595 86.643 new wave 0.9597662
## 10 1.000 0.09933 114.041 breakcore 0.9156017
## time_signature time_signature_confidence
## 1 4 0.778
## 2 4 0.384
## 3 1 0.000
## 4 4 0.000
## 5 4 0.562
## 6 3 0.454
## 7 1 0.000
## 8 3 0.408
## 9 4 0.487
## 10 4 0.878
## title year
## 1 I Didn't Mean To 0
## 2 Soul Deep 1969
## 3 Amor De Cabaret 0
## 4 Something Girls 1982
## 5 Face the Ashes 2007
## 6 The Moon And I (Ordinary Day Album Version) 0
## 7 Keepin It Real (Skit) 0
## 8 Drop of Rain 0
## 9 Pink World 1984
## 10 Insatiable (Instrumental Version) 0
#Tail of data
tail(music, 10)
## artist.hotttnesss artist.id artist.name
## 9991 0.36368784 ARUUP4L1187B9B72EB Diamanda Galas
## 9992 0.43094204 ARI4S0E1187B9B06C0 David Arkenstone
## 9993 0.04903438 AROIHOI122988FEB8E Mario Rosenstock
## 9994 0.33153475 ARQ91R31187FB38A88 Grandpa Jones
## 9995 0.40114474 ARDK0551187FB5AC48 Blind Willie Johnson
## 9996 0.49982631 AR4C6V01187FB3BAF4 Moonspell
## 9997 0.40977900 AR9JLBU1187B9AAEC4 Danny Williams
## 9998 0.28990293 ARS1DCR1187B9A4A56 Winston Reedy
## 9999 0.21682889 ARAGMIV11F4C843F78 Myrick "Freeze" Guillory
## 10000 0.50924310 ARYXOV81187B99831D Seventh Day Slumber
## artist_mbtags artist_mbtags_count bars_confidence bars_start
## 9991 avant-garde 2 0.140 0.53770
## 9992 filk 1 0.010 1.63812
## 9993 0 0.454 0.83502
## 9994 0 0.020 0.29732
## 9995 0 0.409 0.44729
## 9996 black metal 1 0.460 0.87991
## 9997 south african 1 0.103 1.63576
## 9998 0 0.003 0.78745
## 9999 0 0.542 0.28192
## 10000 0 0.054 1.20349
## beats_confidence beats_start duration end_of_fade_in familiarity key
## 9991 0.759 0.53770 162.9514 0.442 0.5982371 5
## 9992 0.345 1.63812 314.5922 3.895 0.6393965 9
## 9993 1.000 0.83502 187.1408 0.311 0.3345428 7
## 9994 0.741 0.29732 141.7399 0.491 0.4506460 2
## 9995 0.409 0.44729 172.2510 0.097 0.6073270 0
## 9996 0.719 0.44279 386.1938 0.177 0.7225706 7
## 9997 0.936 0.07692 168.0191 0.403 0.5116634 8
## 9998 1.000 0.27924 193.7236 0.173 0.4335076 1
## 9999 0.574 0.28192 300.8257 0.000 0.3344565 0
## 10000 0.749 0.40494 209.7367 0.125 0.6091820 2
## key_confidence latitude location longitude
## 9991 0.860 32.71568 San Diego, CA -117.16172
## 9992 0.658 35.83073 Tennessee -85.97874
## 9993 0.546 37.15736 Not available -63.93336
## 9994 0.101 37.82245 Niagra, KY -85.69091
## 9995 0.000 31.30627 Marlin, TX -96.89774
## 9996 0.374 39.55792 Portugal -7.84481
## 9997 0.223 -33.96243 Port Elizabeth, South Africa 25.62326
## 9998 0.931 37.15736 Not available -63.93336
## 9999 0.297 37.15736 Not available -63.93336
## 10000 0.315 37.15736 Not available -63.93336
## loudness mode mode_confidence release.id
## 9991 -12.673 1 0.562 171957
## 9992 -14.881 1 0.648 149791
## 9993 -8.065 1 0.630 637025
## 9994 -11.756 1 0.394 760072
## 9995 -16.159 1 0.200 597439
## 9996 -8.087 1 0.540 691752
## 9997 -14.517 1 0.398 41649
## 9998 -12.087 1 0.565 346402
## 9999 -12.574 1 0.503 86259
## 10000 -5.324 0 0.406 64501
## release.name similar song.hotttnesss
## 9991 The Sporting Life AR8W31W1187B9A6F5C 0.3231121
## 9992 Return Of The Guardians ARTMSZO1187B98F20E 0.3603706
## 9993 Gift Grub 10 AR48TAR1187B98DC55 NA
## 9994 The Unforgettable Grandpa Jones AR40YBH1187FB38A1A 0.0000000
## 9995 Praise God I'm Satisfied ARE36MM1187B991E50 NA
## 9996 Sin / Pecado ARW5R811187B98BC96 0.5940796
## 9997 Collection ARO59WV1187FB53267 0.3347065
## 9998 Reality ARUBX2Y1187B99CD25 NA
## 9999 Nouveau Zydeco ARGZCCX1187B98C5F6 0.0000000
## 10000 Once Upon A Shattered Life AR9RYZP1187FB36C6A 0.7803116
## song.id start_of_fade_out tatums_confidence tatums_start
## 9991 SOILDRV12A8C13EB77 154.024 0.291 0.35029
## 9992 SOBUUYV12A58A7DA27 296.153 0.147 1.63812
## 9993 SOJARSR12AB0184939 182.671 0.551 0.33701
## 9994 SOUWMIW12AB0184748 136.615 0.965 0.05143
## 9995 SOVMTAW12A8C13B071 167.184 0.509 0.14462
## 9996 SOLXXPY12A67ADABA0 386.194 0.224 0.22262
## 9997 SOAYONI12A6D4F85C8 163.463 0.604 0.07692
## 9998 SOJZLAJ12AB017E8A2 186.015 1.000 0.27924
## 9999 SORZSCJ12A8C132446 300.826 0.421 0.28192
## 10000 SOFAOMI12A6D4FA2D8 193.167 0.710 0.20535
## tempo terms terms_freq time_signature
## 9991 162.133 no wave 1.0000000 4
## 9992 141.975 celtic 0.9929859 4
## 9993 90.050 irish 0.9412616 4
## 9994 119.271 bluegrass 1.0000000 4
## 9995 95.677 texas blues 0.9723530 1
## 9996 140.185 sympho black metal 0.9997654 4
## 9997 77.072 ballad 0.9154172 3
## 9998 118.123 lovers rock 0.9601150 4
## 9999 137.663 zydeco 1.0000000 4
## 10000 150.575 christian rock 0.9262196 4
## time_signature_confidence title year
## 9991 0.668 Dark End Of The Street 0
## 9992 0.659 The Forgotten Lands 1996
## 9993 0.433 Munster Song (Best of 2009) 0
## 9994 0.150 Down In Dixie 0
## 9995 0.000 God Don't Never Change 1989
## 9996 0.099 The Hanged Man 1998
## 9997 0.597 The Wonderful World Of The Young 1998
## 9998 0.205 Sentimental Man 0
## 9999 0.000 Zydeco In D-Minor 0
## 10000 0.317 Shattered Life 2005
I decided to filter my data set by song hotness ranked greater than 0 and years greater than 0, as these values are not very useful to me. I also selected columns with variables that I am most interested in seeing how they may have influenced song hotness. This extraction improves the useaebility and readability of the data frame.
#creating the simple_music extraction
simple_music<-filter(music, song.hotttnesss > 0 & year > 0) %>% select(title, artist.name, year, song.hotttnesss, tempo, duration)
#checking the drugs extraction
str(simple_music)
## 'data.frame': 2712 obs. of 6 variables:
## $ title : chr "Face the Ashes" "Pink World" "Floating" "Caught In A Dream" ...
## $ artist.name : chr "Gob" "Planet P Project" "Blue Rodeo" "Tesla" ...
## $ year : int 2007 1984 1987 2004 2004 1985 1972 1964 2007 2003 ...
## $ song.hotttnesss: num 0.605 0.266 0.405 0.684 0.667 ...
## $ tempo : num 129.7 86.6 119.8 150.1 166.9 ...
## $ duration : num 210 270 491 290 208 ...
head(simple_music)
## title artist.name year
## 1 Face the Ashes Gob 2007
## 2 Pink World Planet P Project 1984
## 3 Floating Blue Rodeo 1987
## 4 Caught In A Dream Tesla 2004
## 5 Setting Fire to Sleeping Giants The Dillinger Escape Plan 2004
## 6 James (Hold The Ladder Steady) SUE THOMPSON 1985
## song.hotttnesss tempo duration
## 1 0.6045007 129.738 209.6061
## 2 0.2658610 86.643 269.8183
## 3 0.4051157 119.826 491.1277
## 4 0.6841362 150.062 290.2983
## 5 0.6665278 166.862 207.7775
## 6 0.4952936 137.522 124.8649
tail(simple_music)
## title artist.name year
## 2707 Don't Let Her Pull You Down New Found Glory 2009
## 2708 Harboring An Apparition Mouth Of The Architect 2006
## 2709 The Forgotten Lands David Arkenstone 1996
## 2710 The Hanged Man Moonspell 1998
## 2711 The Wonderful World Of The Young Danny Williams 1998
## 2712 Shattered Life Seventh Day Slumber 2005
## song.hotttnesss tempo duration
## 2707 0.6631940 192.070 208.2738
## 2708 0.6909710 124.562 475.2191
## 2709 0.3603706 141.975 314.5922
## 2710 0.5940796 140.185 386.1938
## 2711 0.3347065 77.072 168.0191
## 2712 0.7803116 150.575 209.7367
I think that it may be easier to understand the duration variable if I convert it from seconds to minutes, so I created a new column and added it to the data frame.
simple_music2<- simple_music %>% mutate(duration.mins = duration/60)
head(simple_music2)
## title artist.name year
## 1 Face the Ashes Gob 2007
## 2 Pink World Planet P Project 1984
## 3 Floating Blue Rodeo 1987
## 4 Caught In A Dream Tesla 2004
## 5 Setting Fire to Sleeping Giants The Dillinger Escape Plan 2004
## 6 James (Hold The Ladder Steady) SUE THOMPSON 1985
## song.hotttnesss tempo duration duration.mins
## 1 0.6045007 129.738 209.6061 3.493435
## 2 0.2658610 86.643 269.8183 4.496972
## 3 0.4051157 119.826 491.1277 8.185462
## 4 0.6841362 150.062 290.2983 4.838305
## 5 0.6665278 166.862 207.7775 3.462959
## 6 0.4952936 137.522 124.8649 2.081081
How many artists are represented in this data set?
#number of unique artists
select(simple_music2, artist.name) %>% unique %>% nrow
## [1] 1535
How many years are represented in this data set?
#number of unique artists
select(simple_music2, year) %>% unique %>% nrow
## [1] 54
Which years?
#all years represented in data set
unique(simple_music2$year)
## [1] 2007 1984 1987 2004 1985 1972 1964 2003 1995 2006 1974 2009 1997 1986
## [15] 1966 1992 1990 2000 2005 2002 2001 1991 1994 1971 1999 1988 2010 1989
## [29] 1998 1983 1982 2008 1993 1963 1996 1980 1975 1973 1977 1967 1969 1981
## [43] 1954 1962 1976 1968 1979 1956 1978 1970 1959 1960 1965 1961
According to an analysis of the same data for the InfoVis Final Project by Tom Englehardt, average song duration has increased between 1960 and 2010. We can quickly glance at the summary statistics of each of these years and see if this is the case.
#checking the summary statistics of 1960
y1960<-filter(simple_music2, year==1960)
summary(y1960)
## title artist.name year song.hotttnesss
## Length:4 Length:4 Min. :1960 Min. :0.2441
## Class :character Class :character 1st Qu.:1960 1st Qu.:0.3885
## Mode :character Mode :character Median :1960 Median :0.5011
## Mean :1960 Mean :0.4841
## 3rd Qu.:1960 3rd Qu.:0.5967
## Max. :1960 Max. :0.6903
## tempo duration duration.mins
## Min. : 87.81 Min. :143.4 Min. :2.389
## 1st Qu.: 88.80 1st Qu.:155.2 1st Qu.:2.587
## Median : 98.81 Median :171.9 Median :2.865
## Mean :102.06 Mean :168.9 Mean :2.814
## 3rd Qu.:112.07 3rd Qu.:185.6 3rd Qu.:3.093
## Max. :122.80 Max. :188.3 Max. :3.138
#checking the summary statistics of 2010
y2010<-filter(simple_music2, year==2010)
summary(y2010)
## title artist.name year song.hotttnesss
## Length:51 Length:51 Min. :2010 Min. :0.2490
## Class :character Class :character 1st Qu.:2010 1st Qu.:0.4843
## Mode :character Mode :character Median :2010 Median :0.6112
## Mean :2010 Mean :0.5965
## 3rd Qu.:2010 3rd Qu.:0.6814
## Max. :2010 Max. :1.0000
## tempo duration duration.mins
## Min. : 56.18 Min. : 46.21 Min. :0.7702
## 1st Qu.: 96.75 1st Qu.:201.69 1st Qu.:3.3615
## Median :120.08 Median :234.27 Median :3.9044
## Mean :120.63 Mean :235.86 Mean :3.9311
## 3rd Qu.:141.24 3rd Qu.:264.87 3rd Qu.:4.4145
## Max. :200.00 Max. :550.97 Max. :9.1829
The average duration of a song in 1960 was 2.814 minutes and in 2010 it was 3.9311 minutes. It appears that it did indeed increase!
I would like to take a glance at the highest ranking songs by song hotness. I can do so by using the arrange() function.
ranking <- arrange(simple_music2, desc(song.hotttnesss))
head(ranking, 10)
## title
## 1 Nothin' On You [feat. Bruno Mars] (Album Version)
## 2 Immigrant Song (Album Version)
## 3 If Today Was Your Last Day (Album Version)
## 4 Harder To Breathe
## 5 Blue Orchid
## 6 Just Say Yes
## 7 They Reminisce Over You (Single Version)
## 8 Inertiatic Esp
## 9 The Loco-Motion
## 10 Innocence
## artist.name year song.hotttnesss tempo duration
## 1 B.o.B 2010 1.0000000 104.038 269.6355
## 2 Led Zeppelin 1970 1.0000000 150.569 145.0575
## 3 Nickelback 2008 0.9843467 89.958 248.0583
## 4 Maroon 5 2002 0.9798372 149.917 173.6616
## 5 The White Stripes 2005 0.9723869 152.785 160.3130
## 6 Snow Patrol 2009 0.9459947 107.998 284.1073
## 7 Pete Rock & C.L. Smooth 1992 0.9322742 101.779 286.6934
## 8 The Mars Volta 2003 0.9286168 112.486 266.4746
## 9 Little Eva 1962 0.9283671 129.781 145.1620
## 10 Avril Lavigne 2007 0.9271325 138.364 232.2020
## duration.mins
## 1 4.493924
## 2 2.417625
## 3 4.134305
## 4 2.894360
## 5 2.671884
## 6 4.735122
## 7 4.778224
## 8 4.441244
## 9 2.419367
## 10 3.870033
tail(ranking, 10)
## title
## 2703 Got You Moving
## 2704 Wonderland
## 2705 Awake (Album Version)
## 2706 The Homecoming Song (Album Version)
## 2707 Willie And The Pig
## 2708 Ammunition
## 2709 Tiempos Dificiles
## 2710 Sambangole / Tres Golpes Na' Mas
## 2711 Sudanese Dance
## 2712 Pinned To The Ground (album version)
## artist.name year song.hotttnesss tempo
## 2703 DJ DLG feat. MC Flipside 2006 0.2120454 128.007
## 2704 Robin Fox 2001 0.2065508 140.021
## 2705 Onesidezero 2001 0.2063969 86.844
## 2706 Owsley 1999 0.2037120 83.689
## 2707 The Grease Band 1971 0.2028319 174.960
## 2708 The Jane Shermans 2008 0.2022170 140.143
## 2709 Juan Carlos Baglietto 1982 0.2013874 97.342
## 2710 Colombiafrica - The Mystic Orchestra 2007 0.1996828 186.121
## 2711 Xcultures 2000 0.1992383 125.013
## 2712 Buzzhorn 2002 0.1938578 144.836
## duration duration.mins
## 2703 377.6257 6.293761
## 2704 515.6567 8.594278
## 2705 228.8061 3.813435
## 2706 179.6436 2.994061
## 2707 254.8502 4.247503
## 2708 235.4934 3.924890
## 2709 232.6199 3.876999
## 2710 318.7979 5.313299
## 2711 273.5277 4.558795
## 2712 239.9865 3.999775
We can visualize this data with a histogram. I lowered the binwidth to 0.01 so we can see more of the data. I went ahead and added color to the histogram so that it would be easier to read.
#creating and printing the hist for song hotness
hist_hotness<- simple_music2 %>% ggplot(aes(song.hotttnesss)) +
geom_histogram(binwidth=0.01, fill=I("blue"), col=I("pink"), alpha=.6) +
labs(x = "Song Hotness Score", y = "Count")
hist_hotness
It appears to be close to the normal distribution, but has several outliers. If it were to emulate the normal distribution, than I would expect that the mean, median, and mode are equal. We can check this with the summary() function.
summary(simple_music2$song.hotttnesss)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1939 0.3760 0.5034 0.5021 0.6236 1.0000
Let’s check the mode!
#creating the mode function
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
getmode(simple_music2$song.hotttnesss)
## [1] 0.265861
The mean for song hotness is 0.5021, the median is 0.5034, and the mode is 0.2659. Thus, it is clear that it varies from the normal distribution.
We can visualize this summary information with a boxplot.
#creating and printing the boxplot for song hotness
boxplot(simple_music2$song.hotttnesss, col = "blue")
A good way to see the visual affect of song duration on song hotness rankings for songs is to utilize a scatterplot.
with(simple_music2, plot(duration.mins, song.hotttnesss, xlab="Duration (mins)", ylab="Song Hotness"))
The package ggplot2 can give us more sophisticated graphs for looking at how duration of a song impacted song hotness. I set the color to blue and the opacity to .4 so we can see the points better.
ggplot(data = simple_music2) +
geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss), col=I("blue"), alpha=.4) +
labs(title = "Song Hotness vs Duration",
x = "Duration (mins)", y = "Song Hottness")
It looks like shorter songs tend to have higher song hotness rankings! And in general, there is a tendency of songs to have a duration that is below 6 minutes. Now let’s see what happens when we add a third dimension of tempo in BPM using the size aesthetic.
ggplot(data = simple_music2) +
geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss, size=tempo), col=I("blue"), alpha=.4) +
labs(title = "Song Hotness vs Duration",
x = "Duration (mins)", y = "Song Hottness")
Wow, it appears that a majority of songs with higher song hotness rankings had faster tempos overall! In fact, the majority of the data had tempos over 100 BPM.
Another interesting variable to consider is looking at how duration of songs changed over time. I set the points to red and the opacity to .4 so that we can see it better in contrast with the regression line, created by the geom_smooth() function.
ggplot(data = simple_music2) +
geom_point(mapping = aes(x = year, y = duration.mins), col=I("red"), alpha=.4) +
geom_smooth(mapping = aes(x = year, y = duration.mins)) +
labs(title = "Song Duration Over Time",
x = "Year", y = "Duration (mins)")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
It looks like the duration of songs has remained relatively static over time, with overall more data existing in the data frame for later years.
Another way to look at this data is to split the plot into facets. We can now see the individual affects of song duration on song hotness in each year, with tempo being represented by size again. Pretty neat! Although a little hard to read.
ggplot(data = simple_music2) +
geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss, size=tempo), col=I("blue"), alpha=.4) +
facet_wrap(~ year, ncol = 10)
We can recreate this affect with an animation to show how the plots changed over time.
#static plot
p <- ggplot(simple_music2) +
geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss, size=tempo), col=I("blue"), alpha=.4) +
labs(x = "Duration (mins)", y = "Song Hotness")
p
#transition plot
p + transition_time(year) +
labs(title = "Year: {frame_time}")