Introduction

For this project, I used a library adapted from The Million Song Dataset. The Million Song Dataset is a “freely-available collection of audio features and metadata for a million contemporary popular music tracks”, provided by The Echo Nest. This adaption was developed for another project by Ryan Whitcomb. I wanted to work with it, as it already had a reduced number of observations from 1,000,000 to 10,000.

1. Formulating the Question

I want to investigate a couple of aspects of songs in comparison to their “song hotness” ranking. I decide to pose the question: How does duration and tempo contribute to a song’s hotness ranking? Did this change over time?

2. Reading the Data

First, I imported data and stored it in the “music” variable, so it is easier to call. I set the argument stringsAsFactors = FALSE, so that it would import strings as characters.

music <-read.csv("music2.csv", stringsAsFactors = FALSE)

I will be using functions from the dplyr, tidyr, ggplot2, and gganimate libraries. It can be installed with install.packages() if it is not already on the device.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(gganimate)

3. Checking the Packaging

I assume the imported data will be a data frame, as it contains data of different data types. I can confirm this by utilizing the class() function. I will also check the dimensions and structure to get a better overview of the data set.

#checking characteristics of the dataset
class(music)
## [1] "data.frame"
#checking the number of rows and columns
nrow(music)
## [1] 10000
ncol(music)
## [1] 35
#this can also be done all at once with dimension
dim(music)
## [1] 10000    35

4. Observing the Structure

I want to check the structure of the data frame to see the names of the columns and tables. I can also see the data type contained in each column.

#checking the structure
str(music)
## 'data.frame':    10000 obs. of  35 variables:
##  $ artist.hotttnesss        : num  0.402 0.417 0.343 0.454 0.402 ...
##  $ artist.id                : chr  "ARD7TVE1187B99BFB1" "ARMJAGH1187FB546F3" "ARKRRTF1187B9984DA" "AR7G5I41187FB4CE6C" ...
##  $ artist.name              : chr  "Casual" "The Box Tops" "Sonora Santanera" "Adam Ant" ...
##  $ artist_mbtags            : chr  "" "classic pop and rock" "" "uk" ...
##  $ artist_mbtags_count      : num  0 1 0 1 0 0 0 0 0 0 ...
##  $ bars_confidence          : num  0.643 0.007 0.98 0.017 0.175 0.121 0.709 0.142 0.806 0.047 ...
##  $ bars_start               : num  0.585 0.711 0.732 1.306 1.064 ...
##  $ beats_confidence         : num  0.834 1 0.98 0.809 0.883 0.438 0.709 0.234 0.44 1 ...
##  $ beats_start              : num  0.585 0.206 0.732 0.81 0.136 ...
##  $ duration                 : num  219 148 177 233 210 ...
##  $ end_of_fade_in           : num  0.247 0.148 0.282 0 0.066 ...
##  $ familiarity              : num  0.582 0.631 0.487 0.63 0.651 ...
##  $ key                      : num  1 6 8 0 2 5 1 4 4 7 ...
##  $ key_confidence           : num  0.736 0.169 0.643 0.751 0.092 0.635 0 0 0.717 0.053 ...
##  $ latitude                 : num  37.2 35.1 37.2 37.2 37.2 ...
##  $ location                 : chr  "California - LA" "Memphis, TN" "Not available" "London, England" ...
##  $ longitude                : num  -63.9 -90 -63.9 -63.9 -63.9 ...
##  $ loudness                 : num  -11.2 -9.84 -9.69 -9.01 -4.5 ...
##  $ mode                     : int  0 0 1 1 1 1 1 0 1 0 ...
##  $ mode_confidence          : num  0.636 0.43 0.565 0.749 0.371 0.557 0 0.16 0.652 0.473 ...
##  $ release.id               : int  300848 300822 514953 287650 611336 41838 25824 8876 358182 692313 ...
##  $ release.name             : chr  "Fear Itself" "Dimensions" "Las Numero 1 De La Sonora Santanera" "Friend Or Foe" ...
##  $ similar                  : chr  "ARV4KO21187FB38008" "ARSZWK21187B9B26D7" "ARFSJUG11C8A421AAD" "AR4R0741187FB39AF2" ...
##  $ song.hotttnesss          : num  0.602 NA NA NA 0.605 ...
##  $ song.id                  : chr  "SOMZWCG12A8C13C480" "SOCIWDW12A8C13D406" "SOXVLOJ12AB0189215" "SONHOTT12A8C13493C" ...
##  $ start_of_fade_out        : num  219 138 172 217 199 ...
##  $ tatums_confidence        : num  0.779 0.969 0.482 0.601 1 0.136 0.467 0.292 0.121 1 ...
##  $ tatums_start             : num  0.285 0.206 0.421 0.563 0.136 ...
##  $ tempo                    : num  92.2 121.3 100.1 119.3 129.7 ...
##  $ terms                    : chr  "hip hop" "blue-eyed soul" "salsa" "pop rock" ...
##  $ terms_freq               : num  1 1 1 0.989 0.887 ...
##  $ time_signature           : num  4 4 1 4 4 3 1 3 4 4 ...
##  $ time_signature_confidence: num  0.778 0.384 0 0 0.562 0.454 0 0.408 0.487 0.878 ...
##  $ title                    : chr  "I Didn't Mean To" "Soul Deep" "Amor De Cabaret" "Something Girls" ...
##  $ year                     : int  0 1969 0 1982 2007 0 0 0 1984 0 ...

5. Checking the Head and Tail

I am interested in checking the head and tails of the data. I can set the second argument in these functions to 10 in order to see the first 10 observations of the data frame.

#Head of data
head(music, 10)
##    artist.hotttnesss          artist.id           artist.name
## 1          0.4019975 ARD7TVE1187B99BFB1                Casual
## 2          0.4174996 ARMJAGH1187FB546F3          The Box Tops
## 3          0.3434284 ARKRRTF1187B9984DA      Sonora Santanera
## 4          0.4542312 AR7G5I41187FB4CE6C              Adam Ant
## 5          0.4017237 ARXR32B1187FB57099                   Gob
## 6          0.3854706 ARKFYS91187B98E58F Jeff And Sheri Easter
## 7          0.2619412 ARD0S291187B9B7BF5               Rated R
## 8          0.6055071 AR10USD1187B99F3F1 Tweeterfriendly Music
## 9          0.3322757 AR8ZCNI1187B9A069B      Planet P Project
## 10         0.4227056 ARNTLGG11E2835DDB9                   Clp
##           artist_mbtags artist_mbtags_count bars_confidence bars_start
## 1                                         0           0.643    0.58521
## 2  classic pop and rock                   1           0.007    0.71054
## 3                                         0           0.980    0.73152
## 4                    uk                   1           0.017    1.30621
## 5                                         0           0.175    1.06368
## 6                                         0           0.121    1.17118
## 7                                         0           0.709    0.27253
## 8                                         0           0.142    0.65428
## 9                                         0           0.806    1.91886
## 10                                        0           0.047    0.62445
##    beats_confidence beats_start duration end_of_fade_in familiarity key
## 1             0.834     0.58521 218.9318          0.247   0.5817938   1
## 2             1.000     0.20627 148.0355          0.148   0.6306300   6
## 3             0.980     0.73152 177.4755          0.282   0.4873568   8
## 4             0.809     0.81002 233.4036          0.000   0.6303823   0
## 5             0.883     0.13576 209.6061          0.066   0.6510457   2
## 6             0.438     0.74856 267.7024          2.264   0.5352927   5
## 7             0.709     0.27253 114.7816          0.096   0.5564956   1
## 8             0.234     0.65428 189.5702          0.319   0.8011364   4
## 9             0.440     1.22595 269.8183          5.300   0.4266679   4
## 10            1.000     0.09933 266.3963          0.084   0.5505137   7
##    key_confidence latitude                    location longitude loudness
## 1           0.736 37.15736             California - LA -63.93336  -11.197
## 2           0.169 35.14968                 Memphis, TN -90.04892   -9.843
## 3           0.643 37.15736               Not available -63.93336   -9.689
## 4           0.751 37.15736             London, England -63.93336   -9.013
## 5           0.092 37.15736               Not available -63.93336   -4.501
## 6           0.635 37.15736               Not available -63.93336   -9.323
## 7           0.000 37.15736                        Ohio -63.93336  -17.302
## 8           0.000 37.15736 Burlington, Ontario, Canada -63.93336  -11.642
## 9           0.717 37.15736               Not available -63.93336  -13.496
## 10          0.053 37.15736               Not available -63.93336   -6.697
##    mode mode_confidence release.id                        release.name
## 1     0           0.636     300848                         Fear Itself
## 2     0           0.430     300822                          Dimensions
## 3     1           0.565     514953 Las Numero 1 De La Sonora Santanera
## 4     1           0.749     287650                       Friend Or Foe
## 5     1           0.371     611336                       Muertos Vivos
## 6     1           0.557      41838                        Ordinary Day
## 7     1           0.000      25824                   Da Ghetto Psychic
## 8     0           0.160       8876                        Gin & Phonic
## 9     1           0.652     358182                          Pink World
## 10    0           0.473     692313                   Superinstrumental
##               similar song.hotttnesss            song.id start_of_fade_out
## 1  ARV4KO21187FB38008       0.6021200 SOMZWCG12A8C13C480           218.932
## 2  ARSZWK21187B9B26D7              NA SOCIWDW12A8C13D406           137.915
## 3  ARFSJUG11C8A421AAD              NA SOXVLOJ12AB0189215           172.304
## 4  AR4R0741187FB39AF2              NA SONHOTT12A8C13493C           217.124
## 5  ARUA62A1187B99D9B0       0.6045007 SOFSOCN12A8C143F5D           198.699
## 6  ARHNMEZ11F50C4706C              NA SOYMRWW12A6D4FAB14           254.270
## 7  ARF93II1187B99F981              NA SOMJBYD12A6D4F8557           114.782
## 8  ARJXL4Z1187B9A5920              NA SOHKNRJ12A6701D1F8           181.023
## 9  ARWVP631187FB4D016       0.2658610 SOIAZJW12AB01853F1           258.990
## 10 ARAR1XA11C8A415BE5              NA SOUDSGM12AC9618304           261.747
##    tatums_confidence tatums_start   tempo           terms terms_freq
## 1              0.779      0.28519  92.198         hip hop  1.0000000
## 2              0.969      0.20627 121.274  blue-eyed soul  1.0000000
## 3              0.482      0.42132 100.070           salsa  1.0000000
## 4              0.601      0.56254 119.293        pop rock  0.9885839
## 5              1.000      0.13576 129.738        pop punk  0.8872883
## 6              0.136      0.53929 147.782 southern gospel  1.0000000
## 7              0.467      0.05611 111.787       breakbeat  1.0000000
## 8              0.292      0.36129 101.430   post-hardcore  0.9998180
## 9              0.121      1.22595  86.643        new wave  0.9597662
## 10             1.000      0.09933 114.041       breakcore  0.9156017
##    time_signature time_signature_confidence
## 1               4                     0.778
## 2               4                     0.384
## 3               1                     0.000
## 4               4                     0.000
## 5               4                     0.562
## 6               3                     0.454
## 7               1                     0.000
## 8               3                     0.408
## 9               4                     0.487
## 10              4                     0.878
##                                          title year
## 1                             I Didn't Mean To    0
## 2                                    Soul Deep 1969
## 3                              Amor De Cabaret    0
## 4                              Something Girls 1982
## 5                               Face the Ashes 2007
## 6  The Moon And I (Ordinary Day Album Version)    0
## 7                        Keepin It Real (Skit)    0
## 8                                 Drop of Rain    0
## 9                                   Pink World 1984
## 10           Insatiable (Instrumental Version)    0
#Tail of data
tail(music, 10)
##       artist.hotttnesss          artist.id              artist.name
## 9991         0.36368784 ARUUP4L1187B9B72EB           Diamanda Galas
## 9992         0.43094204 ARI4S0E1187B9B06C0         David Arkenstone
## 9993         0.04903438 AROIHOI122988FEB8E         Mario Rosenstock
## 9994         0.33153475 ARQ91R31187FB38A88            Grandpa Jones
## 9995         0.40114474 ARDK0551187FB5AC48     Blind Willie Johnson
## 9996         0.49982631 AR4C6V01187FB3BAF4                Moonspell
## 9997         0.40977900 AR9JLBU1187B9AAEC4           Danny Williams
## 9998         0.28990293 ARS1DCR1187B9A4A56            Winston Reedy
## 9999         0.21682889 ARAGMIV11F4C843F78 Myrick "Freeze" Guillory
## 10000        0.50924310 ARYXOV81187B99831D      Seventh Day Slumber
##       artist_mbtags artist_mbtags_count bars_confidence bars_start
## 9991    avant-garde                   2           0.140    0.53770
## 9992           filk                   1           0.010    1.63812
## 9993                                  0           0.454    0.83502
## 9994                                  0           0.020    0.29732
## 9995                                  0           0.409    0.44729
## 9996    black metal                   1           0.460    0.87991
## 9997  south african                   1           0.103    1.63576
## 9998                                  0           0.003    0.78745
## 9999                                  0           0.542    0.28192
## 10000                                 0           0.054    1.20349
##       beats_confidence beats_start duration end_of_fade_in familiarity key
## 9991             0.759     0.53770 162.9514          0.442   0.5982371   5
## 9992             0.345     1.63812 314.5922          3.895   0.6393965   9
## 9993             1.000     0.83502 187.1408          0.311   0.3345428   7
## 9994             0.741     0.29732 141.7399          0.491   0.4506460   2
## 9995             0.409     0.44729 172.2510          0.097   0.6073270   0
## 9996             0.719     0.44279 386.1938          0.177   0.7225706   7
## 9997             0.936     0.07692 168.0191          0.403   0.5116634   8
## 9998             1.000     0.27924 193.7236          0.173   0.4335076   1
## 9999             0.574     0.28192 300.8257          0.000   0.3344565   0
## 10000            0.749     0.40494 209.7367          0.125   0.6091820   2
##       key_confidence  latitude                     location  longitude
## 9991           0.860  32.71568                San Diego, CA -117.16172
## 9992           0.658  35.83073                    Tennessee  -85.97874
## 9993           0.546  37.15736                Not available  -63.93336
## 9994           0.101  37.82245                   Niagra, KY  -85.69091
## 9995           0.000  31.30627                   Marlin, TX  -96.89774
## 9996           0.374  39.55792                     Portugal   -7.84481
## 9997           0.223 -33.96243 Port Elizabeth, South Africa   25.62326
## 9998           0.931  37.15736                Not available  -63.93336
## 9999           0.297  37.15736                Not available  -63.93336
## 10000          0.315  37.15736                Not available  -63.93336
##       loudness mode mode_confidence release.id
## 9991   -12.673    1           0.562     171957
## 9992   -14.881    1           0.648     149791
## 9993    -8.065    1           0.630     637025
## 9994   -11.756    1           0.394     760072
## 9995   -16.159    1           0.200     597439
## 9996    -8.087    1           0.540     691752
## 9997   -14.517    1           0.398      41649
## 9998   -12.087    1           0.565     346402
## 9999   -12.574    1           0.503      86259
## 10000   -5.324    0           0.406      64501
##                          release.name            similar song.hotttnesss
## 9991                The Sporting Life AR8W31W1187B9A6F5C       0.3231121
## 9992          Return Of The Guardians ARTMSZO1187B98F20E       0.3603706
## 9993                     Gift Grub 10 AR48TAR1187B98DC55              NA
## 9994  The Unforgettable Grandpa Jones AR40YBH1187FB38A1A       0.0000000
## 9995         Praise God I'm Satisfied ARE36MM1187B991E50              NA
## 9996                     Sin / Pecado ARW5R811187B98BC96       0.5940796
## 9997                       Collection ARO59WV1187FB53267       0.3347065
## 9998                          Reality ARUBX2Y1187B99CD25              NA
## 9999                   Nouveau Zydeco ARGZCCX1187B98C5F6       0.0000000
## 10000      Once Upon A Shattered Life AR9RYZP1187FB36C6A       0.7803116
##                  song.id start_of_fade_out tatums_confidence tatums_start
## 9991  SOILDRV12A8C13EB77           154.024             0.291      0.35029
## 9992  SOBUUYV12A58A7DA27           296.153             0.147      1.63812
## 9993  SOJARSR12AB0184939           182.671             0.551      0.33701
## 9994  SOUWMIW12AB0184748           136.615             0.965      0.05143
## 9995  SOVMTAW12A8C13B071           167.184             0.509      0.14462
## 9996  SOLXXPY12A67ADABA0           386.194             0.224      0.22262
## 9997  SOAYONI12A6D4F85C8           163.463             0.604      0.07692
## 9998  SOJZLAJ12AB017E8A2           186.015             1.000      0.27924
## 9999  SORZSCJ12A8C132446           300.826             0.421      0.28192
## 10000 SOFAOMI12A6D4FA2D8           193.167             0.710      0.20535
##         tempo              terms terms_freq time_signature
## 9991  162.133            no wave  1.0000000              4
## 9992  141.975             celtic  0.9929859              4
## 9993   90.050              irish  0.9412616              4
## 9994  119.271          bluegrass  1.0000000              4
## 9995   95.677        texas blues  0.9723530              1
## 9996  140.185 sympho black metal  0.9997654              4
## 9997   77.072             ballad  0.9154172              3
## 9998  118.123        lovers rock  0.9601150              4
## 9999  137.663             zydeco  1.0000000              4
## 10000 150.575     christian rock  0.9262196              4
##       time_signature_confidence                            title year
## 9991                      0.668           Dark End Of The Street    0
## 9992                      0.659              The Forgotten Lands 1996
## 9993                      0.433      Munster Song (Best of 2009)    0
## 9994                      0.150                    Down In Dixie    0
## 9995                      0.000           God Don't Never Change 1989
## 9996                      0.099                   The Hanged Man 1998
## 9997                      0.597 The Wonderful World Of The Young 1998
## 9998                      0.205                  Sentimental Man    0
## 9999                      0.000                Zydeco In D-Minor    0
## 10000                     0.317                   Shattered Life 2005

Selecting Specific Columns

I decided to filter my data set by song hotness ranked greater than 0 and years greater than 0, as these values are not very useful to me. I also selected columns with variables that I am most interested in seeing how they may have influenced song hotness. This extraction improves the useaebility and readability of the data frame.

#creating the simple_music extraction
simple_music<-filter(music, song.hotttnesss > 0 & year > 0) %>% select(title, artist.name, year, song.hotttnesss, tempo, duration)

#checking the drugs extraction
str(simple_music)
## 'data.frame':    2712 obs. of  6 variables:
##  $ title          : chr  "Face the Ashes" "Pink World" "Floating" "Caught In A Dream" ...
##  $ artist.name    : chr  "Gob" "Planet P Project" "Blue Rodeo" "Tesla" ...
##  $ year           : int  2007 1984 1987 2004 2004 1985 1972 1964 2007 2003 ...
##  $ song.hotttnesss: num  0.605 0.266 0.405 0.684 0.667 ...
##  $ tempo          : num  129.7 86.6 119.8 150.1 166.9 ...
##  $ duration       : num  210 270 491 290 208 ...
head(simple_music)
##                             title               artist.name year
## 1                  Face the Ashes                       Gob 2007
## 2                      Pink World          Planet P Project 1984
## 3                        Floating                Blue Rodeo 1987
## 4               Caught In A Dream                     Tesla 2004
## 5 Setting Fire to Sleeping Giants The Dillinger Escape Plan 2004
## 6  James (Hold The Ladder Steady)              SUE THOMPSON 1985
##   song.hotttnesss   tempo duration
## 1       0.6045007 129.738 209.6061
## 2       0.2658610  86.643 269.8183
## 3       0.4051157 119.826 491.1277
## 4       0.6841362 150.062 290.2983
## 5       0.6665278 166.862 207.7775
## 6       0.4952936 137.522 124.8649
tail(simple_music)
##                                 title            artist.name year
## 2707      Don't Let Her Pull You Down        New Found Glory 2009
## 2708          Harboring An Apparition Mouth Of The Architect 2006
## 2709              The Forgotten Lands       David Arkenstone 1996
## 2710                   The Hanged Man              Moonspell 1998
## 2711 The Wonderful World Of The Young         Danny Williams 1998
## 2712                   Shattered Life    Seventh Day Slumber 2005
##      song.hotttnesss   tempo duration
## 2707       0.6631940 192.070 208.2738
## 2708       0.6909710 124.562 475.2191
## 2709       0.3603706 141.975 314.5922
## 2710       0.5940796 140.185 386.1938
## 2711       0.3347065  77.072 168.0191
## 2712       0.7803116 150.575 209.7367

Converting Duration into Minutes

I think that it may be easier to understand the duration variable if I convert it from seconds to minutes, so I created a new column and added it to the data frame.

simple_music2<- simple_music %>% mutate(duration.mins = duration/60)
head(simple_music2)
##                             title               artist.name year
## 1                  Face the Ashes                       Gob 2007
## 2                      Pink World          Planet P Project 1984
## 3                        Floating                Blue Rodeo 1987
## 4               Caught In A Dream                     Tesla 2004
## 5 Setting Fire to Sleeping Giants The Dillinger Escape Plan 2004
## 6  James (Hold The Ladder Steady)              SUE THOMPSON 1985
##   song.hotttnesss   tempo duration duration.mins
## 1       0.6045007 129.738 209.6061      3.493435
## 2       0.2658610  86.643 269.8183      4.496972
## 3       0.4051157 119.826 491.1277      8.185462
## 4       0.6841362 150.062 290.2983      4.838305
## 5       0.6665278 166.862 207.7775      3.462959
## 6       0.4952936 137.522 124.8649      2.081081

6. Checking the Count

How many artists are represented in this data set?

#number of unique artists
select(simple_music2, artist.name) %>% unique %>% nrow 
## [1] 1535

How many years are represented in this data set?

#number of unique artists
select(simple_music2, year) %>% unique %>% nrow 
## [1] 54

Which years?

#all years represented in data set
unique(simple_music2$year)
##  [1] 2007 1984 1987 2004 1985 1972 1964 2003 1995 2006 1974 2009 1997 1986
## [15] 1966 1992 1990 2000 2005 2002 2001 1991 1994 1971 1999 1988 2010 1989
## [29] 1998 1983 1982 2008 1993 1963 1996 1980 1975 1973 1977 1967 1969 1981
## [43] 1954 1962 1976 1968 1979 1956 1978 1970 1959 1960 1965 1961

7. Validating the Data

According to an analysis of the same data for the InfoVis Final Project by Tom Englehardt, average song duration has increased between 1960 and 2010. We can quickly glance at the summary statistics of each of these years and see if this is the case.

#checking the summary statistics of 1960
y1960<-filter(simple_music2, year==1960)
summary(y1960)
##     title           artist.name             year      song.hotttnesss 
##  Length:4           Length:4           Min.   :1960   Min.   :0.2441  
##  Class :character   Class :character   1st Qu.:1960   1st Qu.:0.3885  
##  Mode  :character   Mode  :character   Median :1960   Median :0.5011  
##                                        Mean   :1960   Mean   :0.4841  
##                                        3rd Qu.:1960   3rd Qu.:0.5967  
##                                        Max.   :1960   Max.   :0.6903  
##      tempo           duration     duration.mins  
##  Min.   : 87.81   Min.   :143.4   Min.   :2.389  
##  1st Qu.: 88.80   1st Qu.:155.2   1st Qu.:2.587  
##  Median : 98.81   Median :171.9   Median :2.865  
##  Mean   :102.06   Mean   :168.9   Mean   :2.814  
##  3rd Qu.:112.07   3rd Qu.:185.6   3rd Qu.:3.093  
##  Max.   :122.80   Max.   :188.3   Max.   :3.138
#checking the summary statistics of 2010
y2010<-filter(simple_music2, year==2010)
summary(y2010)
##     title           artist.name             year      song.hotttnesss 
##  Length:51          Length:51          Min.   :2010   Min.   :0.2490  
##  Class :character   Class :character   1st Qu.:2010   1st Qu.:0.4843  
##  Mode  :character   Mode  :character   Median :2010   Median :0.6112  
##                                        Mean   :2010   Mean   :0.5965  
##                                        3rd Qu.:2010   3rd Qu.:0.6814  
##                                        Max.   :2010   Max.   :1.0000  
##      tempo           duration      duration.mins   
##  Min.   : 56.18   Min.   : 46.21   Min.   :0.7702  
##  1st Qu.: 96.75   1st Qu.:201.69   1st Qu.:3.3615  
##  Median :120.08   Median :234.27   Median :3.9044  
##  Mean   :120.63   Mean   :235.86   Mean   :3.9311  
##  3rd Qu.:141.24   3rd Qu.:264.87   3rd Qu.:4.4145  
##  Max.   :200.00   Max.   :550.97   Max.   :9.1829

The average duration of a song in 1960 was 2.814 minutes and in 2010 it was 3.9311 minutes. It appears that it did indeed increase!

8. Try the easy solution first

I would like to take a glance at the highest ranking songs by song hotness. I can do so by using the arrange() function.

ranking <- arrange(simple_music2, desc(song.hotttnesss))
head(ranking, 10)
##                                                title
## 1  Nothin' On You [feat. Bruno Mars] (Album Version)
## 2                     Immigrant Song (Album Version)
## 3         If Today Was Your Last Day (Album Version)
## 4                                  Harder To Breathe
## 5                                        Blue Orchid
## 6                                       Just Say Yes
## 7           They Reminisce Over You (Single Version)
## 8                                     Inertiatic Esp
## 9                                    The Loco-Motion
## 10                                         Innocence
##                artist.name year song.hotttnesss   tempo duration
## 1                    B.o.B 2010       1.0000000 104.038 269.6355
## 2             Led Zeppelin 1970       1.0000000 150.569 145.0575
## 3               Nickelback 2008       0.9843467  89.958 248.0583
## 4                 Maroon 5 2002       0.9798372 149.917 173.6616
## 5        The White Stripes 2005       0.9723869 152.785 160.3130
## 6              Snow Patrol 2009       0.9459947 107.998 284.1073
## 7  Pete Rock & C.L. Smooth 1992       0.9322742 101.779 286.6934
## 8           The Mars Volta 2003       0.9286168 112.486 266.4746
## 9               Little Eva 1962       0.9283671 129.781 145.1620
## 10           Avril Lavigne 2007       0.9271325 138.364 232.2020
##    duration.mins
## 1       4.493924
## 2       2.417625
## 3       4.134305
## 4       2.894360
## 5       2.671884
## 6       4.735122
## 7       4.778224
## 8       4.441244
## 9       2.419367
## 10      3.870033
tail(ranking, 10)
##                                     title
## 2703                       Got You Moving
## 2704                           Wonderland
## 2705                Awake (Album Version)
## 2706  The Homecoming Song (Album Version)
## 2707                   Willie And The Pig
## 2708                           Ammunition
## 2709                    Tiempos Dificiles
## 2710     Sambangole / Tres Golpes Na' Mas
## 2711                       Sudanese Dance
## 2712 Pinned To The Ground (album version)
##                               artist.name year song.hotttnesss   tempo
## 2703             DJ DLG feat. MC Flipside 2006       0.2120454 128.007
## 2704                            Robin Fox 2001       0.2065508 140.021
## 2705                          Onesidezero 2001       0.2063969  86.844
## 2706                               Owsley 1999       0.2037120  83.689
## 2707                      The Grease Band 1971       0.2028319 174.960
## 2708                    The Jane Shermans 2008       0.2022170 140.143
## 2709                Juan Carlos Baglietto 1982       0.2013874  97.342
## 2710 Colombiafrica - The Mystic Orchestra 2007       0.1996828 186.121
## 2711                            Xcultures 2000       0.1992383 125.013
## 2712                             Buzzhorn 2002       0.1938578 144.836
##      duration duration.mins
## 2703 377.6257      6.293761
## 2704 515.6567      8.594278
## 2705 228.8061      3.813435
## 2706 179.6436      2.994061
## 2707 254.8502      4.247503
## 2708 235.4934      3.924890
## 2709 232.6199      3.876999
## 2710 318.7979      5.313299
## 2711 273.5277      4.558795
## 2712 239.9865      3.999775

Histogram Visualization: Song Hotness

We can visualize this data with a histogram. I lowered the binwidth to 0.01 so we can see more of the data. I went ahead and added color to the histogram so that it would be easier to read.

#creating and printing the hist for song hotness
hist_hotness<- simple_music2 %>% ggplot(aes(song.hotttnesss)) +
  geom_histogram(binwidth=0.01, fill=I("blue"), col=I("pink"), alpha=.6) +
  labs(x = "Song Hotness Score", y = "Count")
hist_hotness

It appears to be close to the normal distribution, but has several outliers. If it were to emulate the normal distribution, than I would expect that the mean, median, and mode are equal. We can check this with the summary() function.

summary(simple_music2$song.hotttnesss)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1939  0.3760  0.5034  0.5021  0.6236  1.0000

Let’s check the mode!

#creating the mode function
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
getmode(simple_music2$song.hotttnesss)
## [1] 0.265861

The mean for song hotness is 0.5021, the median is 0.5034, and the mode is 0.2659. Thus, it is clear that it varies from the normal distribution.

Boxplot Visualization: Song Hotness

We can visualize this summary information with a boxplot.

#creating and printing the boxplot for song hotness
boxplot(simple_music2$song.hotttnesss, col = "blue")

Scatterplot Visualization: Song Duration Affect on Song Hotness

A good way to see the visual affect of song duration on song hotness rankings for songs is to utilize a scatterplot.

with(simple_music2, plot(duration.mins, song.hotttnesss, xlab="Duration (mins)", ylab="Song Hotness"))

9. Challenge your solution

Using ggplot2 to Recreate the Scatterplot

The package ggplot2 can give us more sophisticated graphs for looking at how duration of a song impacted song hotness. I set the color to blue and the opacity to .4 so we can see the points better.

ggplot(data = simple_music2) +
 geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss), col=I("blue"), alpha=.4) +
  labs(title = "Song Hotness vs Duration", 
       x = "Duration (mins)", y = "Song Hottness")

Adding the Tempo Dimension

It looks like shorter songs tend to have higher song hotness rankings! And in general, there is a tendency of songs to have a duration that is below 6 minutes. Now let’s see what happens when we add a third dimension of tempo in BPM using the size aesthetic.

ggplot(data = simple_music2) +
 geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss, size=tempo), col=I("blue"), alpha=.4) +
  labs(title = "Song Hotness vs Duration", 
       x = "Duration (mins)", y = "Song Hottness")

Wow, it appears that a majority of songs with higher song hotness rankings had faster tempos overall! In fact, the majority of the data had tempos over 100 BPM.

Scatterplot Visualization: Song Duration Over Time

Another interesting variable to consider is looking at how duration of songs changed over time. I set the points to red and the opacity to .4 so that we can see it better in contrast with the regression line, created by the geom_smooth() function.

ggplot(data = simple_music2) +
 geom_point(mapping = aes(x = year, y = duration.mins), col=I("red"), alpha=.4) +
 geom_smooth(mapping = aes(x = year, y = duration.mins)) + 
  labs(title = "Song Duration Over Time", 
       x = "Year", y = "Duration (mins)")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

It looks like the duration of songs has remained relatively static over time, with overall more data existing in the data frame for later years.

10. Follow up

Another way to look at this data is to split the plot into facets. We can now see the individual affects of song duration on song hotness in each year, with tempo being represented by size again. Pretty neat! Although a little hard to read.

ggplot(data = simple_music2) + 
  geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss, size=tempo), col=I("blue"), alpha=.4) + 
  facet_wrap(~ year, ncol = 10)

We can recreate this affect with an animation to show how the plots changed over time.

#static plot
p <- ggplot(simple_music2) +  
  geom_point(mapping = aes(x = duration.mins, y = song.hotttnesss, size=tempo), col=I("blue"), alpha=.4) + 
  labs(x = "Duration (mins)", y = "Song Hotness")
p

#transition plot
p + transition_time(year) +
  labs(title = "Year: {frame_time}")