A Data Analysis and Visualization of Music Chart History

STAT 545A - Homework 6
Christian Birch Okkels
October 21, 2013

Introduction

Here, we perform a data analysis and visualization of the music chart history. The dataset to be investigated comes from the so-called Whitburn Project - a huge undertaking by music enthusiasts to preserve and share high-quality recordings of popular songs since the 1890s. The project has spawned a vast spreadsheet with data about almost 40,000 songs (as of 2013) that have been hits on the Billboard Chart since 1890. The dataset contains more than a hundred columns of raw data, with everything from song lengths, artists, songwriters, albums, labels, peak positions, number of weeks on the charts, and even week-by-week chart position, and so much more. This presents a ton of different options for data analysis and visualization; thus, with the relatively small time allotted for this assignment, we will probably just scratch the surface of the true range of possibilities. Although various versions of the dataset exist, none of them are easy to come by. However, we have been lucky to dig one up from the mound of information that is the Internet. Of course, the dataset does come with plenty of shortcomings related to missing data for certain variables. As a result, it needs cleaning and preparation before we can go toe-to-toe with the fun stuff. This cleaning procedure will be described more thoroughly later. For now, let us have a look at some of what the dataset has to offer.

Description of Dataset

As mentioned above, the dataset contains countless columns of raw data for each of the many observations. Some of the more interesting columns (for our case) are described in the table below. As some of the original variable names can be difficult to decipher, the variable names below are those specified in the cleaned version of the data.

Variable name Description
Year Year in which the single first hit its highest weekly position.
Yearly.Rank Yearly rankings. Formula: Highest position / Number of weeks at highest / Number of weeks in Top 10/40/100.
Prefix Year and Rank combined for sorting purposes.
nWeeksChart Number of weeks the single charted.
nWeeksChartTop40 Number of weeks the single charted at 40 or below.
nWeeksChartTop10 Number of weeks the single charted at 10 or below.
nWeeksChartPeak Number of weeks the single charted at its highest position.
High The peak position of the single.
Artist The artist.
Artist.Inverted The artist (Last name, First name).
Featured Featured artist(s).
UnFeatured Additional artists not listed as featured.
Album Title of the album that the single originally came from.
Track Title of the song.
Time Length of the song.
Artist.ID A number ID to distinguish artists.
Label.Number Name of the label.
Genre Type of music.
Written.By The writers of the song.
ScorePoints Scoring system where points are given for every week on the charts, and by how high it charted each week; 100 points for no. 1, 99 for no. 2, etc.
Data.Entered The month/day/year the song first hit the charts.
Date.Peaked The month/day/year the song first hit its highest peak position.
X1st.Week - X66th.Week Chart ranking history: “1st Week” is the ranking of the song upon entering the chart; “2nd Week” is the ranking position of the song in the second week (if it's still on the chart); etc.

Data Cleaning and Preparation

A separate R script, data_cleanPrepare.R has been coded to perform the initial data cleaning and preparation.

First, the script loads some necessary R packages: lattice, plyr, and xtable:

library(lattice)
library(plyr)
library(xtable)

It then proceeds to load, or source, two functions from separate scripts:

source("func_timeToSec.R")
source("func_htmlPrint.R")

The first script, func_timeToSec.R, contains a function that converts time formats of “hh:mm:ss” into seconds. This conversion is critical, since we need numeric or integer values to compute certain statistical properties, etc. The second script, func_htmlPrint.R, holds a function to print data.frames as HTML tables.
Now, the data cleaning script then loads the raw data from the text file charts.txt:

charts_orig = read.delim("charts.txt")
# str(charts_orig) # basic sanity check.

The second line performs a basic sanity check. It has been out-commented since the output is very space-consuming.
Next, we start to cut to the bone by keeping only the most interesting columns:

charts <- subset(charts_orig, select = c(Year, Yearly.Rank, Prefix, CH, X40, 
    X10, PK, High, Artist, Featured, Album, Track, Time, Genre, Temp.1, Date.Entered, 
    Date.Peaked))

All of the many other columns are thereby excluded in the new data.frame charts. Now, as seen in the code above, some variables have rather odd names; e.g. CH, X40, etc. We therefore rename them to something more meaningful:

charts <- rename(charts, c(CH = "nWeeksChart", X40 = "nWeeksChartTop40", X10 = "nWeeksChartTop10", 
    PK = "nWeeksChartPeak", Temp.1 = "ScorePoints"))

Many observations are missing data for certain variables. This is particularly so for older songs from the 1940s and earlier. Therefore, we keep only the data for the years 1950-2013:

charts <- subset(charts, Year > 1949)

Now, the current column Time contains song lengths in the format “mm:ss”. For many purposes, we would like to work with a more manageable format. We thus create a new column, Time.num, which contains the song length in seconds, and then add it to our data.frame. This is where we use the aforementioned function in the script func_timeToSec.R.

charts$Time.num <- sapply(charts$Time, func_timeToSec)

As mentioned, some observations are missing data for certain variables–even after we cut away the older data. The main variable that we consider here is the new Time.num. Therefore, we find all blank entries and replace them with NAs, after which we eliminate all observations with NA:

is.na(charts$Time) <- which(charts$Time == "")
charts <- subset(charts, !is.na(charts$Time))

The nWeeksChart column has also presented plenty of problems. It contains both blank elements and strange n/a (not even “NA”). Moreover, we would like it to be numeric. These cleaning procedures took a long time to figure out for this particular case, but we finally found the solution:

# Remove all observations with blanks or 'n/a''s in nWeeksChart column.
charts <- subset(charts, !nWeeksChart == "")
charts <- subset(charts, !nWeeksChart == "n/a")
charts$nWeeksChart <- factor(charts$nWeeksChart)  # update factor levels.
# Convert nWeeksChart from Factor to Numeric:
charts$nWeeksChart <- as.numeric(levels(charts$nWeeksChart))[as.integer(charts$nWeeksChart)]

Finally, we write the cleaned data to the file charts_clean.tsv:

write.table(charts, "charts_clean.tsv", quote = FALSE, sep = "\t", row.names = FALSE)

Data Aggregation and Plotting

This part is the central one; here, we perform a variety of data aggregation and plotting tasks and write the data tables and figures to file. The R script to perform all of this is data_aggregatePlot.R. It starts out by loading the necessary libraries and sourcing the two functions. Moreover, it reads the cleaned data saved by the script described above.

charts <- read.delim("charts_clean.tsv")
str(charts)  # basic sanity check.
## 'data.frame':    28142 obs. of  18 variables:
##  $ Year            : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
##  $ Yearly.Rank     : Factor w/ 890 levels "1","10","100",..: 127 206 230 244 271 295 314 330 339 352 ...
##  $ Prefix          : Factor w/ 28142 levels "1950_001","1950_002",..: 27812 27813 27814 27815 27816 27817 27818 27819 27820 27821 ...
##  $ nWeeksChart     : int  48 40 47 24 33 55 25 49 22 24 ...
##  $ nWeeksChartTop40: int  17 33 29 19 17 35 15 41 6 18 ...
##  $ nWeeksChartTop10: int  0 13 0 10 0 20 0 21 0 1 ...
##  $ nWeeksChartPeak : int  1 1 2 2 1 4 1 6 1 1 ...
##  $ High            : Factor w/ 103 levels "","--","0","1",..: 17 60 11 49 12 27 22 4 26 93 ...
##  $ Artist          : Factor w/ 7290 levels "'Til Tuesday",..: 5075 5357 3116 3761 2108 3116 6033 4397 4160 5148 ...
##  $ Featured        : Factor w/ 1856 levels ""," Feat. Master P, Destiny's Child, O'Dell, Mo B. Dick",..: 1 1 1 1088 1 1 1417 189 647 1 ...
##  $ Album           : Factor w/ 6668 levels "","#1's","#1s\311and Then Some",..: 3723 6656 1224 675 17 1224 6656 6656 6656 6116 ...
##  $ Track           : Factor w/ 22807 levels "#1","#1 Dee Jay",..: 11329 7476 10001 1905 18377 14935 6755 19766 13542 20213 ...
##  $ Time            : Factor w/ 357 levels "1:02","1:16",..: 185 128 158 146 177 106 121 154 141 166 ...
##  $ Genre           : Factor w/ 39 levels "","Adult Contemporary",..: 1 32 1 1 1 1 1 1 1 1 ...
##  $ ScorePoints     : Factor w/ 2686 levels "","#REF!","0",..: 1346 1688 1555 911 935 1855 475 1918 2681 810 ...
##  $ Date.Entered    : Factor w/ 3285 levels "02/20/2010","04/21/2012",..: 1755 2448 2322 2709 2637 2818 3268 3069 3141 513 ...
##  $ Date.Peaked     : Factor w/ 3284 levels "","1/1/00","1/1/05",..: 1155 91 162 232 27 2698 1232 1197 27 1155 ...
##  $ Time.num        : int  266 209 239 227 258 187 202 235 222 247 ...

The output above gives us a good overview of the cleaned, prepared data. For instance, we see the new Time.num column; it has integer elements which are exactly the song lengths in seconds (this can be confirmed by comparing the song lengths to those in the “mm:ss” format in the Time column). The nWeeksChart column, which was earlier a factor, has also successfully been converted; it was converted to numeric, but since it only contained integers, it is now saved as such.

In the following, we consider an array of different data aggregation and plotting tasks.

Distribution of Song Lengths

We start by the simple task of visualizing the distribution of song durations.

histogram(~Time.num, charts, nint = 50, col = "blue", main = "Song length distribution", 
    xlab = "Song length (seconds)")

plot of chunk unnamed-chunk-12

We see two peaks at roughly 150 and 225 seconds, corresponding to 2:30 and 3:45 minutes, respectively. It is interesting to see that the left dropoff is significantly steeper than the one on the right; apparently, for these more unusual song lengths, it is more common for songs to be longer.

Song Length vs. Year

In this part we consider how song lengths have evolved through time. This is done by making a scatterplot of duration against year. In order to avoid overplotting, we specify the alpha argument. Moreover, a line is drawn through the average song length for each year.

startYear <- min(charts$Year) # starting year (used for plotting x-range).
endYear <- max(charts$Year)   # final year (used for plotting x-range).
songLengthVsYear_xyPlot <- xyplot(Time.num ~ Year, charts, 
       main = "Song length vs. year", xlab = "Year", ylab = "Song length (seconds)", grid = TRUE,
       alpha = 0.5, # combat overplotting via alpha.
       type = c("p", "a"), col.line = "darkorange", lwd = 3,  # draw averages.
       scales = list(y = list(at = seq(0, 600, 30)),
                     x = list(at = seq(startYear, endYear, 5))))
print(songLengthVsYear_xyPlot)

plot of chunk unnamed-chunk-13

This shows something very interesting; having increased through the 1960s and up to the beginning of 1990s, the average song length peaked just before 1995 and has slowly decreased since then. (A similar plot can be made using stripplot, as exemplified in the data aggregation script.)
Now, we can also combat overplotting via smoothScatter in the panel argument, as done below:

songLengthVsYear_xyPlot2 <- xyplot(Time.num ~ Year, charts, main = "Song length vs. year", 
    xlab = "Year", ylab = "Song length (seconds)", grid = TRUE, scales = list(y = list(at = seq(0, 
        600, 30)), x = list(at = seq(startYear, endYear, 5))), panel = panel.smoothScatter)  # combat overplotting via smoothScatter.
print(songLengthVsYear_xyPlot2)
## KernSmooth 2.23 loaded Copyright M. P. Wand 1997-2009 (loaded the
## KernSmooth namespace)

plot of chunk unnamed-chunk-14

This gives a nice view of where the majority of the data points are located. From the blue brush-like stroke, we see the same behaviour as described above.

Song Length vs. Decade

This part is much like the previous, except we consider decades instead of years. This is mainly used as an exercise to learn more about R. In this case, considering decades instead of years can lead to some very interesting possibilities; for example, it allows us to more easily condition on the decade variable as a factor (conditioning on the year variable would simply be too much, as there are too many levels), whereby we can obtain a lot of interesting plots.
First, we use cut() to make a new factor called Decade. The factor levels are renamed directly in the function call. Then we use the decades in the plot:

Decade <- cut(charts$Year, 6, labels = c("1950s", "1960s", "1970s", "1980s", 
    "1990s", "2000s"))
songLengthVsDecade_stripplot <- stripplot(Time.num ~ Decade, charts, grid = TRUE, 
    main = "Song length vs. decade", xlab = "Decade", ylab = "Song length (seconds)", 
    jitter.data = TRUE, panel = panel.smoothScatter, scales = list(y = list(at = seq(0, 
        600, 30), rot = c(0, 0), cex = 0.8), x = list(rot = c(0, 0), cex = 0.8)))
print(songLengthVsDecade_stripplot)

plot of chunk unnamed-chunk-15

The same behaviour described above can also be hinted at here. Another interesting thing is the data for the 1960s; the big, blue, faded dot appears darker than those for the other decades. This means that there are more points gathered there. Consequently, it was more common in the 1960s for songs to be of roughly similar durations. Especially later on, in the past two decades, there seems to be a larger spread.

Shortest and Longest Songs Each Year

Let us find the shortest and longest songs for each year. We not only want just the minimum and maximum song lengths, but we would also like to see some info about these songs; e.g. song title, artist, etc. This data aggregation is performed via the plyr library. In the ddply() call we have custumized our own function:

minSongLengthInfoEachYear <- ddply(charts, ~Year, function(x) {
    theMin <- which.min(x$Time.num)
    shortestSongInfo <- x[theMin, c("Year", "Track", "Artist", "Time.num", "Time")]
    shortestSongInfo <- rename(shortestSongInfo, c(Time.num = "TimeInSeconds"))
})
maxSongLengthInfoEachYear <- ddply(charts, ~Year, function(x) {
    theMax <- which.max(x$Time.num)
    longestSongInfo <- x[theMax, c("Year", "Track", "Artist", "Time.num", "Time")]
    longestSongInfo <- rename(longestSongInfo, c(Time.num = "TimeInSeconds"))
})
write.table(minSongLengthInfoEachYear, "table_minSongLengthInfoEachYear.txt", 
    quote = FALSE, sep = "\t", row.names = FALSE)  # write to file.
write.table(maxSongLengthInfoEachYear, "table_maxSongLengthInfoEachYear.txt", 
    quote = FALSE, sep = "\t", row.names = FALSE)  # write to file.

The above code only defines the data.frames and writes them to file.
NOTE: For the sake of exercise, let us try to read in data from these files and then show it in HTML tables. The tables are rather long, so we will just do it for the minimum song lengths. (To see the longest songs and their info, uncomment the second line.)

htmlPrint(read.delim("table_minSongLengthInfoEachYear.txt"))
Year Track Artist TimeInSeconds Time
1950 A Bushel and a Peck Margaret Whiting & Jimmy Wakely 120 2:00
1951 Jingle Bells Les Paul 94 1:34
1952 Meet Mister Callaghan Les Paul 107 1:47
1953 The Typewriter Leroy Anderson & His Pops Concert Orchestra 95 1:35
1954 Oh, That’ll be Joyful Four Lads 155 2:35
1955 Ballad Of Davy Crockett Fess Parker 99 1:39
1956 Dear Elvis (Page 1) Audrey 93 1:33
1957 Santa And The Satellite (Part I) Buchanan & Goodman 83 1:23
1958 Bluebirds Over The Mountain Ersel Hickey 86 1:26
1959 Some Kind-A Earthquake Duane Eddy 77 1:17
1960 What Do You Want? Bobby Vee 94 1:34
1961 Let’s Get Together Hayley Mills 88 1:28
1962 Sugar Blues Ace Cannon 90 1:30
1963 Ten Little Indians Beach Boys, The 85 1:25
1964 Little Boxes Womenfolk, The 62 1:02
1965 Sunshine, Lollipops And Rainbows Lesley Gore 97 1:37
1966 Please Tell Me Why Dave Clark Five 90 1:30
1967 Long Legged Girl (With The Short Dress On) Elvis Presley 86 1:26
1968 Tip-Toe Thru The Tulips With Me Tiny Tim 108 1:48
1969 She’s A Lady John Sebastian 105 1:45
1970 Theme Music for The film 2001 A Space Odyssey Berlin Philharmonic 98 1:38
1971 Rags To Riches Elvis Presley 114 1:54
1972 Those Were The Days Carroll O'Connor & Jean Stapleton (As The Bunkers) 87 1:27
1973 Dueling Tubas (Theme From Belligerence) Martin Mull 86 1:26
1974 Energy Crisis ‘74 Dickie Goodman 120 2:00
1975 Sneaky Snake Tom T. Hall 117 1:57
1976 Hurt Elvis Presley 125 2:05
1977 Telephone Man Meri Wilson 118 1:58
1978 Do You Wanna Dance Ramones 115 1:55
1979 Good Timin’ Beach Boys, The 130 2:10
1980 Theme From The Dukes Of Hazzard (Good Ol' Boys) Waylon 126 2:06
1981 Almost Saturday Night Dave Edmunds 131 2:11
1982 Come Go With Me Beach Boys, The 126 2:06
1983 Holiday Road Lindsey Buckingham 131 2:11
1984 Sunshine In The Shade Fixx, The 146 2:26
1985 Miami Vice Theme Jan Hammer 146 2:26
1986 In Between Days Cure, The 136 2:16
1987 Come On, Let’s Go Los Lobos 129 2:09
1988 Hippy Hippy Shake Georgia Satellites, The 105 1:45
1989 Pop Singer John Cougar Mellencamp 165 2:45
1990 Drag My Bad Name Down 4 Of Us, The 170 2:50
1991 The Star Spangled Banner Whitney Houston 129 2:09
1992 All Shook Up Billy Joel 125 2:05
1993 Chattahoochee Alan Jackson 144 2:24
1994 Bizarre Love Triangle Frente! 117 1:57
1995 Roll To Me Del Amitri 127 2:07
1996 Esa Nena Linda Artie The 1 Man Party 156 2:36
1997 Little Bitty Alan Jackson 155 2:35
1998 We Shouldn’t Really Be Doing This George Strait 149 2:29
1999 Crazy Little Thing Called Love Dwight Yoakam 142 2:22
2000 www.memory Alan Jackson 156 2:36
2001 The Star Spangled Banner Whitney Houston 129 2:09
2002 Some Days You Gotta Dance Dixie Chicks, The 150 2:30
2003 Faint Linkin Park 162 2:42
2004 Drinkin' Bone Tracy Byrd 129 2:09
2005 Naked Marques Houston 130 2:10
2006 What I’ve Been Looking For (Reprise) Zac Efron 79 1:19
2007 Not Fade Away Sheryl Crow 125 2:05
2008 Anyone Else But You Michael Cera & Ellen Page 116 1:56
2009 It’s My Life / Confessions Part II Glee Cast 111 1:51
2010 Sing! Glee Cast 111 1:51
2011 Isn’t She Lovely Glee Cast 98 1:38
2012 Yesterday Adam Levine 131 2:11
2013 Cups Anna Kendrick 76 1:16
# htmlPrint(read.delim('table_maxSongLengthInfoEachYear.txt'))

We can easily find the shortest and longest songs for ALL the years considered:

htmlPrint(minSongLengthInfoEachYear[which.min(minSongLengthInfoEachYear$TimeInSeconds), 
    c("Year", "Track", "Artist", "TimeInSeconds", "Time")])
Year Track Artist TimeInSeconds Time
1964 Little Boxes Womenfolk, The 62 1:02
htmlPrint(maxSongLengthInfoEachYear[which.max(maxSongLengthInfoEachYear$TimeInSeconds), 
    c("Year", "Track", "Artist", "TimeInSeconds", "Time")])
Year Track Artist TimeInSeconds Time
1976 A Better Place To Be (Live) (Parts 1 & 2) Harry Chapin 570 9:30

On a related note, we can also find the average song length for ALL years:

sprintf("Average song length for years %d-%d = %4.2f seconds.", min(charts$Year), 
    max(charts$Year), mean(charts$Time.num, na.rm = TRUE))
## [1] "Average song length for years 1950-2013 = 200.79 seconds."

Average, Minimum, and Maximum Song Length Each Year:

Above, we looked at tables and song info for the shortest and longest songs. We now take a step forward and try to visualize the minimum and maximum song lengths through time. We also include the average. In the function in the ddply() call below, these statistics are included as the levels of a factor in the resulting data.frame:

songLengthStatsEachYear2 <- ddply(charts, ~Year, function(x) {
    cLevels <- c("min", "max", "avg")
    data.frame(stat = factor(cLevels, levels = cLevels), songLength = c(range(x$Time.num, 
        na.rm = TRUE), mean(x$Time.num, na.rm = TRUE)))
})
write.table(songLengthStatsEachYear2, "table_songLengthStatsEachYear2.txt", 
    quote = FALSE, sep = "\t", row.names = FALSE)

NOTE: Above, the data.frame is only saved and written it to file, because we once again want to toy with reading it back in and showing it;

htmlPrint(read.delim("table_songLengthStatsEachYear2.txt"))
Year stat songLength
1950 min 120
1950 max 213
1950 avg 170
1951 min 94
1951 max 246
1951 avg 167
1952 min 107
1952 max 367
1952 avg 165
1953 min 95
1953 max 405
1953 avg 165
1954 min 155
1954 max 155
1954 avg 155
1955 min 99
1955 max 370
1955 avg 156
1956 min 93
1956 max 335
1956 avg 153
1957 min 83
1957 max 220
1957 avg 147
1958 min 86
1958 max 410
1958 avg 143
1959 min 77
1959 max 249
1959 avg 143
1960 min 94
1960 max 282
1960 avg 148
1961 min 88
1961 max 495
1961 avg 148
1962 min 90
1962 max 354
1962 avg 150
1963 min 85
1963 max 271
1963 avg 149
1964 min 62
1964 max 219
1964 avg 149
1965 min 97
1965 max 360
1965 avg 155
1966 min 90
1966 max 333
1966 avg 157
1967 min 86
1967 max 298
1967 avg 161
1968 min 108
1968 max 440
1968 avg 170
1969 min 105
1969 max 444
1969 avg 180
1970 min 98
1970 max 413
1970 avg 186
1971 min 114
1971 max 410
1971 avg 186
1972 min 87
1972 max 516
1972 avg 200
1973 min 86
1973 max 391
1973 avg 201
1974 min 120
1974 max 390
1974 avg 199
1975 min 117
1975 max 444
1975 avg 200
1976 min 125
1976 max 570
1976 avg 207
1977 min 118
1977 max 391
1977 avg 208
1978 min 115
1978 max 475
1978 avg 213
1979 min 130
1979 max 437
1979 avg 218
1980 min 126
1980 max 396
1980 avg 217
1981 min 131
1981 max 393
1981 avg 219
1982 min 126
1982 max 347
1982 avg 217
1983 min 131
1983 max 367
1983 avg 227
1984 min 146
1984 max 372
1984 avg 233
1985 min 146
1985 max 382
1985 avg 236
1986 min 136
1986 max 375
1986 avg 240
1987 min 129
1987 max 352
1987 avg 238
1988 min 105
1988 max 361
1988 avg 241
1989 min 165
1989 max 444
1989 avg 245
1990 min 170
1990 max 400
1990 avg 248
1991 min 129
1991 max 489
1991 avg 248
1992 min 125
1992 max 536
1992 avg 256
1993 min 144
1993 max 423
1993 avg 254
1994 min 117
1994 max 392
1994 avg 245
1995 min 127
1995 max 455
1995 avg 250
1996 min 156
1996 max 410
1996 avg 250
1997 min 155
1997 max 443
1997 avg 248
1998 min 149
1998 max 392
1998 avg 240
1999 min 142
1999 max 429
1999 avg 235
2000 min 156
2000 max 470
2000 avg 242
2001 min 129
2001 max 415
2001 avg 238
2002 min 150
2002 max 370
2002 avg 239
2003 min 162
2003 max 468
2003 avg 241
2004 min 129
2004 max 394
2004 avg 235
2005 min 130
2005 max 349
2005 avg 232
2006 min 79
2006 max 380
2006 avg 227
2007 min 125
2007 max 448
2007 avg 231
2008 min 116
2008 max 515
2008 avg 236
2009 min 111
2009 max 392
2009 avg 229
2010 min 111
2010 max 416
2010 avg 231
2011 min 98
2011 max 436
2011 avg 227
2012 min 131
2012 max 502
2012 avg 231
2013 min 76
2013 max 484
2013 avg 232

The data in this data.frame can be used for plotting:

minMaxAvgSongLengthVsYear <- xyplot(songLength ~ Year, songLengthStatsEachYear2, 
    main = "min, max, and average song length vs. year", ylab = "Song length (seconds)", 
    group = stat, type = "b", grid = "h", as.table = TRUE, auto.key = list(columns = 3))
# print(minMaxAvgSongLengthVsYear)
png("plot_minMaxAvgSongLengthVsYear.png")
print(minMaxAvgSongLengthVsYear)
dev.off()
## pdf 
##   2

It is on purpose that we don't immediately print the plot in the above code chunk; we just save it to file.
NOTE: We now want to embed the pre-made plot in this document: min., max., avg. song length vs. year

From the green graph we see that the average song length increased from the 1960s to the beginning of the 1990s, and from there on is decreasing a little bit. This is just like what we discussed in some of the earlier sections. Moreover, it is interesting to notice how the minimum song length appears to vary much less from year to year than the maximum song length. (It should be noted that the point–somewhere in the 1960s–where the max, min, and avg are the same, is likely the only data point for that year and should thus be taken with a grain of salt–or more likely just be disregarded entirely.)

Number of Songs Per Year

We now consider how the number of songs listed on the charts has evolved from year to year. We skip years 1955 and older, since these contain so few data points and so many missing values to really cause havoc now that we look at the number of songs.

nSongsEachYear <- ddply(subset(charts, Year > 1955), ~Year, summarize, nSongs = length(Prefix))
# the variable Prefix is unique for each song, which is optimal for this
# case.
nSongsVsYear <- xyplot(nSongs ~ Year, nSongsEachYear, main = "No. of songs on the charts vs. year", 
    ylab = "No. of songs", type = "b", grid = "h")
# print(nSongsVsYear)
png("plot_nSongsVsYear.png")
print(nSongsVsYear)
dev.off()
## pdf 
##   2

NOTE: Again, we only define the plot and print it to file; we want to bring it back in just like before by embedding the pre-made plot in this document: no. of songs on charts vs. year

This plot shows something very interesting, telling us a lot about the diversity of songs on the chart through time. We see a clear peak in the end of the 1960s; more than 700 songs were on the chart for each year in this period. From the 1970s and all the way to the beginning of the 2000s, the number of songs decreased overall; this observed behaviour indicates a smaller amount of diversity on the charts in this period, and thus that the same songs seemed to dominate. From about 2004, the number of songs takes a sudden increase, peaking at about 500 in 2011. The sudden downfall after this could be true, or it could be attributed to the dataset not having been updated with all the new songs for the most recent years of 2012 and 2013.

Proportion of Songs with a Duration Longer than the Total Average Duration

Here, we consider the number and the proportion of songs that have a duration longer than a certain threshold. This threshold is set to be the average song length for the entire time period, but it can be changed at will. The resulting data.frame is shown in the table below.

threshold <- mean(charts$Time.num, na.rm = TRUE)
nSongsLongerThanAvgEachYear <- ddply(subset(charts, Year > 1955), ~Year, function(x) {
    count <- sum(x$Time.num >= threshold, na.rm = TRUE)
    total <- nrow(x)
    prop <- count/total
    data.frame(Count = count, Total = total, Proportion = prop)
})
htmlPrint(nSongsLongerThanAvgEachYear, digits = 2)
Year Count Total Proportion
1956 11 505 0.02
1957 6 496 0.01
1958 4 530 0.01
1959 5 576 0.01
1960 15 602 0.02
1961 12 681 0.02
1962 14 676 0.02
1963 13 658 0.02
1964 5 718 0.01
1965 27 717 0.04
1966 20 743 0.03
1967 45 739 0.06
1968 83 686 0.12
1969 138 672 0.21
1970 161 653 0.25
1971 161 635 0.25
1972 236 591 0.40
1973 237 536 0.44
1974 223 496 0.45
1975 250 568 0.44
1976 277 534 0.52
1977 280 473 0.59
1978 292 453 0.64
1979 346 476 0.73
1980 343 474 0.72
1981 292 408 0.72
1982 310 424 0.73
1983 373 452 0.83
1984 391 435 0.90
1985 370 405 0.91
1986 374 397 0.94
1987 359 398 0.90
1988 355 387 0.92
1989 365 392 0.93
1990 359 376 0.95
1991 360 385 0.94
1992 342 371 0.92
1993 321 349 0.92
1994 306 345 0.89
1995 333 357 0.93
1996 302 324 0.93
1997 320 341 0.94
1998 312 346 0.90
1999 262 315 0.83
2000 281 317 0.89
2001 261 301 0.87
2002 269 295 0.91
2003 281 312 0.90
2004 262 306 0.86
2005 297 342 0.87
2006 295 363 0.81
2007 288 349 0.83
2008 329 396 0.83
2009 339 436 0.78
2010 369 483 0.76
2011 380 497 0.76
2012 303 374 0.81
2013 259 331 0.78

We then plot the proportion of songs vs. year:

propSongsLongerThanAvgVsYear <- xyplot(Proportion ~ Year, nSongsLongerThanAvgEachYear, 
    main = paste("Proportion of songs with length >= ", threshold, "(= avg. over all years)"), 
    ylab = "Proportion of songs", type = "b", grid = "h")
print(propSongsLongerThanAvgVsYear)

plot of chunk unnamed-chunk-25

So, we are comparing the song lengths to the average duration (over all years), which is about 3:20 minutes. Evidently, there are very few older songs longer than this threshold. But there actually is a reason for this: In the 1960s and earlier, the songs were recorded in a so-called 45 RPM format, which had a capacity of about 3 minutes. It is thus no wonder why the left end of graph looks the way it does, with very low proportions at each year. Now, in the end of the 1960s, these recording constraints were removed. And this is exactly what we can see in the plot; from the end of the 1960s, the proportion of songs with a duration longer than 3:20 increases. The peak seemed to have been reached in the 1990s, where almost all songs on the chart were longer than 3:20 minutes. Since then, the trend has been decreasing.

Longest Charting Songs

In this part we will investigate which songs that charted the longest in the Top 100, Top 40, and Top 10, as well as which song charted longest at its highest position. We begin by considering the entire time period. We also extract some additional information belonging to the longest charting songs, e.g. artist, song title, etc.

# longest charting song in top 100 and its info:
htmlPrint(charts[which.max(charts$nWeeksChart), c("Track", "Artist", "Time", 
    "Date.Entered", "High", "Date.Peaked", "nWeeksChart")])
Track Artist Time Date.Entered High Date.Peaked nWeeksChart
I’m Yours Jason Mraz 4:03 5/3/08 6 9/20/08 76
# in top 40:
htmlPrint(charts[which.max(charts$nWeeksChartTop40), c("Track", "Artist", "Time", 
    "Date.Entered", "High", "Date.Peaked", "nWeeksChartTop40")])
Track Artist Time Date.Entered High Date.Peaked nWeeksChartTop40
I’m Yours Jason Mraz 4:03 5/3/08 6 9/20/08 62
# in top 10:
htmlPrint(charts[which.max(charts$nWeeksChartTop10), c("Track", "Artist", "Time", 
    "Date.Entered", "High", "Date.Peaked", "nWeeksChartTop10")])
Track Artist Time Date.Entered High Date.Peaked nWeeksChartTop10
How Do I Live LeAnn Rimes 4:18 6/21/97 2 12/13/97 32
# song that charted longest at its peak/highest position:
htmlPrint(charts[which.max(charts$nWeeksChartPeak), c("Track", "Artist", "Time", 
    "Date.Entered", "High", "Date.Peaked", "nWeeksChartPeak")])
Track Artist Time Date.Entered High Date.Peaked nWeeksChartPeak
One Sweet Day Mariah Carey 4:42 12/2/95 1 12/2/95 16

It is quite impressive what the outputs above tell us. First of all, “I'm Yours” by Jason Mraz was in Top 100 for 76 weeks, and in Top 40 for as long as 62 weeks! And this even though its highest position was only 6. More impressive, perhaps, is that “One Sweet Day” with Mariah Carey held the no. 1 position for 16 weeks–that's 4 months without being pushed off the pole position!
One can also look at earlier years, as is done in the data aggregation and plotting script file. This poses the question whether songs are charting longer nowadays?
We can investigate this:

nWeeksInTop100vsYear <- xyplot(nWeeksChart ~ Year, subset(charts, Year > 1955), 
    grid = "h", main = "No. of weeks a song has charted in Top 100 vs. Year", 
    ylab = "No. of weeks", type = c("p", "a"), col.line = "darkorange", lwd = 3, 
    alpha = 0.5)
print(nWeeksInTop100vsYear)

plot of chunk unnamed-chunk-27

Some songs in newer time do seem to be charting much longer (as seen by the scattered points in the top right), but the average (orange line) is decreasing.

Chart Position, Weeks on Chart, etc. vs. Song Length

In this part we investigate whether/how song length is related to success on the charts; i.e. should we make our new song long or short? For this purpose we create a large data.frame through a customized function in a ddply() call:

# longest charting songs (in Top 100, 40, 10) and their lengths for each
# year:
longestChartingSongsEachYear <- ddply(charts, ~Year, function(x) {
    max_nWeeksChart <- max(x$nWeeksChart)
    theMax_nWeeksChart <- which.max(x$nWeeksChart)
    max_nWeeksChartTop40 <- max(x$nWeeksChartTop40)
    theMax_nWeeksChartTop40 <- which.max(x$nWeeksChartTop40)
    max_nWeeksChartTop10 <- max(x$nWeeksChartTop10)
    theMax_nWeeksChartTop10 <- which.max(x$nWeeksChartTop10)
    cLevels <- c("weeksInTop100", "weeksInTop40", "weeksInTop10")
    data.frame(successMeasure = factor(cLevels, levels = cLevels), nWeeks = c(max_nWeeksChart, 
        max_nWeeksChartTop40, max_nWeeksChartTop10), Track = c(as.character(x$Track[theMax_nWeeksChart]), 
        as.character(x$Track[theMax_nWeeksChartTop40]), as.character(x$Track[theMax_nWeeksChartTop10])), 
        Artist = c(as.character(x$Artist[theMax_nWeeksChart]), as.character(x$Artist[theMax_nWeeksChartTop40]), 
            as.character(x$Artist[theMax_nWeeksChartTop10])), songLength = c(x$Time.num[theMax_nWeeksChart], 
            x$Time.num[theMax_nWeeksChartTop40], x$Time.num[theMax_nWeeksChartTop10]))
})
write.table(longestChartingSongsEachYear, "table_longestChartingSongsEachYear.txt", 
    quote = FALSE, sep = "\t", row.names = FALSE)

The data.frame is written to file in the last two lines of code above. NOTE: Let's read the table back in and print it as HTML (though it is quite large):

htmlPrint(read.delim("table_longestChartingSongsEachYear.txt"))
Year successMeasure nWeeks Track Artist songLength
1950 weeksInTop100 27 The Third Man Theme Anton Karas 131
1950 weeksInTop40 The Third Man Theme Anton Karas 131
1950 weeksInTop10 The Third Man Theme Anton Karas 131
1951 weeksInTop100 34 Be My Love Mario Lanza 208
1951 weeksInTop40 Be My Love Mario Lanza 208
1951 weeksInTop10 Be My Love Mario Lanza 208
1952 weeksInTop100 38 Blue Tango Leroy Anderson & His Pops Concert Orchestra 171
1952 weeksInTop40 Blue Tango Leroy Anderson & His Pops Concert Orchestra 171
1952 weeksInTop10 Blue Tango Leroy Anderson & His Pops Concert Orchestra 171
1953 weeksInTop100 31 Vaya Con Dios (May God Be With You) Les Paul & Mary Ford 171
1953 weeksInTop40 Vaya Con Dios (May God Be With You) Les Paul & Mary Ford 171
1953 weeksInTop10 Vaya Con Dios (May God Be With You) Les Paul & Mary Ford 171
1954 weeksInTop100 1 Oh, That’ll be Joyful Four Lads 155
1954 weeksInTop40 Oh, That’ll be Joyful Four Lads 155
1954 weeksInTop10 Oh, That’ll be Joyful Four Lads 155
1955 weeksInTop100 27 Melody Of Love Billy Vaughn 175
1955 weeksInTop40 Melody Of Love Billy Vaughn 175
1955 weeksInTop10 Cherry Pink And Apple Blossom White Perez Prado and His Orchestra 176
1956 weeksInTop100 31 Canadian Sunset Hugo Winterhalter 170
1956 weeksInTop40 24 Lisbon Antigua Nelson Riddle and His Orchestra 153
1956 weeksInTop10 21 Don’t Be Cruel Elvis Presley 123
1957 weeksInTop100 39 Wonderful! Wonderful! Johnny Mathis 167
1957 weeksInTop40 So Rare Jimmy Dorsey 150
1957 weeksInTop10 So Rare Jimmy Dorsey 150
1958 weeksInTop100 30 All The Way Frank Sinatra 170
1958 weeksInTop40 Chantilly Lace Big Bopper 140
1958 weeksInTop10 Patricia Perez Prado and His Orchestra 138
1959 weeksInTop100 26 Mack The Knife Bobby Darin 184
1959 weeksInTop40 22 Mack The Knife Bobby Darin 184
1959 weeksInTop10 16 Mack The Knife Bobby Darin 184
1960 weeksInTop100 27 Running Bear Johnny Preston 153
1960 weeksInTop40 20 He’ll Have To Go Jim Reeves 136
1960 weeksInTop10 12 The Theme From A Summer Place Percy Faith 144
1961 weeksInTop100 26 Moon River Henry Mancini, His Orchestra and Chorus 161
1961 weeksInTop40 18 Exodus Ferrante and Teicher 174
1961 weeksInTop10 12 Tossin' And Turnin' Bobby Lewis 150
1962 weeksInTop100 23 Limbo Rock Chubby Checker 142
1962 weeksInTop40 18 The Twist Chubby Checker 152
1962 weeksInTop10 13 The Twist Chubby Checker 152
1963 weeksInTop100 20 Up On The Roof Drifters, The 154
1963 weeksInTop40 13 Sugar Shack Jimmy Gilmer 121
1963 weeksInTop10 10 Sugar Shack Jimmy Gilmer 121
1964 weeksInTop100 22 Hello, Dolly! Louis Armstrong 142
1964 weeksInTop40 19 Hello, Dolly! Louis Armstrong 142
1964 weeksInTop10 13 Hello, Dolly! Louis Armstrong 142
1965 weeksInTop100 18 Wooly Bully Sam the Sham and the Pharaohs 140
1965 weeksInTop40 14 Wooly Bully Sam the Sham and the Pharaohs 140
1965 weeksInTop10 10 I Can’t Help Myself Four Tops 163
1966 weeksInTop100 21 Born Free Roger Williams 142
1966 weeksInTop40 14 Devil With A Blue Dress On & Good Golly Miss Molly Mitch Ryder 194
1966 weeksInTop10 12 I’m A Believer Monkees, The 161
1967 weeksInTop100 18 Boogaloo Down Broadway Fantastic Johnny C, The 161
1967 weeksInTop40 15 To Sir With Love Lulu 164
1967 weeksInTop10 10 Daydream Believer Monkees, The 177
1968 weeksInTop100 26 Sunshine Of Your Love Cream, The 183
1968 weeksInTop40 19 Hey Jude Beatles, The 431
1968 weeksInTop10 14 Hey Jude Beatles, The 431
1969 weeksInTop100 22 Sugar, Sugar Archies, The 168
1969 weeksInTop40 Sugar, Sugar Archies, The 168
1969 weeksInTop10 Sugar, Sugar Archies, The 168
1970 weeksInTop100 23 Yellow River Christie 160
1970 weeksInTop40 Raindrops Keep Fallin' On My Head B.J. Thomas 182
1970 weeksInTop10 Raindrops Keep Fallin' On My Head B.J. Thomas 182
1971 weeksInTop100 26 I’ve Found Someone Of My Own Free Movement, The 225
1971 weeksInTop40 Knock Three Times Dawn 176
1971 weeksInTop10 Joy To The World Three Dog Night 197
1972 weeksInTop100 22 I Am Woman Helen Reddy 188
1972 weeksInTop40 American Pie (Parts 1 and 2) Don McLean 516
1972 weeksInTop10 The First Time Ever I Saw Your Face Roberta Flack 255
1973 weeksInTop100 38 Why Me Kris Kristofferson 205
1973 weeksInTop40 Why Me Kris Kristofferson 205
1973 weeksInTop10 Let’s Get It On Marvin Gaye 238
1974 weeksInTop100 28 One Hell Of A Woman Mac Davis 172
1974 weeksInTop40 Come And Get Your Love Redbone 210
1974 weeksInTop10 The Way We Were Barbra Streisand 209
1975 weeksInTop100 32 Feelings Morris Albert 226
1975 weeksInTop40 Rhinestone Cowboy Glen Campbell 188
1975 weeksInTop10 One Of These Nights Eagles 208
1976 weeksInTop100 28 A Fifth Of Beethoven Walter Murphy 182
1976 weeksInTop40 A Fifth Of Beethoven Walter Murphy 182
1976 weeksInTop10 Tonight’s The Night (Gonna Be Alright) Rod Stewart 235
1977 weeksInTop100 33 How Deep Is Your Love Bee Gees 210
1977 weeksInTop40 How Deep Is Your Love Bee Gees 210
1977 weeksInTop10 How Deep Is Your Love Bee Gees 210
1978 weeksInTop100 40 I Go Crazy Paul Davis 234
1978 weeksInTop40 I Go Crazy Paul Davis 234
1978 weeksInTop10 Le Freak Chic 210
1979 weeksInTop100 27 I Will Survive Gloria Gaynor 195
1979 weeksInTop40 Pop Muzik M 200
1979 weeksInTop10 Hot Stuff Donna Summer 227
1980 weeksInTop100 31 Another One Bites The Dust Queen 212
1980 weeksInTop40 Do That To Me One More Time Captain and Tennille 229
1980 weeksInTop10 Another One Bites The Dust Queen 212
1981 weeksInTop100 32 Jessie’s Girl Rick Springfield 194
1981 weeksInTop40 22 Jessie’s Girl Rick Springfield 194
1981 weeksInTop10 15 Physical Olivia Newton-John 223
1982 weeksInTop100 43 Tainted Love Soft Cell 158
1982 weeksInTop40 22 Hurts So Good John Cougar 215
1982 weeksInTop10 16 Hurts So Good John Cougar 215
1983 weeksInTop100 32 Baby, Come To Me Patti Austin 210
1983 weeksInTop40 21 You And I Eddie Rabbitt 238
1983 weeksInTop10 14 Flashdance…What A Feeling Irene Cara 235
1984 weeksInTop100 30 Borderline Madonna 238
1984 weeksInTop40 What’s Love Got To Do With It Tina Turner 229
1984 weeksInTop10 When Doves Cry Prince 229
1985 weeksInTop100 29 I Miss You Klymaxx 244
1985 weeksInTop40 Careless Whisper Wham! 300
1985 weeksInTop10 Say You, Say Me Lionel Richie 239
1986 weeksInTop100 27 Something About You Level 42 228
1986 weeksInTop40 17 That’s What Friends Are For Dionne & Friends 254
1986 weeksInTop10 10 That’s What Friends Are For Dionne & Friends 254
1987 weeksInTop100 30 In My Dreams REO Speedwagon 260
1987 weeksInTop40 16 Shake You Down Gregory Abbott 244
1987 weeksInTop10 9 Faith George Michael 194
1988 weeksInTop100 30 I’ll Always Love You Taylor Dayne 258
1988 weeksInTop40 17 Need You Tonight INXS 181
1988 weeksInTop10 8 Every Rose Has Its Thorn Poison 260
1989 weeksInTop100 39 Bust A Move Young M.C. 260
1989 weeksInTop40 20 Bust A Move Young M.C. 260
1989 weeksInTop10 10 Another Day In Paradise Phil Collins 284
1990 weeksInTop100 30 Close To You Maxi Priest 235
1990 weeksInTop40 19 From A Distance Bette Midler 275
1990 weeksInTop10 10 Because I Love You (The Postman Song) Stevie B 255
1991 weeksInTop100 29 High Enough Damn Yankees 250
1991 weeksInTop40 20 Emotions Mariah Carey 249
1991 weeksInTop10 11 (Everything I Do) I Do It For You Bryan Adams 243
1992 weeksInTop100 37 Just Another Day Jon Secada 251
1992 weeksInTop40 Just Another Day Jon Secada 251
1992 weeksInTop10 End Of The Road Boyz II Men 350
1993 weeksInTop100 45 Whoomp! There It Is Tag Team 267
1993 weeksInTop40 Whoomp! There It Is Tag Team 267
1993 weeksInTop10 Whoomp! There It Is Tag Team 267
1994 weeksInTop100 45 Another Night Real McCoy 233
1994 weeksInTop40 Another Night Real McCoy 233
1994 weeksInTop10 Another Night Real McCoy 233
1995 weeksInTop100 49 Run-Around Blues Traveler 252
1995 weeksInTop40 Gangsta’s Paradise Coolio 240
1995 weeksInTop10 Gangsta’s Paradise Coolio 240
1996 weeksInTop100 60 Macarena (Bayside Boys Mix) Los Del Rio 234
1996 weeksInTop40 You’re Makin' Me High Toni Braxton 269
1996 weeksInTop10 Un-Break My Heart Toni Braxton 264
1997 weeksInTop100 69 How Do I Live LeAnn Rimes 258
1997 weeksInTop40 How Do I Live LeAnn Rimes 258
1997 weeksInTop10 How Do I Live LeAnn Rimes 258
1998 weeksInTop100 56 I Don’t Want To Wait Paula Cole 247
1998 weeksInTop40 52 Truly Madly Deeply Savage Garden 279
1998 weeksInTop10 26 Truly Madly Deeply Savage Garden 279
1999 weeksInTop100 58 Smooth Santana 244
1999 weeksInTop40 50 Smooth Santana 244
1999 weeksInTop10 30 Smooth Santana 244
2000 weeksInTop100 57 Higher Creed 316
2000 weeksInTop40 43 Amazed Lonestar 265
2000 weeksInTop10 19 Everything You Want Vertical Horizon 241
2001 weeksInTop100 56 The Way You Love Me Faith Hill 186
2001 weeksInTop40 45 Hanging By A Moment Lifehouse 213
2001 weeksInTop10 23 How You Remind Me Nickelback 223
2002 weeksInTop100 45 Wherever You Will Go Calling, The 208
2002 weeksInTop40 40 Wherever You Will Go Calling, The 208
2002 weeksInTop10 19 Dilemma Nelly 287
2003 weeksInTop100 54 Unwell matchbox twenty 228
2003 weeksInTop40 42 Here Without You 3 Doors Down 238
2003 weeksInTop10 17 Hey Ya! OutKast 249
2004 weeksInTop100 50 Someday Nickelback 207
2004 weeksInTop40 41 Yeah! Usher 250
2004 weeksInTop10 24 Yeah! Usher 250
2005 weeksInTop100 62 You And Me Lifehouse 195
2005 weeksInTop40 44 You And Me Lifehouse 195
2005 weeksInTop10 23 We Belong Together Mariah Carey 201
2006 weeksInTop100 58 How To Save A Life Fray, The 261
2006 weeksInTop40 36 How To Save A Life Fray, The 261
2006 weeksInTop10 19 How To Save A Life Fray, The 261
2007 weeksInTop100 64 Before He Cheats Carrie Underwood 200
2007 weeksInTop40 53 Before He Cheats Carrie Underwood 200
2007 weeksInTop10 25 Apologize Timbaland 184
2008 weeksInTop100 76 I’m Yours Jason Mraz 243
2008 weeksInTop40 62 I’m Yours Jason Mraz 243
2008 weeksInTop10 23 Low Flo Rida 232
2009 weeksInTop100 57 Use Somebody Kings Of Leon 230
2009 weeksInTop40 47 I Gotta Feeling Black Eyed Peas, The 289
2009 weeksInTop10 24 Down Jay Sean 212
2010 weeksInTop100 60 Need You Now Lady Antebellum 237
2010 weeksInTop40 50 Need You Now Lady Antebellum 237
2010 weeksInTop10 22 Just The Way You Are Bruno Mars 220
2011 weeksInTop100 68 Party Rock Anthem LMFAO 263
2011 weeksInTop40 53 Rolling In The Deep Adele 228
2011 weeksInTop10 29 Party Rock Anthem LMFAO 263
2012 weeksInTop100 62 Ho Hey Lumineers, The 163
2012 weeksInTop40 44 Somebody That I Used To Know Gotye 244
2012 weeksInTop10 24 Somebody That I Used To Know Gotye 244
2013 weeksInTop100 55 Radioactive Imagine Dragons 187
2013 weeksInTop40 41 Thrift Shop Macklemore 235
2013 weeksInTop10 21 Thrift Shop Macklemore 235

What's more interesting, perhaps, is to make a plot:

nWeeksOnChartsVsSongLength <- xyplot(nWeeks ~ songLength, longestChartingSongsEachYear, 
    group = successMeasure, main = "Longest charting songs (3 different measures) for each year vs. song length", 
    xlab = "Song length (seconds)", ylab = "No. of weeks", grid = "h", auto.key = list(columns = 3))
print(nWeeksOnChartsVsSongLength)

plot of chunk unnamed-chunk-30

pdf("plot_nWeeksOnChartsVsSongLength.pdf")
print(nWeeksOnChartsVsSongLength)
dev.off()
## pdf 
##   2

Finally, we can calculate the average song length based on the longest charting songs; this would give us the “best” song length for success:

bestSongLengthForSucess <- mean(longestChartingSongsEachYear$songLength)
sprintf("Best song length for staying on charts = %4.2f seconds.", bestSongLengthForSucess)
## [1] "Best song length for staying on charts = 213.27 seconds."

So, if we were to make a song and want it to last long on the charts, a good starting point might be to have it last 213.2656 seconds.

Number and Proportion of Songs Charting Longer than 10 Weeks in Top 100 vs. Year

Here, we look at the number and the proportion of songs that have charted longer than a certain amount of weeks (set to 10 by default) in Top 100. We do this for every year.

benchmark <- 10
nSongsChartLongerEachYear <- ddply(subset(charts, Year > 1955), ~Year, function(x) {
    count <- sum(x$nWeeksChart >= benchmark, na.rm = TRUE)
    total <- nrow(x)
    prop <- count/total
    data.frame(Count = count, Total = total, Proportion = prop)
})
# write table to file:
write.table(nSongsChartLongerEachYear, "table_nSongsChartLongerEachYear.txt", 
    quote = FALSE, sep = "\t", row.names = FALSE)
# print HTML table:
htmlPrint(nSongsChartLongerEachYear, digits = 2)
Year Count Total Proportion
1956 255 505 0.50
1957 252 496 0.51
1958 249 530 0.47
1959 252 576 0.44
1960 251 602 0.42
1961 210 681 0.31
1962 243 676 0.36
1963 225 658 0.34
1964 203 718 0.28
1965 198 717 0.28
1966 180 743 0.24
1967 178 739 0.24
1968 197 686 0.29
1969 222 672 0.33
1970 220 653 0.34
1971 242 635 0.38
1972 252 591 0.43
1973 262 536 0.49
1974 241 496 0.49
1975 239 568 0.42
1976 223 534 0.42
1977 233 473 0.49
1978 263 453 0.58
1979 251 476 0.53
1980 271 474 0.57
1981 240 408 0.59
1982 260 424 0.61
1983 267 452 0.59
1984 263 435 0.60
1985 276 405 0.68
1986 268 397 0.68
1987 275 398 0.69
1988 269 387 0.70
1989 263 392 0.67
1990 265 376 0.70
1991 274 385 0.71
1992 264 371 0.71
1993 256 349 0.73
1994 249 345 0.72
1995 256 357 0.72
1996 247 324 0.76
1997 255 341 0.75
1998 240 346 0.69
1999 250 315 0.79
2000 242 317 0.76
2001 257 301 0.85
2002 242 295 0.82
2003 247 312 0.79
2004 245 306 0.80
2005 240 342 0.70
2006 219 363 0.60
2007 223 349 0.64
2008 239 396 0.60
2009 223 436 0.51
2010 227 483 0.47
2011 221 497 0.44
2012 204 374 0.55
2013 170 331 0.51
# plot:
propSongsChartLongerVsYear <- xyplot(Proportion ~ Year, nSongsChartLongerEachYear, 
    main = paste("Proportion of songs charting in Top 100 \n                                                  longer than", 
        benchmark, "weeks vs. Year"), ylab = "Proportion", type = "b", grid = "h")
print(propSongsChartLongerVsYear)

plot of chunk unnamed-chunk-32

# write plot to file:
pdf("plot_propSongsChartLongerVsYear.pdf")
print(propSongsChartLongerVsYear)
dev.off()

pdf 2

This is quite interesting to see; fewer and fewer songs charted in Top 100 for more than 10 weeks in the time period from th 1950s to the end of the 1960s. Afterwards, we observe an overall increase lasting all the way into the 2000s; here, an increasing number, or proportion rather, of songs charted longer than 10 weeks in top 100. The trend for the last 10 or so years is decreasing, however, indicating that songs are on the top 100 for fewer and fewer weeks.

In the data aggregation and plotting script, data_aggregatePlot.R, we have also considered the variable ScorePoints that contains a certain score (see the beginning sections for details) for each song. We leave it out here for the sake of brevity (though it seems we are a far cry from brevity with this report…) and leave it at the mention.

Final Notes

As mentioned in the beginning, the dataset considered here provides an almost endless array of opportunities for data aggregation and visualization. The limiting factors are only the finite amount of time we have available in our lives as well as the dataset's shortcomings in regards to e.g. missing observations.
One thing we would have liked to look at was the Genre variable/column. This would be a great factor on which we could condition, thus making e.g. nice multi-panel plots out of some of the plots already made in the above. However, there are only very few observations in the dataset for which the Genre is available.

There are other cool analyses and visualizations of the data in the Whitburn Project:

Regarding code externalization, I have tried to read in code chunks from my R scripts to this R Markdown document–but without success so far. It seems most of the Internet resources on this topic deals with .Rnw files with a slightly different syntax for code chunks than what is used here for .Rmd. My failed attempt is below. The referenced code chunk with the label my-label is at the bottom of the data_aggregatePlot.R script.

read_chunk('data_aggregatePlot.R')
<<my-label>>=
@

However, in the analysis and visualizations above, I have embedded pre-existing figures into this R Markdown document as well as imported pre-existing data from files and worked with it.

As of Monday, Oct 21, just before the deadline, I am also working on a Git repository.
UPDATE on Monday, Oct 21: The Git repository is up and running!

Christian Birch Okkels
October 21, 2013