What do you get on the 13th day of Christmas?

A response to Bob Rudis’s Ode to Christmas Shopping. What would happen if Christmas had an extra day? Let’s try to find out by throwing a bunch of Christmas carols at a deep learning network to see what it suggests as an extension of

day13 <- "on the thirteenth day of christmas my true love gave to me" # what?

I’m only going to say this once: This is programming as a joke! The deep learning model is garbage, and no sensible results are or can be expected. But it’s a heap of fun nevertheless! Christmas carol texts were obtained from all English-language carols given at http://www.lyricsmode.com/lyrics/c/christmas_carols/, and dumped in a /texts/ directory.

Some text pre-processing

First convert these texts to vectors of words.

require (magrittr)
process_one <- function (f)
{
    con <- file (f, open = "r")
    txt <- readLines (con, warn = FALSE)
    close (con)
    paste (txt, collapse = " ") %>%
        gsub ("[[:punct:]]|\\-\\-", "", .) %>%
        gsub ("^\\s+", "", .) %>% # gsub stops at 1st match, so this is separate
        tolower () %>%
        strsplit (split = "\\s+") %>%
        extract2 (1)
}
f <- list.files ("./texts", full.names = TRUE)
txts <- lapply (f, process_one)

Then get some stats for setting up the network

lens <- vapply (txts, length, numeric (1))
minlen <- min (lens) # 47
nwords <- sum (lens) # 7515
words <- unique (unlist (txts))
message (format (nwords, big.mark = ","), " words with ",
         format (length (words), big.mark = ","), " unique = ",
         formatC (nwords / length (words), format = "f", digits = 2),
         " symbols per song, or ",
         formatC (length (words) / mean (lens), format = "f", digits = 2),
         " occurrences of each symbol per song")

## 7,515 words with 1,376 unique = 5.46 symbols per song, or 8.06 occurrences of each symbol per song

Process our day13 string, add to words, and convert txts to symbols

day13 <- day13 %>% strsplit (split = "\\s") %>%
    extract2 (1)
words <- unique (c (words, day13))
txts <- lapply (txts, function (i) match (i, words))

Then convert txts to rolling sequences of symbols defined by the length of the shortest carol (which is 47 words)

rolltxt <- function (txt, len = minlen)
{
    res <- list ()
    while (length (txt) > len)
    {
        res [[length (res) + 1]] <- txt [1:len]
        txt <- txt [2:length (txt)]
    }
    do.call (cbind, res)
}
dat <- lapply (txts, rolltxt) %>% do.call (cbind, .)
dat <- dat / length (words)

And separate into training and test components

ntest <- 3 # test 3 word sequences
train <- t (dat [1:(nrow (dat) - ntest), ])
test <- t (dat [(nrow (dat) - ntest + 1):nrow (dat), ])

(yeah yeah, the test data are also in the train data, I know …)

keras time

Define a kerasR model (using the TensorFlow backend here) as the simplest of all possible sequential Long-Short Term Memory (LSTM) layers.

library (kerasR)
train <- t (dat [1:(nrow (dat) - ntest), ])
test <- t (dat [(nrow (dat) - ntest + 1):nrow (dat), ])
n <- ncol (train) # 44
n_neurons <- 500
batch_size <- 32
mod <- Sequential()
mod$add (LSTM (n_neurons, input_shape= c (1, n)))
mod$add (Dense (ncol (test)))
keras_compile (mod, loss='mean_squared_error', optimizer='adam')

We need to convert our 2-D training data from the carol texts into a 3-D tensor:

train3 <- array (NA, dim = c (nrow (train), 1, ncol (train))) # make tensor
train3 [, 1, ] <- train

Then train the model on the carols:

keras_fit (mod, train3, test, batch_size = 32, epochs=10, verbose = 1,
           validation_split = 0.1)

Prediction

First set up our prediction text which is the (final portion of the) text of the twelve days of Christmas, plus:

paste (day13, collapse = " ")

## [1] "on the thirteenth day of christmas my true love gave to me"

len <- minlen - ntest
txt <- process_one (f = "test.txt") # full text of the 12 days
day13 <- "on the thirteenth day of christmas my true love gave to me" %>%
    strsplit (split = "\\s") %>%
    extract2 (1)
txt <- c (txt, day13) %>%
  tail (len) %>%
  match (words) %>%
  array (dim = c (1, 1, length (.)))

Use that to predict, first setting up the prediction text

library (kerasR)
mod <- keras_load ("the_model.h5")
x <- keras_predict (mod, txt)

Then just match that prediction back onto the words

x <- words [round (keras_predict (mod, txt) * length (words))]

And so what did I get?

With a bit of visual embellishment in Christmas card form …

library (extrafont)
fonttable <- fonts ()
plot.new ()
par (mar = rep (0, 4), bg = "gray5", family = "Sawasdee")
plot (NULL, NULL, xlim = c (0, 1), ylim = c (0, 1),
      xlab = "", ylab = "", xaxt = "n", yaxt = "n", frame = FALSE)
text (x = 0.5, y = 0.97, adj = 0.5, cex = 2, col = "gray95",
      labels = "On the thirteenth day of Christmas")
text (x = 0.5, y = 0.9, adj = 0.5, cex = 2, col = "gray95",
      labels = "my true love gave to me ...")
text (x = 0.5, y = 0.45 + (-1:1) * 0.25, adj = 0.5, labels = x,
      cex = 10, col = "gray95")

There you go. If you don’t understand what ‘nigh by bed’ means, ask the machine.