Inspired by Dan Nguyen’s experiments, I decided to try out IBM Watson’s text-to-speech servce from R.

To do this, you need to sign up for a Bluemix account and turn on the text-to-speech service. Then, when you go to your dashboard, you’ll recieve a username and password for the service. Store these in your ~.Renviron file.

I recorded some test text using Apple’s Quicktime program, but this saved the audio as .m4a. IBM doesn’t take this file type, but it was easy to convert it to OGG via

 ffmpeg -i testvoice.m4a testvoice.ogg
 

(On OSX, you can install this via brew install ffmpeg).

Now, to R! First, load some packages:

library(httr)
library(purrr)
library(magrittr)

Now we can submit the audio file via one POST request:

a = POST('https://stream.watsonplatform.net/speech-to-text/api/v1/recognize',
     authenticate(Sys.getenv("BLUEMIX_TTS_USER"), Sys.getenv("BLUEMIX_TTS_PWD")),  #Set these values in .Renviron
     encode = "multipart",  #neccessary for files >4MB 
     content_type("audio/ogg"), #Whatever your audio format is
     query = list(continuous="true"), #Transcribe the whole file, not just until the first pause
     body=list(file=upload_file("testvoice.ogg")))  #The file to update.

Now extract just the text, discarding the uncertainty value, etc.

content(a) %>%
  use_series(results) %>% 
  map_chr(~ .x$alternatives[[1]]$transcript[[1]]) %>% 
  paste(collapse = "\n") %>% 
  cat
## this is a test of some voice recording 
## you know just 
## recording some sound now so that's it 
## okay

I think I said “okey-dokey”, not “okay”, but hey, pretty close!