Inspired by Dan Nguyen’s experiments, I decided to try out IBM Watson’s text-to-speech servce from R.
To do this, you need to sign up for a Bluemix account and turn on the text-to-speech service. Then, when you go to your dashboard, you’ll recieve a username and password for the service. Store these in your ~.Renviron file.
I recorded some test text using Apple’s Quicktime program, but this saved the audio as .m4a. IBM doesn’t take this file type, but it was easy to convert it to OGG via
ffmpeg -i testvoice.m4a testvoice.ogg
(On OSX, you can install this via brew install ffmpeg).
Now, to R! First, load some packages:
library(httr)
library(purrr)
library(magrittr)
Now we can submit the audio file via one POST request:
a = POST('https://stream.watsonplatform.net/speech-to-text/api/v1/recognize',
authenticate(Sys.getenv("BLUEMIX_TTS_USER"), Sys.getenv("BLUEMIX_TTS_PWD")), #Set these values in .Renviron
encode = "multipart", #neccessary for files >4MB
content_type("audio/ogg"), #Whatever your audio format is
query = list(continuous="true"), #Transcribe the whole file, not just until the first pause
body=list(file=upload_file("testvoice.ogg"))) #The file to update.
Now extract just the text, discarding the uncertainty value, etc.
content(a) %>%
use_series(results) %>%
map_chr(~ .x$alternatives[[1]]$transcript[[1]]) %>%
paste(collapse = "\n") %>%
cat
## this is a test of some voice recording
## you know just
## recording some sound now so that's it
## okay
I think I said “okey-dokey”, not “okay”, but hey, pretty close!