SPEECH TO TEXT - R Programming Language Interface for IBM Watson Services
IBM’s Watson Speech-to-text Service converts audio to text by applying powerful neural network models. In this project we will be using the R code snippet to convert a short audio file where I said “hi! hi! what is my name!”
Before you begin you will need
An IBM Bluemix demo account
A dialog App and
Credentials to that Service and confirm you’re able to CURL service with
some sample WAV files or method to utter audio to watson STT service
Rstudio
Back to our R code. We would start off by downloading the necessary libraries
library(RCurl) # install.packages("RCurl")
## Loading required package: bitops
library(httr)
Then we will enter our STT service credentials from bluemix
# Speech-To-Text-Orange credentials": {
url <- "https://stream.watsonplatform.net/speech-to-text/api"
username <-"44300586-b32f-4f3e-ad64-bc226aff5e43" # you need your own - STT service credentials from bluemix
password <- "wcykvxsj4PbI" # you need your own - STT service credentials from bluemix
username_password = paste(username,":",password,sep="")
TEST CURL AND CREDS# API Endpoint
# TEST CURL AND CREDS# API Endpoint https://stream.watsonplatform.net/speech-to-text/api
# curl -u $USERNAME:$PASSWORD "https://stream.watsonplatform.net/speech-to-text/api/v1/models" # WORKS
Test connectivity and return models available
### FUNCTION to test connectivity and return models available
watson.speech_to_text.getmodels <- function()
{return(GET(url=paste(url,"/v1/models",sep=""),
authenticate(username,password)))}
## function done.
watson.speech_to_text.getmodels() # returns list of 10+ models ##### works
## Response [https://stream.watsonplatform.net/speech-to-text/api/v1/models]
## Date: 2018-09-06 20:41
## Status: 200
## Content-Type: application/json;charset=utf-8
## Size: 6.44 kB
## {
## "models": [
## {
## "name": "pt-BR_NarrowbandModel",
## "language": "pt-BR",
## "url": "https://stream.watsonplatform.net/speech-to-text/api/v1...
## "rate": 8000,
## "supported_features": {
## "custom_language_model": false,
## "speaker_labels": false
## ...
Analyze AUDIO WAV file with IBM Watson Speech to Text service
# command line call works: curl -u $USERNAME:$PASSWORD -H "content-type: audio/wav" --data-binary @"testvoice.wav" "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
###### FUNCTION - ANalyze AUDIO WAV file with IBM Watson Speech to Text service
watson.speech_to_text.recognize <- function(audio_file)
{ return(POST(url=paste(url,"/v1/recognize",sep=""),
authenticate(username,password),
add_headers("Content-Type"="audio/wav"),
body = (file = upload_file(audio_file))
))} #works # hope this helps you with syntax!
TIDY UP the STT response - just export the TRANSCRIPT ONLY
#### FUNCTION TO TIDY UP the STT response - just export the TRANSCRIPT ONLY
stt_transcript_only <- function(raw)
{
data <- as.data.frame(strsplit(as.character(raw),"\\n"))
data <- data[c(7), ] # for now, grab just what we want
data <- paste(data) # kill levels, - fyi this nukes confidence % info (may want later)
data <- gsub(" ","",data) # remove excessive whitespace 0 cannot use ALL [[punct]] here
data <- gsub("\\\\","",data) # remove punct we dont like
data <- gsub("\"","",data) # remove punct we dont like
data <- gsub("transcript","",data) # remove excessive whitespace
data <- gsub(":","",data) # remove excessive whitespace - later: Improve this tidy step.
return(data)
}
Make sure your audio/WAV file is in the working directory
## TESTS OK
response <- watson.speech_to_text.recognize("testvoice3.wav") #make sure your WAV is in the working directory getwd() to check
response # takes about 5 seconds
## Response [https://stream.watsonplatform.net/speech-to-text/api/v1/recognize]
## Date: 2018-09-06 20:41
## Status: 200
## Content-Type: application/json
## Size: 249 B
## {
## "results": [
## {
## "alternatives": [
## {
## "confidence": 0.787,
## "transcript": "hi hi hi what is my name "
## }
## ],
## "final": true
## ...
#content(response,"text") # raw results
transcript <- stt_transcript_only(content(response,"text"))
## No encoding supplied: defaulting to UTF-8.
transcript # extracted the core translation (deleted a bunch more)
## [1] " hi hi hi what is my name "
With a confidence level of 79% the system accurately intepreted what I said!