Leveraging IBM’s Watson for Speech regonition

Pre-requisties

Before you begin you will need

An IBM Bluemix demo account
A dialog App and
Credentials to that Service and confirm you’re able to CURL service with
some sample WAV files or method to utter audio to watson STT service
Rstudio

Back to our R code. We would start off by downloading the necessary libraries

library(RCurl) # install.packages("RCurl")

## Loading required package: bitops

library(httr)

Then we will enter our STT service credentials from bluemix

# Speech-To-Text-Orange credentials": {
url <- "https://stream.watsonplatform.net/speech-to-text/api"
username <-"44300586-b32f-4f3e-ad64-bc226aff5e43" # you need your own - STT service credentials from bluemix
password <- "wcykvxsj4PbI"  # you need your own - STT service credentials from bluemix
username_password = paste(username,":",password,sep="")

TEST CURL AND CREDS# API Endpoint

# TEST CURL AND CREDS# API Endpoint  https://stream.watsonplatform.net/speech-to-text/api
# curl -u $USERNAME:$PASSWORD "https://stream.watsonplatform.net/speech-to-text/api/v1/models" # WORKS

Test connectivity and return models available

### FUNCTION to test connectivity and return models available
watson.speech_to_text.getmodels <- function()
{return(GET(url=paste(url,"/v1/models",sep=""),
           authenticate(username,password)))}
## function done.  
watson.speech_to_text.getmodels() # returns list of 10+ models ##### works

## Response [https://stream.watsonplatform.net/speech-to-text/api/v1/models]
##   Date: 2018-09-06 20:41
##   Status: 200
##   Content-Type: application/json;charset=utf-8
##   Size: 6.44 kB
## {
##    "models": [
##       {
##          "name": "pt-BR_NarrowbandModel", 
##          "language": "pt-BR", 
##          "url": "https://stream.watsonplatform.net/speech-to-text/api/v1...
##          "rate": 8000, 
##          "supported_features": {
##             "custom_language_model": false, 
##             "speaker_labels": false
## ...

Analyze AUDIO WAV file with IBM Watson Speech to Text service

# command line call works: curl -u $USERNAME:$PASSWORD -H "content-type: audio/wav" --data-binary @"testvoice.wav" "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize"

###### FUNCTION - ANalyze AUDIO WAV file with IBM Watson Speech to Text service
watson.speech_to_text.recognize <- function(audio_file)
{ return(POST(url=paste(url,"/v1/recognize",sep=""),
              authenticate(username,password),
              add_headers("Content-Type"="audio/wav"),
              body = (file = upload_file(audio_file))  
              ))} #works # hope this helps you with syntax!

TIDY UP the STT response - just export the TRANSCRIPT ONLY

#### FUNCTION TO TIDY UP the STT response - just export the TRANSCRIPT ONLY
stt_transcript_only <- function(raw) 
{
  data <- as.data.frame(strsplit(as.character(raw),"\\n"))
  data <- data[c(7), ] # for now, grab just what we want
  data <- paste(data) # kill levels, - fyi this nukes confidence % info (may want later)
  data <- gsub("  ","",data) # remove excessive whitespace  0 cannot use ALL [[punct]] here
  data <- gsub("\\\\","",data) # remove punct we dont like
  data <- gsub("\"","",data) # remove punct we dont like
  data <- gsub("transcript","",data) # remove excessive whitespace
  data <- gsub(":","",data) # remove excessive whitespace - later: Improve this tidy step. 
  return(data) 
}

Make sure your audio/WAV file is in the working directory

## TESTS OK
response <- watson.speech_to_text.recognize("testvoice3.wav") #make sure your WAV is in the working directory getwd() to check
response # takes about 5 seconds

## Response [https://stream.watsonplatform.net/speech-to-text/api/v1/recognize]
##   Date: 2018-09-06 20:41
##   Status: 200
##   Content-Type: application/json
##   Size: 249 B
##  {
##    "results": [
##       {
##          "alternatives": [
##             {
##                "confidence": 0.787, 
##                "transcript": "hi hi hi what is my name "
##             }
##          ], 
##          "final": true
## ...

#content(response,"text") # raw results
transcript <- stt_transcript_only(content(response,"text"))

## No encoding supplied: defaulting to UTF-8.

transcript # extracted the core translation (deleted a bunch more)

## [1] "  hi hi hi what is my name "

With a confidence level of 79% the system accurately intepreted what I said!

Leveraging IBM’s Watson for Speech regonition

Focus

Overview

Pre-requisties