With Shopify’s latest update, we can see that voice search is gaining more and more popularity and that trend is not going to end any time soon. That being said, proportionally to this trend curve, demand for speech to text conversion APIs also increases. If you have an app and you have been looking for new features, you came to the right place!
How do Speech to Text algorithms work?
As this is a complex topic, we won’t go too deep into it, but we’ll cover the basics that will allow you to understand how it works. The main core of these algorithms are AI and neural networks that allow fast learning and word recognition.
- We will assume that this ASR (Automatic Speech Recognition) system runs on the cloud so when we say the word, that word (input) is being transformed into digital representation (bits) which are sent through the API to the extraction component.
- Extraction component is a part of the speech recognition system that excludes noise and extracts important parts of the sequence which are sent to the main part of the system - decoder.
- Decoder - the heart of this system is a piece of art and is made of neural networks that are trained with millions of voices, words, sentences, different languages, accents etc. For every piece of speech data, there exists a corresponding text in this network, so when our speech input enters a decoder, it’s being compared with similar speech inputs and that speech data with the highest probability for matching will send out its text reference as an output.
Which speech to text APIs are available for use?
Our goal is to use ready-made solutions and here is the list of just a few:
There are many more including the free ones but for our purposes we decided to use Google Speech API and there are important reasons behind this decision:
- Current open source ‘speech recognition’ APIs lag behind the large companies and because of that we have situations where open source libraries aren’t supported across the major browsers. I’d like to note here that every open source API that I came across isn’t supported on Safari browser and that’s the large number of people we don’t want to lose in our shops. You can’t develop or upgrade a Shopify app with a feature that’s not supported on iOS.
- Google’s API is amazingly accurate and reduces word errors
- Can be configured so it fits to your application - you can set sampling frequency and make it suitable for short/long searches
How to integrate?
- Register for Google cloud services so you’re able to use their API
- Create Cloud Platform project
- Enable Speech to Text API
- Create authentication credentials so you’re able to access the API
- Add the code
const recorder = require('node-record-lpcm16');
// Imports the Google Cloud client library
const speech = require('@google-cloud/speech');
// Creates a client
const client = new speech.SpeechClient();
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false, // If you want interim results, set this to true
};
// Create a recognize stream
const recognizeStream = client
.streamingRecognize(request)
.on('error', console.error)
.on('data', data =>
process.stdout.write(
data.results[0] && data.results[0].alternatives[0]
? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
: `\n\nReached transcription time limit, press Ctrl+C\n`
)
);
// Start recording and send the microphone input to the Speech API.
// Ensure SoX is installed, see https://www.npmjs.com/package/node-record-lpcm16#dependencies
recorder
.record({
sampleRateHertz: sampleRateHertz,
threshold: 0,
verbose: false,
recordProgram: 'sox', // Try also "arecord" or "sox"
silence: '10.0',
}).stream()
.on('error', console.error)
.pipe(recognizeStream);
console.log('Listening, press Ctrl+C to stop.');
Test it all!
That’s it!
In our case, we used Node for short implementation of voice to text recognition as many existing Shopify apps are written in Node and you’re able to use any programming language that fits your needs.
As the end product, this feature allows you to use the output of previous code in combination with Shopify API endpoints which allows you to make apps and shops interactive and improved in a way you never thought would be possible. You just wait, 5 years from now, I’ll be writing a blog post about how telepathically input speech, but until then, you’ll just have to speak your search out loud!