Speech recognition (ASR / Speech-to-Text)
In short
Speech recognition (ASR, speech-to-text) automatically converts spoken language into written text. It is the first step that lets an AI phone assistant understand what a caller says.
From sound to text
An ASR system breaks down the audio signal, identifies sounds and words and assembles them into text. Modern models use neural networks and draw on context to tell similar-sounding words apart correctly.
Why quality matters
If the ASR mishears the caller, even the best assistant replies off the mark. Good speech recognition copes with background noise, accents and phone-line quality - which matters especially over the phone.
Frequently asked questions
No, it is the counterpart. ASR turns speech into text (speech-to-text); speech synthesis turns text into speech (text-to-speech).
Modern systems are robust against noise, but very loud noise or several people talking at once can reduce accuracy.
Related terms
Go deeper with these related topics around AI telephony.

