Skip to main content
Glossary

Latency

Delay between end of caller speech and assistant response. Under 700 ms feels natural, over 1500 ms feels broken. Composed of STT, LLM and TTS time.

Latency in a telephony context is the time between the end of the caller’s utterance and the first syllable of the assistant’s response. It is additive: STT processing + LLM inference + TTS synthesis + the phone system’s audio pipeline.

Field thresholds: under 700 ms feels natural; 700–1500 ms is perceptible; above 1500 ms produces "hello, are you still there?" awkwardness. Streaming STT and streaming TTS are mandatory — batch processing fundamentally cannot meet these targets.

Optimisation starts with measurement in production: where are the milliseconds? Model size, inference region, codec on the phone leg, and caching of frequent responses are the highest-leverage levers.

See it applied

Next step

See BHOMY in a 15-minute demo on a real call example.

🍪

Cookies & Privacy

We use cookies to provide you with the best possible experience on our website. Some of them are technically necessary, others help us improve the website.