Glossary

Latency

Delay between end of caller speech and assistant response. Under 700 ms feels natural, over 1500 ms feels broken. Composed of STT, LLM and TTS time.

Latency in a telephony context is the time between the end of the caller’s utterance and the first syllable of the assistant’s response. It is additive: STT processing + LLM inference + TTS synthesis + the phone system’s audio pipeline.

Field thresholds: under 700 ms feels natural; 700–1500 ms is perceptible; above 1500 ms produces "hello, are you still there?" awkwardness. Streaming STT and streaming TTS are mandatory — batch processing fundamentally cannot meet these targets.

Optimisation starts with measurement in production: where are the milliseconds? Model size, inference region, codec on the phone leg, and caching of frequent responses are the highest-leverage levers.

See it applied

Next step

See BHOMY in a 15-minute demo on a real call example.

🍪

Cookies & Privacy

We use cookies to provide you with the best possible experience on our website. Some of them are technically necessary, others help us improve the website.