Glossary

WER (Word Error Rate)

Standard STT accuracy metric. Share of incorrectly recognized words (insertions + deletions + substitutions) / total words. German general speech ≈ 5 %, domain German (medical/legal) ≈ 8–15 %.

Word Error Rate (WER) is the standard metric for speech-to-text quality. It measures the share of words in a reference transcription that are wrong due to insertions, deletions or substitutions. A WER of 5 % means roughly every twentieth word is incorrect.

WER only becomes meaningful when measured on real calls from your own use case — not on the clean studio material a vendor publishes. Accents, telephone bandwidth (8 kHz), background noise and domain vocabulary (medication names, product SKUs) routinely push real-world WER 2–3× higher than vendor benchmarks.

Operationally more important than global WER is often entity-level WER: is the patient name correct? Is the phone number right? A pipeline with slot validation ("I understood ‘Meier’, is that correct?") can ship reliable outcomes even with higher raw WER.

FAQ
What WER is good in a telephony context?
For standard German over the telephone channel, 8–12 % is realistic for modern models. Below 5 % is rare in telephony and usually only reachable on near-training-domain content.
Go deeper in the docs

Next step

See BHOMY in a 15-minute demo on a real call example.

🍪

Cookies & Privacy

We use cookies to provide you with the best possible experience on our website. Some of them are technically necessary, others help us improve the website.