What WER is good in a telephony context?

For standard German over the telephone channel, 8–12 % is realistic for modern models. Below 5 % is rare in telephony and usually only reachable on near-training-domain content.

WER (Word Error Rate) — Glossary

Word Error Rate (WER) is the standard metric for speech-to-text quality. It measures the share of words in a reference transcription that are wrong due to insertions, deletions or substitutions. A WER of 5 % means roughly every twentieth word is incorrect.

WER only becomes meaningful when measured on real calls from your own use case — not on the clean studio material a vendor publishes. Accents, telephone bandwidth (8 kHz), background noise and domain vocabulary (medication names, product SKUs) routinely push real-world WER 2–3× higher than vendor benchmarks.

Operationally more important than global WER is often entity-level WER: is the patient name correct? Is the phone number right? A pipeline with slot validation ("I understood ‘Meier’, is that correct?") can ship reliable outcomes even with higher raw WER.

WER (Word Error Rate)

Next step

Cookies & Privacy