AI Phone Assistant A–Z

Glossary

Voice AI, telephony and GDPR terms — explained concisely for decision makers and engineering teams.

Appointment Pricing

Scroll

AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z — AI Phone Assistant A–Z —

Table of contents

DPA (Data Processing Agreement)

Contract under Art. 28 GDPR between controller and processor. Defines purpose, scope and safeguards of data processing. Mandatory for any SaaS handling personal data.

Related:GDPR EU Data Residency

Barge-In

Ability of callers to interrupt the assistant mid-sentence. Considered a marker of natural conversation; implemented via parallel STT with voice-activity detection.

Related:Voice AI Latency

Concurrent Calls

Number of phone calls handleable in parallel. Defines scaling under load — critical during campaigns or emergencies. Usually plan-dependent.

Related:Inbound vs. Outbound SLA (Service Level Agreement)

CRM Integration

Connection between phone assistant and customer-relationship system (HubSpot, Salesforce, Pipedrive). Auto-creates contacts and activities after each call.

Related:Webhook

GDPR

EU General Data Protection Regulation. Governs processing of personal data in the EU. Requires legal basis, purpose limitation, DPA on processing, and EU servers for sensitive data.

Related:DPA (Data Processing Agreement)EU Data Residency

EU Data Residency

Storage and processing of data exclusively in EU data centers. Reduces post-Schrems-II third-country transfer risk and is a precondition for many GDPR-compliant deployments.

Related:GDPR DPA (Data Processing Agreement)

Inbound vs. Outbound

Inbound = assistant takes incoming calls (booking, support). Outbound = assistant places calls (confirmations, surveys). Compliance requirements differ between modes.

Related:AI Phone Assistant Concurrent Calls

IVR (Interactive Voice Response)

Legacy voice menu system with keypad input ("press 1 for ..."). AI phone assistants replace IVR with free-form speech and intent recognition.

Related:AI Phone Assistant Voice AI

AI Phone Assistant

Software that handles inbound and outbound calls autonomously — using speech-to-text, a language model for response generation and text-to-speech. Hands off to humans when needed.

Related:Voice AI LLM (Large Language Model)TTS (Text-to-Speech)

Latency

Delay between end of caller speech and assistant response. Under 700 ms feels natural, over 1500 ms feels broken. Composed of STT, LLM and TTS time.

Related:Barge-In TTS (Text-to-Speech)STT / ASR (Speech-to-Text)

LLM (Large Language Model)

Large language model (GPT-4, Claude, Llama) used to generate responses. Combined with RAG in phone contexts to access company-specific knowledge.

Related:RAG (Retrieval-Augmented Generation)NLU (Natural Language Understanding)

NLU (Natural Language Understanding)

Component that maps caller utterances to structured intents and entities ("appointment Tuesday 10am" → intent=book, slot=tue-10). Today usually handled by LLMs.

Related:LLM (Large Language Model)STT / ASR (Speech-to-Text)

RAG (Retrieval-Augmented Generation)

Architecture where the LLM fetches relevant documents from a knowledge base before answering. Enables up-to-date, company-specific responses without fine-tuning.

Related:LLM (Large Language Model)CRM Integration

SIP Trunk

Internet-based phone line forwarding numbers to the AI assistant. Standard VoIP protocol. Frequently ported from existing carriers (Deutsche Telekom, Sipgate, Vodafone).

Related:Inbound vs. Outbound Concurrent Calls

SLA (Service Level Agreement)

Contractually guaranteed service quality: availability (e.g. 99.9 %), response time, recovery time. Mandatory for business-critical deployments, often with penalties on breach.

Related:Concurrent Calls EU Data Residency

STT / ASR (Speech-to-Text)

Converts spoken language to text. Also called ASR (Automatic Speech Recognition). Quality drives understanding rate; specialized models per language are essential.

Related:TTS (Text-to-Speech)NLU (Natural Language Understanding)Latency

TTS (Text-to-Speech)

Converts text into spoken audio. Modern neural TTS sounds nearly human. Differs in latency, language coverage and voice-cloning capability.

Related:STT / ASR (Speech-to-Text)Voice Cloning Latency

Voice AI

Umbrella term for AI systems that understand and produce speech. Encompasses STT, NLU/LLM and TTS. AI phone assistants are a concrete application of Voice AI.

Related:AI Phone Assistant LLM (Large Language Model)

Voice Cloning

Synthesis of a voice from a sample (typically 30 s–10 min). Enables consistent brand voice. Requires GDPR and consent-law review before deployment.

Related:TTS (Text-to-Speech)Voice AI

Webhook

HTTP callback that notifies a third-party system on call events (call ended, appointment booked). Most common integration technique alongside direct APIs.

Related:CRM Integration

DNC List (Do-Not-Call)

List of phone numbers that may not be called outbound. In Germany effectively enforced via consent rules under UWG; in Switzerland the asterisk entry in the phone book (Art. 3 lit. u UWG/CH).

Related:Opt-In UWG (German Unfair Competition Act)

DPIA (Data Protection Impact Assessment)

Risk assessment under Art. 35 GDPR. Mandatory for high-risk processing (e.g. systematic call recording). Structured risk and mitigation analysis before go-live.

Related:GDPR DPA (Data Processing Agreement)

E.164

ITU-T standard for international phone-number format with "+", country code and max 15 digits (e.g. +49 89 1234567). Mandatory format for SIP routing and webhook payloads.

Related:PSTN (Public Switched Telephone Network)Number Porting

EU AI Act

EU Regulation 2024/1689 governing AI systems. Tiered model from "minimal" to "unacceptable" risk. AI phone assistants typically classify as "limited risk" with a transparency duty (disclose AI to caller).

Related:GDPR DPIA (Data Protection Impact Assessment)

Function Calling / Tool Use

LLM capability to emit structured function calls instead of free text (e.g. bookAppointment(tuesday, 10:00)). Foundation of reliable CRM/EHR integration in phone contexts.

Related:LLM (Large Language Model)CRM Integration

Handoff / Escalation

Controlled transfer from AI assistant to a human — typically on triage signals, complaints, or explicit caller request. Quality marker of any production integration.

Related:IVR (Interactive Voice Response)Intent Recognition

Intent Recognition

Classification of caller intent into predefined categories (booking, prescription, complaint, info). Classically a classifier; today usually LLM few-shot.

Related:NLU (Natural Language Understanding)LLM (Large Language Model)

MOS (Mean Opinion Score)

Audio-quality rating on a 1–5 scale, originally from human listeners, today commonly via POLQA/PESQ algorithms. MOS ≥ 4.0 is considered telephony-grade.

Related:Latency TTS (Text-to-Speech)

Number Porting

Carrying an existing phone number when switching providers. Legally guaranteed in Germany (§ 59 TKG). Typical lead time 5–15 working days depending on the losing carrier.

Related:SIP Trunk E.164

Opt-In

Explicit, prior consent to calls or data processing. Required for outbound marketing calls (§ 7 (2) UWG in Germany). Must be documented, granular and revocable.

Related:GDPR DNC List (Do-Not-Call)

Prompt Injection (Voice)

Attack technique where a caller tries to override the system prompt ("Ignore your instructions…"). Voice-specific hardening: allowlists, tool-use validation, refuse tool calls on anomaly.

Related:LLM (Large Language Model)Function Calling / Tool Use

PSTN (Public Switched Telephone Network)

The legacy landline network with central exchanges. AI assistants reach it via SIP trunks and gateways. PSTN reachability is what makes E.164 delivery globally meaningful.

Related:SIP Trunk E.164

Realtime API

Streaming interfaces (e.g. OpenAI Realtime, Google Live API) that process audio directly — without the STT→text→TTS intermediate step. Reduces latency below 500 ms.

Related:LLM (Large Language Model)Latency

SSML (Speech Synthesis Markup Language)

XML markup for TTS: pronunciation, pauses, emphasis, phone-number breakdown. W3C standard. Essential for clean pronunciation of domain terms and foreign proper names.

Related:TTS (Text-to-Speech)

TKG (German Telecommunications Act)

German statute regulating telecommunications. Relevant for AI assistants: § 7 UWG on advertising, § 9a TKG on traffic data, § 59 TKG on number porting.

Related:GDPR Opt-In

Turn-Taking

Management of speaker/listener role switches. Beyond raw barge-in: detection of pauses, backchannels ("mhm"), avoidance of double-talk. Key to conversational naturalness.

Related:Barge-In VAD (Voice Activity Detection)

UWG (German Unfair Competition Act)

German statute governing, among other things, advertising calls. § 7 (2) UWG prohibits cold calls to private individuals without explicit consent. Fines up to € 300,000 per violation.

Related:Opt-In DNC List (Do-Not-Call)

VAD (Voice Activity Detection)

Detection of whether the audio currently contains speech or just silence/background noise. Prerequisite for barge-in, turn-taking and efficient STT (no processing during silence).

Related:STT / ASR (Speech-to-Text)Barge-In

Voicemail Detection (AMD)

Detection of whether an outbound call was answered by a human or a voicemail box. Also Answering Machine Detection. Latency and accuracy (typ. 90–97 %) are the trade-off.

Related:Inbound vs. Outbound TTS (Text-to-Speech)

WER (Word Error Rate)

Standard STT accuracy metric. Share of incorrectly recognized words (insertions + deletions + substitutions) / total words. German general speech ≈ 5 %, domain German (medical/legal) ≈ 8–15 %.

Related:STT / ASR (Speech-to-Text)NLU (Natural Language Understanding)

Ready for the next step?

Connect these terms to concrete solutions for your industry.

View solutions Which AI phone assistant is right for you?

As of 3 May 2026. Definitions are reviewed quarterly. · v2026-05-03