DPA (Data Processing Agreement)
Contract under Art. 28 GDPR between controller and processor. Defines purpose, scope and safeguards of data processing. Mandatory for any SaaS handling personal data.
Voice AI, telephony and GDPR terms — explained concisely for decision makers and engineering teams.
Contract under Art. 28 GDPR between controller and processor. Defines purpose, scope and safeguards of data processing. Mandatory for any SaaS handling personal data.
Ability of callers to interrupt the assistant mid-sentence. Considered a marker of natural conversation; implemented via parallel STT with voice-activity detection.
Number of phone calls handleable in parallel. Defines scaling under load — critical during campaigns or emergencies. Usually plan-dependent.
Connection between phone assistant and customer-relationship system (HubSpot, Salesforce, Pipedrive). Auto-creates contacts and activities after each call.
EU General Data Protection Regulation. Governs processing of personal data in the EU. Requires legal basis, purpose limitation, DPA on processing, and EU servers for sensitive data.
Storage and processing of data exclusively in EU data centers. Reduces post-Schrems-II third-country transfer risk and is a precondition for many GDPR-compliant deployments.
Inbound = assistant takes incoming calls (booking, support). Outbound = assistant places calls (confirmations, surveys). Compliance requirements differ between modes.
Legacy voice menu system with keypad input ("press 1 for ..."). AI phone assistants replace IVR with free-form speech and intent recognition.
Software that handles inbound and outbound calls autonomously — using speech-to-text, a language model for response generation and text-to-speech. Hands off to humans when needed.
Delay between end of caller speech and assistant response. Under 700 ms feels natural, over 1500 ms feels broken. Composed of STT, LLM and TTS time.
Large language model (GPT-4, Claude, Llama) used to generate responses. Combined with RAG in phone contexts to access company-specific knowledge.
Component that maps caller utterances to structured intents and entities ("appointment Tuesday 10am" → intent=book, slot=tue-10). Today usually handled by LLMs.
Architecture where the LLM fetches relevant documents from a knowledge base before answering. Enables up-to-date, company-specific responses without fine-tuning.
Internet-based phone line forwarding numbers to the AI assistant. Standard VoIP protocol. Frequently ported from existing carriers (Deutsche Telekom, Sipgate, Vodafone).
Contractually guaranteed service quality: availability (e.g. 99.9 %), response time, recovery time. Mandatory for business-critical deployments, often with penalties on breach.
Converts spoken language to text. Also called ASR (Automatic Speech Recognition). Quality drives understanding rate; specialized models per language are essential.
Converts text into spoken audio. Modern neural TTS sounds nearly human. Differs in latency, language coverage and voice-cloning capability.
Umbrella term for AI systems that understand and produce speech. Encompasses STT, NLU/LLM and TTS. AI phone assistants are a concrete application of Voice AI.
Synthesis of a voice from a sample (typically 30 s–10 min). Enables consistent brand voice. Requires GDPR and consent-law review before deployment.
HTTP callback that notifies a third-party system on call events (call ended, appointment booked). Most common integration technique alongside direct APIs.
List of phone numbers that may not be called outbound. In Germany effectively enforced via consent rules under UWG; in Switzerland the asterisk entry in the phone book (Art. 3 lit. u UWG/CH).
Risk assessment under Art. 35 GDPR. Mandatory for high-risk processing (e.g. systematic call recording). Structured risk and mitigation analysis before go-live.
ITU-T standard for international phone-number format with "+", country code and max 15 digits (e.g. +49 89 1234567). Mandatory format for SIP routing and webhook payloads.
EU Regulation 2024/1689 governing AI systems. Tiered model from "minimal" to "unacceptable" risk. AI phone assistants typically classify as "limited risk" with a transparency duty (disclose AI to caller).
LLM capability to emit structured function calls instead of free text (e.g. bookAppointment(tuesday, 10:00)). Foundation of reliable CRM/EHR integration in phone contexts.
Controlled transfer from AI assistant to a human — typically on triage signals, complaints, or explicit caller request. Quality marker of any production integration.
Classification of caller intent into predefined categories (booking, prescription, complaint, info). Classically a classifier; today usually LLM few-shot.
Audio-quality rating on a 1–5 scale, originally from human listeners, today commonly via POLQA/PESQ algorithms. MOS ≥ 4.0 is considered telephony-grade.
Carrying an existing phone number when switching providers. Legally guaranteed in Germany (§ 59 TKG). Typical lead time 5–15 working days depending on the losing carrier.
Explicit, prior consent to calls or data processing. Required for outbound marketing calls (§ 7 (2) UWG in Germany). Must be documented, granular and revocable.
Attack technique where a caller tries to override the system prompt ("Ignore your instructions…"). Voice-specific hardening: allowlists, tool-use validation, refuse tool calls on anomaly.
The legacy landline network with central exchanges. AI assistants reach it via SIP trunks and gateways. PSTN reachability is what makes E.164 delivery globally meaningful.
Streaming interfaces (e.g. OpenAI Realtime, Google Live API) that process audio directly — without the STT→text→TTS intermediate step. Reduces latency below 500 ms.
XML markup for TTS: pronunciation, pauses, emphasis, phone-number breakdown. W3C standard. Essential for clean pronunciation of domain terms and foreign proper names.
German statute regulating telecommunications. Relevant for AI assistants: § 7 UWG on advertising, § 9a TKG on traffic data, § 59 TKG on number porting.
Management of speaker/listener role switches. Beyond raw barge-in: detection of pauses, backchannels ("mhm"), avoidance of double-talk. Key to conversational naturalness.
German statute governing, among other things, advertising calls. § 7 (2) UWG prohibits cold calls to private individuals without explicit consent. Fines up to € 300,000 per violation.
Detection of whether the audio currently contains speech or just silence/background noise. Prerequisite for barge-in, turn-taking and efficient STT (no processing during silence).
Detection of whether an outbound call was answered by a human or a voicemail box. Also Answering Machine Detection. Latency and accuracy (typ. 90–97 %) are the trade-off.
Standard STT accuracy metric. Share of incorrectly recognized words (insertions + deletions + substitutions) / total words. German general speech ≈ 5 %, domain German (medical/legal) ≈ 8–15 %.
Connect these terms to concrete solutions for your industry.
We use cookies to provide you with the best possible experience on our website. Some of them are technically necessary, others help us improve the website.