Real-time Voice Interaction between AI for Service and Agent Platform¶
When integrating a voice channel with the Agent Platform through the Automation Node in AI for Service, two modes of voice interaction are available:
- Real-Time Voice
- ASR/TTS
Both modes determine how user audio is processed, how agent responses are generated, and how they're presented to the user in the form of audio or speech.
Real-Time Voice Interactions¶
Real-Time Voice enables natural, conversational interaction using multi-modal language models that support real-time voice input and output.
How It Works¶
- When Real-Time Voice is enabled in AI for Service, it must also be enabled in the Agent Platform for the selected Agentic App.
- The Agent Platform will use the configured real-time voice model to process user audio and generate audio responses.
- If Real-Time Voice isn't enabled on the Agent Platform side, the system can't process audio requests correctly and will return errors.
- When Real-Time Voice is disabled in the Automation Node on the AI for Service side, ASR/TTS will be used by default for audio responses.
TTS Streaming¶
TTS Streaming provides a hybrid approach in which the agent’s text output is streamed back progressively, and AI for Service converts it to speech using ASR/TTS.
How It Works¶
- If real-time voice is disabled, the system defaults to ASR-TTS, that converts text to speech.
- Agent Platform uses the text based LLM model to generate the answer and sends it back to the AI for Service Platform, which uses ASR/TTS to convert it to audio and present it to user.
- If TTS streaming is disabled, the system delivers the audio response to the user only after the complete output is generated.
- When TTS Streaming is turned on, the text output is streamed back to AI for Service as soon as it's generated. This reduces the latency involved in answering user queries.
- AI for Service converts streamed text to speech in real time using TTS engines (Deepgram, ElevenLabs).
Effect of Wait-time Experience Configured in Agent Platform¶
The agent platform allows admins to configure the wait-time experience to provide a better experience for users when processing responses is delayed. Note that it's not applicable to real time voice. Learn More.
- The wait-time experience configured in Agent Platform is triggered.
- After the threshold is reached, the Agent Platform sends a filler or system-generated message, as configured.
- The message is presented to the user via TTS as it's received by AI for Service. (if streaming is enabled).
Configurations¶
For real-time voice interactions:
- Enable Real Time Voice in the AI for Service Automation node.

- Ensure real-time voice is enabled in the Agentic App, and that an AI model supporting real-time voice is configured for it.

Note: Wait Time Experience in Agent Platform isn't available for real-time voice interactions.
For TTS:
