Voice & Chat Integration Guide

Engram provides two integrated communication channels for interacting with AI agents:

Channel	Endpoint	Model	Auth	Use Case
Chat	API Gateway	gpt-5.1-chat	API Key	Text-based conversations
Voice	Azure AI Services	gpt-realtime	DefaultAzureCredential	Real-time voice interactions

Security Model (NIST AI RMF Compliant)

Level	Environment	Chat Auth	VoiceLive Auth
1-2	POC/Staging	API Key (APIM)	Azure CLI / DefaultAzureCredential
3-5	Production/Enterprise	API Key (APIM)	Managed Identity (DefaultAzureCredential)

DefaultAzureCredential automatically selects the best credential for each environment:

Local Dev: Azure CLI (az login)
Azure Container Apps: Managed Identity
AKS: Workload Identity
VMs: System-assigned Managed Identity

Chat Integration (API Gateway)

Configuration

The chat system uses an OpenAI-compatible API gateway for text-based agent interactions.

# Environment Variables
AZURE_AI_ENDPOINT=https://zimax-gw.azure-api.net/zimax/openai/v1
AZURE_AI_KEY=<your-api-key>
AZURE_AI_DEPLOYMENT=gpt-5.1-chat

API Endpoints

REST Chat (Single Turn)

POST /api/v1/chat
Content-Type: application/json

{
  "content": "What are the key requirements for this project?",
  "agent_id": "elena",
  "session_id": "optional-session-id"
}

Response:

{
  "message_id": "uuid",
  "content": "Based on my analysis...",
  "agent_id": "elena",
  "agent_name": "Dr. Elena Vasquez",
  "timestamp": "2025-12-15T10:30:00Z",
  "session_id": "session-123"
}

WebSocket Chat (Streaming)

// Connect to streaming chat
const ws = new WebSocket('ws://localhost:8082/api/v1/chat/ws/session-123');

// Send message
ws.send(JSON.stringify({
  type: 'message',
  content: 'Analyze project risks',
  agent_id: 'marcus'
}));

// Receive streaming response
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  switch (data.type) {
    case 'typing':
      // Agent is typing
      break;
    case 'chunk':
      // Append text chunk: data.content
      break;
    case 'complete':
      // Response finished: data.message_id
      break;
  }
};

Agents

Agent	ID	Expertise
Elena	`elena`	Business Analysis, Requirements Engineering
Marcus	`marcus`	Project Management, Risk Assessment

VoiceLive Integration (Real-time Voice)

Configuration

VoiceLive provides real-time speech-to-speech conversations using Azure AI Services with NIST AI RMF compliant authentication.

# Environment Variables
AZURE_VOICELIVE_ENDPOINT=https://zimax.services.ai.azure.com
AZURE_VOICELIVE_MODEL=gpt-realtime
AZURE_VOICELIVE_VOICE=en-US-Ava:DragonHDLatestNeural
MARCUS_VOICELIVE_VOICE=en-US-GuyNeural

# Optional: API key for POC/staging (not needed if using Azure CLI or Managed Identity)
# AZURE_VOICELIVE_KEY=<optional-for-poc-only>

Authentication

VoiceLive uses DefaultAzureCredential which automatically selects the appropriate credential:

Environment	Credential Used
Local Development	Azure CLI (`az login`)
Azure Container Apps	Managed Identity
Azure Kubernetes Service	Workload Identity
Azure VMs	System Managed Identity
GitHub Actions	Federated Identity

Enterprise (Level 3-5): Managed Identity is required - no API keys in production.

API Endpoints

Voice Configuration

GET /api/v1/voice/config/{agent_id}

Response:

{
  "agent_id": "elena",
  "voice_name": "en-US-Ava:DragonHDLatestNeural",
  "model": "gpt-realtime",
  "endpoint_configured": true
}

Voice Status

GET /api/v1/voice/status

Response:

{
  "voicelive_configured": true,
  "endpoint": "https://zimax.services.ai.azure.com...",
  "model": "gpt-realtime",
  "active_sessions": 0,
  "agents": {
    "elena": {"voice": "en-US-Ava:DragonHDLatestNeural"},
    "marcus": {"voice": "en-US-GuyNeural"}
  }
}

VoiceLive WebSocket (Real-time Audio)

// Connect to VoiceLive
const ws = new WebSocket('ws://localhost:8082/api/v1/voice/voicelive/session-123');

ws.onopen = () => {
  // Set agent
  ws.send(JSON.stringify({ type: 'agent', agent_id: 'elena' }));
};

// Send audio (PCM16, 24kHz, mono)
function sendAudio(pcm16Base64) {
  ws.send(JSON.stringify({
    type: 'audio',
    data: pcm16Base64
  }));
}

// Switch agent mid-conversation
function switchAgent(agentId) {
  ws.send(JSON.stringify({
    type: 'agent',
    agent_id: agentId  // 'elena' or 'marcus'
  }));
}

// Cancel current response
function cancelResponse() {
  ws.send(JSON.stringify({ type: 'cancel' }));
}

// Handle responses
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  switch (data.type) {
    case 'transcription':
      // data.status: 'listening' | 'processing' | 'complete'
      // data.text: transcribed text (when complete)
      break;
    case 'audio':
      // data.data: base64 audio response
      // data.format: 'audio/pcm16'
      playAudio(data.data);
      break;
    case 'agent_switched':
      // data.agent_id: new active agent
      break;
    case 'error':
      // data.message: error description
      break;
  }
};

Audio Format Requirements

Property	Value
Format	PCM16 (signed 16-bit)
Sample Rate	24,000 Hz (Required for low-latency)
Channels	Mono (1 channel)
Encoding	Base64
Chunk Size	1200 samples (50ms)

Memory Enrichment

The Voice system integrates with memory in two directions:

1. Pre-Session Context (Read)

Upon connection, the system retrieves up to 20 relevant facts from the user’s Zep memory graph and injects them into the Agent’s system instructions. This gives the agent awareness of the user’s context (role, current projects, past conversations).

2. Post-Session Persistence (Write) — v2 Architecture

[!NOTE] VoiceLive v2 decouples audio streaming from memory enrichment. See VoiceLive Configuration SOP for details.

In the evolved architecture:

// Browser: On receiving final transcript
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'transcription' && data.status === 'complete') {
    // Async fire-and-forget to memory
    fetch('/api/v1/memory/enrich', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        text: data.text,
        session_id: sessionId,
        speaker: data.speaker,  // 'user' or 'assistant'
        agent_id: currentAgentId
      })
    });
  }
};

Key principle: Audio playback happens immediately. Memory enrichment flows behind it asynchronously. If memory persistence fails, the voice experience is unaffected.

Voice Personalities

Agent	Voice	Personality
Elena	en-US-Ava:DragonHDLatestNeural	Warm, measured, professional
Marcus	en-US-GuyNeural	Confident, energetic, direct

Frontend Integration

VoiceChat Component

The frontend includes a ready-to-use VoiceChat component:

import VoiceChat from './components/VoiceChat';

function App() {
  return (
    <VoiceChat
      agentId="elena"
      onMessage={(msg) => console.log('Voice message:', msg)}
      onVisemes={(visemes) => /* Lip-sync data */}
      onStatusChange={(status) => console.log('Status:', status)}
    />
  );
}

Push-to-Talk Usage

Hold the voice button to speak
Release to send audio
Agent responds with synthesized speech
Switch agents anytime during conversation

Local Development

Start Backend

cd backend

# Set environment variables (or use .env file)
export AZURE_AI_ENDPOINT=https://zimax-gw.azure-api.net/zimax/openai/v1
export AZURE_AI_KEY=<your-key>
export AZURE_AI_DEPLOYMENT=gpt-5.1-chat
export AZURE_VOICELIVE_ENDPOINT=https://zimax.services.ai.azure.com
export AZURE_VOICELIVE_KEY=<your-key>
export AZURE_VOICELIVE_MODEL=gpt-realtime

# Start server
uvicorn backend.api.main:app --host 0.0.0.0 --port 8082 --reload

Start Frontend

cd frontend
export VITE_API_URL=http://localhost:8082
export VITE_WS_URL=ws://localhost:8082
npm run dev

Test Chat

curl -X POST http://localhost:8082/api/v1/chat \
  -H "Content-Type: application/json" \
  -d '{"content": "Hello Elena!", "agent_id": "elena"}'

Test Voice Status

curl http://localhost:8082/api/v1/voice/status

Troubleshooting

Chat Issues

Issue	Solution
401 Unauthorized	Check `AZURE_AI_KEY` is set correctly
Model not found	Verify `AZURE_AI_DEPLOYMENT=gpt-5.1-chat`
Temperature error	The gateway may not support custom temperature

Voice Issues

Issue	Solution
“VoiceLive not configured”	Set `AZURE_VOICELIVE_ENDPOINT` and `AZURE_VOICELIVE_KEY`
Connection failed	Verify the endpoint supports real-time audio
No audio response	Check microphone permissions in browser

WebSocket Connection

// Debug WebSocket connection
ws.onerror = (error) => console.error('WebSocket error:', error);
ws.onclose = (event) => console.log('WebSocket closed:', event.code, event.reason);

Architecture - Platform design and context schema
Agents - Elena and Marcus agent details
Local Testing Guide - Development setup

Voice & Chat Integration Guide

Security Model (NIST AI RMF Compliant)

Chat Integration (API Gateway)

Configuration

API Endpoints

REST Chat (Single Turn)

WebSocket Chat (Streaming)

Agents

VoiceLive Integration (Real-time Voice)

Configuration

Authentication

API Endpoints

Voice Configuration

Voice Status

VoiceLive WebSocket (Real-time Audio)

Audio Format Requirements

Memory Enrichment

1. Pre-Session Context (Read)

2. Post-Session Persistence (Write) — v2 Architecture

Voice Personalities

Frontend Integration

VoiceChat Component

Push-to-Talk Usage

Local Development

Start Backend

Start Frontend

Test Chat

Test Voice Status

Troubleshooting

Chat Issues

Voice Issues

WebSocket Connection

Related Documentation