Voice & Chat Routing + Episode Metadata Fix

Document Version: 1.0
Last Updated: December 31, 2025
Status: Implementation Complete


Problem

User reported that episodes and stories are loading correctly while logged in with Google account, but:

  1. Backend routing for voice and chat needs to be fixed
  2. Episode metadata needs to include user and project information correctly

Root Cause

  1. Chat WebSocket: Used hardcoded dev_security instead of authenticating users from JWT token
  2. Voice persist_conversation_turn: Used hardcoded user IDs instead of authenticated user
  3. Episode Metadata: Not consistently including user identity metadata (email, display_name) in all session creation points

Solution

1. Fixed Chat WebSocket Authentication

File: backend/api/routers/chat.py

Changes:

  • Extract JWT token from query parameter (like voice WebSocket does)
  • Validate token and extract user identity
  • Create SecurityContext with authenticated user
  • Pass authenticated user to connection manager

Before:

dev_security = SecurityContext(user_id="ws-user", tenant_id="ws-tenant", roles=[Role.ANALYST], scopes=["*"])
await manager.connect(websocket, session_id, dev_security)

After:

# Extract token from query parameter
token_param = websocket.query_params.get("token")
# Validate token and create SecurityContext with authenticated user
security = SecurityContext(
    user_id=user_id,
    tenant_id=tenant_id,
    roles=roles,
    scopes=scopes,
    session_id=session_id,
    email=email,
    display_name=display_name,
)
await manager.connect(websocket, session_id, security)

2. Fixed Voice persist_conversation_turn Authentication

File: backend/api/routers/voice.py

Changes:

  • Added user: SecurityContext = Depends(get_current_user) parameter
  • Use authenticated user instead of hardcoded user IDs
  • Include full user metadata in session creation

Before:

async def persist_conversation_turn(turn: ConversationTurn):
    # Hardcoded user IDs
    security = SecurityContext(user_id="voice-user", ...)

After:

async def persist_conversation_turn(turn: ConversationTurn, user: SecurityContext = Depends(get_current_user)):
    # Use authenticated user
    security = SecurityContext(
        user_id=user.user_id,
        tenant_id=user.tenant_id,
        email=user.email,
        display_name=user.display_name,
        ...
    )

3. Enhanced Episode Metadata

Files:

  • backend/api/routers/voice.py - Voice WebSocket and persist_conversation_turn
  • backend/memory/client.py - persist_conversation method

Changes:

  • Ensure all session creation includes user identity metadata:
    • tenant_id (always included)
    • email (if available)
    • display_name (if available)
  • Updated persist_conversation to include user metadata in episode metadata

Before:

session_metadata = {
    "turn_count": context.episodic.total_turns,
}

After:

session_metadata = {
    "turn_count": context.episodic.total_turns,
    "tenant_id": context.security.tenant_id,
}
if context.security.email:
    session_metadata["email"] = context.security.email
if context.security.display_name:
    session_metadata["display_name"] = context.security.display_name

User Identity Flow

Chat Flow

  1. REST Endpoint (POST /api/v1/chat):
    • Uses get_current_user dependency → extracts JWT from Authorization header
    • Creates SecurityContext with authenticated user
    • Passes to get_or_create_session() → creates EnterpriseContext
    • enrich_context() ensures Zep session has full user metadata
    • persist_conversation() includes user metadata in episode metadata
  2. WebSocket Endpoint (WS /api/v1/chat/ws/{session_id}):
    • Extracts JWT token from query parameter (?token=...)
    • Validates token and creates SecurityContext with authenticated user
    • Passes to connection manager
    • Same flow as REST endpoint for memory operations

Voice Flow

  1. WebSocket Endpoint (WS /api/v1/voice/voicelive/{session_id}):
    • Extracts JWT token from query parameter (?token=...)
    • Validates token and creates SecurityContext with authenticated user
    • Creates EnterpriseContext with authenticated user
    • _ensure_memory_session() includes full user metadata
  2. Persist Turn Endpoint (POST /api/v1/voice/conversation/turn):
    • Uses get_current_user dependency → extracts JWT from Authorization header
    • Creates SecurityContext with authenticated user
    • Includes full user metadata in session creation

Episode Metadata Structure

Episodes now include the following metadata:

{
  "session_id": "session-xxx",
  "user_id": "d240186f-f80e-4369-9296-57fef571cd93",
  "metadata": {
    "tenant_id": "6684288a-b805-4161-bf41-ba2121e51c90",
    "email": "derek.brent.moore@engramai.onmicrosoft.com",
    "display_name": "derek brent moore",
    "agent_id": "elena",
    "channel": "chat|voice|voice-direct",
    "turn_count": 5,
    "summary": "Conversation summary...",
    "topics": ["topic1", "topic2"]
  }
}

Project/Department Metadata (Future)

For enterprise boundaries, project/department information can be added:

  1. Custom JWT Claims: Add project_id and department_id as custom claims in Entra ID
  2. Extract in Auth Middleware: Read custom claims from JWT token
  3. Add to SecurityContext: Extend SecurityContext to include project_id and department_id
  4. Include in Metadata: Add to session metadata when creating episodes

Example:

# In auth.py - extract custom claims
project_id = token.get("project_id")
department_id = token.get("department_id")

# In SecurityContext
project_id: Optional[str] = None
department_id: Optional[str] = None

# In session metadata
if security.project_id:
    session_metadata["project_id"] = security.project_id
if security.department_id:
    session_metadata["department_id"] = security.department_id

Verification

Test Chat Authentication

# Get JWT token from browser console or MSAL
TOKEN="<your-jwt-token>"

# Test REST endpoint
curl -X POST https://api.engram.work/api/v1/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"content": "hi", "session_id": "test-session"}'

# Test WebSocket (use wscat or similar)
wscat -c "wss://api.engram.work/api/v1/chat/ws/test-session?token=$TOKEN"

Test Voice Authentication

# Test WebSocket
wscat -c "wss://api.engram.work/api/v1/voice/voicelive/test-session?token=$TOKEN"

# Test persist turn endpoint
curl -X POST https://api.engram.work/api/v1/voice/conversation/turn \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "test-session",
    "agent_id": "elena",
    "role": "user",
    "content": "Hello"
  }'

Verify Episode Metadata

# List episodes (should show user's episodes only)
curl -X GET https://api.engram.work/api/v1/memory/episodes \
  -H "Authorization: Bearer $TOKEN"

# Get specific episode transcript
curl -X GET https://api.engram.work/api/v1/memory/episodes/{session_id} \
  -H "Authorization: Bearer $TOKEN"

  • docs/architecture/user-identity-flow-comprehensive.md - Complete user identity flow
  • docs/architecture/security-context-enterprise-architecture.md - SecurityContext architecture
  • docs/troubleshooting/user-identity-consistency-fix.md - User identity consistency fixes
  • backend/api/middleware/auth.py - Authentication middleware
  • backend/core/context.py - SecurityContext definition

Summary

Chat WebSocket: Now authenticates users from JWT token in query parameter
Voice persist_conversation_turn: Now uses authenticated user from SecurityContext
Episode Metadata: Now includes user_id, tenant_id, email, display_name consistently
Project Metadata: Framework ready for project/department metadata (requires custom JWT claims)

All voice and chat routing now properly authenticates users and includes full user metadata in episodes, ensuring proper user attribution and project/department boundaries.