Skip to content

ADR-003: Priority Context Injection for Anti-Hallucination

Date: 2026-02-15 Status: Accepted Deciders: Jeff Mosley, Brian Moore

Context

When working on large, multi-session projects, LLMs often "forget" or contradict key architectural decisions made earlier in the conversation. This is a critical problem because:

  1. Context Window Limitations: Even with 128K+ context windows, older messages get truncated
  2. Hallucination Risk: LLMs may invent details that contradict established decisions
  3. Consistency Issues: Different sessions may suggest conflicting approaches to the same problem

Commercial tools like Augment and Cursor solve this with proprietary context engines that maintain "grounding" - ensuring the LLM always has access to critical context.

Decision

Implement a Priority Context Injection system that:

  1. Always injects first: Priority context is prepended to every LLM request when project_id is set
  2. Never truncated: Priority context is treated as sacrosanct - conversation history is truncated before priority context
  3. Authoritative: Marked explicitly as "do not contradict" guidance for the LLM

Priority Context Structure

# PRIORITY CONTEXT
## This context is authoritative and must not be contradicted.

### Project Brief
{project goals, constraints, tech stack}

### Key Decisions (Do Not Contradict)
1. **{Decision Title}**
   - Rationale: {why this decision was made}
   - Context: {additional context}
   - Category: {architecture/infrastructure/security/etc.}

Implementation

Database Schema (mem_decisions table): - id (UUID primary key) - project_id (references mem_projects) - title (decision summary) - rationale (why it was decided) - context (additional context) - category (architecture/infrastructure/security/etc.) - supersedes_id (for decision updates) - superseded (boolean, false by default) - created_at, updated_at timestamps

API Endpoints: - POST /api/v1/decisions - Create decision - GET /api/v1/projects/{id}/decisions - List project decisions - GET /api/v1/decisions/{id} - Get single decision - PATCH /api/v1/decisions/{id} - Update/supersede decision - GET /api/v1/projects/{id}/priority-context - Get formatted priority context

Chat Handler Integration: - Modified cloudChatHandler and cloudChatStreamHandler in cloud_llm.go - When project_id is set, calls memoryMgr.GetPriorityContext(ctx, projectID) - Prepends priority context to system prompt before LLM call

Embedding Strategy

Decision: Use GPU Ollama with snowflake-arctic-embed:335m instead of Voyage AI.

  • GPU: RTX 4080 SUPER at 192.168.10.56:11434
  • Model: snowflake-arctic-embed:335m (1024 dimensions, SOTA retrieval performance)
  • Latency: 90-150ms per embedding
  • Cost: $0 (local GPU)

This decision was made earlier in the project to avoid Voyage AI costs since we have a powerful GPU available.

Consequences

Positive

  1. Consistency: LLM always has access to key decisions, preventing contradictions
  2. Cost Savings: Using local GPU instead of Voyage AI saves money
  3. Reduced Hallucinations: Grounding context reduces invented details
  4. Better Multi-Session Experience: Projects maintain coherence across sessions

Negative

  1. Token Overhead: Priority context uses tokens that could be used for conversation
  2. Maintenance: Decisions need to be recorded and kept up-to-date
  3. Complexity: Additional database schema and API endpoints

Neutral

  1. Future Enhancement: Phase 3 will add RAG-based grounding (fetch actual code before LLM response)
  2. Decision Versioning: Superseded decisions are kept for audit trail

Implementation Details

Key Files Modified

File Changes
cortex-api/cloud_llm.go Added priority context injection to both handlers
cortex-api/main.go Updated route registration to pass MemoryManager
cortex-api/memory_manager.go Added decision CRUD and GetPriorityContext()
cortex-api/schema/session_projects_tasks.sql Added mem_decisions table

Testing

Verified working by: 1. Creating a test project 2. Adding a decision about GPU Ollama vs Voyage AI 3. Making a cloud chat request with project_id 4. Confirming LLM response referenced the stored decision correctly

Deployment

Deployed to both API servers (192.168.11.132, 192.168.11.133) on 2026-02-15.

Alternatives Considered

  1. Full Context Caching: Store entire conversation history - Rejected due to token limits
  2. Voyage AI Embeddings: Cloud-based - Rejected due to cost, GPU is free
  3. Manual Context Injection: User pastes context - Rejected, too manual
  4. Session Summaries Only: Auto-generate summaries - Rejected, loses decision details

See Also