ADR-003: Priority Context Injection for Anti-Hallucination¶

Date: 2026-02-15 Status: Accepted Deciders: Jeff Mosley, Brian Moore

Context¶

When working on large, multi-session projects, LLMs often "forget" or contradict key architectural decisions made earlier in the conversation. This is a critical problem because:

Context Window Limitations: Even with 128K+ context windows, older messages get truncated
Hallucination Risk: LLMs may invent details that contradict established decisions
Consistency Issues: Different sessions may suggest conflicting approaches to the same problem

Commercial tools like Augment and Cursor solve this with proprietary context engines that maintain "grounding" - ensuring the LLM always has access to critical context.

Decision¶

Implement a Priority Context Injection system that:

Always injects first: Priority context is prepended to every LLM request when project_id is set
Never truncated: Priority context is treated as sacrosanct - conversation history is truncated before priority context
Authoritative: Marked explicitly as "do not contradict" guidance for the LLM

Priority Context Structure¶

# PRIORITY CONTEXT
## This context is authoritative and must not be contradicted.

### Project Brief
{project goals, constraints, tech stack}

### Key Decisions (Do Not Contradict)
1. **{Decision Title}**
   - Rationale: {why this decision was made}
   - Context: {additional context}
   - Category: {architecture/infrastructure/security/etc.}

Implementation¶

Database Schema (mem_decisions table): - id (UUID primary key) - project_id (references mem_projects) - title (decision summary) - rationale (why it was decided) - context (additional context) - category (architecture/infrastructure/security/etc.) - supersedes_id (for decision updates) - superseded (boolean, false by default) - created_at, updated_at timestamps

API Endpoints: - POST /api/v1/decisions - Create decision - GET /api/v1/projects/{id}/decisions - List project decisions - GET /api/v1/decisions/{id} - Get single decision - PATCH /api/v1/decisions/{id} - Update/supersede decision - GET /api/v1/projects/{id}/priority-context - Get formatted priority context

Chat Handler Integration: - Modified cloudChatHandler and cloudChatStreamHandler in cloud_llm.go - When project_id is set, calls memoryMgr.GetPriorityContext(ctx, projectID) - Prepends priority context to system prompt before LLM call

Embedding Strategy¶

Decision: Use GPU Ollama with snowflake-arctic-embed:335m instead of Voyage AI.

GPU: RTX 4080 SUPER at 192.168.10.56:11434
Model: snowflake-arctic-embed:335m (1024 dimensions, SOTA retrieval performance)
Latency: 90-150ms per embedding
Cost: $0 (local GPU)

This decision was made earlier in the project to avoid Voyage AI costs since we have a powerful GPU available.

Consequences¶

Positive¶

Consistency: LLM always has access to key decisions, preventing contradictions
Cost Savings: Using local GPU instead of Voyage AI saves money
Reduced Hallucinations: Grounding context reduces invented details
Better Multi-Session Experience: Projects maintain coherence across sessions

Negative¶

Token Overhead: Priority context uses tokens that could be used for conversation
Maintenance: Decisions need to be recorded and kept up-to-date
Complexity: Additional database schema and API endpoints

Neutral¶

Future Enhancement: Phase 3 will add RAG-based grounding (fetch actual code before LLM response)
Decision Versioning: Superseded decisions are kept for audit trail

Implementation Details¶

Key Files Modified¶

File	Changes
`cortex-api/cloud_llm.go`	Added priority context injection to both handlers
`cortex-api/main.go`	Updated route registration to pass MemoryManager
`cortex-api/memory_manager.go`	Added decision CRUD and GetPriorityContext()
`cortex-api/schema/session_projects_tasks.sql`	Added mem_decisions table

Testing¶

Verified working by: 1. Creating a test project 2. Adding a decision about GPU Ollama vs Voyage AI 3. Making a cloud chat request with project_id 4. Confirming LLM response referenced the stored decision correctly

Deployment¶

Deployed to both API servers (192.168.11.132, 192.168.11.133) on 2026-02-15.

Alternatives Considered¶

Full Context Caching: Store entire conversation history - Rejected due to token limits
Voyage AI Embeddings: Cloud-based - Rejected due to cost, GPU is free
Manual Context Injection: User pastes context - Rejected, too manual
Session Summaries Only: Auto-generate summaries - Rejected, loses decision details