ADR-003: Priority Context Injection for Anti-Hallucination¶
Date: 2026-02-15 Status: Accepted Deciders: Jeff Mosley, Brian Moore
Context¶
When working on large, multi-session projects, LLMs often "forget" or contradict key architectural decisions made earlier in the conversation. This is a critical problem because:
- Context Window Limitations: Even with 128K+ context windows, older messages get truncated
- Hallucination Risk: LLMs may invent details that contradict established decisions
- Consistency Issues: Different sessions may suggest conflicting approaches to the same problem
Commercial tools like Augment and Cursor solve this with proprietary context engines that maintain "grounding" - ensuring the LLM always has access to critical context.
Decision¶
Implement a Priority Context Injection system that:
- Always injects first: Priority context is prepended to every LLM request when
project_idis set - Never truncated: Priority context is treated as sacrosanct - conversation history is truncated before priority context
- Authoritative: Marked explicitly as "do not contradict" guidance for the LLM
Priority Context Structure¶
# PRIORITY CONTEXT
## This context is authoritative and must not be contradicted.
### Project Brief
{project goals, constraints, tech stack}
### Key Decisions (Do Not Contradict)
1. **{Decision Title}**
- Rationale: {why this decision was made}
- Context: {additional context}
- Category: {architecture/infrastructure/security/etc.}
Implementation¶
Database Schema (mem_decisions table):
- id (UUID primary key)
- project_id (references mem_projects)
- title (decision summary)
- rationale (why it was decided)
- context (additional context)
- category (architecture/infrastructure/security/etc.)
- supersedes_id (for decision updates)
- superseded (boolean, false by default)
- created_at, updated_at timestamps
API Endpoints:
- POST /api/v1/decisions - Create decision
- GET /api/v1/projects/{id}/decisions - List project decisions
- GET /api/v1/decisions/{id} - Get single decision
- PATCH /api/v1/decisions/{id} - Update/supersede decision
- GET /api/v1/projects/{id}/priority-context - Get formatted priority context
Chat Handler Integration:
- Modified cloudChatHandler and cloudChatStreamHandler in cloud_llm.go
- When project_id is set, calls memoryMgr.GetPriorityContext(ctx, projectID)
- Prepends priority context to system prompt before LLM call
Embedding Strategy¶
Decision: Use GPU Ollama with snowflake-arctic-embed:335m instead of Voyage AI.
- GPU: RTX 4080 SUPER at 192.168.10.56:11434
- Model: snowflake-arctic-embed:335m (1024 dimensions, SOTA retrieval performance)
- Latency: 90-150ms per embedding
- Cost: $0 (local GPU)
This decision was made earlier in the project to avoid Voyage AI costs since we have a powerful GPU available.
Consequences¶
Positive¶
- Consistency: LLM always has access to key decisions, preventing contradictions
- Cost Savings: Using local GPU instead of Voyage AI saves money
- Reduced Hallucinations: Grounding context reduces invented details
- Better Multi-Session Experience: Projects maintain coherence across sessions
Negative¶
- Token Overhead: Priority context uses tokens that could be used for conversation
- Maintenance: Decisions need to be recorded and kept up-to-date
- Complexity: Additional database schema and API endpoints
Neutral¶
- Future Enhancement: Phase 3 will add RAG-based grounding (fetch actual code before LLM response)
- Decision Versioning: Superseded decisions are kept for audit trail
Implementation Details¶
Key Files Modified¶
| File | Changes |
|---|---|
cortex-api/cloud_llm.go |
Added priority context injection to both handlers |
cortex-api/main.go |
Updated route registration to pass MemoryManager |
cortex-api/memory_manager.go |
Added decision CRUD and GetPriorityContext() |
cortex-api/schema/session_projects_tasks.sql |
Added mem_decisions table |
Testing¶
Verified working by: 1. Creating a test project 2. Adding a decision about GPU Ollama vs Voyage AI 3. Making a cloud chat request with project_id 4. Confirming LLM response referenced the stored decision correctly
Deployment¶
Deployed to both API servers (192.168.11.132, 192.168.11.133) on 2026-02-15.
Alternatives Considered¶
- Full Context Caching: Store entire conversation history - Rejected due to token limits
- Voyage AI Embeddings: Cloud-based - Rejected due to cost, GPU is free
- Manual Context Injection: User pastes context - Rejected, too manual
- Session Summaries Only: Auto-generate summaries - Rejected, loses decision details