ADR-007: Refactoring - Long-Term Architecture for Shared Components¶

Status: ✅ COMPLETED Date: 2026-02-16 Completed: 2026-02-17 Decision Makers: Brian Moore, AI Team

Implementation Status¶

Phase	Duration	Status	Commit
Phase 1: Create pkg/	~2 hours	✅ COMPLETE	8ade7a1
Phase 2.1: Migrate cortex-api	~1 hour	✅ COMPLETE	eb8ade6
Phase 2.2: Migrate cortex-indexer-v2	~1 hour	✅ COMPLETE	869a8b7
Phase 2.3: Migrate cortex-context	~2 hours	✅ COMPLETE	f76c57d
Phase 3: Thread-safety tests	~1 hour	✅ COMPLETE	2aa6019
Phase 4: Clean code polish	~30 min	✅ COMPLETE	(this commit)

Files Deleted (Code Reduction)¶

File	Lines	Replaced By
cortex-api/voyage_embeddings.go	158	pkg/embedder/voyage.go
cortex-api/voyage_reranker.go	376	pkg/reranker/voyage.go
cortex-indexer-v2/voyage_embedder.go	276	pkg/embedder/voyage.go
cortex-context/internal/embedder/*	929	pkg/embedder/*
cortex-context/internal/reranker/*	558	pkg/reranker/*
Total Deleted	2,297

Files Kept (Documented Fallbacks)¶

File	Lines	Purpose
cortex-api/gpu_embedder.go	189	GPU Ollama fallback (offline/dev)
cortex-indexer-v2/gpu_embedder.go	224	GPU Ollama fallback (offline/dev)
cortex-api/reranker.go	251	LLM reranker fallback (Voyage outages)

These files are kept as documented fallbacks with clear header comments explaining their purpose and when to use them. See "GPU Strategy" section below.

Net Code Reduction¶

Lines deleted: 2,297
Lines added (pkg/): ~600
Lines added (adapters): ~150
Net reduction: ~1,547 lines (~68% reduction in duplicated code)

Context¶

The Cortex codebase has grown organically with multiple competing implementations: - cortex-api - Main API server (module: cortex-api) - cortex-context - MCP context engine (module: github.com/emshvac/cortex-context) - cortex-indexer-v2 - Code indexer (module: cortex-indexer)

Problems: 1. Duplicated code - 3 embedder implementations, 2 reranker implementations (~2,200 lines) 2. Inconsistent module naming - Some use local names, some use GitHub paths 3. No code sharing - Can't import between modules due to naming inconsistency 4. Drift risk - Fixes in one place don't propagate to others

Decision¶

Create a Go workspace with shared packages to enable proper code sharing while maintaining separate deployable services.

Long-Term Architecture¶

cortex/                              # Go workspace root
├── go.work                          # Go workspace file
├── pkg/                             # SHARED PACKAGES (new)
│   ├── go.mod                       # module github.com/emshvac/cortex-pkg
│   ├── embedder/
│   │   ├── embedder.go              # Interface
│   │   ├── voyage.go                # Voyage AI (production)
│   │   ├── ollama.go                # Ollama (fallback/dev)
│   │   └── retry.go                 # Shared retry logic
│   ├── reranker/
│   │   ├── reranker.go              # Interface
│   │   └── voyage.go                # Voyage AI
│   └── ratelimit/
│       └── tokenbucket.go           # Thread-safe rate limiter
│
├── cortex-api/                      # API Server
│   ├── go.mod                       # module github.com/emshvac/cortex-api
│   └── (imports pkg/embedder, pkg/reranker)
│
├── cortex-context/                  # Context Engine (MCP Server)
│   ├── go.mod                       # module github.com/emshvac/cortex-context
│   └── (imports pkg/embedder, pkg/reranker)
│
└── cortex-indexer-v2/               # Code Indexer
    ├── go.mod                       # module github.com/emshvac/cortex-indexer
    └── (imports pkg/embedder)

Why Go Workspace?¶

Local development - Changes to pkg/ immediately available to all services
Proper versioning - Each service can pin to specific pkg version in prod
Clean separation - Shared code is explicit, not buried in /internal
No MCP latency - Direct function calls for embeddings, not network hops

Data Flow After Refactoring¶

┌─────────────────────────────────────────────────────────────────────┐
│                        cortex-api (API Server)                       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────┐  │
│  │ pkg/embedder│    │pkg/reranker │    │ MCP Client              │  │
│  │ (direct)    │    │ (direct)    │    │ (search orchestration)  │  │
│  └──────┬──────┘    └──────┬──────┘    └───────────┬─────────────┘  │
└─────────┼──────────────────┼───────────────────────┼────────────────┘
          │                  │                       │
          ▼                  ▼                       ▼
┌─────────────────┐  ┌─────────────────┐   ┌────────────────────────┐
│   Voyage AI     │  │   Voyage AI     │   │   cortex-context       │
│   Embeddings    │  │   Reranking     │   │   (MCP Server)         │
│   API           │  │   API           │   │   - Search orchestration│
└─────────────────┘  └─────────────────┘   │   - Query expansion    │
                                           │   - RSE extraction     │
                                           │   - Result packing     │
                                           └───────────┬────────────┘
                                                       │
                                                       ▼
                                           ┌────────────────────────┐
                                           │   Qdrant HA Cluster    │
                                           │   (Vector + BM25)      │
                                           └────────────────────────┘

Key insight: pkg/embedder and pkg/reranker are used directly (no network hop). MCP is only for complex search orchestration.

Refactoring Phases¶

Phase 1: Create Shared Package Structure (FOUNDATION)¶

Duration: ~2-3 hours Risk: Low (additive change)

Create cortex/go.work workspace file
Create cortex/pkg/ with shared packages
Move canonical implementations from cortex-context to pkg/
Update cortex-context to import from pkg/
Verify all tests pass

# Create workspace
cd cortex
go work init
go work use ./cortex-api ./cortex-context ./cortex-indexer-v2 ./pkg

# Create pkg module
mkdir -p pkg/embedder pkg/reranker pkg/ratelimit
cd pkg && go mod init github.com/emshvac/cortex-pkg

Phase 2: Migrate Services to Shared Packages (CONSOLIDATION)¶

Duration: ~4-6 hours Risk: Medium (breaking changes)

Service	Current Files	Migration
cortex-api	gpu_embedder.go, voyage_embeddings.go, reranker.go, voyage_reranker.go	Replace with `import "github.com/emshvac/cortex-pkg/embedder"`
cortex-indexer-v2	gpu_embedder.go, voyage_embedder.go	Replace with `import "github.com/emshvac/cortex-pkg/embedder"`
cortex-context	internal/embedder/, internal/reranker/	Move to pkg/, keep thin wrappers if needed

Files to delete after migration: | File | Lines | Reason | |------|-------|--------| | cortex-api/gpu_embedder.go | 178 | Replaced by pkg/embedder | | cortex-api/voyage_embeddings.go | 158 | Replaced by pkg/embedder | | cortex-api/reranker.go | 236 | Replaced by pkg/reranker | | cortex-api/voyage_reranker.go | 376 | Replaced by pkg/reranker | | cortex-indexer-v2/gpu_embedder.go | 213 | Replaced by pkg/embedder | | cortex-indexer-v2/voyage_embedder.go | 276 | Replaced by pkg/embedder | | Total | 1,437 | |

Phase 3: Fix Structural Issues (HARDENING)¶

Duration: ~2-3 hours Risk: Low

Rate limiter consolidation - Single thread-safe implementation in pkg/ratelimit
HTTP client pooling - Shared client with proper connection reuse
Error handling standardization - Consistent error types across services

Phase 4: Clean Code (POLISH)¶

Duration: ~1-2 hours Risk: Low

Extract magic numbers to config (hardcoded IPs, timeouts, batch sizes)
Standardize logging patterns
Add comprehensive metrics to shared packages

Shared Package Structure (NEW)¶

`pkg/embedder/` - Embedding Generation¶

File	Purpose	Features
`embedder.go`	Interface definition	`Embedder`, `BatchEmbedder` interfaces
`voyage.go`	Voyage AI voyage-code-3	Rate limiting (300 RPM), batching (128), retry, float32
`ollama.go`	Ollama fallback	GPU/CPU inference, configurable model
`retry.go`	Shared retry logic	Exponential backoff, jitter

`pkg/reranker/` - Cross-Encoder Reranking¶

File	Purpose	Features
`reranker.go`	Interface definition	`Reranker` interface, config types
`voyage.go`	Voyage AI rerank-2.5	Rate limiting, retry, truncation handling

`pkg/ratelimit/` - Thread-Safe Rate Limiting¶

File	Purpose	Features
`tokenbucket.go`	Token bucket implementation	Thread-safe, configurable RPM, burst support

GPU Strategy (PRESERVED FOR FUTURE)¶

Current Decision: GPU Ollama is NOT used for embeddings because: 1. Voyage AI voyage-code-3 achieves 97.6% recall vs 68.3% for local models 2. Cost ($0.06/M tokens) is acceptable for our scale 3. Eliminates model compatibility issues (same model for index + query)

Future GPU Use Cases:

Use Case	Model	Status	Notes
Query Expansion	qwen2.5-coder:7b	✅ ACTIVE	Generates 3 query variations
Local Embeddings (dev)	snowflake-arctic-embed:335m	📋 AVAILABLE	For offline/dev scenarios
Code Generation	deepseek-coder:6.7b	📋 AVAILABLE	Local inference option
Reranking (fallback)	qwen2.5-coder:7b	📋 AVAILABLE	If Voyage API is down

GPU Infrastructure: - Windows machine: 192.168.10.56:11434 (RTX 4080 SUPER) - Models available: snowflake-arctic-embed:335m, mxbai-embed-large, qwen2.5-coder:7b, deepseek-coder:6.7b

When to Revisit GPU Embeddings: 1. If Voyage AI costs exceed $50/month 2. If we need offline/air-gapped operation 3. If local model quality improves significantly 4. If we need lower latency (<50ms vs ~200ms)

Migration Steps (Detailed)¶

Step 1: Create Go Workspace and pkg/ Module¶

cd cortex

# Create workspace file
cat > go.work << 'EOF'
go 1.24

use (
    ./cortex-api
    ./cortex-context
    ./cortex-indexer-v2
    ./pkg
)
EOF

# Create pkg module
mkdir -p pkg/embedder pkg/reranker pkg/ratelimit
cd pkg
go mod init github.com/emshvac/cortex-pkg
cd ..

Step 2: Copy Canonical Implementations to pkg/¶

# Copy from cortex-context (the best implementations)
cp cortex-context/internal/embedder/embedder.go pkg/embedder/
cp cortex-context/internal/embedder/voyage.go pkg/embedder/
cp cortex-context/internal/embedder/retry.go pkg/embedder/
cp cortex-context/internal/reranker/reranker.go pkg/reranker/
cp cortex-context/internal/reranker/voyage.go pkg/reranker/

# Update package declarations
sed -i '' 's/package embedder/package embedder/' pkg/embedder/*.go
sed -i '' 's/package reranker/package reranker/' pkg/reranker/*.go

Step 3: Create Rate Limiter in pkg/ratelimit¶

Extract and consolidate the token bucket implementations into a single, thread-safe version.

Step 4: Update cortex-api/main.go¶

// Replace local implementations with shared packages
import (
    "github.com/emshvac/cortex-pkg/embedder"
    "github.com/emshvac/cortex-pkg/reranker"
)

// Initialize embedder
voyageEmbedder := embedder.NewVoyageEmbedder(embedder.VoyageConfig{
    APIKey: os.Getenv("VOYAGE_API_KEY"),
    Model:  "voyage-code-3",
})

// Initialize reranker
voyageReranker := reranker.NewVoyageReranker(reranker.VoyageConfig{
    APIKey: os.Getenv("VOYAGE_API_KEY"),
    Model:  "rerank-2.5",
})

Step 5: Update cortex-context to Use pkg/¶

// Replace internal imports
import (
    "github.com/emshvac/cortex-pkg/embedder"
    "github.com/emshvac/cortex-pkg/reranker"
)

Step 6: Update cortex-indexer-v2¶

import "github.com/emshvac/cortex-pkg/embedder"

// Replace GPUEmbedder/VoyageEmbedder with:
emb := embedder.NewVoyageEmbedder(embedder.VoyageConfig{...})

Step 7: Delete Redundant Files¶

# Only after all services are migrated and tested
rm cortex-api/gpu_embedder.go
rm cortex-api/voyage_embeddings.go
rm cortex-api/reranker.go
rm cortex-api/voyage_reranker.go
rm cortex-indexer-v2/gpu_embedder.go
rm cortex-indexer-v2/voyage_embedder.go

# Optionally remove cortex-context internal packages if fully migrated
# (or keep as thin wrappers for backward compatibility)

Step 8: Update go.mod Files¶

# Add pkg dependency to each service
cd cortex-api && go get github.com/emshvac/cortex-pkg@latest
cd ../cortex-context && go get github.com/emshvac/cortex-pkg@latest
cd ../cortex-indexer-v2 && go get github.com/emshvac/cortex-pkg@latest

Step 9: Verify and Test¶

# Build all services
cd cortex
go work sync
go build ./cortex-api/...
go build ./cortex-context/...
go build ./cortex-indexer-v2/...

# Run tests
go test ./pkg/...
go test ./cortex-api/...
go test ./cortex-context/...
go test ./cortex-indexer-v2/...

Consequences¶

Positive¶

Single source of truth - One implementation for embeddings, reranking, rate limiting
Easier maintenance - Fix bugs in one place, propagates to all services
Consistent behavior - Same rate limiting, retry logic, error handling everywhere
Code reduction - ~1,400 lines of duplicated code removed
Better testing - Shared packages get more thorough testing
Clear dependencies - Go workspace makes dependencies explicit

Negative¶

New package to maintain - cortex-pkg is a new artifact
Breaking change - Services need updates to import from pkg/
Coordination required - Changes to pkg/ affect all services

Risks¶

Build breakage during migration - Mitigated by Go workspace (local development works)
Version drift - Mitigated by workspace and semantic versioning
Circular dependencies - Avoid by keeping pkg/ dependency-free

Rollback Plan¶

During migration - Keep old files until new imports verified working
After migration - Deleted files preserved in git history:
```
git checkout HEAD~1 -- cortex-api/gpu_embedder.go
```
Full rollback - Revert to pre-refactoring commit and re-copy implementations

Success Criteria¶

All services build successfully with Go workspace
All tests pass
No duplicate embedder/reranker code (Voyage AI implementations consolidated)
Rate limiter is thread-safe (verified by race detector - commit 2aa6019)
CI/CD pipelines updated and passing (fixed in commit c7f136a)
Fallback implementations documented with clear header comments

Timeline (Actual)¶

Phase	Estimated	Actual	Status
Phase 1: Create pkg/	2-3 hours	~2 hours	✅ COMPLETE
Phase 2: Migrate services	4-6 hours	~4 hours	✅ COMPLETE
Phase 3: Hardening (tests)	2-3 hours	~1 hour	✅ COMPLETE
Phase 4: Polish	1-2 hours	~30 min	✅ COMPLETE
Total	9-14 hours	~7.5 hours	✅

Lessons Learned¶

Adapter pattern works well - cortex-api and cortex-indexer-v2 needed float64 while pkg/ uses float32. Thin adapter wrappers solved this cleanly.
Go workspace simplifies local dev - Changes to pkg/ immediately available without version bumps or go mod updates.
Type compatibility matters - Ensure return types match expectations before migrating. pkg/reranker.Result vs internal/reranker.RerankResult required updates to downstream code.
Keep fallbacks documented - GPU embedder and LLM reranker are valuable for offline/outage scenarios. Clear header comments prevent accidental deletion.
CI/CD catches issues early - Pipeline failure after Phase 2.3 revealed unused replace directives and workspace interference. Fixed before production impact.

ADR-002: Voyage AI Embeddings Strategy
ADR-005: Hybrid Search Weight Configuration
ADR-006: Incremental Indexing
cortex-context/CORTEX_CONTEXT_INTEGRATION.md

Files Deleted (Post-Migration)¶

File	Lines	Replaced By	Status
cortex-api/voyage_embeddings.go	158	pkg/embedder/voyage.go	✅ DELETED
cortex-api/voyage_reranker.go	376	pkg/reranker/voyage.go	✅ DELETED
cortex-indexer-v2/voyage_embedder.go	276	pkg/embedder/voyage.go	✅ DELETED
cortex-context/internal/embedder/*	929	pkg/embedder/*	✅ DELETED
cortex-context/internal/reranker/*	558	pkg/reranker/*	✅ DELETED
Total Deleted	2,297

Files Kept (Documented Fallbacks)¶

File	Lines	Reason
cortex-api/gpu_embedder.go	189	GPU Ollama fallback for offline/dev
cortex-indexer-v2/gpu_embedder.go	224	GPU Ollama fallback for offline/dev
cortex-api/reranker.go	251	LLM reranker fallback for Voyage outages

These files have been documented with clear header comments explaining: - Their fallback status (not primary implementation) - When to use them (offline, dev, outages) - Important caveats (model consistency for embeddings)

ADR-007: Refactoring - Long-Term Architecture for Shared Components¶

Implementation Status¶

Files Deleted (Code Reduction)¶

Files Kept (Documented Fallbacks)¶

Net Code Reduction¶

Context¶

Decision¶

Long-Term Architecture¶

Why Go Workspace?¶

Data Flow After Refactoring¶

Refactoring Phases¶

Phase 1: Create Shared Package Structure (FOUNDATION)¶

Phase 2: Migrate Services to Shared Packages (CONSOLIDATION)¶

Phase 3: Fix Structural Issues (HARDENING)¶

Phase 4: Clean Code (POLISH)¶

Shared Package Structure (NEW)¶

pkg/embedder/ - Embedding Generation¶

pkg/reranker/ - Cross-Encoder Reranking¶

pkg/ratelimit/ - Thread-Safe Rate Limiting¶

GPU Strategy (PRESERVED FOR FUTURE)¶

Migration Steps (Detailed)¶

Step 1: Create Go Workspace and pkg/ Module¶

Step 2: Copy Canonical Implementations to pkg/¶

Step 3: Create Rate Limiter in pkg/ratelimit¶

Step 4: Update cortex-api/main.go¶

Step 5: Update cortex-context to Use pkg/¶

Step 6: Update cortex-indexer-v2¶

Step 7: Delete Redundant Files¶

Step 8: Update go.mod Files¶

Step 9: Verify and Test¶

Consequences¶

Positive¶

Negative¶

Risks¶

Rollback Plan¶

Success Criteria¶

Timeline (Actual)¶

Lessons Learned¶

Related Documents¶

Files Deleted (Post-Migration)¶

Files Kept (Documented Fallbacks)¶

`pkg/embedder/` - Embedding Generation¶

`pkg/reranker/` - Cross-Encoder Reranking¶

`pkg/ratelimit/` - Thread-Safe Rate Limiting¶