ADR-007: Refactoring - Long-Term Architecture for Shared Components¶
Status: ✅ COMPLETED Date: 2026-02-16 Completed: 2026-02-17 Decision Makers: Brian Moore, AI Team
Implementation Status¶
| Phase | Duration | Status | Commit |
|---|---|---|---|
| Phase 1: Create pkg/ | ~2 hours | ✅ COMPLETE | 8ade7a1 |
| Phase 2.1: Migrate cortex-api | ~1 hour | ✅ COMPLETE | eb8ade6 |
| Phase 2.2: Migrate cortex-indexer-v2 | ~1 hour | ✅ COMPLETE | 869a8b7 |
| Phase 2.3: Migrate cortex-context | ~2 hours | ✅ COMPLETE | f76c57d |
| Phase 3: Thread-safety tests | ~1 hour | ✅ COMPLETE | 2aa6019 |
| Phase 4: Clean code polish | ~30 min | ✅ COMPLETE | (this commit) |
Files Deleted (Code Reduction)¶
| File | Lines | Replaced By |
|---|---|---|
| cortex-api/voyage_embeddings.go | 158 | pkg/embedder/voyage.go |
| cortex-api/voyage_reranker.go | 376 | pkg/reranker/voyage.go |
| cortex-indexer-v2/voyage_embedder.go | 276 | pkg/embedder/voyage.go |
| cortex-context/internal/embedder/* | 929 | pkg/embedder/* |
| cortex-context/internal/reranker/* | 558 | pkg/reranker/* |
| Total Deleted | 2,297 |
Files Kept (Documented Fallbacks)¶
| File | Lines | Purpose |
|---|---|---|
| cortex-api/gpu_embedder.go | 189 | GPU Ollama fallback (offline/dev) |
| cortex-indexer-v2/gpu_embedder.go | 224 | GPU Ollama fallback (offline/dev) |
| cortex-api/reranker.go | 251 | LLM reranker fallback (Voyage outages) |
These files are kept as documented fallbacks with clear header comments explaining their purpose and when to use them. See "GPU Strategy" section below.
Net Code Reduction¶
- Lines deleted: 2,297
- Lines added (pkg/): ~600
- Lines added (adapters): ~150
- Net reduction: ~1,547 lines (~68% reduction in duplicated code)
Context¶
The Cortex codebase has grown organically with multiple competing implementations:
- cortex-api - Main API server (module: cortex-api)
- cortex-context - MCP context engine (module: github.com/emshvac/cortex-context)
- cortex-indexer-v2 - Code indexer (module: cortex-indexer)
Problems: 1. Duplicated code - 3 embedder implementations, 2 reranker implementations (~2,200 lines) 2. Inconsistent module naming - Some use local names, some use GitHub paths 3. No code sharing - Can't import between modules due to naming inconsistency 4. Drift risk - Fixes in one place don't propagate to others
Decision¶
Create a Go workspace with shared packages to enable proper code sharing while maintaining separate deployable services.
Long-Term Architecture¶
cortex/ # Go workspace root
├── go.work # Go workspace file
├── pkg/ # SHARED PACKAGES (new)
│ ├── go.mod # module github.com/emshvac/cortex-pkg
│ ├── embedder/
│ │ ├── embedder.go # Interface
│ │ ├── voyage.go # Voyage AI (production)
│ │ ├── ollama.go # Ollama (fallback/dev)
│ │ └── retry.go # Shared retry logic
│ ├── reranker/
│ │ ├── reranker.go # Interface
│ │ └── voyage.go # Voyage AI
│ └── ratelimit/
│ └── tokenbucket.go # Thread-safe rate limiter
│
├── cortex-api/ # API Server
│ ├── go.mod # module github.com/emshvac/cortex-api
│ └── (imports pkg/embedder, pkg/reranker)
│
├── cortex-context/ # Context Engine (MCP Server)
│ ├── go.mod # module github.com/emshvac/cortex-context
│ └── (imports pkg/embedder, pkg/reranker)
│
└── cortex-indexer-v2/ # Code Indexer
├── go.mod # module github.com/emshvac/cortex-indexer
└── (imports pkg/embedder)
Why Go Workspace?¶
- Local development - Changes to
pkg/immediately available to all services - Proper versioning - Each service can pin to specific pkg version in prod
- Clean separation - Shared code is explicit, not buried in
/internal - No MCP latency - Direct function calls for embeddings, not network hops
Data Flow After Refactoring¶
┌─────────────────────────────────────────────────────────────────────┐
│ cortex-api (API Server) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ pkg/embedder│ │pkg/reranker │ │ MCP Client │ │
│ │ (direct) │ │ (direct) │ │ (search orchestration) │ │
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
└─────────┼──────────────────┼───────────────────────┼────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌────────────────────────┐
│ Voyage AI │ │ Voyage AI │ │ cortex-context │
│ Embeddings │ │ Reranking │ │ (MCP Server) │
│ API │ │ API │ │ - Search orchestration│
└─────────────────┘ └─────────────────┘ │ - Query expansion │
│ - RSE extraction │
│ - Result packing │
└───────────┬────────────┘
│
▼
┌────────────────────────┐
│ Qdrant HA Cluster │
│ (Vector + BM25) │
└────────────────────────┘
Key insight: pkg/embedder and pkg/reranker are used directly (no network hop). MCP is only for complex search orchestration.
Refactoring Phases¶
Phase 1: Create Shared Package Structure (FOUNDATION)¶
Duration: ~2-3 hours Risk: Low (additive change)
- Create
cortex/go.workworkspace file - Create
cortex/pkg/with shared packages - Move canonical implementations from cortex-context to pkg/
- Update cortex-context to import from pkg/
- Verify all tests pass
# Create workspace
cd cortex
go work init
go work use ./cortex-api ./cortex-context ./cortex-indexer-v2 ./pkg
# Create pkg module
mkdir -p pkg/embedder pkg/reranker pkg/ratelimit
cd pkg && go mod init github.com/emshvac/cortex-pkg
Phase 2: Migrate Services to Shared Packages (CONSOLIDATION)¶
Duration: ~4-6 hours Risk: Medium (breaking changes)
| Service | Current Files | Migration |
|---|---|---|
| cortex-api | gpu_embedder.go, voyage_embeddings.go, reranker.go, voyage_reranker.go | Replace with import "github.com/emshvac/cortex-pkg/embedder" |
| cortex-indexer-v2 | gpu_embedder.go, voyage_embedder.go | Replace with import "github.com/emshvac/cortex-pkg/embedder" |
| cortex-context | internal/embedder/, internal/reranker/ | Move to pkg/, keep thin wrappers if needed |
Files to delete after migration: | File | Lines | Reason | |------|-------|--------| | cortex-api/gpu_embedder.go | 178 | Replaced by pkg/embedder | | cortex-api/voyage_embeddings.go | 158 | Replaced by pkg/embedder | | cortex-api/reranker.go | 236 | Replaced by pkg/reranker | | cortex-api/voyage_reranker.go | 376 | Replaced by pkg/reranker | | cortex-indexer-v2/gpu_embedder.go | 213 | Replaced by pkg/embedder | | cortex-indexer-v2/voyage_embedder.go | 276 | Replaced by pkg/embedder | | Total | 1,437 | |
Phase 3: Fix Structural Issues (HARDENING)¶
Duration: ~2-3 hours Risk: Low
- Rate limiter consolidation - Single thread-safe implementation in
pkg/ratelimit - HTTP client pooling - Shared client with proper connection reuse
- Error handling standardization - Consistent error types across services
Phase 4: Clean Code (POLISH)¶
Duration: ~1-2 hours Risk: Low
- Extract magic numbers to config (hardcoded IPs, timeouts, batch sizes)
- Standardize logging patterns
- Add comprehensive metrics to shared packages
Shared Package Structure (NEW)¶
pkg/embedder/ - Embedding Generation¶
| File | Purpose | Features |
|---|---|---|
embedder.go |
Interface definition | Embedder, BatchEmbedder interfaces |
voyage.go |
Voyage AI voyage-code-3 | Rate limiting (300 RPM), batching (128), retry, float32 |
ollama.go |
Ollama fallback | GPU/CPU inference, configurable model |
retry.go |
Shared retry logic | Exponential backoff, jitter |
pkg/reranker/ - Cross-Encoder Reranking¶
| File | Purpose | Features |
|---|---|---|
reranker.go |
Interface definition | Reranker interface, config types |
voyage.go |
Voyage AI rerank-2.5 | Rate limiting, retry, truncation handling |
pkg/ratelimit/ - Thread-Safe Rate Limiting¶
| File | Purpose | Features |
|---|---|---|
tokenbucket.go |
Token bucket implementation | Thread-safe, configurable RPM, burst support |
GPU Strategy (PRESERVED FOR FUTURE)¶
Current Decision: GPU Ollama is NOT used for embeddings because: 1. Voyage AI voyage-code-3 achieves 97.6% recall vs 68.3% for local models 2. Cost ($0.06/M tokens) is acceptable for our scale 3. Eliminates model compatibility issues (same model for index + query)
Future GPU Use Cases:
| Use Case | Model | Status | Notes |
|---|---|---|---|
| Query Expansion | qwen2.5-coder:7b | ✅ ACTIVE | Generates 3 query variations |
| Local Embeddings (dev) | snowflake-arctic-embed:335m | 📋 AVAILABLE | For offline/dev scenarios |
| Code Generation | deepseek-coder:6.7b | 📋 AVAILABLE | Local inference option |
| Reranking (fallback) | qwen2.5-coder:7b | 📋 AVAILABLE | If Voyage API is down |
GPU Infrastructure: - Windows machine: 192.168.10.56:11434 (RTX 4080 SUPER) - Models available: snowflake-arctic-embed:335m, mxbai-embed-large, qwen2.5-coder:7b, deepseek-coder:6.7b
When to Revisit GPU Embeddings: 1. If Voyage AI costs exceed $50/month 2. If we need offline/air-gapped operation 3. If local model quality improves significantly 4. If we need lower latency (<50ms vs ~200ms)
Migration Steps (Detailed)¶
Step 1: Create Go Workspace and pkg/ Module¶
cd cortex
# Create workspace file
cat > go.work << 'EOF'
go 1.24
use (
./cortex-api
./cortex-context
./cortex-indexer-v2
./pkg
)
EOF
# Create pkg module
mkdir -p pkg/embedder pkg/reranker pkg/ratelimit
cd pkg
go mod init github.com/emshvac/cortex-pkg
cd ..
Step 2: Copy Canonical Implementations to pkg/¶
# Copy from cortex-context (the best implementations)
cp cortex-context/internal/embedder/embedder.go pkg/embedder/
cp cortex-context/internal/embedder/voyage.go pkg/embedder/
cp cortex-context/internal/embedder/retry.go pkg/embedder/
cp cortex-context/internal/reranker/reranker.go pkg/reranker/
cp cortex-context/internal/reranker/voyage.go pkg/reranker/
# Update package declarations
sed -i '' 's/package embedder/package embedder/' pkg/embedder/*.go
sed -i '' 's/package reranker/package reranker/' pkg/reranker/*.go
Step 3: Create Rate Limiter in pkg/ratelimit¶
Extract and consolidate the token bucket implementations into a single, thread-safe version.
Step 4: Update cortex-api/main.go¶
// Replace local implementations with shared packages
import (
"github.com/emshvac/cortex-pkg/embedder"
"github.com/emshvac/cortex-pkg/reranker"
)
// Initialize embedder
voyageEmbedder := embedder.NewVoyageEmbedder(embedder.VoyageConfig{
APIKey: os.Getenv("VOYAGE_API_KEY"),
Model: "voyage-code-3",
})
// Initialize reranker
voyageReranker := reranker.NewVoyageReranker(reranker.VoyageConfig{
APIKey: os.Getenv("VOYAGE_API_KEY"),
Model: "rerank-2.5",
})
Step 5: Update cortex-context to Use pkg/¶
// Replace internal imports
import (
"github.com/emshvac/cortex-pkg/embedder"
"github.com/emshvac/cortex-pkg/reranker"
)
Step 6: Update cortex-indexer-v2¶
import "github.com/emshvac/cortex-pkg/embedder"
// Replace GPUEmbedder/VoyageEmbedder with:
emb := embedder.NewVoyageEmbedder(embedder.VoyageConfig{...})
Step 7: Delete Redundant Files¶
# Only after all services are migrated and tested
rm cortex-api/gpu_embedder.go
rm cortex-api/voyage_embeddings.go
rm cortex-api/reranker.go
rm cortex-api/voyage_reranker.go
rm cortex-indexer-v2/gpu_embedder.go
rm cortex-indexer-v2/voyage_embedder.go
# Optionally remove cortex-context internal packages if fully migrated
# (or keep as thin wrappers for backward compatibility)
Step 8: Update go.mod Files¶
# Add pkg dependency to each service
cd cortex-api && go get github.com/emshvac/cortex-pkg@latest
cd ../cortex-context && go get github.com/emshvac/cortex-pkg@latest
cd ../cortex-indexer-v2 && go get github.com/emshvac/cortex-pkg@latest
Step 9: Verify and Test¶
# Build all services
cd cortex
go work sync
go build ./cortex-api/...
go build ./cortex-context/...
go build ./cortex-indexer-v2/...
# Run tests
go test ./pkg/...
go test ./cortex-api/...
go test ./cortex-context/...
go test ./cortex-indexer-v2/...
Consequences¶
Positive¶
- Single source of truth - One implementation for embeddings, reranking, rate limiting
- Easier maintenance - Fix bugs in one place, propagates to all services
- Consistent behavior - Same rate limiting, retry logic, error handling everywhere
- Code reduction - ~1,400 lines of duplicated code removed
- Better testing - Shared packages get more thorough testing
- Clear dependencies - Go workspace makes dependencies explicit
Negative¶
- New package to maintain -
cortex-pkgis a new artifact - Breaking change - Services need updates to import from pkg/
- Coordination required - Changes to pkg/ affect all services
Risks¶
- Build breakage during migration - Mitigated by Go workspace (local development works)
- Version drift - Mitigated by workspace and semantic versioning
- Circular dependencies - Avoid by keeping pkg/ dependency-free
Rollback Plan¶
- During migration - Keep old files until new imports verified working
- After migration - Deleted files preserved in git history:
- Full rollback - Revert to pre-refactoring commit and re-copy implementations
Success Criteria¶
- All services build successfully with Go workspace
- All tests pass
- No duplicate embedder/reranker code (Voyage AI implementations consolidated)
- Rate limiter is thread-safe (verified by race detector - commit 2aa6019)
- CI/CD pipelines updated and passing (fixed in commit c7f136a)
- Fallback implementations documented with clear header comments
Timeline (Actual)¶
| Phase | Estimated | Actual | Status |
|---|---|---|---|
| Phase 1: Create pkg/ | 2-3 hours | ~2 hours | ✅ COMPLETE |
| Phase 2: Migrate services | 4-6 hours | ~4 hours | ✅ COMPLETE |
| Phase 3: Hardening (tests) | 2-3 hours | ~1 hour | ✅ COMPLETE |
| Phase 4: Polish | 1-2 hours | ~30 min | ✅ COMPLETE |
| Total | 9-14 hours | ~7.5 hours | ✅ |
Lessons Learned¶
-
Adapter pattern works well - cortex-api and cortex-indexer-v2 needed float64 while pkg/ uses float32. Thin adapter wrappers solved this cleanly.
-
Go workspace simplifies local dev - Changes to pkg/ immediately available without version bumps or go mod updates.
-
Type compatibility matters - Ensure return types match expectations before migrating. pkg/reranker.Result vs internal/reranker.RerankResult required updates to downstream code.
-
Keep fallbacks documented - GPU embedder and LLM reranker are valuable for offline/outage scenarios. Clear header comments prevent accidental deletion.
-
CI/CD catches issues early - Pipeline failure after Phase 2.3 revealed unused
replacedirectives and workspace interference. Fixed before production impact.
Related Documents¶
- ADR-002: Voyage AI Embeddings Strategy
- ADR-005: Hybrid Search Weight Configuration
- ADR-006: Incremental Indexing
cortex-context/CORTEX_CONTEXT_INTEGRATION.md
Files Deleted (Post-Migration)¶
| File | Lines | Replaced By | Status |
|---|---|---|---|
| cortex-api/voyage_embeddings.go | 158 | pkg/embedder/voyage.go | ✅ DELETED |
| cortex-api/voyage_reranker.go | 376 | pkg/reranker/voyage.go | ✅ DELETED |
| cortex-indexer-v2/voyage_embedder.go | 276 | pkg/embedder/voyage.go | ✅ DELETED |
| cortex-context/internal/embedder/* | 929 | pkg/embedder/* | ✅ DELETED |
| cortex-context/internal/reranker/* | 558 | pkg/reranker/* | ✅ DELETED |
| Total Deleted | 2,297 |
Files Kept (Documented Fallbacks)¶
| File | Lines | Reason |
|---|---|---|
| cortex-api/gpu_embedder.go | 189 | GPU Ollama fallback for offline/dev |
| cortex-indexer-v2/gpu_embedder.go | 224 | GPU Ollama fallback for offline/dev |
| cortex-api/reranker.go | 251 | LLM reranker fallback for Voyage outages |
These files have been documented with clear header comments explaining: - Their fallback status (not primary implementation) - When to use them (offline, dev, outages) - Important caveats (model consistency for embeddings)