Skip to content

ADR-007: Refactoring - Long-Term Architecture for Shared Components

Status: ✅ COMPLETED Date: 2026-02-16 Completed: 2026-02-17 Decision Makers: Brian Moore, AI Team

Implementation Status

Phase Duration Status Commit
Phase 1: Create pkg/ ~2 hours ✅ COMPLETE 8ade7a1
Phase 2.1: Migrate cortex-api ~1 hour ✅ COMPLETE eb8ade6
Phase 2.2: Migrate cortex-indexer-v2 ~1 hour ✅ COMPLETE 869a8b7
Phase 2.3: Migrate cortex-context ~2 hours ✅ COMPLETE f76c57d
Phase 3: Thread-safety tests ~1 hour ✅ COMPLETE 2aa6019
Phase 4: Clean code polish ~30 min ✅ COMPLETE (this commit)

Files Deleted (Code Reduction)

File Lines Replaced By
cortex-api/voyage_embeddings.go 158 pkg/embedder/voyage.go
cortex-api/voyage_reranker.go 376 pkg/reranker/voyage.go
cortex-indexer-v2/voyage_embedder.go 276 pkg/embedder/voyage.go
cortex-context/internal/embedder/* 929 pkg/embedder/*
cortex-context/internal/reranker/* 558 pkg/reranker/*
Total Deleted 2,297

Files Kept (Documented Fallbacks)

File Lines Purpose
cortex-api/gpu_embedder.go 189 GPU Ollama fallback (offline/dev)
cortex-indexer-v2/gpu_embedder.go 224 GPU Ollama fallback (offline/dev)
cortex-api/reranker.go 251 LLM reranker fallback (Voyage outages)

These files are kept as documented fallbacks with clear header comments explaining their purpose and when to use them. See "GPU Strategy" section below.

Net Code Reduction

  • Lines deleted: 2,297
  • Lines added (pkg/): ~600
  • Lines added (adapters): ~150
  • Net reduction: ~1,547 lines (~68% reduction in duplicated code)

Context

The Cortex codebase has grown organically with multiple competing implementations: - cortex-api - Main API server (module: cortex-api) - cortex-context - MCP context engine (module: github.com/emshvac/cortex-context) - cortex-indexer-v2 - Code indexer (module: cortex-indexer)

Problems: 1. Duplicated code - 3 embedder implementations, 2 reranker implementations (~2,200 lines) 2. Inconsistent module naming - Some use local names, some use GitHub paths 3. No code sharing - Can't import between modules due to naming inconsistency 4. Drift risk - Fixes in one place don't propagate to others

Decision

Create a Go workspace with shared packages to enable proper code sharing while maintaining separate deployable services.

Long-Term Architecture

cortex/                              # Go workspace root
├── go.work                          # Go workspace file
├── pkg/                             # SHARED PACKAGES (new)
│   ├── go.mod                       # module github.com/emshvac/cortex-pkg
│   ├── embedder/
│   │   ├── embedder.go              # Interface
│   │   ├── voyage.go                # Voyage AI (production)
│   │   ├── ollama.go                # Ollama (fallback/dev)
│   │   └── retry.go                 # Shared retry logic
│   ├── reranker/
│   │   ├── reranker.go              # Interface
│   │   └── voyage.go                # Voyage AI
│   └── ratelimit/
│       └── tokenbucket.go           # Thread-safe rate limiter
├── cortex-api/                      # API Server
│   ├── go.mod                       # module github.com/emshvac/cortex-api
│   └── (imports pkg/embedder, pkg/reranker)
├── cortex-context/                  # Context Engine (MCP Server)
│   ├── go.mod                       # module github.com/emshvac/cortex-context
│   └── (imports pkg/embedder, pkg/reranker)
└── cortex-indexer-v2/               # Code Indexer
    ├── go.mod                       # module github.com/emshvac/cortex-indexer
    └── (imports pkg/embedder)

Why Go Workspace?

  1. Local development - Changes to pkg/ immediately available to all services
  2. Proper versioning - Each service can pin to specific pkg version in prod
  3. Clean separation - Shared code is explicit, not buried in /internal
  4. No MCP latency - Direct function calls for embeddings, not network hops

Data Flow After Refactoring

┌─────────────────────────────────────────────────────────────────────┐
│                        cortex-api (API Server)                       │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────────────┐  │
│  │ pkg/embedder│    │pkg/reranker │    │ MCP Client              │  │
│  │ (direct)    │    │ (direct)    │    │ (search orchestration)  │  │
│  └──────┬──────┘    └──────┬──────┘    └───────────┬─────────────┘  │
└─────────┼──────────────────┼───────────────────────┼────────────────┘
          │                  │                       │
          ▼                  ▼                       ▼
┌─────────────────┐  ┌─────────────────┐   ┌────────────────────────┐
│   Voyage AI     │  │   Voyage AI     │   │   cortex-context       │
│   Embeddings    │  │   Reranking     │   │   (MCP Server)         │
│   API           │  │   API           │   │   - Search orchestration│
└─────────────────┘  └─────────────────┘   │   - Query expansion    │
                                           │   - RSE extraction     │
                                           │   - Result packing     │
                                           └───────────┬────────────┘
                                           ┌────────────────────────┐
                                           │   Qdrant HA Cluster    │
                                           │   (Vector + BM25)      │
                                           └────────────────────────┘

Key insight: pkg/embedder and pkg/reranker are used directly (no network hop). MCP is only for complex search orchestration.

Refactoring Phases

Phase 1: Create Shared Package Structure (FOUNDATION)

Duration: ~2-3 hours Risk: Low (additive change)

  1. Create cortex/go.work workspace file
  2. Create cortex/pkg/ with shared packages
  3. Move canonical implementations from cortex-context to pkg/
  4. Update cortex-context to import from pkg/
  5. Verify all tests pass
# Create workspace
cd cortex
go work init
go work use ./cortex-api ./cortex-context ./cortex-indexer-v2 ./pkg

# Create pkg module
mkdir -p pkg/embedder pkg/reranker pkg/ratelimit
cd pkg && go mod init github.com/emshvac/cortex-pkg

Phase 2: Migrate Services to Shared Packages (CONSOLIDATION)

Duration: ~4-6 hours Risk: Medium (breaking changes)

Service Current Files Migration
cortex-api gpu_embedder.go, voyage_embeddings.go, reranker.go, voyage_reranker.go Replace with import "github.com/emshvac/cortex-pkg/embedder"
cortex-indexer-v2 gpu_embedder.go, voyage_embedder.go Replace with import "github.com/emshvac/cortex-pkg/embedder"
cortex-context internal/embedder/, internal/reranker/ Move to pkg/, keep thin wrappers if needed

Files to delete after migration: | File | Lines | Reason | |------|-------|--------| | cortex-api/gpu_embedder.go | 178 | Replaced by pkg/embedder | | cortex-api/voyage_embeddings.go | 158 | Replaced by pkg/embedder | | cortex-api/reranker.go | 236 | Replaced by pkg/reranker | | cortex-api/voyage_reranker.go | 376 | Replaced by pkg/reranker | | cortex-indexer-v2/gpu_embedder.go | 213 | Replaced by pkg/embedder | | cortex-indexer-v2/voyage_embedder.go | 276 | Replaced by pkg/embedder | | Total | 1,437 | |

Phase 3: Fix Structural Issues (HARDENING)

Duration: ~2-3 hours Risk: Low

  1. Rate limiter consolidation - Single thread-safe implementation in pkg/ratelimit
  2. HTTP client pooling - Shared client with proper connection reuse
  3. Error handling standardization - Consistent error types across services

Phase 4: Clean Code (POLISH)

Duration: ~1-2 hours Risk: Low

  1. Extract magic numbers to config (hardcoded IPs, timeouts, batch sizes)
  2. Standardize logging patterns
  3. Add comprehensive metrics to shared packages

Shared Package Structure (NEW)

pkg/embedder/ - Embedding Generation

File Purpose Features
embedder.go Interface definition Embedder, BatchEmbedder interfaces
voyage.go Voyage AI voyage-code-3 Rate limiting (300 RPM), batching (128), retry, float32
ollama.go Ollama fallback GPU/CPU inference, configurable model
retry.go Shared retry logic Exponential backoff, jitter

pkg/reranker/ - Cross-Encoder Reranking

File Purpose Features
reranker.go Interface definition Reranker interface, config types
voyage.go Voyage AI rerank-2.5 Rate limiting, retry, truncation handling

pkg/ratelimit/ - Thread-Safe Rate Limiting

File Purpose Features
tokenbucket.go Token bucket implementation Thread-safe, configurable RPM, burst support

GPU Strategy (PRESERVED FOR FUTURE)

Current Decision: GPU Ollama is NOT used for embeddings because: 1. Voyage AI voyage-code-3 achieves 97.6% recall vs 68.3% for local models 2. Cost ($0.06/M tokens) is acceptable for our scale 3. Eliminates model compatibility issues (same model for index + query)

Future GPU Use Cases:

Use Case Model Status Notes
Query Expansion qwen2.5-coder:7b ✅ ACTIVE Generates 3 query variations
Local Embeddings (dev) snowflake-arctic-embed:335m 📋 AVAILABLE For offline/dev scenarios
Code Generation deepseek-coder:6.7b 📋 AVAILABLE Local inference option
Reranking (fallback) qwen2.5-coder:7b 📋 AVAILABLE If Voyage API is down

GPU Infrastructure: - Windows machine: 192.168.10.56:11434 (RTX 4080 SUPER) - Models available: snowflake-arctic-embed:335m, mxbai-embed-large, qwen2.5-coder:7b, deepseek-coder:6.7b

When to Revisit GPU Embeddings: 1. If Voyage AI costs exceed $50/month 2. If we need offline/air-gapped operation 3. If local model quality improves significantly 4. If we need lower latency (<50ms vs ~200ms)

Migration Steps (Detailed)

Step 1: Create Go Workspace and pkg/ Module

cd cortex

# Create workspace file
cat > go.work << 'EOF'
go 1.24

use (
    ./cortex-api
    ./cortex-context
    ./cortex-indexer-v2
    ./pkg
)
EOF

# Create pkg module
mkdir -p pkg/embedder pkg/reranker pkg/ratelimit
cd pkg
go mod init github.com/emshvac/cortex-pkg
cd ..

Step 2: Copy Canonical Implementations to pkg/

# Copy from cortex-context (the best implementations)
cp cortex-context/internal/embedder/embedder.go pkg/embedder/
cp cortex-context/internal/embedder/voyage.go pkg/embedder/
cp cortex-context/internal/embedder/retry.go pkg/embedder/
cp cortex-context/internal/reranker/reranker.go pkg/reranker/
cp cortex-context/internal/reranker/voyage.go pkg/reranker/

# Update package declarations
sed -i '' 's/package embedder/package embedder/' pkg/embedder/*.go
sed -i '' 's/package reranker/package reranker/' pkg/reranker/*.go

Step 3: Create Rate Limiter in pkg/ratelimit

Extract and consolidate the token bucket implementations into a single, thread-safe version.

Step 4: Update cortex-api/main.go

// Replace local implementations with shared packages
import (
    "github.com/emshvac/cortex-pkg/embedder"
    "github.com/emshvac/cortex-pkg/reranker"
)

// Initialize embedder
voyageEmbedder := embedder.NewVoyageEmbedder(embedder.VoyageConfig{
    APIKey: os.Getenv("VOYAGE_API_KEY"),
    Model:  "voyage-code-3",
})

// Initialize reranker
voyageReranker := reranker.NewVoyageReranker(reranker.VoyageConfig{
    APIKey: os.Getenv("VOYAGE_API_KEY"),
    Model:  "rerank-2.5",
})

Step 5: Update cortex-context to Use pkg/

// Replace internal imports
import (
    "github.com/emshvac/cortex-pkg/embedder"
    "github.com/emshvac/cortex-pkg/reranker"
)

Step 6: Update cortex-indexer-v2

import "github.com/emshvac/cortex-pkg/embedder"

// Replace GPUEmbedder/VoyageEmbedder with:
emb := embedder.NewVoyageEmbedder(embedder.VoyageConfig{...})

Step 7: Delete Redundant Files

# Only after all services are migrated and tested
rm cortex-api/gpu_embedder.go
rm cortex-api/voyage_embeddings.go
rm cortex-api/reranker.go
rm cortex-api/voyage_reranker.go
rm cortex-indexer-v2/gpu_embedder.go
rm cortex-indexer-v2/voyage_embedder.go

# Optionally remove cortex-context internal packages if fully migrated
# (or keep as thin wrappers for backward compatibility)

Step 8: Update go.mod Files

# Add pkg dependency to each service
cd cortex-api && go get github.com/emshvac/cortex-pkg@latest
cd ../cortex-context && go get github.com/emshvac/cortex-pkg@latest
cd ../cortex-indexer-v2 && go get github.com/emshvac/cortex-pkg@latest

Step 9: Verify and Test

# Build all services
cd cortex
go work sync
go build ./cortex-api/...
go build ./cortex-context/...
go build ./cortex-indexer-v2/...

# Run tests
go test ./pkg/...
go test ./cortex-api/...
go test ./cortex-context/...
go test ./cortex-indexer-v2/...

Consequences

Positive

  1. Single source of truth - One implementation for embeddings, reranking, rate limiting
  2. Easier maintenance - Fix bugs in one place, propagates to all services
  3. Consistent behavior - Same rate limiting, retry logic, error handling everywhere
  4. Code reduction - ~1,400 lines of duplicated code removed
  5. Better testing - Shared packages get more thorough testing
  6. Clear dependencies - Go workspace makes dependencies explicit

Negative

  1. New package to maintain - cortex-pkg is a new artifact
  2. Breaking change - Services need updates to import from pkg/
  3. Coordination required - Changes to pkg/ affect all services

Risks

  1. Build breakage during migration - Mitigated by Go workspace (local development works)
  2. Version drift - Mitigated by workspace and semantic versioning
  3. Circular dependencies - Avoid by keeping pkg/ dependency-free

Rollback Plan

  1. During migration - Keep old files until new imports verified working
  2. After migration - Deleted files preserved in git history:
    git checkout HEAD~1 -- cortex-api/gpu_embedder.go
    
  3. Full rollback - Revert to pre-refactoring commit and re-copy implementations

Success Criteria

  • All services build successfully with Go workspace
  • All tests pass
  • No duplicate embedder/reranker code (Voyage AI implementations consolidated)
  • Rate limiter is thread-safe (verified by race detector - commit 2aa6019)
  • CI/CD pipelines updated and passing (fixed in commit c7f136a)
  • Fallback implementations documented with clear header comments

Timeline (Actual)

Phase Estimated Actual Status
Phase 1: Create pkg/ 2-3 hours ~2 hours ✅ COMPLETE
Phase 2: Migrate services 4-6 hours ~4 hours ✅ COMPLETE
Phase 3: Hardening (tests) 2-3 hours ~1 hour ✅ COMPLETE
Phase 4: Polish 1-2 hours ~30 min ✅ COMPLETE
Total 9-14 hours ~7.5 hours

Lessons Learned

  1. Adapter pattern works well - cortex-api and cortex-indexer-v2 needed float64 while pkg/ uses float32. Thin adapter wrappers solved this cleanly.

  2. Go workspace simplifies local dev - Changes to pkg/ immediately available without version bumps or go mod updates.

  3. Type compatibility matters - Ensure return types match expectations before migrating. pkg/reranker.Result vs internal/reranker.RerankResult required updates to downstream code.

  4. Keep fallbacks documented - GPU embedder and LLM reranker are valuable for offline/outage scenarios. Clear header comments prevent accidental deletion.

  5. CI/CD catches issues early - Pipeline failure after Phase 2.3 revealed unused replace directives and workspace interference. Fixed before production impact.

  • ADR-002: Voyage AI Embeddings Strategy
  • ADR-005: Hybrid Search Weight Configuration
  • ADR-006: Incremental Indexing
  • cortex-context/CORTEX_CONTEXT_INTEGRATION.md

Files Deleted (Post-Migration)

File Lines Replaced By Status
cortex-api/voyage_embeddings.go 158 pkg/embedder/voyage.go ✅ DELETED
cortex-api/voyage_reranker.go 376 pkg/reranker/voyage.go ✅ DELETED
cortex-indexer-v2/voyage_embedder.go 276 pkg/embedder/voyage.go ✅ DELETED
cortex-context/internal/embedder/* 929 pkg/embedder/* ✅ DELETED
cortex-context/internal/reranker/* 558 pkg/reranker/* ✅ DELETED
Total Deleted 2,297

Files Kept (Documented Fallbacks)

File Lines Reason
cortex-api/gpu_embedder.go 189 GPU Ollama fallback for offline/dev
cortex-indexer-v2/gpu_embedder.go 224 GPU Ollama fallback for offline/dev
cortex-api/reranker.go 251 LLM reranker fallback for Voyage outages

These files have been documented with clear header comments explaining: - Their fallback status (not primary implementation) - When to use them (offline, dev, outages) - Important caveats (model consistency for embeddings)