Overview
This document contains performance benchmarks for the Cortex AI system.
| Metric |
Value |
Target |
| P50 latency |
45ms |
<100ms |
| P95 latency |
120ms |
<200ms |
| P99 latency |
250ms |
<500ms |
| Throughput |
500 req/s |
>200 req/s |
| Repository Size |
Index Time |
Memory Usage |
| 10K files |
2 min |
1 GB |
| 50K files |
8 min |
3 GB |
| 100K files |
18 min |
6 GB |
| 500K files |
90 min |
15 GB |
Embedding Generation
| Model |
Speed |
Quality Score |
| Voyage Code 3 |
1000 tokens/s |
0.92 |
| text-embedding-3-large |
1500 tokens/s |
0.89 |
| Local Ollama (nomic) |
500 tokens/s |
0.85 |
| Collection Size |
Search Latency |
Memory |
| 1M vectors |
15ms |
4 GB |
| 10M vectors |
35ms |
35 GB |
| 50M vectors |
80ms |
175 GB |
Test Environment
- Hardware: 3-node Proxmox cluster, 64GB RAM each
- Qdrant: 3-node cluster with replication
- PostgreSQL: HA cluster with streaming replication
- Network: 10Gbps internal
Running Benchmarks
# Run search benchmarks
go test -bench=. ./cortex-context/...
# Run indexing benchmarks
go test -bench=. ./cortex-indexer/...
Historical Data
Benchmarks are recorded in Prometheus and visualized in Grafana.
Dashboard: https://grafana.emshvac.co/d/cortex-perf