Monitoring Guide
Overview
Cortex AI uses Prometheus for metrics collection and Grafana for visualization.
Endpoints
| Service |
Metrics Endpoint |
| cortex-api |
:8080/metrics |
| cortex-context |
:8081/metrics |
| cortex-indexer |
:8082/metrics |
Key Metrics
API Metrics
# Request rate
rate(http_requests_total{service="cortex-api"}[5m])
# Error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
# Latency percentiles
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
Search Metrics
# Search latency
histogram_quantile(0.95, rate(search_duration_seconds_bucket[5m]))
# Search throughput
rate(search_requests_total[5m])
Indexing Metrics
# Files indexed per minute
rate(indexed_files_total[1m]) * 60
# Indexing errors
rate(indexing_errors_total[5m])
Grafana Dashboards
| Dashboard |
URL |
| Overview |
https://grafana.emshvac.co/d/cortex-overview |
| API Performance |
https://grafana.emshvac.co/d/cortex-api |
| Search Quality |
https://grafana.emshvac.co/d/cortex-search |
| Infrastructure |
https://grafana.emshvac.co/d/cortex-infra |
Alerting
Alerts are configured in Prometheus Alertmanager and sent to:
- Slack: #cortex-alerts
- PagerDuty: For critical alerts
Critical Alerts
| Alert |
Condition |
Severity |
| High Error Rate |
Error rate > 1% for 5min |
Critical |
| API Down |
No heartbeat for 1min |
Critical |
| Qdrant Unhealthy |
<2 healthy nodes |
Critical |
Warning Alerts
| Alert |
Condition |
Severity |
| High Latency |
P95 > 500ms for 10min |
Warning |
| Index Lag |
>30min since last index |
Warning |
| Disk Space |
>80% full |
Warning |