Skip to content

Monitoring Guide

Overview

Cortex AI uses Prometheus for metrics collection and Grafana for visualization.

Endpoints

Service Metrics Endpoint
cortex-api :8080/metrics
cortex-context :8081/metrics
cortex-indexer :8082/metrics

Key Metrics

API Metrics

# Request rate
rate(http_requests_total{service="cortex-api"}[5m])

# Error rate
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])

# Latency percentiles
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

Search Metrics

# Search latency
histogram_quantile(0.95, rate(search_duration_seconds_bucket[5m]))

# Search throughput
rate(search_requests_total[5m])

Indexing Metrics

# Files indexed per minute
rate(indexed_files_total[1m]) * 60

# Indexing errors
rate(indexing_errors_total[5m])

Grafana Dashboards

Dashboard URL
Overview https://grafana.emshvac.co/d/cortex-overview
API Performance https://grafana.emshvac.co/d/cortex-api
Search Quality https://grafana.emshvac.co/d/cortex-search
Infrastructure https://grafana.emshvac.co/d/cortex-infra

Alerting

Alerts are configured in Prometheus Alertmanager and sent to: - Slack: #cortex-alerts - PagerDuty: For critical alerts

Critical Alerts

Alert Condition Severity
High Error Rate Error rate > 1% for 5min Critical
API Down No heartbeat for 1min Critical
Qdrant Unhealthy <2 healthy nodes Critical

Warning Alerts

Alert Condition Severity
High Latency P95 > 500ms for 10min Warning
Index Lag >30min since last index Warning
Disk Space >80% full Warning