New Autonomous anomaly detection · Now in GA

Observability for
the AI Agent Era

Not dashboards. Autonomous agents that understand,
reason, and act on your telemetry.

  • Ingest everything.
  • Understand instantly.
  • Act automatically.
  • 2.4Msignals/sec
  • <30sMTTD
  • 36MCP tools
  • 99.2%trace join rate
  • 8persona agents
01 Ingest
02 Understand
03 Reason
04 Act
05 Learn

Dashboards are legacy.

The old model was built for humans querying static charts. The new model is built for agents that never stop watching.

Before

The Old Way

Manual ops
01 Instrument Manual config
02 Dashboard Weeks of setup
03 🔍 Query PromQL · LogQL
04 📟 Alert 3am on-call page
05 Investigate 40+ min manual
MTTD 40+ minutes · Human-dependent · Context lost on handoff
Now

The SignalCortex Way

Autonomous
01 📥 Ingest All signals · OTLP
02 🧠 Understand Auto-context built
03 💡 Reason AI explains why
04 Act Runbooks execute
05 📈 Learn Baselines sharpen
MTTD <30 seconds · Fully autonomous · Self-improving system

Every app. Every agent. One intelligent core.

Any telemetry source flows in. SignalCortex understands it. Persona AI agents surface the right insight to the right person — through natural conversation.

Data Sources

LLM AppsGPT · Claude · Gemini
AI AgentsLangChain · AutoGen · SK
☁️
Cloud ServicesAWS · Azure · GCP
🔗
APIs & MicroservicesREST · gRPC · GraphQL
📦
OTel SDKPython · Go · JS · Java
🦾
Robotic AgentsHumanoid · Drone · AV
🌐
Multi-AgentA2A · Swarms · Long-running
OTLP / gRPC / HTTP

SignalCortex

AI-Native Core

Ingesting telemetry… Detecting anomalies… Correlating signals… Updating baselines… Reasoning with LLM…
ClickHouse Redis Keycloak MCP Tools ABAC Kafka

Persona AI Agents

SRE
SRE Agent Active
Why is p99 latency spiking?
Root cause: cart-service OOM. 3 pods degraded.
DEV
Developer Agent Active
Show me error traces for checkout.
42 spans with HTTP 500 in last 15m.
SEC
Security Agent Alert
Any anomalous auth patterns?
Login burst from 3 IPs detected.
COST
Cost Optimizer Active
Which LLM model costs the most?
GPT-4o: $148/wk. Switch saves 61%.
INC
Incident Responder Incident
Run incident playbook INC-204.
Playbook running. PagerDuty notified.
AIE
AI Engineer Active
Why did eval scores drop overnight?
Hallucination rate up 18% on v0.4.2. Rollback?
ADM
Admin Agent Active
Issue an API key for the staging tenant.
Key issued · mapped to ai-engineer group.
OPS
Operator Agent Active
Status across reliability, cost, and security?
2 anomalies · cost trending +4% · 0 sec alerts.
5Anomaly detectors
7dRolling baseline window
8Persona agents
OTLPOpen standard ingest

Two operating modes. One platform core.

Switch between observing AI systems and using AI to operate systems. Both run on the same telemetry fabric.

Capture the internal behavior of AI systems from prompt to tool-chain output with full telemetry context.

  • 01 Trace every LLM call with latency, tokens, model, and cost.
  • 02 Inspect tool invocations in strict execution order.
  • 03 Debug RAG relevance drift with retrieval-level context.
  • 04 Correlate multi-agent orchestration across service boundaries.
2.4M Signals / sec
99.2% Trace Join Rate
420ms P99 Query
OpenTelemetryOpenInferenceLangfuseTrace Graph

Deploy persona-specific AI agents that detect, investigate, and recommend or execute actions under approval policies.

  • 01 Detect anomalies continuously using z-score, EWMA, and heartbeat models.
  • 02 Investigate root cause automatically using traces, logs, and baselines.
  • 03 Propose remediations with confidence and blast-radius scoring.
  • 04 Require human approval for high-risk actions with full audit trail.
< 30s MTTD
87% Auto Triage
-63% False Positives
SRE AgentDeveloper AgentSecurity Agent+3 Personas

Every role gets an expert.

Not a generic chatbot. Purpose-built AI agents with scoped data access, role-aligned language, and the right tools for each job.

SRE
SRE Agent Monitoring 47 services
p99 latency on checkout-svc spiked at 03:42 UTC. What's the root cause?
SRE
Root cause isolated. Vector DB saturation in embeddings-svc is cascading upstream to checkout.
Root cause embeddings-svc p99 → 4.8s · index at 94% capacity
Blast radius checkout-svc · recommendation-api · user-gateway
Fix Scale index ×2 · Reduce ef_search 512 → 128
Confidence 96% · 5 corroborating spans across 3 traces
Open the incident and page on-call.
SRE
INC-2847 opened · Severity P1 · On-call paged via PagerDuty · Acknowledged in 2m 14s
DEV
Developer Agent Tracing 12 services
Show me all HTTP 500s in payment-svc last 30 min with stack traces.
DEV
47 error spans across 3 root traces. All failures share a common attribute — mising Stripe header.
Top error NullPointerException @ PaymentProcessor.charge():L148
Endpoint POST /api/v1/charge · 2.3% error rate · 47 spans
Common attr header X-Stripe-Key: undefined on all failing requests
Root trace trace-8f2a3c · 12 spans · 890ms total
Which deploy introduced this?
DEV
Deploy d7f9e2a (2h ago by @alex) removed Stripe key injection from middleware. Reverting fixes 100% of failures.
AI
AI Engineer Agent Tracking 4 AI pipelines
Break down my LLM costs this week by model and identify waste.
AI
Total spend: $2,847 this week (+40% WoW). Three models driving 94% of cost.
GPT-4o $1,640 · avg 2,100 tokens/call · 42% of spend
Claude 3.5 $820 · avg 1,400 tokens/call · 29% of spend
Main waste customer-agent injects full history → avg 6,800 tokens/turn
Opportunity Sliding window history → save ~$620/wk immediately
Set a $5k/month budget alert on GPT-4o.
AI
Budget alert set: notify at $4,000 (80%) and $5,000 (100%). At current burn, projected hit in 9 days.
SEC
Security Agent 3 anomalies active
Any unusual auth or access patterns in the last 6 hours?
SEC
3 suspicious patterns detected. One requires immediate action.
Auth burst 847 login attempts from 3 IPs in 90s · z-score 8.4
Stale account service-account-legacy accessed prod (dormant 47 days)
ABAC blocked 2 cross-tenant data requests blocked by policy engine
Risk score HIGH · 2 events require immediate remediation
Block those IPs and open a security incident.
SEC
IPs 45.33.32.156, 198.20.69.74, 162.55.12.8 blocked. SEC-0091 opened · MITRE ATT&CK T1110 (Brute Force).
COST
Cost Agent Tracking $2.8k weekly
LLM costs are up 40% this week. What changed?
COST
Three changes drove the spike. Two are fixable today.
New endpoint /api/chat uses GPT-4o uncapped → +$480/day since Thu
Retry storm payment-ai retrying ×5 on timeout → 5× token spend
Context bloat rag-agent injects full docs (avg 8,200 tokens/call)
Run rate $18,400/mo projected · $6,200 immediately preventable
Create GitHub tasks for the two red items.
COST
Issues created: #4821 (token cap on /api/chat) · #4822 (retry backoff in payment-ai) — assigned to @backend-team.
INC
Incident Responder 1 active P0 incident
payment-svc is 503ing. Customers can't checkout. Declare P0 and triage.
INC
INC-2851 declared P0. Automated triage complete. Runbook matched with 94% confidence.
Impact 3,200 users affected · $12,400/min revenue loss
Root signal payment-svc RAM 98% · 2/4 pods in CrashLoopBackOff
Runbook RB-047: Memory pressure OOM cycle — matched 94%
Auto-actions On-call paged · Slack notified · Status page updated
Execute the runbook.
INC
RB-047 executing: memory limits raised +50% · pods restarting · checkout-svc green · payment-svc ETA 60s.
ADM
Admin Agent Tenant healthy · 12 users
Issue an API key for the staging tenant and map it to the ai-engineer group.
ADM
Key issued. ABAC mapping applied. Onboarding state advanced to AGENT_INSTRUMENTED.
API key sc_live_•••••••••••••••6c2f · HMAC-SHA256 stored
Tenant staging · quota 5M signals/day · 30d retention
Group mapping IdP group ai-engineer → persona ai-engineer (ABAC)
Audit audit_events · actor admin@acme.io · reason=onboarding
Set a soft quota alert at 80%.
ADM
Quota alert configured · will notify #observability at 4M signals/day.
OPS
Operator Agent Cross-cutting view
Give me a single status across reliability, cost, and security right now.
OPS
Cross-domain summary built from 3 persona feeds. One actionable item.
Reliability 2 anomalies · P95 stable · 0 active incidents
Cost +4% WoW · GPT-4o burn elevated · within budget
Security 0 alerts · 1 dormant svc-account flagged for review
Top action Review svc-account-legacy (47d dormant) before next change window
Schedule the review for tomorrow 10am.
OPS
Review item REV-118 created · owner @sec-lead · due tomorrow 10:00 · cross-linked to audit_events.

All personas share a unified telemetry layer with tenant-isolated data and ABAC policy enforcement.

Explore Persona Docs →

A manifesto you can operate.

This is not branding copy. Each principle maps directly to runtime behavior, security boundaries, and incident ergonomics.

Design tension

Dashboards make humans navigate tooling instead of solving incidents.

Doctrine

A question should be enough to trigger context gathering, reasoning, and response.

Operating rule

Natural language first, instrumentation-native under the hood.

  • Ask, do not click
  • Human-readable evidence
  • Zero query language

An interactive operations map.

Explore each layer to see how telemetry moves from raw signals to approved, auditable action.

Telemetry Fabric

Capture everything without changing every team workflow.

OTLP gRPC and HTTP receive logs, metrics, traces, and AI semantics. Data is normalized once, then distributed for query and automation.

Core components

  • OTLP gRPC :4317
  • OTLP HTTP :4318
  • otel-ingest
  • ClickHouse shards
  • Kafka or Event Hubs

Operational outcomes

  • Full-fidelity ingestion
  • Back-pressure handling
  • Tenant-scoped partitioning

3 shards x 2 replicas

Distributed ClickHouse topology

Strict tenant isolation

tenant_id enforced end-to-end

Policy-checked tooling

ABAC guards all persona actions

OIDC enterprise auth

Keycloak and Entra ID support

Built for developers.

OTLP-native ingest. OpenTelemetry SDKs work out of the box. No proprietary agents. No lock-in.

Zero-config ingest

Point any OTLP exporter at SignalCortex. Works with existing instrumentation.

Ask, don't query

Natural language interface via persona agents. No PromQL, LogQL, or custom DSL.

MCP-ready

Query your observability data directly from Cursor, VS Code, and Claude.

Enterprise-grade

Multi-tenant isolation, ABAC policies, Keycloak SSO, JWKS JWT validation.

configure otel-collector
# Configure your OTel collector
cat signalcortex.yaml
exporters:
otlp:
endpoint: "ingest.signalcortex.ai:4317"
headers:
x-api-key: "${SCX_API_KEY}"
# Start your collector
otelcol --config=signalcortex.yaml
✓ Exporter started · endpoint=ingest.signalcortex.ai:4317
✓ Logs pipeline active · retention=30d
✓ Traces pipeline active · 3 shards ready
vscode — copilot chat (mcp)
# 1. Wire SignalCortex into your IDE · .vscode/mcp.json
{
"servers": {
"signalcortex": {
"type": "http",
"url": "https://mcp.signalcortex.ai",
"headers": { "X-API-Key": "${SCX_API_KEY}" }
}
}
}
# 2. Ask Copilot in plain English (calls 36 MCP tools server-side)
@signalcortex why did checkout p99 spike at 03:42 UTC?
→ search_spans · service=checkout · window=03:30–04:00 UTC
→ list_anomalies · tenant-scoped · ABAC enforced
Found 3 corroborating signals
embeddings-svc p99 4.8s (baseline 120ms)
vector index at 94% capacity
blast radius: recommendation-api · search-api
Trace IDs returned · cite from your editor · no DSL required

Run observability
that thinks.

Stop watching dashboards. Start working with agents.

OpenTelemetry native
No proprietary collectors
Multi-tenant · Enterprise ready