New Autonomous anomaly detection · Now in GA

Observability for
the AI Agent Era

Not dashboards. Autonomous agents that understand,
reason, and act on your telemetry.

Ingest everything.
Understand instantly.
Act automatically.

Start Free → See How It Works

2.4Msignals/sec
<30sMTTD
36MCP tools
99.2%trace join rate
8persona agents

signalcortex — ai-chat-agent live

Why did latency spike at 03:42 UTC for checkout?

Prompt · production-us-east · 20m window

AI Chat Agent analyzing 240 spans 14 services 3 anomalies ▋

Root cause: vector index saturation in embeddings-svc pushed p99 to 4.8s.

Blast radiusrecommendation-api · search-api · user-gateway

FixScale index nodes + tune HNSW ef_search

Confidence94% · 3 corroborating signals

Open incident Scale now View traces

01 Ingest

02 Understand

03 Reason

04 Act

05 Learn

The Shift

Dashboards are legacy.

The old model was built for humans querying static charts. The new model is built for agents that never stop watching.

Before

The Old Way

Manual ops

01 Instrument Manual config

02 Dashboard Weeks of setup

03 🔍 Query PromQL · LogQL

04 📟 Alert 3am on-call page

05 ⏳ Investigate 40+ min manual

MTTD 40+ minutes · Human-dependent · Context lost on handoff

Now

The SignalCortex Way

Autonomous

01 📥 Ingest All signals · OTLP

02 🧠 Understand Auto-context built

03 💡 Reason AI explains why

04 ⚡ Act Runbooks execute

05 📈 Learn Baselines sharpen

MTTD <30 seconds · Fully autonomous · Self-improving system

How It Works

Every app. Every agent. One intelligent core.

Any telemetry source flows in. SignalCortex understands it. Persona AI agents surface the right insight to the right person — through natural conversation.

Data Sources

LLM AppsGPT · Claude · Gemini

⚡

AI AgentsLangChain · AutoGen · SK

☁️

Cloud ServicesAWS · Azure · GCP

🔗

APIs & MicroservicesREST · gRPC · GraphQL

📦

OTel SDKPython · Go · JS · Java

🦾

Robotic AgentsHumanoid · Drone · AV

🌐

Multi-AgentA2A · Swarms · Long-running

OTLP / gRPC / HTTP

SignalCortex

AI-Native Core

Ingesting telemetry… Detecting anomalies… Correlating signals… Updating baselines… Reasoning with LLM…

ClickHouse Redis Keycloak MCP Tools ABAC Kafka

Persona AI Agents

SRE

SRE Agent Active

Why is p99 latency spiking?

Root cause: cart-service OOM. 3 pods degraded.

DEV

Developer Agent Active

Show me error traces for checkout.

42 spans with HTTP 500 in last 15m.

SEC

Security Agent Alert

Any anomalous auth patterns?

COST

Cost Optimizer Active

Which LLM model costs the most?

GPT-4o: $148/wk. Switch saves 61%.

INC

Incident Responder Incident

Run incident playbook INC-204.

Playbook running. PagerDuty notified.

AIE

AI Engineer Active

Why did eval scores drop overnight?

Hallucination rate up 18% on v0.4.2. Rollback?

ADM

Admin Agent Active

Issue an API key for the staging tenant.

Key issued · mapped to ai-engineer group.

OPS

Operator Agent Active

Status across reliability, cost, and security?

2 anomalies · cost trending +4% · 0 sec alerts.

5Anomaly detectors

7dRolling baseline window

8Persona agents

OTLPOpen standard ingest

The Dual Mission

Two operating modes. One platform core.

Switch between observing AI systems and using AI to operate systems. Both run on the same telemetry fabric.

Capture the internal behavior of AI systems from prompt to tool-chain output with full telemetry context.

01 Trace every LLM call with latency, tokens, model, and cost.
02 Inspect tool invocations in strict execution order.
03 Debug RAG relevance drift with retrieval-level context.
04 Correlate multi-agent orchestration across service boundaries.

2.4M Signals / sec

99.2% Trace Join Rate

420ms P99 Query

OpenTelemetryOpenInferenceLangfuseTrace Graph

Deploy persona-specific AI agents that detect, investigate, and recommend or execute actions under approval policies.

01 Detect anomalies continuously using z-score, EWMA, and heartbeat models.
02 Investigate root cause automatically using traces, logs, and baselines.
03 Propose remediations with confidence and blast-radius scoring.
04 Require human approval for high-risk actions with full audit trail.

< 30s MTTD

87% Auto Triage

-63% False Positives

SRE AgentDeveloper AgentSecurity Agent+3 Personas

Persona-Based Operations

Every role gets an expert.

Not a generic chatbot. Purpose-built AI agents with scoped data access, role-aligned language, and the right tools for each job.

SRE

SRE Agent Monitoring 47 services

p99 latency on checkout-svc spiked at 03:42 UTC. What's the root cause?

SRE

Root cause isolated. Vector DB saturation in embeddings-svc is cascading upstream to checkout.

Root cause embeddings-svc p99 → 4.8s · index at 94% capacity

Blast radius checkout-svc · recommendation-api · user-gateway

Fix Scale index ×2 · Reduce ef_search 512 → 128

Confidence 96% · 5 corroborating spans across 3 traces

Open the incident and page on-call.

SRE

INC-2847 opened · Severity P1 · On-call paged via PagerDuty · Acknowledged in 2m 14s

DEV

Developer Agent Tracing 12 services

Show me all HTTP 500s in payment-svc last 30 min with stack traces.

DEV

47 error spans across 3 root traces. All failures share a common attribute — mising Stripe header.

Top error NullPointerException @ PaymentProcessor.charge():L148

Endpoint POST /api/v1/charge · 2.3% error rate · 47 spans

Common attr header X-Stripe-Key: undefined on all failing requests

Root trace trace-8f2a3c · 12 spans · 890ms total

Which deploy introduced this?

DEV

Deploy d7f9e2a (2h ago by @alex) removed Stripe key injection from middleware. Reverting fixes 100% of failures.

AI Engineer Agent Tracking 4 AI pipelines

Break down my LLM costs this week by model and identify waste.

Total spend: $2,847 this week (+40% WoW). Three models driving 94% of cost.

GPT-4o $1,640 · avg 2,100 tokens/call · 42% of spend

Claude 3.5 $820 · avg 1,400 tokens/call · 29% of spend

Main waste customer-agent injects full history → avg 6,800 tokens/turn

Opportunity Sliding window history → save ~$620/wk immediately

Set a $5k/month budget alert on GPT-4o.

Budget alert set: notify at $4,000 (80%) and $5,000 (100%). At current burn, projected hit in 9 days.

SEC

Security Agent 3 anomalies active

Any unusual auth or access patterns in the last 6 hours?

SEC

3 suspicious patterns detected. One requires immediate action.

Auth burst 847 login attempts from 3 IPs in 90s · z-score 8.4

Stale account service-account-legacy accessed prod (dormant 47 days)

ABAC blocked 2 cross-tenant data requests blocked by policy engine

Risk score HIGH · 2 events require immediate remediation

Block those IPs and open a security incident.

SEC

IPs 45.33.32.156, 198.20.69.74, 162.55.12.8 blocked. SEC-0091 opened · MITRE ATT&CK T1110 (Brute Force).

COST

Cost Agent Tracking $2.8k weekly

LLM costs are up 40% this week. What changed?

COST

Three changes drove the spike. Two are fixable today.

New endpoint /api/chat uses GPT-4o uncapped → +$480/day since Thu

Retry storm payment-ai retrying ×5 on timeout → 5× token spend

Context bloat rag-agent injects full docs (avg 8,200 tokens/call)

Run rate $18,400/mo projected · $6,200 immediately preventable

Create GitHub tasks for the two red items.

COST

Issues created: #4821 (token cap on /api/chat) · #4822 (retry backoff in payment-ai) — assigned to @backend-team.

INC

Incident Responder 1 active P0 incident

payment-svc is 503ing. Customers can't checkout. Declare P0 and triage.

INC

INC-2851 declared P0. Automated triage complete. Runbook matched with 94% confidence.

Impact 3,200 users affected · $12,400/min revenue loss

Root signal payment-svc RAM 98% · 2/4 pods in CrashLoopBackOff

Runbook RB-047: Memory pressure OOM cycle — matched 94%

Auto-actions On-call paged · Slack notified · Status page updated

Execute the runbook.

INC

RB-047 executing: memory limits raised +50% · pods restarting · checkout-svc green · payment-svc ETA 60s.

ADM

Admin Agent Tenant healthy · 12 users

Issue an API key for the staging tenant and map it to the ai-engineer group.

ADM

Key issued. ABAC mapping applied. Onboarding state advanced to AGENT_INSTRUMENTED.

API key sc_live_•••••••••••••••6c2f · HMAC-SHA256 stored

Tenant staging · quota 5M signals/day · 30d retention

Group mapping IdP group ai-engineer → persona ai-engineer (ABAC)

Audit audit_events · actor admin@acme.io · reason=onboarding

Set a soft quota alert at 80%.

ADM

Quota alert configured · will notify #observability at 4M signals/day.

OPS

Operator Agent Cross-cutting view

Give me a single status across reliability, cost, and security right now.

OPS

Cross-domain summary built from 3 persona feeds. One actionable item.

Reliability 2 anomalies · P95 stable · 0 active incidents

Cost +4% WoW · GPT-4o burn elevated · within budget

Security 0 alerts · 1 dormant svc-account flagged for review

Top action Review svc-account-legacy (47d dormant) before next change window

Schedule the review for tomorrow 10am.

OPS

Review item REV-118 created · owner @sec-lead · due tomorrow 10:00 · cross-linked to audit_events.

All personas share a unified telemetry layer with tenant-isolated data and ABAC policy enforcement.

Explore Persona Docs →

Our Principles

A manifesto you can operate.

This is not branding copy. Each principle maps directly to runtime behavior, security boundaries, and incident ergonomics.

Design tension

Dashboards make humans navigate tooling instead of solving incidents.

Doctrine

A question should be enough to trigger context gathering, reasoning, and response.

Operating rule

Natural language first, instrumentation-native under the hood.

Ask, do not click
Human-readable evidence
Zero query language

Architecture

An interactive operations map.

Explore each layer to see how telemetry moves from raw signals to approved, auditable action.

Telemetry Fabric

Capture everything without changing every team workflow.

OTLP gRPC and HTTP receive logs, metrics, traces, and AI semantics. Data is normalized once, then distributed for query and automation.

Core components

OTLP gRPC :4317
OTLP HTTP :4318
otel-ingest
ClickHouse shards
Kafka or Event Hubs

Operational outcomes

Full-fidelity ingestion
Back-pressure handling
Tenant-scoped partitioning

3 shards x 2 replicas

Distributed ClickHouse topology

Strict tenant isolation

tenant_id enforced end-to-end

Policy-checked tooling

ABAC guards all persona actions

OIDC enterprise auth

Keycloak and Entra ID support

Developer Experience

Built for developers.

OTLP-native ingest. OpenTelemetry SDKs work out of the box. No proprietary agents. No lock-in.

Zero-config ingest

Point any OTLP exporter at SignalCortex. Works with existing instrumentation.

Ask, don't query

Natural language interface via persona agents. No PromQL, LogQL, or custom DSL.

MCP-ready

Query your observability data directly from Cursor, VS Code, and Claude.

Enterprise-grade

Multi-tenant isolation, ABAC policies, Keycloak SSO, JWKS JWT validation.

View Documentation → API Reference →

configure otel-collector

# Configure your OTel collector

cat signalcortex.yaml

exporters:

otlp:

endpoint: "ingest.signalcortex.ai:4317"

headers:

x-api-key: "${SCX_API_KEY}"

# Start your collector

otelcol --config=signalcortex.yaml

✓ Exporter started · endpoint=ingest.signalcortex.ai:4317

✓ Logs pipeline active · retention=30d

✓ Traces pipeline active · 3 shards ready

vscode — copilot chat (mcp)

# 1. Wire SignalCortex into your IDE · .vscode/mcp.json

{

"servers": {

"signalcortex": {

"type": "http",

"url": "https://mcp.signalcortex.ai",

"headers": { "X-API-Key": "${SCX_API_KEY}" }

}

# 2. Ask Copilot in plain English (calls 36 MCP tools server-side)

@signalcortex why did checkout p99 spike at 03:42 UTC?

→ search_spans · service=checkout · window=03:30–04:00 UTC

→ list_anomalies · tenant-scoped · ABAC enforced

Found 3 corroborating signals

embeddings-svc p99 4.8s (baseline 120ms)

vector index at 94% capacity

blast radius: recommendation-api · search-api

Trace IDs returned · cite from your editor · no DSL required

Get Started

Run observability
that thinks.

Stop watching dashboards. Start working with agents.

Create Account → Request Enterprise Demo Sign In

OpenTelemetry native

No proprietary collectors

Multi-tenant · Enterprise ready

Observability for
the AI Agent Era

Dashboards are legacy.

The Old Way

The SignalCortex Way

Every app. Every agent. One intelligent core.

Two operating modes. One platform core.

Every role gets an expert.

A manifesto you can operate.

An interactive operations map.

Capture everything without changing every team workflow.

Continuously correlate, baseline, and prioritize change.

Turn telemetry into decisions with auditable reasoning.

Close the loop with safe automation and human controls.

Close the loop on agent quality with continuous evaluation.

Built for developers.

Run observability
that thinks.

Observability for the AI Agent Era

Dashboards are legacy.

The Old Way

The SignalCortex Way

Every app. Every agent. One intelligent core.

Two operating modes. One platform core.

Every role gets an expert.

Capture everything without changing every team workflow.

Continuously correlate, baseline, and prioritize change.

Turn telemetry into decisions with auditable reasoning.

Close the loop with safe automation and human controls.

Close the loop on agent quality with continuous evaluation.

Built for developers.

Run observability that thinks.

Observability for
the AI Agent Era

Run observability
that thinks.