Sentinel
AI observability platform that monitors agent reliability and catches drift
The Problem
AI systems degrade silently. A model that performed at 95% accuracy last month might be at 78% today — and nobody notices until customers complain. Most teams monitor infrastructure (CPU, memory, uptime) but not AI behavior (response quality, hallucination rates, retrieval relevance).
The gap is observability. Not infrastructure monitoring — AI-specific observability that tracks what matters: is the system still giving good answers?
The Architecture
Sentinel is an observability platform purpose-built for AI agent pipelines. It connects to any LLM-based system via lightweight instrumentation and provides three layers of monitoring:
Every LLM call, tool invocation, and agent decision is captured as a structured trace through LangFuse integration. Traces include full input/output pairs, latency, token usage, and custom metadata. The instrumentation adds less than 5ms overhead per call.
Automated evaluators run on sampled traces using Pydantic AI for structured output parsing. Evaluators check: factual grounding, response relevance, instruction adherence, and format compliance. Each evaluation produces a typed score that feeds into trend analysis.
When quality metrics cross configurable thresholds, Sentinel fires alerts. Not just "something is wrong" — specific alerts like "retrieval relevance for the finance domain dropped below 85% over the last 4 hours." Dashboards built on FastAPI endpoints with real-time WebSocket updates.
Technical Decisions
Structured, typed outputs. When an evaluator assesses response quality, I need a typed score object — not a string that might say "7/10" or "seven out of ten" or "pretty good." Pydantic AI enforces output schemas, making downstream aggregation reliable.
Open source and self-hostable. Sentinel is designed for enterprises that cannot send trace data to third-party SaaS. LangFuse runs in the client's infrastructure with full data sovereignty.
Trace data is relational at its core — spans belong to traces, traces belong to sessions, sessions belong to users. PostgreSQL handles this naturally with JSONB columns for flexible metadata and proper indexing for time-series queries.