VAIL — Runtime Verification for Security Leaders

The problem

AI systems fail differently than traditional software.

Silently, probabilistically, and without a stack trace. Traditional security tools monitor network traffic and endpoints — but AI introduces three failure modes that standard controls do not cover.

Provenance is unverifiable

A model claims an identity, a lineage, a benchmark score. Without behavioral evidence, there is no independent way to confirm any of it. Cursor marketed Composer 2 as in-house before acknowledging it was built on Moonshot AI's Kimi K2.5. Meta submitted a chat-optimized Llama 4 Maverick variant to LMArena while the publicly released model performed differently.

Behavioral Fingerprinting — model identity & lineage

Endpoints drift silently

Providers update, quantize, route, or swap models behind stable API names. Standard health checks see nothing. Three overlapping silent changes to Claude degraded coding performance for six weeks before Anthropic published a postmortem. OpenAI's GPT-5.3-Codex was caught routing Pro subscribers to GPT-5.2 while the CLI displayed the wrong model name.

Stability Monitor — endpoint drift & routing

Unexpected agent behaviors

Skill files, memory files, MCP tool descriptions, and behavioral configs directly steer agent actions. A single malicious edit persists across sessions. Cisco researchers demonstrated that injected instructions in Claude Code's memory file silently altered agent behavior. Malicious MCP configurations in repositories could execute code with developer permissions across four major coding agents.

Agent Behavior Tracking — config & trajectory changes

This is already happening

Documented incidents. Each one detectable by verification.

Every scenario below has been publicly documented. In each case, continuous behavioral monitoring — comparing live behavior against an approved baseline — would have produced timestamped evidence for security and compliance teams.

Cursor / Kimi · 2026

Composer 2 marketed as in-house — built on Kimi K2.5.

Cursor marketed its Composer 2 coding model as in-house before acknowledging it was built on Moonshot AI's Kimi K2.5. The product label stayed constant while the underlying model relationship was misrepresented.

Behavioral Fingerprinting would make the Kimi relationship auditable. Stability Monitor would flag if the served model identity or behavior changed while the product label stayed constant.

Behavioral Fingerprinting + Stability Monitor

OpenAI Codex · 2026

GPT-5.3-Codex silently routed to GPT-5.2.

Users reported selecting GPT-5.3-Codex while response metadata and behavior indicated requests were served by GPT-5.2. The CLI displayed the wrong model name throughout.

Stability Monitor detects the behavioral change behind the same product surface. Semantic fingerprints can independently compare the served model against known identities.

Stability Monitor — silent model routing

Anthropic · Mar–Apr 2026

Three overlapping changes caused six weeks of degradation.

Reports traced Claude Code quality complaints to overlapping changes in reasoning effort, caching behavior, and system-prompt verbosity limits — before Anthropic published a postmortem.

Stability Monitor would have turned a long user-report-driven regression into timestamped change events. Agent Behavior Tracking applies to prompt and config diffs across the stack.

Stability Monitor + Agent Behavior Tracking

TrustFall · May 2026

Malicious MCP configs across four coding agents.

Malicious MCP configuration files in repositories could execute helper programs with developer permissions across Claude Code, Gemini CLI, Cursor CLI, and GitHub Copilot CLI.

Agent Behavior Tracking maps directly to persistent MCP and config diffs that add risky tools. Runtime traces show repository provenance and config activation before agent work begins.

Agent Behavior Tracking — MCP supply-chain exploit

Meta · 2025

Llama 4 Maverick arena variant ≠ public release.

LM Arena confirmed the version of Meta's Llama 4 Maverick listed on their arena was a customized model optimized for human preference — not the standard release users would deploy.

Behavioral Fingerprinting comparison between the arena version and the actual release would show measurable divergence — proving they were not the same model.

Behavioral Fingerprinting — benchmark identity mismatch

Google GenAI · 2026

Gemini aliases resolve to models you cannot audit.

Users reported that Gemini alias APIs return alias names without exposing the concrete resolved model version, making production behavior hard to audit.

Stability Monitor is the black-box substitute for a missing resolved_model field: it detects when alias behavior changes and records stability periods even if the provider only returns the alias.

Stability Monitor — alias transparency gap

The platform

Verification infrastructure for every deployment.

Three verification methods mapped to the three failure modes — provenance, endpoint drift, and agent behavior. Each produces timestamped evidence your security and compliance teams can audit.

/ 01 · Provenance

Behavioral Fingerprinting

Provenance claims are only as good as the evidence behind them. Behavioral Fingerprinting extracts a semantic fingerprint from any model's input-output behavior. Compare fingerprints to reveal fine-tuning relationships, distillation lineage, quantization variants, and false identity claims before they become license, compliance, or security liabilities.

Detects False identity claims · Undisclosed fine-tuning · Lineage mismatch

/ 02 · Endpoint drift

Stability Monitoring

Continuously monitors endpoints to detect changes — model swaps, version updates, quantization changes, inference stack shifts, and parameter drift. Produces an audit trail of stability periods and change events usable by infrastructure ops, security, and compliance.

Detects Silent model routing · Provider divergence · Guardrail degradation

/ 03 · Agent behavior

Agent Behavior Tracking

Track agent behavior tendencies as they adapt from within the production environment. Detect persistent changes to skill files, memory files, MCP configurations, and system prompts — the configuration layer that steers agent actions across sessions.

Detects MCP config tampering · Memory file injection · Autonomy drift

How it works

Zero-touch deployment. Immediate coverage.

VAIL operates as an independent monitoring layer — no agents, no SDKs, no code changes. Probes run from VAIL infrastructure against your AI endpoints' existing API surface. Deploys in minutes, not sprints.

VAIL never accesses your production logs, application traffic, or private enterprise data. Our monitors run completely independently of your application-specific traffic to and from any model endpoint. We issue our own synthetic probes from separate infrastructure — your real user data, prompts, and responses are never seen, stored, or processed by VAIL.

01

Enumerate

Catalog every AI model endpoint in your environment. API providers, self-hosted models, embedded integrations.

02

Baseline

Capture a behavioral fingerprint of each endpoint at the point of approval. This becomes the known-good state.

03

Monitor

Hourly probes compare live behavior against the baseline. Divergence triggers alerts through your SIEM/SOAR.

04

Investigate

When a change is detected, behavioral fingerprints and stability records provide the forensic evidence for triage.

24/7

Continuous behavioral monitoring. Hourly probes. No gaps.

0

Access to production logs, user data, or enterprise traffic

<5min

Time to first baseline. Add an endpoint, get coverage immediately.

3rd

Party-independent infrastructure. No conflict of interest.

Framework alignment

Maps directly to the frameworks your team already uses.

VAIL monitoring generates evidence that maps to existing security and compliance frameworks — not a parallel compliance program, but evidence that feeds into what you already run.

01

MITRE ATLAS

Directly addresses ML supply chain compromise (AML.T0010), model poisoning (AML.T0019), and evasion techniques across the AI attack lifecycle.

Lineage tracing + Behavioral Fingerprinting

02

NIST AI RMF

Produces evidence for MAP (risk identification), MANAGE (post-deployment monitoring), and GOVERN (accountability and documentation) functions.

Continuous attestation + Behavioral Fingerprinting

03

OWASP LLM Top 10

Addresses LLM03 (Supply Chain Vulnerabilities), LLM05 (Improper Output Handling via behavioral drift), and LLM09 (Overreliance on unverified model identity).

All three layers

04

ISO/IEC 42001

Generates timestamped evidence for risk assessment (Section 6), performance evaluation (Section 9), and continual improvement (Section 10).

Attestation records + Behavioral evidence

05

EU AI Act

Supplies conformity evidence for GPAI model provenance obligations, Article 12 logging requirements, and Article 72 post-market monitoring.

Full lifecycle evidence chain

06

Cisco AI Security Framework

Maps to 3 objectives and 7+ techniques across supply chain compromise, integrity degradation, and persistence — detailed mapping below.

VAIL fingerprints in Cisco Provenance Explorer

Cisco AI Security Framework — where runtime monitoring applies

Continuous behavioral monitoring maps to three Cisco framework objectives.

Cisco's Integrated AI Security and Safety Framework is a lifecycle-aware taxonomy covering 19 attacker objectives, 40 techniques, and 112+ subtechniques. The framework explicitly calls for "runtime monitoring and guardrail deployment" and "provenance standards for establishing trust and traceability" as core operationalization requirements. Here's where VAIL's continuous behavioral monitoring provides the detection layer.

OB-009 · Supply Chain Compromise

Detecting model swaps, evasion, and dependency replacement.

AITech-9.1 Model or Agentic System Manipulation — When a model behind an endpoint is swapped or manipulated, behavioral fingerprinting detects the divergence from the approved baseline. The model weights may change, but the behavioral signature changes with them.

AITech-9.2 Detection Evasion (Backdoors and Trojans, Obfuscation) — Attacks designed to evade static analysis are still detectable through behavioral monitoring. A backdoored model will produce different output distributions than the clean version when probed systematically.

AITech-9.3 Dependency / Plugin Compromise (Dependency Replacement / Rug Pull) — When a provider silently replaces the model behind an API — the "rug pull" scenario — continuous monitoring catches the behavioral shift within the first probe cycle.

Stability Monitor detects 9.1 swaps and 9.3 rug pulls behaviorally. Behavioral Fingerprinting detects 9.2 backdoors at runtime.

OB-007 · Sabotage / Integrity Degradation

Detecting reasoning corruption and behavioral drift.

AITech-7.1 Reasoning Corruption — When a model's reasoning capabilities degrade — through quantization, infrastructure changes, or adversarial modification — the behavioral fingerprint captures the shift in output quality and decision patterns.

AITech-7.3 Data Source Abuse and Manipulation (Corrupted Third-Party Data) — When a model's RAG pipeline or data sources are corrupted, the model's behavioral outputs change. Continuous monitoring detects this as a divergence from the approved behavioral baseline, even when the model weights themselves haven't changed.

Behavioral Fingerprinting captures reasoning quality. Stability Monitor detects integrity degradation over time.

OB-005 · Persistence

Detecting configuration tampering and profile modification.

AITech-5.2 Configuration Persistence / Agent Profile Tampering — When an attacker modifies system prompts, agent configurations, or behavioral parameters to establish persistent access, the model's response patterns change. Behavioral monitoring detects this because a tampered configuration produces measurably different output distributions than the approved state.

This is the exact scenario documented in the McKinsey Lilli breach — where an attacker with database access could silently alter how the AI responded to every employee. Continuous behavioral monitoring provides the detection layer that configuration auditing alone cannot.

Stability Monitor detects behavioral changes from tampered configurations, system prompts, or agent profiles.

Framework Operationalization · Section 3

The framework explicitly requires runtime monitoring infrastructure.

The Cisco framework's operationalization section identifies six core defense domains. Two of them directly require the kind of continuous monitoring VAIL provides:

Runtime security — The framework calls for "creating policies or guardrails to detect against prompt injections and jailbreaks" and continuous assessment of model behavior at the endpoint level.

Supply chain security — The framework calls for "analyzing and tracking model integrity, auditing dependencies and model provenance for tampering." It further calls for "provenance standards for establishing trust and traceability" including "model origins, training data lineage, and complete dependency chains."

VAIL provides the runtime monitoring and supply chain provenance infrastructure that the Cisco framework identifies as necessary but does not itself provide — filling the gap between threat identification and continuous detection.

VAIL operationalizes Cisco framework requirements for runtime security and supply chain integrity monitoring.

Methodology in operation

Don't take our word for it. See it running live.

Stability Arena is our public dashboard tracking endpoint stability across major models and inference providers. Verify the methodology yourself before any conversation.

Live · Updated hourly

Stability Arena

A continuously updating view of model behavior across providers — including the kind of cross-provider divergence that's invisible to traditional monitoring.

Endpoint Stability — Same Model, 3 Providers Live preview

Open Stability Arena

Peer-reviewed research

Behavioral Fingerprints for LLM Endpoint Stability and Identity

The peer-reviewed methodology behind VAIL's continuous monitoring. Demonstrates detection of changes to model family, version, inference stack, quantization, and behavioral parameters — including substantial cross-provider stability differences for the same model.

Accepted at ACM CAIS '26, System Demonstrations.

Authors — Jonah Leshin, Manish Shah, Ian Timmis, Daniel Kang

Venue — ACM CAIS '26, System Demonstrations

Read the paper

From security practitioners

Why this matters to security leaders.

"When we audit a high-risk AI system, the first question is always: can you prove this is the same model you certified? Most organizations can't. They have documentation from a point in time, but no continuous evidence that the deployed system hasn't drifted."

Ovi Pinzaru — Founding Partner & CEO, AxiLayer AI

"Documentation can be faked. Metadata can be stripped. A model card can claim 'trained from scratch' when the weights are a modified copy. The only way to verify provenance is to examine the weights themselves."

Amy Chang — Head of AI Threat Intelligence & Security Research, Cisco

"These workloads cover conversational agents and a large portion of these use cases rely heavily on tool calling, where even small behavioral shifts can cause downstream automation issues. Stability at the inference layer is becoming a key concern."

Luiz Lima — AI Lead, Clipboard

"The AI supply chain is the new software supply chain. We learned from SolarWinds that you can't trust what a vendor tells you about their product — you have to verify independently. The same principle applies to every model endpoint."

VAIL Research — Security Evolution of Core Technologies

Verify your AI systems.