VAIL — Complete Model Provenance & Continuous Assurance for High-Risk AI Systems

The evidence your conformity file needs

Three layers of evidence. Minimal lift, any scale.

01

Provenance Scan

A static lineage report verifying where the model came from — direct derivative, fine-tune, quantization, merger, or independently trained. Delivered as audit-ready evidence at procurement.

02

Behavioral Fingerprint

A behavioral baseline captured at intake — an auditable record of what the model actually does at the moment it's approved. Exportable alongside the provenance scan into your conformity file.

03

Continuous Attestation

Once deployed, hourly behavioral comparisons against the intake baseline produce timestamped attestation records — formal evidence that the model in production remains the model you approved.

A Model Assurance Toolkit that works alongside your existing compliance stack.

You already have a GRC platform for evidence vaults, control mapping, and audit workflows. The Model Assurance Toolkit complements that stack by producing the evidence itself — structured, timestamped evidence that drops into whatever platform you already use.

As your AI inventory grows from a handful of models to hundreds, the same toolkit scales with you — no additional headcount, no manual evidence collection.

Why comprehensive coverage matters

Compliance requires answers at every layer.

Regulators expect you to know where a model came from, what it does, and whether it's changed. Each of these is a different kind of evidence — and the Model Assurance Toolkit covers all three.

Supply chain verification

Confirmed origin, before anything runs.

A model's documentation can say anything. A provenance scan examines the model itself to confirm whether it's derived from an approved base — catching mislabeled models, undisclosed fine-tuning, and supply-chain risks that metadata alone can't reveal.

This is the first gate: if the lineage isn't verified, no conformity evidence is issued.

Provenance scan establishes where the model came from.

Behavioral assurance

Verified behavior, from approval through production.

A model with confirmed lineage can still behave in ways that diverge from its parent — through fine-tuning, quantization, or post-training modifications. And once deployed, inference providers can change what's running without notice.

Behavioral fingerprinting captures what the model actually does at approval, then continuously attests that it hasn't changed after deployment.

Behavioral fingerprint + continuous attestation tracks what the model does and when it changes.

The evidence it generates

Three categories of compliance evidence, across the full lifecycle.

/ 01 · Procurement

Verified lineage and independent identity confirmation.

A provenance scan examines the model itself — not its documentation — to confirm where it came from and independently verify its identity. The result classifies the model as a direct derivative, fine-tune, quantization, merger, or independently trained. Resistant to metadata stripping, renaming, and other forms of provenance hiding.

Addresses EU AI Act GPAI Provenance · EU AI Act Art. 17 · NIST AI RMF Supply Chain · ISO/IEC 42001 Risk Assessment

/ 02 · Intake

Behavioral baseline captured at the moment of approval.

A behavioral fingerprint records what the model actually does — not what its documentation says it does — at the moment your organization approves it. Generated from independent infrastructure with no operational stake in the system being measured. This becomes the audited baseline that all subsequent monitoring compares against.

Addresses NIST AI RMF MAP · ISO/IEC 42001 §9.2 · NIST AI RMF GOVERN

/ 03 · Production

Continuous attestation that triggers investigation on change.

Once deployed, the model is re-fingerprinted hourly against its approved baseline. Every comparison produces a formal attestation record — evidence that the deployed system remains the system you approved. When a meaningful change is detected, application-specific evaluations can be automatically triggered — closing the loop from detection to investigation that Article 72 requires.

Addresses EU AI Act Art. 12 & 72 · NIST SP 800-53 · ISO/IEC 42001 §9.1 & §10 · NIST AI RMF MANAGE

Methodology in operation

A live look at what's currently being monitored.

Stability Arena is our public dashboard tracking endpoint stability across major models and inference providers. Open it in a new tab and verify the methodology yourself before any conversation.

Live · Updated hourly

Stability Arena

A continuously updating view of model behavior across providers — including the kind of cross-provider divergence you see below for a single open-source model served by three different inference companies.

Endpoint Stability — Same Model, 3 Providers Live preview

Open Stability Arena

Accepted · ACM CAIS 2026 Peer-reviewed research

Behavioral Fingerprints for LLM Endpoint Stability and Identity

The methodology behind the Model Assurance Toolkit's continuous attestation capability. Stability Monitor periodically fingerprints endpoints by sampling outputs from a fixed prompt set and comparing distributions over time — detecting changes to model family, version, inference stack, quantization, and behavioral parameters.

In real-world monitoring of the same model hosted by multiple providers, the paper demonstrates substantial provider-to-provider and within-provider stability differences — the kind of undisclosed changes that compliance frameworks require you to detect.

Authors — Jonah Leshin, Manish Shah, Ian Timmis, Daniel Kang

Venue — ACM Conference on AI Systems (CAIS) 2026, System Demonstrations

Read the paper on arXiv

Part of the Cisco AI Security ecosystem

VAIL behavioral fingerprints, inside Cisco Provenance Explorer.

VAIL is providing behavioral fingerprints to Cisco's Provenance Explorer — a resource for GRC professionals featuring open-source models with security and compliance information, powered by the Cisco AI Security Framework.

Cisco Provenance Explorer

Model security and compliance data, in one place.

Cisco's Provenance Explorer will feature open-source models alongside provenance scores, lineage classifications, and VAIL behavioral fingerprints — giving compliance teams a single reference for evaluating models before procurement.

When you see a behavioral fingerprint in Provenance Explorer, it's the same VAIL fingerprint used in the Model Assurance Toolkit's intake and continuous attestation workflows.

Coming soon — Provenance Explorer is not yet launched.

Cisco AI Security Framework

Aligned to Cisco's AI supply-chain threat taxonomy.

The Model Assurance Toolkit maps to the Cisco AI Security Framework's supply-chain controls — including supply-chain compromise detection, dependency verification, artifact tampering, and runtime integrity threats.

Model checkpoints and model endpoints are treated as supply-chain dependencies. Provenance scans and behavioral fingerprints provide evidence for these controls.

Cisco AI Defense — OB-009 Supply Chain Compromise · AITech-9.2 Detection Evasion · AITech-9.3 Dependency Compromise

How it works

Establish baseline. Probe. Compare. Record.

Stability Monitor runs against the AI system's existing API surface — no SDKs, no agents, no production code changes. Every probe and comparison is preserved as a timestamped, immutable record suitable for inclusion in conformity files and audit evidence.

01

Baseline

Capture an initial behavioral fingerprint at the moment of certification or production approval.

02

Probe

Independent infrastructure issues recurring probes — typically hourly — to the live endpoint.

03

Compare

Statistical comparison against the baseline. Probabilistic methods tolerate non-determinism without false positives.

04

Record

Every probe, comparison, and detected change event is logged with timestamps and exportable as audit evidence.

24/7

Continuous coverage. Hourly fingerprints. Indefinitely.

0

SDK installs, code changes, or production traffic touched

3rd

Party-independent — probes run from VAIL infrastructure, not yours

Pending clearance · For preview only

Why this matters in practice

From the practitioners running these systems.

"These workloads cover conversational agents and a large portion of these use cases rely heavily on tool calling and structured workflows, where even small behavioral shifts can cause downstream automation issues. Stability at the inference layer is becoming a key concern for us — not just model quality."

Luiz Lima — AI Lead, Clipboard

"When we audit a high-risk AI system, the first question is always: can you prove this is the same model you certified? Most organizations can't. They have documentation from a point in time, but no continuous evidence that the deployed system hasn't drifted. That gap is where regulatory exposure lives."

Ovi Pinzaru — Founding Partner & CEO, AxiLayer AI

"Documentation can be faked. Metadata can be stripped. A model card can claim 'trained from scratch' when the weights are a modified copy. The only way to verify provenance is to examine the weights themselves — embedding geometry, normalization layers, energy profiles. That's what Model Provenance Kit does."

Amy Chang — Head of AI Threat Intelligence & Security Research, Cisco

Know what the model is. Know when it changes.