A lightweight Model Assurance Toolkit that produces the compliance evidence your AI regulations require — working alongside your existing GRC platform and scaling as your AI inventory grows. Audit-ready documentation for EU AI Act Articles 12 & 72, ISO/IEC 42001, and NIST AI RMF.
See it live in Stability ArenaYou already have a GRC platform for evidence vaults, control mapping, and audit workflows. The Model Assurance Toolkit complements that stack by producing the evidence itself — structured, timestamped evidence that drops into whatever platform you already use.
As your AI inventory grows from a handful of models to hundreds, the same toolkit scales with you — no additional headcount, no manual evidence collection.
Regulators expect you to know where a model came from, what it does, and whether it's changed. Each of these is a different kind of evidence — and the Model Assurance Toolkit covers all three.
A model's documentation can say anything. A provenance scan examines the model itself to confirm whether it's derived from an approved base — catching mislabeled models, undisclosed fine-tuning, and supply-chain risks that metadata alone can't reveal.
This is the first gate: if the lineage isn't verified, no conformity evidence is issued.
A model with confirmed lineage can still behave in ways that diverge from its parent — through fine-tuning, quantization, or post-training modifications. And once deployed, inference providers can change what's running without notice.
Behavioral fingerprinting captures what the model actually does at approval, then continuously attests that it hasn't changed after deployment.
A provenance scan examines the model itself — not its documentation — to confirm where it came from and independently verify its identity. The result classifies the model as a direct derivative, fine-tune, quantization, merger, or independently trained. Resistant to metadata stripping, renaming, and other forms of provenance hiding.
A behavioral fingerprint records what the model actually does — not what its documentation says it does — at the moment your organization approves it. Generated from independent infrastructure with no operational stake in the system being measured. This becomes the audited baseline that all subsequent monitoring compares against.
Once deployed, the model is re-fingerprinted hourly against its approved baseline. Every comparison produces a formal attestation record — evidence that the deployed system remains the system you approved. When a meaningful change is detected, application-specific evaluations can be automatically triggered — closing the loop from detection to investigation that Article 72 requires.
Stability Arena is our public dashboard tracking endpoint stability across major models and inference providers. Open it in a new tab and verify the methodology yourself before any conversation.
A continuously updating view of model behavior across providers — including the kind of cross-provider divergence you see below for a single open-source model served by three different inference companies.
The methodology behind the Model Assurance Toolkit's continuous attestation capability. Stability Monitor periodically fingerprints endpoints by sampling outputs from a fixed prompt set and comparing distributions over time — detecting changes to model family, version, inference stack, quantization, and behavioral parameters.
In real-world monitoring of the same model hosted by multiple providers, the paper demonstrates substantial provider-to-provider and within-provider stability differences — the kind of undisclosed changes that compliance frameworks require you to detect.
VAIL is providing behavioral fingerprints to Cisco's Provenance Explorer — a resource for GRC professionals featuring open-source models with security and compliance information, powered by the Cisco AI Security Framework.
Cisco's Provenance Explorer will feature open-source models alongside provenance scores, lineage classifications, and VAIL behavioral fingerprints — giving compliance teams a single reference for evaluating models before procurement.
When you see a behavioral fingerprint in Provenance Explorer, it's the same VAIL fingerprint used in the Model Assurance Toolkit's intake and continuous attestation workflows.
The Model Assurance Toolkit maps to the Cisco AI Security Framework's supply-chain controls — including supply-chain compromise detection, dependency verification, artifact tampering, and runtime integrity threats.
Model checkpoints and model endpoints are treated as supply-chain dependencies. Provenance scans and behavioral fingerprints provide evidence for these controls.
Stability Monitor runs against the AI system's existing API surface — no SDKs, no agents, no production code changes. Every probe and comparison is preserved as a timestamped, immutable record suitable for inclusion in conformity files and audit evidence.
"These workloads cover conversational agents and a large portion of these use cases rely heavily on tool calling and structured workflows, where even small behavioral shifts can cause downstream automation issues. Stability at the inference layer is becoming a key concern for us — not just model quality."
"When we audit a high-risk AI system, the first question is always: can you prove this is the same model you certified? Most organizations can't. They have documentation from a point in time, but no continuous evidence that the deployed system hasn't drifted. That gap is where regulatory exposure lives."
"Documentation can be faked. Metadata can be stripped. A model card can claim 'trained from scratch' when the weights are a modified copy. The only way to verify provenance is to examine the weights themselves — embedding geometry, normalization layers, energy profiles. That's what Model Provenance Kit does."
Capability evaluation, red-teaming, and domain-specific safety testing all assume you know what model is running. This pipeline makes that assumption verifiable — from procurement through production, indefinitely.
Request a briefing