Verification tools for agent builders
Open-source CLI and SDK for multi-model adversarial verification. Catch hallucinations before your users do.
Run claims through Claude, GPT, Gemini, Grok, and DeepSeek simultaneously. Models challenge each other. Consensus emerges or disagreement is flagged.
Verify that agent outputs are grounded in their source material. Detect fabrication, drift, and unsupported claims with trace-level granularity.
Quick single-model checks for development. Full adversarial pipeline for production. You control the cost/confidence tradeoff.
Use pot-cli in your terminal and CI pipelines. Embed pot-sdk directly in your agent code. Same engine, two interfaces.
Scan the input for adversarial patterns and prompt injection before verification begins.
N diverse models produce independent proposals. Each evaluates the claim separately โ no shared context, no groupthink.
A red-team model attacks the proposals. Objections are classified by materiality and fact-checked across providers.
Dual synthesizer aggregates proposals and critique into a verdict. Confidence is calibrated via reasoning-based aggregation.
Return ALLOW, BLOCK, or UNCERTAIN โ with per-model evidence, confidence score, and optional signed attestation.
# Install globally npm install -g pot-cli # Verify a single claim against its trace pot-cli verify \ --claim "The Eiffel Tower is 330 metres tall" \ --trace "According to the official Eiffel Tower website, the structure stands at 330 metres including the antenna." # Output: # โ ALLOW (confidence: 0.94) # 3/3 models agree. Source verified. # Full adversarial pipeline pot-cli verify \ --claim "GPT-4 scores 90% on the bar exam" \ --tier standard \ --output json
import { verify } from 'pot-sdk'; const agentOutput = `The contract implements a reentrancy guard correctly. The mutex lock is acquired before the external call and released after.`; const result = await verify(agentOutput, { claim: 'Reentrancy guard is correctly implemented', mode: 'standard', // 'lite' | 'standard' providers: [ { name: 'anthropic', model: 'claude-sonnet-4-6', apiKey: process.env.ANTHROPIC_API_KEY }, { name: 'openai', model: 'gpt-4o', apiKey: process.env.OPENAI_API_KEY }, { name: 'deepseek', model: 'deepseek-v4-flash', apiKey: process.env.DEEPSEEK_API_KEY }, ], }); if (result.verdict === 'BLOCK') { // Hallucination or unsupported claim detected console.warn('Blocked:', result.confidence); return fallback(result); } // result.verdict: 'ALLOW' | 'BLOCK' | 'UNCERTAIN' // result.confidence: 0.0 โ 1.0 // result.mdi: model disagreement index