
OpenAI has launched a new benchmark, EVMbench, to evaluate how effectively AI agents can detect, patch and exploit vulnerabilities in crypto smart contracts.
The benchmark, developed with Paradigm and OtterSec, assessed how much value AI models could theoretically extract from 120 curated smart contract vulnerabilities sourced from 40 audits.
Anthropic’s Claude Opus 4.6 ranked first with an average “detect award” of $37,824, followed by OpenAI’s OC-GPT-5.2 at $31,623 and Google’s Gemini 3 Pro at $25,112.
“Smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders,”
OpenAI said, adding that evaluating models in “economically meaningful environments” is increasingly important.
The initiative comes as attackers stole $3.4 billion in crypto funds in 2025, underscoring the need for stronger automated security tools.
OpenAI also pointed to the expected growth of agent-driven stablecoin payments, echoing comments from Circle chief executive Jeremy Allaire and former Binance boss Changpeng Zhao that crypto could become a native currency layer for AI agents.
Dragonfly managing partner Haseeb Qureshi said crypto transactions still feel “terrifying” due to security risks, arguing that AI-intermediated, self-driving wallets may ultimately make onchain activity more intuitive and secure for everyday users.