OpenAI tests AI agents on crypto flaws

Grafa
OpenAI tests AI agents on crypto flaws
OpenAI tests AI agents on crypto flaws
Isaac Francis
Written by Isaac Francis
Share

OpenAI has launched a new benchmark, EVMbench, to evaluate how effectively AI agents can detect, patch and exploit vulnerabilities in crypto smart contracts.

The benchmark, developed with Paradigm and OtterSec, assessed how much value AI models could theoretically extract from 120 curated smart contract vulnerabilities sourced from 40 audits.

Anthropic’s Claude Opus 4.6 ranked first with an average “detect award” of $37,824, followed by OpenAI’s OC-GPT-5.2 at $31,623 and Google’s Gemini 3 Pro at $25,112.

“Smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders,”

OpenAI said, adding that evaluating models in “economically meaningful environments” is increasingly important.

The initiative comes as attackers stole $3.4 billion in crypto funds in 2025, underscoring the need for stronger automated security tools.

OpenAI also pointed to the expected growth of agent-driven stablecoin payments, echoing comments from Circle chief executive Jeremy Allaire and former Binance boss Changpeng Zhao that crypto could become a native currency layer for AI agents.

Dragonfly managing partner Haseeb Qureshi said crypto transactions still feel “terrifying” due to security risks, arguing that AI-intermediated, self-driving wallets may ultimately make onchain activity more intuitive and secure for everyday users.

Frequently asked questions

Connect with us

Grafa is not a financial advisor. You should seek independent, legal, financial, taxation or other advice that relate to your unique circumstances.

Grafa is not liable for any loss caused, whether due to negligence or otherwise arising from the use of or reliance on the information provided directly or indirectly, by use of this platform.