1. What is OpenAI’s new EVMbench benchmark?

OpenAI launched EVMbench to test how well AI agents detect, patch and exploit vulnerabilities in crypto smart contracts. The benchmark evaluates models in “economically meaningful environments,” meaning scenarios where real money could be at risk. It analysed 120 curated smart contract vulnerabilities from 40 audits.

2. Which AI model performed best in OpenAI’s smart contract test?

Anthropic’s Claude Opus 4.6 ranked first, achieving an average “detect award” of $37,824. OpenAI’s OC-GPT-5.2 and Google’s Gemini 3 Pro followed at $31,623 and $25,112, respectively. The scores reflect how much value an AI agent could theoretically identify or exploit.

3. Why is OpenAI testing AI agents on crypto smart contracts?

Smart contracts secure billions of dollars in digital assets. OpenAI said AI agents are likely to be transformative for both attackers and defenders. Testing models in high-stakes environments helps measure their real-world risk and defensive capabilities.

4. How serious is the smart contract security problem in crypto?

Attackers stole $3.4 billion in crypto funds in 2025, slightly higher than in 2024. That highlights ongoing vulnerability in DeFi and onchain systems. Improving automated detection tools could reduce future exploit risks.

5. Could AI agents become attackers as well as defenders?

Yes. OpenAI explicitly acknowledged that AI agents could assist both attackers and defenders. Stronger offensive capabilities mean security research must keep pace with potential misuse.

6. What role could stablecoins play in AI-driven transactions?

OpenAI said it expects “agentic stablecoin payments” to grow. Circle CEO Jeremy Allaire has predicted billions of AI agents could transact using stablecoins within five years. Former Binance CEO Changpeng Zhao has also described crypto as the “native currency for AI agents.”

7. How might AI change everyday crypto wallet usage?

Dragonfly’s Haseeb Qureshi suggested that AI-powered, self-driving wallets could manage transactions on behalf of users. These systems could reduce fear around signing large transfers by automatically screening risks. This is a forward-looking view, not a current standard.

8. What is the main takeaway for crypto investors?

AI is becoming more embedded in blockchain security and transaction infrastructure. Better AI auditing tools could lower exploit risk over time. However, more powerful AI could also increase attack sophistication.

Grafa — OpenAI tests AI agents on crypto flaws

OpenAI tests AI agents on crypto flaws

Written by Isaac FrancisPublished Feb. 19 2026

OpenAI has launched a new benchmark, EVMbench, to evaluate how effectively AI agents can detect, patch and exploit vulnerabilities in crypto smart contracts.

The benchmark, developed with Paradigm and OtterSec, assessed how much value AI models could theoretically extract from 120 curated smart contract vulnerabilities sourced from 40 audits.

Anthropic’s Claude Opus 4.6 ranked first with an average “detect award” of $37,824, followed by OpenAI’s OC-GPT-5.2 at $31,623 and Google’s Gemini 3 Pro at $25,112.

“Smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders,”

OpenAI said, adding that evaluating models in “economically meaningful environments” is increasingly important.

The initiative comes as attackers stole $3.4 billion in crypto funds in 2025, underscoring the need for stronger automated security tools.

OpenAI also pointed to the expected growth of agent-driven stablecoin payments, echoing comments from Circle chief executive Jeremy Allaire and former Binance boss Changpeng Zhao that crypto could become a native currency layer for AI agents.

Dragonfly managing partner Haseeb Qureshi said crypto transactions still feel “terrifying” due to security risks, arguing that AI-intermediated, self-driving wallets may ultimately make onchain activity more intuitive and secure for everyday users.