Can AI Auditors Solve Smart Contract Security or Just Create New Attack Surfaces?
In February 2026, OpenAI and Paradigm released evmbench, an open benchmark for evaluating AI agents’ ability to detect, patch, and exploit high-severity smart contract vulnerabilities. The results were striking: GPT-5.3-Codex achieved a 71.0% exploit success rate on 117 curated vulnerabilities from 40 audits—a dramatic improvement from less than 20% just six months ago. At face value, this represents extraordinary progress in automated security analysis.
Yet here’s the paradox that keeps me up at night: despite this proliferation of AI-powered security tools, the OWASP Smart Contract Top 10: 2026 documents $905.4 million in losses from 122 deduplicated incidents in 2025 alone. If AI auditing capabilities are advancing so rapidly, why are we still bleeding hundreds of millions of dollars annually?
The Current Landscape of AI Security Tools
The market has responded to the audit bottleneck with an explosion of AI-powered solutions:
- Sherlock AI claims to provide “researcher-level reasoning” trained on insights from top security researchers
- Veritas Protocol advertises a 94.9% accuracy rate in finding critical issues using the Qwen2.5-Coder architecture
- AuditAgent and numerous other tools promise to reduce analysis time from weeks to hours
The economic case is compelling: AI tools offer 40-60% cost savings by handling initial vulnerability screening at $5-10K versus $50-100K for full human audits. For resource-constrained protocols, this accessibility could be transformative.
What AI Does Well (and What It Doesn’t)
Research shows AI excels at pattern-matching vulnerabilities:
Reentrancy attacks
Access control flaws
Integer overflows/underflows
Known vulnerability signatures
However, AI struggles with:
Business logic validation requiring economic understanding
Novel attack vectors without historical training data
Complex multi-step exploits chaining multiple vulnerabilities
Protocol-specific game theory and incentive design flaws
The OWASP 2026 data validates this limitation: while reentrancy (an easily pattern-matched bug) dropped to #8, business logic bugs rose to #2, and access control flaws remained #1. The vulnerabilities causing the most damage are precisely those requiring deep contextual understanding—exactly where AI falls short.
The Adversarial Arms Race
Here’s what concerns me most: we’re creating a new attack surface. As AI auditing becomes standard practice, sophisticated attackers will begin adversarially optimizing exploits to evade AI detection patterns—similar to how malware authors evade antivirus signatures.
EVMbench’s 71% success rate means 29% of known vulnerabilities escape AI detection. In adversarial scenarios, that failure rate could be significantly worse. A benchmark trained on historical exploits may struggle against novel attack techniques specifically designed to exploit AI blind spots.
The Hybrid Approach Question
Current data suggests hybrid approaches (AI screening + human expert review) catch 95%+ of vulnerabilities versus 70-85% for AI-only or 60-70% for manual-only audits. This argues for AI as an augmentation tool rather than a replacement.
But here’s the practical dilemma: if protocols adopt AI-only audits to cut costs, do we create a false sense of security that’s worse than no audit at all? Users may assume “AI audited” provides comparable assurance to human audits when the risk profiles are fundamentally different.
Questions for the Community
-
Should protocols be required to disclose whether audits were AI-assisted, AI-only, or human-only? Transparency seems critical for users evaluating risk.
-
How do we prevent AI auditing from becoming security theater? Passing an AI audit that catches 70% of vulnerabilities still leaves 30% undetected—potentially catastrophic in high-value DeFi.
-
What role should AI auditing play in security standards? Should it be mandatory pre-deployment screening with human review required for critical components? Or should AI remain optional supplementary tooling?
-
How do insurance protocols and audit firms handle AI-audited code? Are premiums and liability different for AI versus human audits?
The technology is impressive and advancing rapidly. But given that we’re still losing nearly a billion dollars annually despite better tools, I remain skeptical that AI auditing alone solves our security crisis. Every line of code is a potential vulnerability—and I’m not convinced AI can think like an attacker with enough creativity and contextual understanding to find them all.
What’s your experience with AI security tools? Are they living up to the hype, or are we building toward a new class of vulnerabilities we don’t yet understand? ![]()
References: