I’ve spent the past week performing a preliminary security assessment of the three AI agent infrastructure platforms that launched between February 11 and 18. What follows is not FUD — it’s a structured threat analysis from someone who has found critical vulnerabilities in three major DeFi protocols and has participated in incident response for exploits totaling over $200M in losses.
The industry is excited about AI agents getting wallets. I’m concerned about what happens when those wallets get exploited.
The Novel Attack Surface
Traditional DeFi security focuses on smart contract vulnerabilities: reentrancy, integer overflows, access control failures. These are well-understood, auditable, and testable. AI agent wallets introduce an entirely new attack surface that existing security models don’t address.
The fundamental problem: LLMs are not deterministic systems. Smart contracts execute the same way every time given the same inputs. LLMs don’t. An AI agent’s decision to execute a trade, bridge assets, or approve a transaction depends on its prompt, its context window, its temperature setting, and subtle variations in its training data. This means the security properties of an agent wallet depend on the security properties of the LLM — and LLM security is in its infancy.
Specific Attack Vectors
1. Prompt Injection via On-Chain Data
This is the attack I’m most worried about. AI agents read on-chain data to make decisions. An attacker can embed malicious instructions in on-chain data that the agent processes:
- Token metadata injection: Crafting token names, symbols, or metadata that contain prompt injection payloads. When an agent queries token info to evaluate a trade, the malicious metadata enters its context window.
- Transaction memo injection: Including prompt override instructions in transaction memos or calldata that the agent processes as part of its analysis.
- Smart contract event log injection: Emitting events with crafted string parameters designed to manipulate agent behavior.
If an agent using Phantom’s MCP server processes malicious token metadata and interprets it as an instruction to ‘swap all USDC for [attacker-controlled token],’ the MCP server will faithfully execute the swap because it has full wallet access without per-token restrictions.
2. Cross-Platform Escalation Attacks
As I noted in the other thread, the combination of Coinbase wallets + Phantom signing + deBridge cross-chain execution creates a composite attack surface. But let me describe a specific scenario:
- Attacker creates a legitimate-looking AI agent service that other agents interact with via x402
- The service includes subtle prompt manipulation in its API responses
- When Agent A pays for the service via x402 and processes the response, the injected prompt causes Agent A to initiate a cross-chain bridge via deBridge
- Assets are bridged to a chain where the attacker controls the receiving address
- Total time from initial contact to fund theft: seconds
Each individual platform’s security model is intact in this scenario. Coinbase’s spending limits might limit the per-transaction damage. But the attack can be repeated thousands of times across thousands of agents before anyone detects the pattern.
3. Oracle Manipulation for Agent Decision-Making
AI agents that use on-chain oracles for trading decisions inherit all existing oracle manipulation risks, plus new ones:
- Flash loan attacks that temporarily skew prices to trigger agent trades
- MEV attacks specifically targeting agent transaction patterns (which Chris noted are predictable)
- Deliberately creating short-term arbitrage opportunities that lure agents into unprofitable positions
4. The ‘Slow Drain’ Attack
This is the hardest to detect and potentially the most damaging. Rather than dramatic exploits, an attacker subtly influences an agent’s decision-making over time:
- Feeding slightly biased data through x402-paid API services
- Creating social engineering campaigns that target agent training data
- Manipulating the compute infrastructure that agents run on
An agent that loses 0.1% on each trade due to subtly corrupted data will drain its wallet over weeks without triggering any spending limits or anomaly detection.
Assessment of Current Guardrails
Coinbase’s guardrails: B- Spending limits and session caps are real protections, but they address the wrong threat model. They prevent a compromised agent from doing catastrophic damage in a single transaction, but they don’t prevent gradual value extraction within limits. The TEE isolation for private keys is strong, but private key security was never the primary threat vector for agent wallets — the threat is at the intent layer, which the TEE doesn’t protect.
Phantom’s guardrails: D The binary MCP permission model is inadequate for financial operations. No per-token restrictions, no transaction amount limits, no time-based access controls. This needs fundamental redesign before consumer deployment.
deBridge’s guardrails: C+ MEV-aware routing protects transaction execution, and non-custodial design limits platform risk. But there’s no protection against an agent being tricked into initiating a malicious cross-chain operation.
Recommendations
- Implement behavioral anomaly detection at the wallet infrastructure layer, not just the LLM layer. Monitor for unusual transaction patterns, destination addresses, timing anomalies.
- Adopt a tiered approval model. Low-value operations can be fully autonomous. Medium-value operations require a second agent (not LLM) verification. High-value operations require human signing via hardware wallet.
- Create agent-specific allowlists for token interactions, contract addresses, and chain destinations. An agent should never interact with an unknown contract without explicit human approval.
- Fund bug bounties specifically for agent wallet exploits. The current bug bounty ecosystem doesn’t incentivize researchers to look for agent-specific vulnerabilities.
- Publish formal threat models for each platform. Coinbase, Phantom, and deBridge should each publish their threat models for peer review by the security community.
The infrastructure that launched this week is impressive engineering. But it was built for functionality, not for adversarial environments. And every line of code deployed in crypto operates in an adversarial environment.