ZK-ML Lets You Prove an AI Model Ran Correctly Without Revealing the Model or the Data - The Missing Link Between On-Chain AI Agents and Actual Trust

The Trust Gap in On-Chain AI

We keep hearing about AI agents managing DeFi positions, running prediction markets, and generating oracle feeds. But here is the uncomfortable question nobody is answering well enough: how do you actually trust that the AI did what it claims?

Right now, most on-chain AI systems work like this: an off-chain model runs inference, produces an output, and pushes that result on-chain. Maybe there is a multisig. Maybe there is a reputation score for the operator. But at its core, you are trusting someone who says “my model produced X” without any cryptographic evidence that this is true. You cannot verify the model architecture. You cannot verify the weights. You cannot verify the input data. You are essentially taking their word for it.

This is where Zero-Knowledge Machine Learning (ZK-ML) enters the picture, and I believe it is the single most important primitive for making on-chain AI trustworthy.

What ZK-ML Actually Does

At a high level, ZK-ML lets you translate a neural network’s forward pass into a cryptographic arithmetic circuit. When the model runs inference, it generates a zero-knowledge proof (typically a zk-SNARK or zk-STARK) that mathematically certifies: this specific output was derived from this specific model architecture and these specific weights, applied to this specific input. The verifier can confirm this statement without ever seeing the model weights or the input data.

Think of it like a sealed envelope analogy. You hand someone a sealed envelope containing your model and data. The ZK proof is a certificate attached to the outside that says “the answer inside is 42, and I can prove the computation was done correctly,” without anyone ever opening the envelope.

The key technical challenge has been the conversion from floating-point arithmetic (what ML models use) to finite field arithmetic (what ZK proofs use). This quantization step necessarily introduces precision loss. Frameworks like EZKL handle this by exporting models to ONNX format and then compiling them into Halo2-based proof circuits. You define your acceptable precision tolerance and the circuit handles the rest.

Where We Are in 2026

The progress in the last 18 months has been substantial. A few highlights that changed my perspective:

  • Recursive SNARKs are becoming standard. Most ZK-ML frameworks now support proof folding, which means proof size does not scale linearly with model complexity anymore. A ResNet-50 inference proof that used to be over 1 GB can now compress to under 100 KB. This makes on-chain verification economically feasible.

  • Proof generation times are dropping into practical territory. We are seeing 1-5 second proof generation for medium-complexity models, which means a DeFi agent can observe the market, compute a decision, generate a proof, and execute a trade all within a single block window on many chains.

  • Lagrange’s DeepProve and similar systems now offer dynamic zk-SNARKs that can efficiently update proofs when underlying data changes without recomputing from scratch. This is critical for continuously running AI agents.

  • On-chain verification is getting cheaper. Stellar just shipped native Groth16 zk-SNARK verification in smart contracts using BN254 cryptography. Ethereum L2s with built-in ZK verification have been doing this for a while, and the trend is clearly toward every major chain supporting native proof verification.

The Three Killer Use Cases

1. Trustless Oracle Feeds Powered by AI

Imagine an oracle that aggregates data from multiple sources, runs a trained anomaly detection model to filter out manipulation attempts, and produces a price feed. With ZK-ML, the oracle can prove it ran the exact model it committed to, on the actual data it received, and produced the reported output. No trust in the operator needed. This completely changes the oracle trust model from reputation-based to proof-based.

2. Verifiable Prediction Markets

Prediction markets that use AI resolution agents currently depend on the honesty of whoever runs the model. With ZK-ML, the resolution model’s architecture and weights can be committed on-chain at market creation. When resolution time comes, the operator runs the model on the relevant data and generates a ZK proof that the resolution matches the committed model. Anyone can verify. No disputes needed.

3. Auditable Automated Trading

This is the one I am most excited about. DeFi protocols increasingly use AI for rebalancing, liquidation decisions, and risk parameter adjustments. ZK-ML allows every single decision to carry a cryptographic receipt proving it came from the approved model, not from some backdoor or manual override. The proof generates in 1-5 seconds, the on-chain verification costs a fraction of a regular transaction, and the entire audit trail is immutable.

The Hard Problems That Remain

I do not want to oversell this. There are real limitations:

  • Large language models are still out of reach. We can prove inference for CNNs, small transformers, and various regression or classification models. But GPT-class models with billions of parameters remain computationally infeasible for ZK proving. The proving time and memory requirements scale roughly quadratically with parameter count.

  • The quantization accuracy trade-off is non-trivial. Converting from float32 to the fixed-point arithmetic that ZK circuits require means your proved model may produce slightly different outputs than the original. For some applications this is fine. For high-precision financial models, the error bounds need careful analysis.

  • Tooling maturity is still early. EZKL and a handful of other frameworks are usable, but the developer experience is nowhere near as smooth as deploying a regular smart contract. Circuit compilation times can be long, debugging is painful, and the documentation assumes deep familiarity with both ML and cryptography.

What I Want to Discuss

I have been building ZK-ML circuits for verifiable inference on DeFi risk models, and the results are promising but the engineering effort is significant. I am curious about this community’s experience and thinking:

  1. For those building AI agents on-chain: what trust model are you currently using, and would ZK proofs of inference change your architecture?
  2. For oracle builders: how do you see ZK-ML fitting into existing oracle networks? Complementary to existing validator staking, or a replacement?
  3. For DeFi protocol designers: what is the minimum proof generation time that would make ZK-ML practical for your use case?

The gap between “AI on blockchain” marketing and actual verifiable AI is enormous right now. ZK-ML is, in my view, the only technology that can credibly close it. But we need the infrastructure, tooling, and most importantly the demand from protocol teams to make it happen.

Would love to hear from builders who are working at this intersection or considering it.

Really solid write-up, Zoe. You have articulated the core problem better than most of what I have seen in this space.

I have been knee-deep in zkEVM implementation work for the past year, and the parallels to ZK-ML are striking. The fundamental challenge is the same: taking computation designed for one execution environment and faithfully representing it in arithmetic circuits. With zkEVM, we are converting EVM opcodes. With ZK-ML, you are converting matrix multiplications, activation functions, and normalization layers. Both hit the same wall when it comes to non-native arithmetic.

A few thoughts from the infrastructure perspective:

On the recursive SNARK point - this is genuinely transformative and I want to emphasize why. Before proof folding, you had to generate a monolithic proof for the entire inference pass. This meant your circuit size was bounded by the total number of operations in your model. With recursive composition, you can break the computation into chunks, prove each chunk independently, and then fold the proofs together. The practical implication is that you can parallelise proof generation across multiple machines. For a trading agent that needs sub-second proofs, this is not a nice-to-have, it is the difference between feasible and impossible.

On the Halo2 backend in EZKL - I have reviewed the circuit construction and it is clever engineering. They use a lookup-table approach for non-linear activation functions like ReLU and sigmoid, which avoids having to express these operations as polynomial constraints directly. The trade-off is that the lookup tables add to the proving key size, but for models under a few million parameters this is manageable. If anyone here has tried compiling a transformer attention mechanism into Halo2 constraints, I would be curious about your experience with the multi-head attention decomposition.

Where I see the real bottleneck is not in proof generation but in the trusted setup and circuit compilation. For EZKL specifically, the circuit compilation step for a moderately complex model can take 15-30 minutes and requires significant memory. You do this once per model version, but if your model updates frequently (as most production ML systems do), this overhead becomes a real workflow issue. Lagrange’s approach with dynamic zk-SNARKs is interesting precisely because it addresses this - you should not need a full recompilation when only the weights change but the architecture stays the same.

One area I think deserves more attention is the verifier contract gas costs across different chains. On Ethereum mainnet, verifying a Groth16 proof costs roughly 230k gas, which at current prices is maybe $2-5. On L2s it is pennies. But the verification cost is fixed regardless of what you proved, which means ZK-ML verification has the same cost whether you proved a linear regression or a ResNet. That flat-cost property is incredibly powerful for scaling.

Question for you, Zoe: have you experimented with proving inference for ensemble models? I am thinking about random forests or gradient boosted trees specifically, since decision tree operations map more naturally to arithmetic circuits than neural network operations do.

Zoe, thank you for a thorough technical breakdown. I want to approach this from the security angle because, as much as I am excited about ZK-ML, the attack surface here is nuanced and I think the community needs to be clear-eyed about it.

The soundness of the proof does not guarantee the soundness of the model.

This is the critical distinction people keep glossing over. A ZK-ML proof certifies that a specific computation was executed correctly. It does NOT certify that the model itself is good, accurate, or unbiased. You can generate a perfectly valid ZK proof that a deliberately poisoned model produced a specific output. The proof is mathematically flawless. The model is malicious. The proof says nothing about model quality, only computational integrity.

So what are we actually trusting with ZK-ML? We are trusting that:

  1. The committed model hash matches the model that was audited or agreed upon
  2. The computation was executed faithfully
  3. The stated input was the actual input used

Point 1 is where the real security work lives, and it is entirely outside the scope of ZK-ML itself. You need a robust model governance framework: who approved this model? When was it last audited? What is the update process? How do stakeholders verify the model hash corresponds to the model they reviewed?

On the quantization attack surface - this concerns me significantly. The conversion from float32 to fixed-point arithmetic creates a gap between the “real” model and the “proved” model. An adversary could potentially craft inputs that produce acceptable outputs in the original model but exploit the quantization boundary to produce different outputs in the ZK circuit. Has anyone done formal analysis of adversarial robustness specifically targeting the quantization step? I have seen work on adversarial examples for standard ML models, but the quantization-aware adversarial setting is a different threat model entirely.

Trusted setup risks - Brian touched on this but I want to be more explicit. For Groth16-based systems, the trusted setup ceremony is a single point of failure. If the toxic waste is not properly destroyed, someone could forge proofs. For a DeFi protocol managing millions in TVL based on ZK-ML oracle feeds, this is not an acceptable risk profile without either (a) a very large, well-audited ceremony, or (b) moving to a transparent setup system like STARKs or Halo2 with IPA commitments.

What I would want to see before recommending ZK-ML for production DeFi:

  • Formal verification of the circuit compiler itself, not just the proofs it generates
  • Independent audits of the quantization precision bounds for financial-grade models
  • A clear model governance and hash commitment protocol
  • Bug bounty programs specifically targeting the prover implementation

Trust but verify, then verify again. The cryptography is beautiful, but cryptography alone has never been sufficient for security. The human and process layers around ZK-ML matter just as much as the math inside it.

Zoe asked specifically about the oracle perspective, so let me give the practitioner view from someone who has spent years trying to make off-chain data trustworthy on-chain.

The oracle trust model today is fundamentally economic. Chainlink and similar networks rely on staked validators who aggregate data from multiple sources. If a validator reports bad data, they get slashed. The assumption is that the economic cost of misbehaving outweighs the potential profit. It works reasonably well for simple price feeds, but it has a glaring weakness: it cannot verify the computation that transforms raw data into the final output.

Consider a real scenario I dealt with last year. We had an oracle feed for a derivatives protocol that needed a volatility index derived from options data across five exchanges. The computation involved fitting a volatility surface model, interpolating implied volatilities at specific strikes and maturities, and then computing a weighted index. The validators were attesting to the final number, but none of them could verify that the mathematical transformation was done correctly. They were trusting the operator who ran the computation. This is exactly the gap ZK-ML fills.

Where ZK-ML is immediately practical for oracles:

  1. Anomaly detection filters. Every production oracle has a data quality pipeline that filters out stale data, detects manipulation attempts, and handles exchange outages. These filters are often ML models (even simple ones like isolation forests or autoencoders). ZK-ML can prove that the filtering step used the committed model and did not cherry-pick which data points to include. This alone would have prevented at least two oracle manipulation incidents I know of.

  2. Derived data feeds. Moving beyond simple spot prices, protocols need feeds for implied volatility, correlation indices, risk scores, and other derived metrics that require non-trivial computation. ZK-ML makes these feeds verifiable at the computation level, not just the attestation level.

  3. Cross-source aggregation logic. When you aggregate prices from ten exchanges, the aggregation method matters. Median? Trimmed mean? VWAP? An ML-based outlier-resistant aggregation? ZK-ML can prove exactly which aggregation method was applied.

My honest assessment of feasibility:

For simple models (anomaly detectors, linear aggregation, small neural nets under 1M parameters), this is production-ready today if you are willing to invest in the engineering. Proof generation in the 1-5 second range is compatible with most oracle update frequencies. The gas cost for on-chain verification is acceptable, especially on L2s.

For more complex models, the bottleneck is the circuit compilation time Brian mentioned. In an oracle context, model updates are less frequent than in a trading system, so the 15-30 minute compilation overhead per model version is manageable. You compile once, verify thousands of times.

The question I keep coming back to: should ZK-ML replace staked validation, or should it complement it? My current thinking is that the strongest architecture layers both. Validators stake economic collateral AND the operator provides a ZK proof of correct computation. The validators then only need to verify the proof (cheap, deterministic) rather than independently replicate the computation (expensive, requires access to the same data sources). This actually makes the validator role simpler and more scalable while providing stronger guarantees.

Would be very interested to hear if anyone has tried integrating ZK-ML proofs into an existing Chainlink or Pyth-style oracle architecture. The plumbing seems straightforward but I suspect there are subtle timing issues I am not seeing.

Coming at this from the DeFi protocol design side, and specifically from the perspective of someone who builds automated yield optimization strategies that rely on ML models for decision-making.

Zoe, to answer your question directly: the minimum proof generation time that would make ZK-ML practical for my use case is under 10 seconds. Let me explain why.

Our yield optimizer rebalances across lending protocols, DEX liquidity pools, and staking positions. The rebalancing decisions are driven by a neural network that takes as input current APYs, pool utilization rates, gas prices, and historical volatility data, and outputs an allocation vector. We retrain weekly and run inference on every block where the portfolio value deviation exceeds a threshold. That means we might run inference 50-200 times per day depending on market conditions.

Right now, our trust model is embarrassingly centralized. We run the model on our own infrastructure, sign the transactions with a multisig, and our users trust us because of our track record and the fact that our treasury is co-invested. But honestly, if I were a user with $500K in our vaults, I would want more than “trust us, bro.”

Here is what ZK-ML would concretely change for us:

  1. Governance bypass protection. Our biggest existential risk is not market volatility, it is an insider compromising the model or manually overriding it. ZK-ML makes every rebalancing decision auditable. If the model hash does not match the committed version, the proof fails. If someone tries to push a manual allocation that did not come from the model, there is no valid proof. This is the single most valuable property for any DeFi protocol using AI.

  2. User-verifiable strategy execution. We could publish the model commitment on-chain and let any user independently verify that every historical rebalancing decision came from the approved model. This transforms our product from “trust the team” to “trust the math.” For institutional capital, this distinction is worth billions in addressable market.

  3. Regulatory compliance. This one is under-discussed. As DeFi faces increasing regulatory scrutiny, being able to prove that your automated trading system followed the exact algorithm it disclosed is enormously valuable. ZK-ML essentially gives you an immutable compliance audit trail for free.

The quantization issue Sophia raised is real and I want to add a DeFi-specific perspective. For our yield optimizer, an allocation that is off by 0.1% is meaningless - the gas costs of rebalancing usually dwarf that. But for a liquidation model, being off by even 0.01% on a collateralization ratio could mean the difference between liquidating and not liquidating a $10M position. The tolerance depends entirely on the application, and DeFi builders need to define these bounds explicitly before adopting ZK-ML.

What I want to build: A vault contract where the rebalancing function only executes if accompanied by a valid ZK proof that the allocation came from the committed model. The model commitment is stored on-chain, updatable only through a governance vote with a timelock. Every historical proof is stored (or at least the verification result), creating a complete audit trail. Users can verify any past decision.

The 1-5 second proof generation Zoe mentioned would work perfectly for our use case. We do not need sub-second proofs because our rebalancing is not latency-sensitive the way an MEV bot would be. Even 10 seconds would be fine. The real question is whether the tooling is mature enough to integrate into a production Solidity + Python stack without a dedicated ZK engineering team.

If anyone here has actually shipped a ZK-ML verified DeFi product to mainnet, I would love to compare notes on the integration experience.