Verifiable On-Chain AI with zkML and Cryptographic Proofs

April 22, 2025 · 36 min read

Software Engineer

Introduction: The Need for Verifiable AI on Blockchain

As AI systems grow in influence, ensuring their outputs are trustworthy becomes critical. Traditional methods rely on institutional assurances (essentially “just trust us”), which offer no cryptographic guarantees. This is especially problematic in decentralized contexts like blockchains, where a smart contract or user must trust an AI-derived result without being able to re-run a heavy model on-chain. Zero-knowledge Machine Learning (zkML) addresses this by allowing cryptographic verification of ML computations. In essence, zkML enables a prover to generate a succinct proof that “the output $Y$ came from running model $M$ on input $X$” – without revealing $X$ or the internal details of $M$. These zero-knowledge proofs (ZKPs) can be verified by anyone (or any contract) efficiently, shifting AI trust from “policy to proof”.

On-chain verifiability of AI means a blockchain can incorporate advanced computations (like neural network inferences) by verifying a proof of correct execution instead of performing the compute itself. This has broad implications: smart contracts can make decisions based on AI predictions, decentralized autonomous agents can prove they followed their algorithms, and cross-chain or off-chain compute services can provide verifiable outputs rather than unverifiable oracles. Ultimately, zkML offers a path to trustless and privacy-preserving AI – for example, proving an AI model’s decisions are correct and authorized without exposing private data or proprietary model weights. This is key for applications ranging from secure healthcare analytics to blockchain gaming and DeFi oracles.

How zkML Works: Compressing ML Inference into Succinct Proofs

At a high level, zkML combines cryptographic proof systems with ML inference so that a complex model evaluation can be “compressed” into a small proof. Internally, the ML model (e.g. a neural network) is represented as a circuit or program consisting of many arithmetic operations (matrix multiplications, activation functions, etc.). Rather than revealing all intermediate values, a prover performs the full computation off-chain and then uses a zero-knowledge proof protocol to attest that every step was done correctly. The verifier, given only the proof and some public data (like the final output and an identifier for the model), can be cryptographically convinced of the correctness without re-executing the model.

To achieve this, zkML frameworks typically transform the model computation into a format amenable to ZKPs:

Circuit Compilation: In SNARK-based approaches, the computation graph of the model is compiled into an arithmetic circuit or set of polynomial constraints. Each layer of the neural network (convolutions, matrix multiplies, nonlinear activations) becomes a sub-circuit with constraints ensuring the outputs are correct given the inputs. Because neural nets involve non-linear operations (ReLUs, Sigmoids, etc.) not naturally suited to polynomials, techniques like lookup tables are used to handle these efficiently. For example, a ReLU (output = max(0, input)) can be enforced by a custom constraint or lookup that verifies output equals input if input≥0 else zero. The end result is a set of cryptographic constraints that the prover must satisfy, which implicitly proves the model ran correctly.
Execution Trace & Virtual Machines: An alternative is to treat the model inference as a program trace, as done in zkVM approaches. For instance, the JOLT zkVM targets the RISC-V instruction set; one can compile the ML model (or the code that computes it) to RISC-V and then prove each CPU instruction executed properly. JOLT introduces a “lookup singularity” technique, replacing expensive arithmetic constraints with fast table lookups for each valid CPU operation. Every operation (add, multiply, bitwise op, etc.) is checked via a lookup in a giant table of pre-computed valid outcomes, using a specialized argument (Lasso/SHOUT) to keep this efficient. This drastically reduces the prover workload: even complex 64-bit operations become a single table lookup in the proof instead of many arithmetic constraints.
Interactive Protocols (GKR Sum-Check): A third approach uses interactive proofs like GKR (Goldwasser–Kalai–Rotblum) to verify a layered computation. Here the model’s computation is viewed as a layered arithmetic circuit (each neural network layer is one layer of the circuit graph). The prover runs the model normally but then engages in a sum-check protocol to prove that each layer’s outputs are correct given its inputs. In Lagrange’s approach (DeepProve, detailed next), the prover and verifier perform an interactive polynomial protocol (made non-interactive via Fiat-Shamir) that checks consistency of each layer’s computations without re-doing them. This sum-check method avoids generating a monolithic static circuit; instead it verifies the consistency of computations in a step-by-step manner with minimal cryptographic operations (mostly hashing or polynomial evaluations).

Regardless of approach, the outcome is a succinct proof (typically a few kilobytes to a few tens of kilobytes) that attests to the correctness of the entire inference. The proof is zero-knowledge, meaning any secret inputs (private data or model parameters) can be kept hidden – they influence the proof but are not revealed to verifiers. Only the intended public outputs or assertions are revealed. This allows scenarios like “prove that model $M$ when applied to patient data $X$ yields diagnosis $Y$, without revealing $X$ or the model’s weights.”

Enabling on-chain verification: Once a proof is generated, it can be posted to a blockchain. Smart contracts can include verification logic to check the proof, often using precompiled cryptographic primitives. For example, Ethereum has precompiles for BLS12-381 pairing operations used in many zk-SNARK verifiers, making on-chain verification of SNARK proofs efficient. STARKs (hash-based proofs) are larger, but can still be verified on-chain with careful optimization or possibly with some trust assumptions (StarkWare’s L2, for instance, verifies STARK proofs on Ethereum by an on-chain verifier contract, albeit with higher gas cost than SNARKs). The key is that the chain does not need to execute the ML model – it only runs a verification which is much cheaper than the original compute. In summary, zkML compresses expensive AI inference into a small proof that blockchains (or any verifier) can check in milliseconds to seconds.

Lagrange DeepProve: Architecture and Performance of a zkML Breakthrough

DeepProve by Lagrange Labs is a state-of-the-art zkML inference framework focusing on speed and scalability. Launched in 2025, DeepProve introduced a new proving system that is dramatically faster than prior solutions like Ezkl. Its design centers on the GKR interactive proof protocol with sum-check and specialized optimizations for neural network circuits. Here’s how DeepProve works and achieves its performance:

One-Time Preprocessing: Developers start with a trained neural network (currently supported types include multilayer perceptrons and popular CNN architectures). The model is exported to ONNX format, a standard graph representation. DeepProve’s tool then parses the ONNX model and quantizes it (converts weights to fixed-point/integer form) for efficient field arithmetic. In this phase, it also generates the proving and verification keys for the cryptographic protocol. This setup is done once per model and does not need to be repeated per inference. DeepProve emphasizes ease of integration: “Export your model to ONNX → one-time setup → generate proofs → verify anywhere”.
Proving (Inference + Proof Generation): After setup, a prover (which could be run by a user, a service, or Lagrange’s decentralized prover network) takes a new input $X$ and runs the model $M$ on it, obtaining output $Y$. During this execution, DeepProve records an execution trace of each layer’s computations. Instead of translating every multiplication into a static circuit upfront (as SNARK approaches do), DeepProve uses the linear-time GKR protocol to verify each layer on the fly. For each network layer, the prover commits to the layer’s inputs and outputs (e.g., via cryptographic hashes or polynomial commitments) and then engages in a sum-check argument to prove that the outputs indeed result from the inputs as per the layer’s function. The sum-check protocol iteratively convinces the verifier of the correctness of a sum of evaluations of a polynomial that encodes the layer’s computation, without revealing the actual values. Non-linear operations (like ReLU, softmax) are handled efficiently through lookup arguments in DeepProve – if an activation’s output was computed, DeepProve can prove that each output corresponds to a valid input-output pair from a precomputed table for that function. Layer by layer, proofs are generated and then aggregated into one succinct proof covering the whole model’s forward pass. The heavy lifting of cryptography is minimized – DeepProve’s prover mostly performs normal numeric computations (the actual inference) plus some light cryptographic commitments, rather than solving a giant system of constraints.
Verification: The verifier uses the final succinct proof along with a few public values – typically the model’s committed identifier (a cryptographic commitment to $M$’s weights), the input $X$ (if not private), and the claimed output $Y$ – to check correctness. Verification in DeepProve’s system involves verifying the sum-check protocol’s transcript and the final polynomial or hash commitments. This is more involved than verifying a classic SNARK (which might be a few pairings), but it’s vastly cheaper than re-running the model. In Lagrange’s benchmarks, verifying a DeepProve proof for a medium CNN takes on the order of 0.5 seconds in software. That is ~0.5s to confirm, for example, that a convolutional network with hundreds of thousands of parameters ran correctly – over 500× faster than naively re-computing that CNN on a GPU for verification. (In fact, DeepProve measured up to 521× faster verification for CNNs and 671× for MLPs compared to re-execution.) The proof size is small enough to transmit on-chain (tens of KB), and verification could be performed in a smart contract if needed, although 0.5s of computation might require careful gas optimization or layer-2 execution.

Architecture and Tooling: DeepProve is implemented in Rust and provides a toolkit (the zkml library) for developers. It natively supports ONNX model graphs, making it compatible with models from PyTorch or TensorFlow (after exporting). The proving process currently targets models up to a few million parameters (tests include a 4M-parameter dense network). DeepProve leverages a combination of cryptographic components: a multilinear polynomial commitment (to commit to layer outputs), the sum-check protocol for verifying computations, and lookup arguments for non-linear ops. Notably, Lagrange’s open-source repository acknowledges it builds on prior work (the sum-check and GKR implementation from Scroll’s Ceno project), indicating an intersection of zkML with zero-knowledge rollup research.

To achieve real-time scalability, Lagrange pairs DeepProve with its Prover Network – a decentralized network of specialized ZK provers. Heavy proof generation can be offloaded to this network: when an application needs an inference proved, it sends the job to Lagrange’s network, where many operators (staked on EigenLayer for security) compute proofs and return the result. This network economically incentivizes reliable proof generation (malicious or failed jobs get the operator slashed). By distributing work across provers (and potentially leveraging GPUs or ASICs), the Lagrange Prover Network hides the complexity and cost from end-users. The result is a fast, scalable, and decentralized zkML service: “verifiable AI inferences fast and affordable”.

Performance Milestones: DeepProve’s claims are backed by benchmarks against the prior state-of-the-art, Ezkl. For a CNN with ~264k parameters (CIFAR-10 scale model), DeepProve’s proving time was ~1.24 seconds versus ~196 seconds for Ezkl – about 158× faster. For a larger dense network with 4 million parameters, DeepProve proved an inference in ~2.3 seconds vs ~126.8 seconds for Ezkl (~54× faster). Verification times also dropped: DeepProve verified the 264k CNN proof in ~0.6s, whereas verifying the Ezkl proof (Halo2-based) took over 5 minutes on CPU in that test. The speedups come from DeepProve’s near-linear complexity: its prover scales roughly O(n) with the number of operations, whereas circuit-based SNARK provers often have superlinear overhead (FFT and polynomial commitments scaling). In fact, DeepProve’s prover throughput can be within an order of magnitude of plain inference runtime – recent GKR systems can be <10× slower than raw execution for large matrix multiplications, an impressive achievement in ZK. This makes real-time or on-demand proofs more feasible, paving the way for verifiable AI in interactive applications.

Use Cases: Lagrange is already collaborating with Web3 and AI projects to apply zkML. Example use cases include: verifiable NFT traits (proving an AI-generated evolution of a game character or collectible is computed by the authorized model), provenance of AI content (proving an image or text was generated by a specific model, to combat deepfakes), DeFi risk models (proving a model’s output that assesses financial risk without revealing proprietary data), and private AI inference in healthcare or finance (where a hospital can get AI predictions with a proof, ensuring correctness without exposing patient data). By making AI outputs verifiable and privacy-preserving, DeepProve opens the door to “AI you can trust” in decentralized systems – moving from an era of “blind trust in black-box models” to one of “objective guarantees”.

SNARK-Based zkML: Ezkl and the Halo2 Approach

The traditional approach to zkML uses zk-SNARKs (Succinct Non-interactive Arguments of Knowledge) to prove neural network inference. Ezkl (by ZKonduit/Modulus Labs) is a leading example of this approach. It builds on the Halo2 proving system (a PLONK-style SNARK with polynomial commitments over BLS12-381). Ezkl provides a tooling chain where a developer can take a PyTorch or TensorFlow model, export it to ONNX, and have Ezkl compile it into a custom arithmetic circuit automatically.

How it works: Each layer of the neural network is converted into constraints:

Linear layers (dense or convolution) become collections of multiplication-add constraints that enforce the dot-products between inputs, weights, and outputs.
Non-linear layers (like ReLU, sigmoid, etc.) are handled via lookups or piecewise constraints because such functions are not polynomial. For instance, a ReLU can be implemented by a boolean selector $b$ with constraints ensuring $y = x \cdot b$ and $0 \le b \le 1$ and $b=1$ if $x>0$ (one way to do it), or more efficiently by a lookup table mapping $x \mapsto \max(0,x)$ for a range of $x$ values. Halo2’s lookup arguments allow mapping 16-bit (or smaller) chunks of values, so large domains (like all 32-bit values) are usually “chunked” into several smaller lookups. This chunking increases the number of constraints.
Big integer ops or divisions (if any) are similarly broken into small pieces. The result is a large set of R1CS/PLONK constraints tailored to the specific model architecture.

Ezkl then uses Halo2 to generate a proof that these constraints hold given the secret inputs (model weights, private inputs) and public outputs. Tooling and integration: One advantage of the SNARK approach is that it leverages well-known primitives. Halo2 is already used in Ethereum rollups (e.g. Zcash, zkEVMs), so it’s battle-tested and has an on-chain verifier readily available. Ezkl’s proofs use BLS12-381 curve, which Ethereum can verify via precompiles, making it straightforward to verify an Ezkl proof in a smart contract. The team has also provided user-friendly APIs; for example, data scientists can work with their models in Python and use Ezkl’s CLI to produce proofs, without deep knowledge of circuits.

Strengths: Ezkl’s approach benefits from the generality and ecosystem of SNARKs. It supports reasonably complex models and has already seen “practical integrations (from DeFi risk models to gaming AI)”, proving real-world ML tasks. Because it operates at the level of the model’s computation graph, it can apply ML-specific optimizations: e.g. pruning insignificant weights or quantizing parameters to reduce circuit size. It also means model confidentiality is natural – the weights can be treated as private witness data, so the verifier only sees that some valid model produced the output, or at best a commitment to the model. The verification of SNARK proofs is extremely fast (typically a few milliseconds or less on-chain), and proof sizes are small (a few kilobytes), which is ideal for blockchain usage.

Weaknesses: Performance is the Achilles’ heel. Circuit-based proving imposes large overheads, especially as models grow. It’s noted that historically, SNARK circuits could be a million times more work for the prover than just running the model itself. Halo2 and Ezkl optimize this, but still, operations like large matrix multiplications generate tons of constraints. If a model has millions of parameters, the prover must handle correspondingly millions of constraints, performing heavy FFTs and multiexponentiation in the process. This leads to high proving times (often minutes or hours for non-trivial models) and high memory usage. For example, proving even a relatively small CNN (e.g. a few hundred thousand parameters) can take tens of minutes with Ezkl on a single machine. The team behind DeepProve cited that Ezkl took hours for certain model proofs that DeepProve can do in minutes. Large models might not even fit in memory or require splitting into multiple proofs (which then need recursive aggregation). While Halo2 is “moderately optimized”, any need to “chunk” lookups or handle wide-bit operations translates to extra overhead. In summary, scalability is limited – Ezkl works well for small-to-medium models (and indeed outperformed some earlier alternatives like naive Stark-based VMs in benchmarks), but struggles as model size grows beyond a point.

Despite these challenges, Ezkl and similar SNARK-based zkML libraries are important stepping stones. They proved that verified ML inference is possible on-chain and have active usage. Notably, projects like Modulus Labs demonstrated verifying an 18-million-parameter model on-chain using SNARKs (with heavy optimization). The cost was non-trivial, but it shows the trajectory. Moreover, the Mina Protocol has its own zkML toolkit that uses SNARKs to allow smart contracts on Mina (which are Snark-based) to verify ML model execution. This indicates a growing multi-platform support for SNARK-based zkML.

STARK-Based Approaches: Transparent and Programmable ZK for ML

zk-STARKs (Scalable Transparent ARguments of Knowledge) offer another route to zkML. STARKs use hash-based cryptography (like FRI for polynomial commitments) and avoid any trusted setup. They often operate by simulating a CPU or VM and proving the execution trace is correct. In context of ML, one can either build a custom STARK for the neural network or use a general-purpose STARK VM to run the model code.

General STARK VMs (RISC Zero, Cairo): A straightforward approach is to write inference code and run it in a STARK VM. For example, Risc0 provides a RISC-V environment where any code (e.g., C++ or Rust implementation of a neural network) can be executed and proven via a STARK. Similarly, StarkWare’s Cairo language can express arbitrary computations (like an LSTM or CNN inference) which are then proved by the StarkNet STARK prover. The advantage is flexibility – you don’t need to design custom circuits for each model. However, early benchmarks showed that naive STARK VMs were slower compared to optimized SNARK circuits for ML. In one test, a Halo2-based proof (Ezkl) was about 3× faster than a STARK-based approach on Cairo, and even 66× faster than a RISC-V STARK VM on a certain benchmark in 2024. This gap is due to the overhead of simulating every low-level instruction in a STARK and the larger constants in STARK proofs (hashing is fast but you need a lot of it; STARK proof sizes are bigger, etc.). However, STARK VMs are improving and have the benefit of transparent setup (no trusted setup) and post-quantum security. As STARK-friendly hardware and protocols advance, proving speeds will improve.

DeepProve’s approach vs STARK: Interestingly, DeepProve’s use of GKR and sum-check yields a proof more akin to a STARK in spirit – it’s an interactive, hash-based proof with no need for a structured reference string. The trade-off is that its proofs are larger and verification is heavier than a SNARK. Yet, DeepProve shows that careful protocol design (specialized to ML’s layered structure) can vastly outperform both generic STARK VMs and SNARK circuits in proving time. We can consider DeepProve as a bespoke STARK-style zkML prover (though they use the term zkSNARK for succinctness, it doesn’t have a traditional SNARK’s small constant-size verification, since 0.5s verify is bigger than typical SNARK verify). Traditional STARK proofs (like StarkNet’s) often involve tens of thousands of field operations to verify, whereas SNARK verifies in maybe a few dozen. Thus, one trade-off is evident: SNARKs yield smaller proofs and faster verifiers, while STARKs (or GKR) offer easier scaling and no trusted setup at the cost of proof size and verify speed.

Emerging improvements: The JOLT zkVM (discussed earlier under JOLTx) is actually outputting SNARKs (using PLONKish commitments) but it embodies ideas that could be applied in STARK context too (Lasso lookups could theoretically be used with FRI commitments). StarkWare and others are researching ways to speed up proving of common operations (like using custom gates or hints in Cairo for big int ops, etc.). There’s also Circomlib-ML by Privacy&Scaling Explorations (PSE), which provides Circom templates for CNN layers, etc. – that’s SNARK-oriented, but conceptually similar templates could be made for STARK languages.

In practice, non-Ethereum ecosystems leveraging STARKs include StarkNet (which could allow on-chain verification of ML if someone writes a verifier, though cost is high) and Risc0’s Bonsai service (which is an off-chain proving service that emits STARK proofs which can be verified on various chains). As of 2025, most zkML demos on blockchain have favored SNARKs (due to verifier efficiency), but STARK approaches remain attractive for their transparency and potential in high-security or quantum-resistant settings. For example, a decentralized compute network might use STARKs to let anyone verify work without a trusted setup, useful for longevity. Also, some specialized ML tasks might exploit STARK-friendly structures: e.g. computations heavily using XOR/bit operations could be faster in STARKs (since those are cheap in boolean algebra and hashing) than in SNARK field arithmetic.

Summary of SNARK vs STARK for ML:

Performance: SNARKs (like Halo2) have huge proving overhead per gate but benefit from powerful optimizations and small constants for verify; STARKs (generic) have larger constant overhead but scale more linearly and avoid expensive crypto like pairings. DeepProve shows that customizing the approach (sum-check) yields near-linear proving time (fast) but with a STARK-like proof. JOLT shows that even a general VM can be made faster with heavy use of lookups. Empirically, for models up to millions of operations: a well-optimized SNARK (Ezkl) can handle it but might take tens of minutes, whereas DeepProve (GKR) can do it in seconds. STARK VMs in 2024 were likely in between or worse than SNARKs unless specialized (Risc0 was slower in tests, Cairo was slower without custom hints).
Verification: SNARK proofs verify quickest (milliseconds, and minimal data on-chain ~ a few hundred bytes to a few KB). STARK proofs are larger (dozens of KB) and take longer (tens of ms to seconds) to verify due to many hashing steps. In blockchain terms, a SNARK verify might cost e.g. ~200k gas, whereas a STARK verify could cost millions of gas – often too high for L1, acceptable on L2 or with succinct verification schemes.
Setup and Security: SNARKs like Groth16 require a trusted setup per circuit (unfriendly for arbitrary models), but universal SNARKs (PLONK, Halo2) have a one-time setup that can be reused for any circuit up to certain size. STARKs need no setup and use only hash assumptions (plus classical polynomial complexity assumptions), and are post-quantum secure. This makes STARKs appealing for longevity – proofs remain secure even if quantum computers emerge, whereas current SNARKs (BLS12-381 based) would be broken by quantum attacks.

We will consolidate these differences in a comparison table shortly.

FHE for ML (FHE-o-ML): Private Computation vs. Verifiable Computation

Fully Homomorphic Encryption (FHE) is a cryptographic technique that allows computations to be performed directly on encrypted data. In the context of ML, FHE can enable a form of privacy-preserving inference: for example, a client can send encrypted input to a model host, the host runs the neural network on the ciphertext without decrypting it, and sends back an encrypted result which the client can decrypt. This ensures data confidentiality – the model owner learns nothing about the input (and potentially the client learns only the output, not the model’s internals if they only get output). However, FHE by itself does not produce a proof of correctness in the same way ZKPs do. The client must trust that the model owner actually performed the computation honestly (the ciphertext could have been manipulated). Usually, if the client has the model or expects a certain distribution of outputs, blatant cheating can be detected, but subtle errors or use of a wrong model version would not be evident just from the encrypted output.

Trade-offs in performance: FHE is notoriously heavy in computation. Running deep learning inference under FHE incurs orders-of-magnitude slowdown. Early experiments (e.g., CryptoNets in 2016) took tens of seconds to evaluate a tiny CNN on encrypted data. By 2024, improvements like CKKS (for approximate arithmetic) and better libraries (Microsoft SEAL, Zama’s Concrete) have reduced this overhead, but it remains large. For example, a user reported that using Zama’s Concrete-ML to run a CIFAR-10 classifier took 25–30 minutes per inference on their hardware. After optimizations, Zama’s team achieved ~40 seconds for that inference on a 192-core server. Even 40s is extremely slow compared to a plaintext inference (which might be 0.01s), showing a ~$10^3$–$10^4\times$ overhead. Larger models or higher precision increase the cost further. Additionally, FHE operations consume a lot of memory and require occasional bootstrapping (a noise-reduction step) which is computationally expensive. In summary, scalability is a major issue – state-of-the-art FHE might handle a small CNN or simple logistic regression, but scaling to large CNNs or Transformers is beyond current practical limits.

Privacy advantages: FHE’s big appeal is data privacy. The input can remain completely encrypted throughout the process. This means an untrusted server can compute on a client’s private data without learning anything about it. Conversely, if the model is sensitive (proprietary), one could envisage encrypting the model parameters and having the client perform FHE inference on their side – but this is less common because if the client has to do the heavy FHE compute, it negates the idea of offloading to a powerful server. Typically, the model is public or held by server in the clear, and the data is encrypted by the client’s key. Model privacy in that scenario is not provided by default (the server knows the model; the client learns outputs but not weights). There are more exotic setups (like secure two-party computation or multi-key FHE) where both model and data can be kept private from each other, but those incur even more complexity. In contrast, zkML via ZKPs can ensure model privacy and data privacy at once – the prover can have both the model and data as secret witness, only revealing what’s needed to the verifier.

No on-chain verification needed (and none possible): With FHE, the result comes out encrypted to the client. The client then decrypts it to obtain the actual prediction. If we want to use that result on-chain, the client (or whoever holds the decryption key) would have to publish the plaintext result and convince others it’s correct. But at that point, trust is back in the loop – unless combined with a ZKP. In principle, one could combine FHE and ZKP: e.g., use FHE to keep data private during compute, and then generate a ZK-proof that the plaintext result corresponds to a correct computation. However, combining them means you pay the performance penalty of FHE and ZKP – extremely impractical with today’s tech. So, in practice FHE-of-ML and zkML serve different use cases:

FHE-of-ML: Ideal when the goal is confidentiality between two parties (client and server). For instance, a cloud service can host an ML model and users can query it with their sensitive data without revealing the data to the cloud (and if the model is sensitive, perhaps deploy it via FHE-friendly encodings). This is great for privacy-preserving ML services (medical predictions, etc.). The user still has to trust the service to faithfully run the model (since no proof), but at least any data leakage is prevented. Some projects like Zama are even exploring an “FHE-enabled EVM (fhEVM)” where smart contracts could operate on encrypted inputs, but verifying those computations on-chain would require the contract to somehow enforce correct computation – an open challenge likely requiring ZK proofs or specialized secure hardware.
zkML (ZKPs): Ideal when the goal is verifiability and public auditability. If you want anyone (or any contract) to be sure that “Model $M$ was evaluated correctly on $X$ and produced $Y$”, ZKPs are the solution. They also provide privacy as a bonus (you can hide $X$ or $Y$ or $M$ if needed by treating them as private inputs to the proof), but their primary feature is the proof of correct execution.

A complementary relationship: It’s worth noting that ZKPs protect the verifier (they learn nothing about secrets, only that the computation was correctly done), whereas FHE protects the prover’s data from the computing party. In some scenarios, these could be combined – for example, a network of untrusted nodes could use FHE to compute on users’ private data and then provide ZK proofs to the users (or blockchain) that the computations were done according to the protocol. This would cover both privacy and correctness, but the performance cost is enormous with today’s algorithms. More feasible in the near term are hybrids like Trusted Execution Environments (TEE) plus ZKP or Functional Encryption plus ZKP – these are beyond our scope, but they aim to provide something similar (TEEs keep data/model secret during compute, then a ZKP can attest the TEE did the right thing).

In summary, FHE-of-ML prioritizes confidentiality of inputs/outputs, while zkML prioritizes verifiable correctness (with possible privacy). Table 1 below contrasts the key properties:

Approach	Prover Performance (Inference & Proof)	Proof Size & Verification	Privacy Features	Trusted Setup?	Post-Quantum?
zk-SNARK (Halo2, Groth16, PLONK, etc)	Heavy prover overhead (up to 10^6× normal runtime without optimizations; in practice 10^3–10^5×). Optimized for specific model/circuit; proving time in minutes for medium models, hours for large. Recent zkML SNARKs (DeepProve with GKR) vastly improve this (near-linear overhead, e.g. seconds instead of minutes for million-param models).	Very small proofs (often < 100 KB, sometimes ~a few KB). Verification is fast: a few pairings or polynomial evals (typically < 50 ms on-chain). DeepProve’s GKR-based proofs are larger (tens–hundreds KB) and verify in ~0.5 s (still much faster than re-running the model).	Data confidentiality: Yes – inputs can be private in proof (not revealed). Model privacy: Yes – prover can commit to model weights and not reveal them. Output hiding: Optional – proof can be of a statement without revealing output (e.g. “output has property P”). However, if the output itself is needed on-chain, it typically becomes public. Overall, SNARKs offer full zero-knowledge flexibility (hide whichever parts you want).	Depends on scheme. Groth16/EZKL require a trusted setup per circuit; PLONK/Halo2 use a universal setup (one time). DeepProve’s sum-check GKR is transparent (no setup) – a bonus of that design.	Classical SNARKs (BLS12-381 curves) are not PQ-safe (vulnerable to quantum attacks on elliptic curve discrete log). Some newer SNARKs use PQ-safe commitments, but Halo2/PLONK as used in Ezkl are not PQ-safe. GKR (DeepProve) uses hash commitments (e.g. Poseidon/Merkle) which are conjectured PQ-safe (relying on hash preimage resistance).
zk-STARK (FRI, hash-based proof)	Prover overhead is high but more linear scaling. Typically 10^2–10^4× slower than native for large tasks, with room to parallelize. General STARK VMs (Risc0, Cairo) saw slower performance vs SNARK for ML in 2024 (e.g. 3×–66× slower than Halo2 in some cases). Specialized STARKs (or GKR) can approach linear overhead and outperform SNARKs for large circuits.	Proofs are larger: often tens of KB (growing with circuit size/log(n)). Verifier must do multiple hash and FFT checks – verification time ~O(n^ε) for small ε (e.g. ~50 ms to 500 ms depending on proof size). On-chain, this is costlier (StarkWare’s L1 verifier can take millions of gas per proof). Some STARKs support recursive proofs to compress size, at cost of prover time.	Data & Model privacy: A STARK can be made zero-knowledge by randomizing trace data (adding blinding to polynomial evaluations), so it can hide private inputs similarly to SNARK. Many STARK implementations focus on integrity, but zk-STARK variants do allow privacy. So yes, they can hide inputs/models like SNARKs. Output hiding: likewise possible in theory (prover doesn’t declare the output as public), but rarely used since usually the output is what we want to reveal/verify.	No trusted setup. Transparency is a hallmark of STARKs – only require common random string (which Fiat-Shamir can derive). This makes them attractive for open-ended use (any model, any time, no per-model ceremony).	Yes, STARKs rely on hash and information-theoretic security assumptions (like random oracle and difficulty of certain codeword decoding in FRI). These are believed to be secure against quantum adversaries. STARK proofs are thus PQ-resistant, an advantage for future-proofing verifiable AI.
FHE for ML (Fully Homomorphic Encryption applied to inference)	Prover = party doing computation on encrypted data. The computation time is extremely high: 10^3–10^5× slower than plaintext inference is common. High-end hardware (many-core servers, FPGA, etc.) can mitigate this. Some optimizations (low-precision inference, leveled FHE parameters) can reduce overhead but there is a fundamental performance hit. FHE is currently practical for small models or simple linear models; deep networks remain challenging beyond toy sizes.	No proof generated. The result is an encrypted output. Verification in the sense of checking correctness is not provided by FHE alone – one trusts the computing party to not cheat. (If combined with secure hardware, one might get an attestation; otherwise, a malicious server could return an incorrect encrypted result that the client would decrypt to wrong output without knowing the difference).	Data confidentiality: Yes – the input is encrypted, so the computing party learns nothing about it. Model privacy: If the model owner is doing the compute on encrypted input, the model is in plaintext on their side (not protected). If roles are reversed (client holds model encrypted and server computes), model could be kept encrypted, but this scenario is less common. There are techniques like secure two-party ML that combine FHE/MPC to protect both, but these go beyond plain FHE. Output hiding: By default, the output of the computation is encrypted (only decryptable by the party with the secret key, usually the input owner). So the output is hidden from the computing server. If we want the output public, the client can decrypt and reveal it.	No setup needed. Each user generates their own key pair for encryption. Trust relies on keys remaining secret.	The security of FHE schemes (e.g. BFV, CKKS, TFHE) is based on lattice problems (Learning With Errors), which are believed to be resistant to quantum attacks (at least no efficient quantum algorithm is known). So FHE is generally considered post-quantum secure.

Table 1: Comparison of zk-SNARK, zk-STARK, and FHE approaches for machine learning inference (performance and privacy trade-offs).

Use Cases and Implications for Web3 Applications

The convergence of AI and blockchain via zkML unlocks powerful new application patterns in Web3:

Decentralized Autonomous Agents & On-Chain Decision-Making: Smart contracts or DAOs can incorporate AI-driven decisions with guarantees of correctness. For example, imagine a DAO that uses a neural network to analyze market conditions before executing trades. With zkML, the DAO’s smart contract can require a zkSNARK proof that the authorized ML model (with a known hash commitment) was run on the latest data and produced the recommended action, before the action is accepted. This prevents malicious actors from injecting a fake prediction – the chain verifies the AI’s computation. Over time, one could even have fully on-chain autonomous agents (contracts that query off-chain AI or contain simplified models) making decisions in DeFi or games, with all their moves proven correct and policy-compliant via zk proofs. This raises the trust in autonomous agents, since their “thinking” is transparent and verifiable rather than a black-box.
Verifiable Compute Markets: Projects like Lagrange are effectively creating verifiable computation marketplaces – developers can outsource heavy ML inference to a network of provers and get back a proof with the result. This is analogous to decentralized cloud computing, but with built-in trust: you don’t need to trust the server, only the proof. It’s a paradigm shift for oracles and off-chain computation. Protocols like Ethereum’s upcoming DSC (decentralized sequencing layer) or oracle networks could use this to provide data feeds or analytic feeds with cryptographic guarantees. For instance, an oracle could supply “the result of model X on input Y” and anyone can verify the attached proof on-chain, rather than trusting the oracle’s word. This could enable verifiable AI-as-a-service on blockchain: any contract can request a computation (like “score these credit risks with my private model”) and accept the answer only with a valid proof. Projects such as Gensyn are exploring decentralized training and inference marketplaces using these verification techniques.
NFTs and Gaming – Provenance and Evolution: In blockchain games or NFT collectibles, zkML can prove traits or game moves were generated by legitimate AI models. For example, a game might allow an AI to evolve an NFT pet’s attributes. Without ZK, a clever user might modify the AI or the outcome to get a superior pet. With zkML, the game can require a proof that “pet’s new stats were computed by the official evolution model on the pet’s old stats”, preventing cheating. Similarly for generative art NFTs: an artist could release a generative model as a commitment; later, when minting NFTs, prove each image was produced by that model given some seed, guaranteeing authenticity (and even doing so without revealing the exact model to the public, preserving the artist’s IP). This provenance verification ensures authenticity in a manner akin to verifiable randomness – except here it’s verifiable creativity.
Privacy-Preserving AI in Sensitive Domains: zkML allows confirmation of outcomes without exposing inputs. In healthcare, a patient’s data could be run through an AI diagnostic model by a cloud provider; the hospital receives a diagnosis and a proof that the model (which could be privately held by a pharmaceutical company) was run correctly on the patient data. The patient data remains private (only an encrypted or committed form was used in the proof), and the model weights remain proprietary – yet the result is trusted. Regulators or insurance could also verify that only approved models were used. In finance, a company could prove to an auditor or regulator that its risk model was applied to its internal data and produced certain metrics without revealing the underlying sensitive financial data. This enables compliance and oversight with cryptographic assurances rather than manual trust.
Cross-Chain and Off-Chain Interoperability: Because zero-knowledge proofs are fundamentally portable, zkML can facilitate cross-chain AI results. One chain might have an AI-intensive application running off-chain; it can post a proof of the result to a different blockchain, which will trustlessly accept it. For instance, consider a multi-chain DAO using an AI to aggregate sentiment across social media (off-chain data). The AI analysis (complex NLP on large data) is done off-chain by a service that then posts a proof to a small blockchain (or multiple chains) that “analysis was done correctly and output sentiment score = 0.85”. All chains can verify and use that result in their governance logic, without each needing to rerun the analysis. This kind of interoperable verifiable compute is what Lagrange’s network aims to support, by serving multiple rollups or L1s simultaneously. It removes the need for trusted bridges or oracle assumptions when moving results between chains.
AI Alignment and Governance: On a more forward-looking note, zkML has been highlighted as a tool for AI governance and safety. Lagrange’s vision statements, for example, argue that as AI systems become more powerful (even superintelligent), cryptographic verification will be essential to ensure they follow agreed rules. By requiring AI models to produce proofs of their reasoning or constraints, humans retain a degree of control – “you cannot trust what you cannot verify”. While this is speculative and involves social as much as technical aspects, the technology could enforce that an AI agent running autonomously still proves it is using an approved model and hasn’t been tampered with. Decentralized AI networks might use on-chain proofs to verify contributions (e.g., a network of nodes collaboratively training a model can prove each update was computed faithfully). Thus zkML could play a role in ensuring AI systems remain accountable to human-defined protocols even in decentralized or uncontrolled environments.

In conclusion, zkML and verifiable on-chain AI represent a convergence of advanced cryptography and machine learning that stands to enhance trust, transparency, and privacy in AI applications. By comparing the major approaches – zk-SNARKs, zk-STARKs, and FHE – we see a spectrum of trade-offs between performance and privacy, each suitable for different scenarios. SNARK-based frameworks like Ezkl and innovations like Lagrange’s DeepProve have made it feasible to prove substantial neural network inferences with practical effort, opening the door to real-world deployments of verifiable AI. STARK-based and VM-based approaches promise greater flexibility and post-quantum security, which will become important as the field matures. FHE, while not a solution for verifiability, addresses the complementary need of confidential ML computation, and in combination with ZKPs or in specific private contexts it can empower users to leverage AI without sacrificing data privacy.

The implications for Web3 are significant: we can foresee smart contracts reacting to AI predictions, knowing they are correct; markets for compute where results are trustlessly sold; digital identities (like Worldcoin’s proof-of-personhood via iris AI) protected by zkML to confirm someone is human without leaking their biometric image; and generally a new class of “provable intelligence” that enriches blockchain applications. Many challenges remain – performance for very large models, developer ergonomics, and the need for specialized hardware – but the trajectory is clear. As one report noted, “today’s ZKPs can support small models, but moderate to large models break the paradigm”; however, rapid advances (50×–150× speedups with DeepProve over prior art) are pushing that boundary outward. With ongoing research (e.g., on hardware acceleration and distributed proving), we can expect progressively larger and more complex AI models to become provable. zkML might soon evolve from niche demos to an essential component of trusted AI infrastructure, ensuring that as AI becomes ubiquitous, it does so in a way that is auditable, decentralized, and aligned with user privacy and security.

Share on Twitter

API Marketplace Featured

Introduction: The Need for Verifiable AI on Blockchain​

How zkML Works: Compressing ML Inference into Succinct Proofs​

Lagrange DeepProve: Architecture and Performance of a zkML Breakthrough​

SNARK-Based zkML: Ezkl and the Halo2 Approach​

STARK-Based Approaches: Transparent and Programmable ZK for ML​

FHE for ML (FHE-o-ML): Private Computation vs. Verifiable Computation​

Use Cases and Implications for Web3 Applications​