7 posts tagged with "cryptographic proofs"

Cryptographic proof systems

Gensyn's Judge Tackles AI's Biggest Trust Gap: Who Evaluates the Evaluators?

March 27, 2026 · 9 min read

Software Engineer

GPT-4 disagrees with itself 40% of the time when asked to judge the same response twice. Bard hallucinated 91% of its references in medical systematic reviews. And the benchmarks meant to keep AI honest? Models are increasingly optimized to game them. The entire AI evaluation stack — the infrastructure that tells us whether a model is good, safe, or truthful — rests on foundations that are opaque, non-reproducible, and silently shifting under our feet.

Gensyn, the decentralized machine-learning protocol backed by $50 million from a16z crypto, CoinFund, and Protocol Labs, thinks it has a structural fix. Its new system, called Judge, brings cryptographically verifiable AI evaluation to production — replacing black-box API calls with deterministic, challengeable, on-chain proofs of model quality. If it works at scale, it could reshape how the AI industry establishes trust.

AgentKit: Bridging the Trust Gap in Agentic Commerce

March 20, 2026 · 9 min read

Dora Noda

Software Engineer

When an AI agent books a restaurant, buys concert tickets, or negotiates a price on your behalf, the website on the other end faces a question it has never had to ask before: is there actually a human behind this software?

On March 17, 2026, Sam Altman's World and Coinbase answered with AgentKit — a developer toolkit that lets AI agents carry cryptographic proof of human backing, embedded directly into the payment layer of the internet.

The timing is no accident. McKinsey projects agentic commerce — transactions initiated and completed by autonomous AI programs — could reach $3 trillion to $5 trillion globally by 2030. Morgan Stanley estimates $190 billion to $385 billion in U.S. e-commerce spending alone will flow through AI agents by the end of the decade. But as these agents multiply, so does the attack surface. One person running a thousand bots to scalp tickets, drain limited inventory, or game loyalty programs looks identical to a thousand legitimate customers — unless you can verify the humans behind the machines.

The ZK-ML Revolution: How Cryptographic Proofs Are Reinventing DeFi Risk Assessment

March 12, 2026 · 14 min read

Dora Noda

Software Engineer

When a DeFi lending protocol liquidates a position, how can you be certain the risk calculation was correct? What if the model was flawed, manipulated, or simply opaque? For years, DeFi has operated on a paradox: protocols demand transparency for on-chain execution, yet the AI models making critical risk decisions remain black boxes. Zero-knowledge machine learning (ZK-ML) is finally solving this trust gap—and the implications for institutional DeFi adoption in 2026 are profound.

The Trust Crisis in DeFi Risk Models

DeFi's explosive growth to over $50 billion in total value locked has created a new problem: institutional capital demands verifiable risk assessments, but current solutions force an unacceptable trade-off between transparency and confidentiality.

Traditional oracle-based risk systems expose protocols to three critical vulnerabilities. First, latency kills capital efficiency. In high-volatility events, slow or inaccurate price feeds prevent lending protocols from liquidating positions in time, leading to bad debt cascades. Legacy push-based oracles force protocols to use conservative loan-to-value ratios—typically 50-70%—to compensate for update delays, directly reducing borrower capital efficiency.

Second, manipulation remains endemic. Without cryptographic verification of how risk scores are calculated, protocols rely on trust in centralized data providers. A compromised oracle can trigger false liquidations or, worse, allow undercollateralized positions to persist until systemic failure.

Third, proprietary models create regulatory nightmares. Institutional participants need to prove their risk assessments are sound without exposing proprietary algorithms. Banks can't deploy lending protocols where risk logic is fully public, yet regulators won't accept opaque "trust us" systems. This regulatory catch-22 has stalled institutional DeFi integration.

The numbers tell the story: DeFi liquidation events in 2025 resulted in over $2.3 billion in cascading losses, with 40% attributed to oracle latency and manipulation vulnerabilities. Institutional participants are waiting on the sidelines—not because they doubt blockchain's potential, but because they can't accept the current risk infrastructure.

Enter Zero-Knowledge Machine Learning

ZK-ML represents a paradigm shift: it enables AI-generated risk assessments to be cryptographically verified without revealing underlying data or model parameters. Think of it as a mathematical proof that says, "This liquidation forecast was computed correctly using our proprietary model and your encrypted data"—without exposing either.

The technology works by converting machine learning inference into zero-knowledge proofs. When a DeFi protocol needs to assess liquidation risk, the ZK-ML system:

Runs the AI model on encrypted user data (collateral positions, trading history, wallet behavior)
Generates a cryptographic proof that the computation was performed correctly
Publishes the proof on-chain for anyone to verify, without revealing the model architecture or sensitive user data
Triggers smart contract actions (like liquidations) based on verifiably correct risk scores

This isn't theoretical. Projects like EZKL, Modulus Labs, and Gensyn are already demonstrating production-grade ZK-ML frameworks. EZKL's recent benchmarks show verification speeds 65.88x faster than earlier ZK systems, with support for models up to 18 million parameters. Modulus Labs proved on-chain inference of complex neural networks, while Gensyn is building decentralized training infrastructure with built-in verification.

The real-world impact is already visible. ORA's Marine liquidation system uses zkOracle-based implementations to perform trustless liquidations on Compound Finance. By introducing zero-latency oracle updates that trigger exactly when liquidations become possible, Marine enables lending protocols to offer higher LTV ratios—up to 85-90%—while maintaining safety margins that would be reckless with legacy oracles.

Privacy-Preserving Credit Scoring: The Institutional Unlock

For institutional DeFi adoption, credit scoring is the Holy Grail. Traditional finance relies on FICO scores and credit bureaus, but these systems are fundamentally incompatible with blockchain's pseudonymous design. How do you assess creditworthiness without KYC? How do you prove a borrower's repayment history without exposing their transaction graph?

ZK-ML solves this through privacy-preserving credit scoring. Research from IEEE and Springer demonstrates complete credit score systems using blockchain and zero-knowledge proofs. The architecture works by:

Encrypting credit data across multiple DeFi protocols (repayment history, liquidation events, wallet age, transaction patterns)
Running ML credit models on this encrypted data using homomorphic encryption or secure multi-party computation
Generating zero-knowledge proofs that a specific wallet address has a certain credit score range, without revealing which protocols contributed data or the wallet's full history
Creating portable on-chain attestations that let users carry their verified creditworthiness across platforms

This isn't just privacy theater—it's regulatory necessity. A recent study published in Science Direct demonstrated that blockchain-based verification layers with cryptographic Proof-of-SQL mechanisms enable institutions to validate borrower credentials while maintaining GDPR compliance. The VeriNet framework achieved this in both deepfake detection and fintech credit scoring applications, proving the approach works at scale.

The business case is compelling: institutional lenders can now deploy capital in DeFi lending pools with verifiable risk segmentation. Instead of treating all anonymous borrowers as high-risk (and charging 15-25% APY to compensate), protocols can offer differentiated rates—8% for verified low-risk wallets, 12% for medium-risk, 20% for high-risk—all while maintaining user privacy and regulatory compliance.

ZK-ML vs. Traditional Oracles: The Performance Gap

The speed advantage of ZK-ML over legacy oracle systems is staggering. Traditional price oracles update every 1-60 seconds depending on the implementation (Chainlink's heartbeat is typically 1-3% price deviation or hourly updates). During the March 2024 volatility spike, Ethereum gas prices spiked to 500+ gwei, causing oracle update delays of 10-15 minutes.

ZK-ML systems eliminate this latency by computing risk assessments on-demand with cryptographic proof generation taking 100-500 milliseconds for typical DeFi risk models. Marine's zkOracle implementation demonstrated this in production: liquidations triggered within 1-2 blocks of positions becoming undercollateralized, versus 10-50 blocks for oracle-dependent systems.

The capital efficiency gains are measurable. Conservative estimates suggest ZK-ML-enabled lending protocols can safely increase LTV ratios by 15-20 percentage points. For a $1 billion TVL protocol, this translates to $150-200 million in additional borrowing capacity—unlocking hundreds of millions in annual interest revenue that legacy infrastructure leaves on the table.

Beyond speed, ZK-ML offers manipulation resistance that oracles can't match. Traditional price feeds can be spoofed through flash loan attacks, validator collusion, or API key compromises. ZK-ML risk models operate on-chain with cryptographic verification of every computation step. An attacker would need to break the underlying zero-knowledge proof system (which would require breaking core cryptographic assumptions like discrete logarithm hardness) rather than just compromising a single oracle feed.

The Financial Stability Board's 2023 report on DeFi risks explicitly identified oracle manipulation as a systemic vulnerability. ZK-ML directly addresses this: when liquidation decisions are based on cryptographically proven risk models rather than trust-based price feeds, the attack surface shrinks by orders of magnitude.

Why Institutions Need Transparent Yet Confidential Models

The institutional DeFi adoption bottleneck isn't technology—it's trust infrastructure. When J.P. Morgan or State Street evaluate DeFi lending protocols, their due diligence teams ask: "How do you calculate liquidation risk?" "Can we audit your model?" "How do you prevent gaming?"

With traditional DeFi protocols, the answers are unsatisfying:

Fully transparent models: Open-source risk logic means competitors can front-run liquidations, market makers can game the system, and proprietary competitive advantages evaporate
Black-box models: Institutional compliance teams reject systems where risk calculations can't be audited
Oracle dependency: Reliance on external price feeds introduces counterparty risk that banks can't accept

ZK-ML breaks this impasse. Institutions can now deploy protocols with selectively transparent risk models:

Auditable verification: Regulators and auditors can verify that liquidation decisions follow the claimed algorithm, without seeing proprietary parameters
Competitive protection: Model architecture and training data remain confidential, preserving competitive advantages
On-chain accountability: Every risk decision generates an immutable cryptographic proof, creating perfect audit trails for compliance
Cross-protocol portability: Users can prove creditworthiness without revealing which protocols they've used

The regulatory implications are profound. The Enterprise Ethereum Alliance's DeFi Risk Assessment Guidelines (Version 1) explicitly call for "verifiable computation frameworks that preserve confidentiality while enabling audit." ZK-ML is the only technology that meets this specification.

Georgetown's recent policy paper on institutional DeFi integration identified the compliance challenge: "Rather than retrofitting traditional financial regulation onto intermediary-less systems, emerging solutions embed compliance capabilities directly into DeFi infrastructure." ZK-ML does exactly this—it's compliance-native architecture, not a bolted-on afterthought.

The 2026 Breakout: From Theory to Production

The inflection point is here. While ZK-ML concepts have existed since 2021, practical implementations are only now reaching production maturity. The evidence:

Infrastructure maturation: EZKL demonstrated support for attention mechanisms—barely feasible in 2024, now optimized for production use. Modulus Labs proved on-chain inference for 18 million parameter models, crossing the threshold where real-world credit models become viable.

Capital deployment: Gensyn raised significant funding to build decentralized AI training with cryptographic verification. Institutions aren't funding research projects—they're funding production infrastructure.

Ecosystem integration: Zero-knowledge proof technology has moved from cryptography research to blockchain-scale applications. Chainalysis and TRM Labs are building ZK-compatible compliance tools. The infrastructure layer is maturing.

Developer tooling: The barrier to implementing ZK-ML has collapsed. What required cryptography PhDs in 2023 can now be implemented by standard blockchain developers using EZKL, Modulus, or emerging frameworks. When developers can ship ZK-ML systems in weeks instead of years, adoption accelerates exponentially.

The trajectory mirrors DeFi's own evolution. In 2020, DeFi was a research curiosity with $1 billion TVL. By 2021, infrastructure matured and TVL exploded 50x to $50 billion. ZK-ML is tracking the same curve—2024 was research and proofs-of-concept, 2025 saw first production deployments, and 2026 is the breakout year.

Market signals confirm this. The PayFi sector (programmable payment infrastructure) reached $2.27 billion market cap with $148 million daily volume. Institutions are rotating capital from speculative DeFi to revenue-generating payment infrastructure—and they're demanding the risk management tools to make that capital deployment safe. ZK-ML is the missing piece.

The Road Ahead: Challenges and Opportunities

Despite the momentum, ZK-ML faces real technical and adoption hurdles. Computational overhead remains significant—generating zero-knowledge proofs for complex ML models requires 10-1000x more computation than standard inference. EZKL's 65x speedup over earlier systems is impressive, but still means a risk calculation that takes 10ms natively requires 650ms with ZK proofs.

For high-frequency trading and liquidation systems where microseconds matter, this latency is acceptable. For real-time applications requiring thousands of inferences per second, current ZK-ML systems struggle. The industry needs another 5-10x performance improvement before ZK-ML becomes viable for all DeFi use cases.

Model complexity limits are real. While Modulus Labs demonstrated 18 million parameters, cutting-edge AI models now exceed 100 billion parameters (GPT-4) or even trillions (dense transformer models). Current ZK-ML systems can't prove computations at that scale. For DeFi risk models—typically 1-50 million parameters—this isn't a blocker. But for frontier AI applications, ZK-ML needs fundamental algorithmic breakthroughs.

Standardization remains fragmented. EZKL, Modulus, Gensyn, and Worldcoin's Orion all use different proof systems, circuit designs, and verification mechanisms. This fragmentation creates integration nightmares: a DeFi protocol using EZKL proofs can't easily verify Modulus-generated credit scores without running multiple verification systems.

The industry needs ZK-ML standards similar to how ERC-20 standardized tokens or EIP-1559 standardized gas fees. The Enterprise Ethereum Alliance is working on this, but comprehensive standards won't arrive until late 2026 or 2027.

Yet the opportunities dwarf these challenges. Cross-chain credit scoring becomes possible when ZK proofs can attest to wallet behavior across multiple blockchains without revealing the underlying transaction graph. A user could prove "I have never been liquidated across Ethereum, Polygon, and Arbitrum" with a single cryptographic proof.

Automated risk-based lending transforms from concept to reality. Imagine depositing collateral into a DeFi protocol and instantly receiving a credit line calibrated to your verifiable on-chain history—no manual approval, no centralized credit bureau, just math and cryptography.

Regulatory compliance automation becomes tractable. Instead of hiring compliance teams to manually review DeFi transactions, institutions deploy ZK-ML systems that cryptographically prove AML/KYC compliance without revealing user identities to the blockchain.

The vision is a financial system that's simultaneously more transparent (every decision is verifiably correct) and more private (sensitive data never leaves encrypted form) than anything possible in traditional finance or current DeFi.

Why This Matters Beyond DeFi

The implications extend far beyond lending protocols and liquidations. Any system requiring verifiable AI decisions with privacy preservation becomes a ZK-ML use case:

Healthcare AI: Prove a diagnosis was made correctly without revealing patient records
Supply chain: Verify ESG compliance through ML audits without exposing proprietary supplier networks
Insurance: Calculate premiums using AI risk models while keeping policyholder data confidential
Voting systems: Use ML to detect fraudulent ballots while preserving voter privacy

But DeFi is the proving ground. It has the economic incentives (billions in TVL at risk), the technical sophistication (cryptography-native developers), and the regulatory pressure (institutional adoption depends on it) to drive ZK-ML from research to production.

When ZK-ML becomes standard infrastructure in DeFi lending—expected by Q4 2026 based on current development velocity—the technology will be production-tested and ready for deployment across every sector where trustworthy AI matters.

The Bottom Line

Zero-knowledge machine learning isn't just a technical upgrade—it's the trust infrastructure that institutional DeFi has been waiting for. By enabling cryptographically verifiable risk assessments that preserve both proprietary model confidentiality and user privacy, ZK-ML solves the regulatory paradox that has stalled billions in institutional capital.

The timeline is clear: 2024 was research, 2025 saw first production deployments, and 2026 is the breakout year. With frameworks like EZKL achieving 65x performance improvements, protocols like Marine demonstrating zero-latency liquidations, and institutional demand crystallizing around compliant risk infrastructure, the conditions for explosive adoption are aligned.

For DeFi protocols, the strategic question isn't whether to adopt ZK-ML—it's whether to lead the transition or watch competitors capture the institutional capital that comes with verifiable, privacy-preserving risk management. For institutions evaluating DeFi exposure, ZK-ML-enabled protocols represent the first generation of blockchain-based finance that meets the compliance, auditability, and risk management standards that fiduciary duty demands.

The risk assessment revolution is here. The only question is who builds it first.

BlockEden.xyz provides enterprise-grade blockchain infrastructure with industry-leading reliability and performance. Explore our API services to build on foundations designed to last.

Sources

Filecoin's Onchain Cloud Transformation: From Cold Storage to Programmable Infrastructure

February 26, 2026 · 11 min read

Dora Noda

Software Engineer

While AWS charges $23 per terabyte monthly for standard storage, Filecoin costs $0.19 for the same capacity. But cost alone never wins infrastructure wars. The real question is whether decentralized storage can match centralized cloud providers in the metrics that actually matter: speed, reliability, and developer experience. On November 18, 2025, Filecoin made its answer clear with the launch of Onchain Cloud—a fundamental transformation that turns 2.1 exbibytes of archival storage into programmable, verifiable infrastructure designed for AI workloads and real-time applications.

This isn't incremental improvement. It's Filecoin's pivot from "blockchain storage network" to "decentralized cloud platform," complete with automated payments, cryptographic verification, and performance guarantees. After months of testing with over 100 developer teams, the mainnet launched in January 2026, positioning Filecoin to capture a meaningful share of the $12 billion AI infrastructure market.

The Onchain Cloud Architecture: Three Pillars of Programmable Storage

Filecoin Onchain Cloud introduces three core services that collectively enable developers to build on verifiable, decentralized infrastructure without the complexity traditionally associated with blockchain storage.

Filecoin Warm Storage Service keeps data online and provably available through continuous onchain proofs. Unlike cold archival storage that requires retrieval delays, warm storage maintains data in an accessible state while still leveraging Filecoin's cryptographic verification. This addresses the primary limitation that kept Filecoin confined to backup and archival use cases—data wasn't fast enough for active workloads.

Filecoin Pay automates usage-based payments through smart contracts, settling transactions only when delivery is confirmed onchain. This is fundamental infrastructure for pay-as-you-go cloud services: payments flow automatically as services are proven, eliminating manual invoicing, credit systems, and trust assumptions. Thousands of payment channels have already processed transactions through the testnet phase.

Filecoin Beam enables measured, incentivized data retrievals with performance-based incentives. Storage providers compete not just on storage capacity but on retrieval speed and reliability. This creates a retrieval market where providers are rewarded for performance, directly addressing the historical weakness of decentralized storage: unpredictable retrieval times.

Developers access these services through the Synapse SDK, which abstracts the complexity of direct Filecoin protocol interaction. Early integrations come from the ERC-8004 community, Ethereum Name Service (ENS), KYVE, Monad, Safe, Akave, and Storacha—projects that need verifiable storage for everything from blockchain state to decentralized identity.

Cryptographic Proofs: The Technical Foundation of Verifiable Storage

What differentiates Filecoin from centralized cloud providers isn't just decentralization—it's cryptographic proof that storage commitments are being honored. This matters for AI training datasets that need provenance guarantees, compliance-heavy industries that require audit trails, and any application where data integrity is non-negotiable.

Proof-of-Replication (PoRep) generates a unique copy of a sector's original data through a computationally intensive sealing process. This proves that a storage provider is storing a physically unique copy of the client's data, not just pretending to store it or storing a single copy for multiple clients. The sealed sector undergoes slow encoding, making it infeasible for dishonest providers to regenerate data on-demand to fake storage.

The sealing process produces a Multi-SNARK proof and a set of commitments (CommR) that link the sealed sector to the original unsealed data. These commitments are publicly verifiable on the blockchain, creating an immutable record of storage deals.

Proof-of-Spacetime (PoSt) proves continuous storage over time through regular cryptographic challenges. Storage providers face a 30-minute deadline to respond to WindowPoSt challenges by submitting zk-SNARK proofs that verify they still possess the exact bytes they committed to storing. This happens continuously—not just at the initiation of a storage deal, but throughout its entire duration.

The verification process randomly selects leaf nodes from the encoded replica and runs Merkle inclusion proofs to show that the provider has the specific bytes that should be there. Providers then use the privately stored CommRLast to prove they know a root for the replica that both agrees with the inclusion proofs and can derive the publicly-known CommR. The final stage compresses these proofs into a single zk-SNARK for efficient onchain verification.

Failure to submit WindowPoSt proofs within the 30-minute window triggers slashing: the storage provider loses a portion of their collateral (burned to the f099 address), and their storage power is reduced. This creates economic consequences for storage failures, aligning provider incentives with network reliability.

This two-layer proof system—PoRep for initial verification, PoSt for continuous validation—creates verifiable storage that centralized clouds simply cannot offer. When AWS says they're storing your data, you trust their infrastructure and legal agreements. When Filecoin says it, you have cryptographic proof updated every 30 minutes.

AI Infrastructure Market: Where Decentralized Storage Meets Real Demand

The timing of Filecoin Onchain Cloud's launch aligns with a fundamental shift in AI infrastructure requirements. As artificial intelligence transitions from research curiosity to production infrastructure reshaping entire industries, the storage needs become clear and massive.

AI models require massive datasets for training. Modern large language models train on hundreds of billions of tokens. Computer vision models need millions of labeled images. Recommendation systems ingest user behavior data at scale. These datasets don't fit in local storage—they need cloud infrastructure. But they also need provenance guarantees: poisoned training data creates poisoned models, and there's no cryptographic way to verify data integrity on AWS.

Continuous data access for inference. Once trained, AI models need constant access to reference data for serving predictions. Retrieval-augmented generation (RAG) systems query knowledge bases to ground language model outputs. Real-time recommendation engines pull user profiles and item catalogs. These aren't one-time retrievals—they're continuous, high-frequency access patterns that demand fast, reliable storage.

Verifiable data provenance to prevent model poisoning. When a financial institution trains a fraud detection model, they need to know the training data wasn't tampered with. When a healthcare AI analyzes patient records, provenance matters for compliance and liability. Filecoin's PoRep and PoSt proofs create an audit trail that centralized storage can't replicate without introducing trusted intermediaries.

Decentralized storage to avoid concentration risks. Relying on a single cloud provider creates systemic risk. AWS outages have taken down significant portions of the internet. Google Cloud disruptions impact millions of services. For AI infrastructure that underpins critical systems, geographic and organizational distribution isn't a philosophical preference—it's a risk management requirement.

Filecoin's network holds 2.1 exbibytes of committed storage with an additional 7.6 EiB of raw capacity available. Network utilization has grown to 36% (up from 32% in Q2 2025), with active stored data near 1,110 petabytes. Around 2,500 datasets were onboarded in 2025, showing steady enterprise adoption.

The economic case is compelling: Filecoin averages $0.19 per terabyte monthly versus AWS's roughly $23 for the same capacity—a 99% cost reduction. But the real value proposition isn't just cheaper storage. It's verifiable storage at scale with programmable infrastructure, delivered through developer-friendly tools.

Competing Against Centralized Cloud: Where Filecoin Stands in 2026

The question isn't whether decentralized storage has advantages—verifiable proofs, censorship resistance, cost efficiency are clear. The question is whether those advantages matter enough to overcome the remaining disadvantages: primarily that Filecoin storage and retrieval is still slower and more complex than centralized alternatives.

Performance gap narrowing but not closed. AWS S3 delivers single-digit millisecond latency for reads. Filecoin Warm Storage and Beam retrievals can't match that—yet. But many workloads don't need millisecond latency. AI training runs access large datasets in sequential batch reads. Archival storage for compliance doesn't prioritize speed. Content distribution networks cache frequently accessed data regardless of origin storage speed.

The Onchain Cloud upgrade introduces sub-minute finality for storage commitments, a significant improvement over previous multi-hour sealing times. This doesn't compete with AWS for latency-critical applications, but it opens up new use cases that were previously impractical on Filecoin.

Developer experience improving through abstraction. Direct Filecoin protocol interaction requires understanding sectors, sealing, WindowPoSt challenges, and payment channels—concepts foreign to developers accustomed to AWS's simple API: create bucket, upload object, set permissions. The Synapse SDK abstracts this complexity, providing familiar interfaces while handling cryptographic proof verification in the background.

Early adoption from ENS, KYVE, Monad, and Safe suggests the developer experience has crossed a usability threshold. These aren't blockchain-native storage projects experimenting with Filecoin for ideological reasons—they're infrastructure projects with real storage needs choosing verifiable decentralized storage over centralized alternatives.

Reliability through economic incentives versus contractual SLAs. AWS offers 99.999999999% (11 nines) durability for S3 Standard through multi-region replication and contractual service level agreements. Filecoin achieves reliability through economic incentives: storage providers who fail WindowPoSt challenges lose collateral and storage power. This creates different risk profiles—one backed by corporate guarantees, the other by cryptographic proofs and financial penalties.

For applications that need both cryptographic verification and high availability, the optimal architecture likely involves Filecoin for verifiable storage of record plus CDN caching for fast retrieval. This hybrid approach leverages Filecoin's strengths (verifiability, cost, decentralization) while mitigating its weaknesses (retrieval speed) through edge caching.

Market positioning: not replacing AWS, but serving different needs. Filecoin isn't going to replace AWS for general-purpose cloud computing. But it doesn't need to. The addressable market is applications where verifiable storage, censorship resistance, or decentralization provide value beyond cost savings: AI training datasets with provenance requirements, blockchain state that needs permanent availability, scientific research data that requires long-term integrity guarantees, compliance-heavy industries that need cryptographic audit trails.

The $12 billion AI infrastructure market represents a subset of total cloud spending where Filecoin's value proposition is strongest. Capturing even 5% of that market would represent $600 million in annual storage demand—meaningful growth from current utilization levels.

From 2.1 EiB to the Future of Verifiable Infrastructure

Filecoin's total committed storage capacity has actually declined through 2025—from 3.8 exbibytes in Q1 to 3.3 EiB in Q2 to 3.0 EiB by Q3—as inefficient storage providers exited following the Network v27 "Golden Week" upgrade. This capacity decline while utilization increased (from 30% to 36%) suggests a maturing market: lower total capacity but higher paid storage as a percentage.

The network expects over 1 exbibyte in paid storage deals by the end of 2025, representing a transition from speculative capacity provisioning to actual customer demand. This matters more than raw capacity numbers—utilization indicates real value delivery, not just miners onboarding storage hoping for future demand.

The Onchain Cloud transformation positions Filecoin for a different growth trajectory: not maximizing total storage capacity, but maximizing storage utilization through services that developers actually need. Warm storage, verifiable retrieval, and automated payments address the barriers that kept Filecoin confined to niche archival use cases.

Early mainnet adoption will be the critical test. Developer teams have tested on testnet, but production deployments with real data and real payments will reveal whether the performance, reliability, and developer experience meet the standards required for infrastructure decisions. The projects already experimenting—ENS for decentralized identity storage, KYVE for blockchain data archives, Safe for multi-signature wallet infrastructure—suggest cautious optimism.

The AI infrastructure market opportunity is real, but not guaranteed. Filecoin faces competition from centralized cloud providers with massive head starts in performance and developer ecosystems, plus decentralized storage competitors like Arweave (permanent storage) and Storj (performance-focused S3 alternative). Winning requires execution: delivering reliability that meets production standards, maintaining competitive pricing as the network scales, and continuing to improve developer tools and documentation.

Filecoin's transformation from "blockchain storage" to "programmable onchain cloud" represents a necessary evolution. The question in 2026 isn't whether decentralized storage has theoretical advantages—it clearly does. The question is whether those advantages translate into developer adoption and customer demand at scale. The cryptographic proofs are in place. The economic incentives are aligned. Now comes the hard part: building a cloud platform that developers trust with production workloads.

BlockEden.xyz provides enterprise-grade infrastructure for blockchain developers building on verifiable foundations. Explore our API marketplace to access the infrastructure you need for applications designed to last.

Sources

Gensyn's Judge: How Bitwise-Exact Reproducibility Is Ending the Era of Opaque AI APIs

February 11, 2026 · 18 min read

Dora Noda

Software Engineer

Every time you query ChatGPT, Claude, or Gemini, you're trusting an invisible black box. The model version? Unknown. The exact weights? Proprietary. Whether the output was generated by the model you think you're using, or a silently updated variant? Impossible to verify. For casual users asking about recipes or trivia, this opacity is merely annoying. For high-stakes AI decision-making—financial trading algorithms, medical diagnoses, legal contract analysis—it's a fundamental crisis of trust.

Gensyn's Judge, launched in late 2025 and entering production in 2026, offers a radical alternative: cryptographically verifiable AI evaluation where every inference is reproducible down to the bit. Instead of trusting OpenAI or Anthropic to serve the correct model, Judge enables anyone to verify that a specific, pre-agreed AI model executed deterministically against real-world inputs—with cryptographic proofs ensuring the results can't be faked.

The technical breakthrough is Verde, Gensyn's verification system that eliminates floating-point nondeterminism—the bane of AI reproducibility. By enforcing bitwise-exact computation across devices, Verde ensures that running the same model on an NVIDIA A100 in London and an AMD MI250 in Tokyo yields identical results, provable on-chain. This unlocks verifiable AI for decentralized finance, autonomous agents, and any application where transparency isn't optional—it's existential.

The Opaque API Problem: Trust Without Verification

The AI industry runs on APIs. Developers integrate OpenAI's GPT-4, Anthropic's Claude, or Google's Gemini via REST endpoints, sending prompts and receiving responses. But these APIs are fundamentally opaque:

Version uncertainty: When you call gpt-4, which exact version am I getting? GPT-4-0314? GPT-4-0613? A silently updated variant? Providers frequently deploy patches without public announcements, changing model behavior overnight.

No audit trail: API responses include no cryptographic proof of which model generated them. If OpenAI serves a censored or biased variant for specific geographies or customers, users have no way to detect it.

Silent degradation: Providers can "lobotomize" models to reduce costs—downgrading inference quality while maintaining the same API contract. Users report GPT-4 becoming "dumber" over time, but without transparent versioning, such claims remain anecdotal.

Nondeterministic outputs: Even querying the same model twice with identical inputs can yield different results due to temperature settings, batching, or hardware-level floating-point rounding errors. This makes auditing impossible—how do you verify correctness when outputs aren't reproducible?

For casual applications, these issues are inconveniences. For high-stakes decision-making, they're blockers. Consider:

Algorithmic trading: A hedge fund deploys an AI agent managing $50 million in DeFi positions. The agent relies on GPT-4 to analyze market sentiment from X posts. If the model silently updates mid-trading session, sentiment scores shift unpredictably—triggering unintended liquidations. The fund has no proof the model misbehaved; OpenAI's logs aren't publicly auditable.

Medical diagnostics: A hospital uses an AI model to recommend cancer treatments. Regulations require doctors to document decision-making processes. But if the AI model version can't be verified, the audit trail is incomplete. A malpractice lawsuit could hinge on proving which model generated the recommendation—impossible with opaque APIs.

DAO governance: A decentralized organization uses an AI agent to vote on treasury proposals. Community members demand proof the agent used the approved model—not a tampered variant that favors specific outcomes. Without cryptographic verification, the vote lacks legitimacy.

This is the trust gap Gensyn targets: as AI becomes embedded in critical decision-making, the inability to verify model authenticity and behavior becomes a "fundamental blocker to deploying agentic AI in high-stakes environments."

Judge: The Verifiable AI Evaluation Protocol

Judge solves the opacity problem by executing pre-agreed, deterministic AI models against real-world inputs and committing results to a blockchain where anyone can challenge them. Here's how the protocol works:

1. Model commitment: Participants agree on an AI model—its architecture, weights, and inference configuration. This model is hashed and committed on-chain. The hash serves as a cryptographic fingerprint: any deviation from the agreed model produces a different hash.

2. Deterministic execution: Judge runs the model using Gensyn's Reproducible Runtime, which guarantees bitwise-exact reproducibility across devices. This eliminates floating-point nondeterminism—a critical innovation we'll explore shortly.

3. Public commitment: After inference, Judge posts the output (or a hash of it) on-chain. This creates a permanent, auditable record of what the model produced for a given input.

4. Challenge period: Anyone can challenge the result by re-executing the model independently. If their output differs, they submit a fraud proof. Verde's refereed delegation mechanism pinpoints the exact operator in the computational graph where results diverge.

5. Slashing for fraud: If a challenger proves Judge produced incorrect results, the original executor is penalized (slashing staked tokens). This aligns economic incentives: executors maximize profit by running models correctly.

Judge transforms AI evaluation from "trust the API provider" to "verify the cryptographic proof." The model's behavior is public, auditable, and enforceable—no longer hidden behind proprietary endpoints.

Verde: Eliminating Floating-Point Nondeterminism

The core technical challenge in verifiable AI is determinism. Neural networks perform billions of floating-point operations during inference. On modern GPUs, these operations aren't perfectly reproducible:

Non-associativity: Floating-point addition isn't associative. (a + b) + c might yield a different result than a + (b + c) due to rounding errors. GPUs parallelize sums across thousands of cores, and the order in which partial sums accumulate varies by hardware and driver version.

Kernel scheduling variability: GPU kernels (like matrix multiplication or attention) can execute in different orders depending on workload, driver optimizations, or hardware architecture. Even running the same model on the same GPU twice can yield different results if kernel scheduling differs.

Batch-size dependency: Research has found that LLM inference is system-level nondeterministic because output depends on batch size. Many kernels (matmul, RMSNorm, attention) change numerical output based on how many samples are processed together—an inference with batch size 1 produces different values than the same input in a batch of 8.

These issues make standard AI models unsuitable for blockchain verification. If two validators re-run the same inference and get slightly different outputs, who's correct? Without determinism, consensus is impossible.

Verde solves this with RepOps (Reproducible Operators)—a library that eliminates hardware nondeterminism by controlling the order of floating-point operations on all devices. Here's how it works:

Canonical reduction orders: RepOps enforces a deterministic order for summing partial results in operations like matrix multiplication. Instead of letting the GPU scheduler decide, RepOps explicitly specifies: "sum column 0, then column 1, then column 2..." across all hardware. This ensures (a + b) + c is always computed in the same sequence.

Custom CUDA kernels: Gensyn developed optimized kernels that prioritize reproducibility over raw speed. RepOps matrix multiplications incur less than 30% overhead compared to standard cuBLAS—a reasonable trade-off for determinism.

Driver and version pinning: Verde uses version-pinned GPU drivers and canonical configurations, ensuring that the same model executing on different hardware produces identical bitwise outputs. A model running on an NVIDIA A100 in one datacenter matches the output from an AMD MI250 in another, bit for bit.

This is the breakthrough enabling Judge's verification: bitwise-exact reproducibility means validators can independently confirm results without trusting executors. If the hash matches, the inference is correct—mathematically provable.

Refereed Delegation: Efficient Verification Without Full Recomputation

Even with deterministic execution, verifying AI inference naively is expensive. A 70-billion-parameter model generating 1,000 tokens might require 10 GPU-hours. If validators must re-run every inference to verify correctness, verification cost equals execution cost—defeating the purpose of decentralization.

Verde's refereed delegation mechanism makes verification exponentially cheaper:

Multiple untrusted executors: Instead of one executor, Judge assigns tasks to multiple independent providers. Each runs the same inference and submits results.

Disagreement triggers investigation: If all executors agree, the result is accepted—no further verification needed. If outputs differ, Verde initiates a challenge game.

Binary search over computation graph: Verde doesn't re-run the entire inference. Instead, it performs binary search over the model's computational graph to find the first operator where results diverge. This pinpoints the exact layer (e.g., "attention layer 47, head 8") causing the discrepancy.

Minimal referee computation: A referee (which can be a smart contract or validator with limited compute) checks only the disputed operator—not the entire forward pass. For a 70B-parameter model with 80 layers, this reduces verification to checking ~7 layers (log₂ 80) in the worst case.

This approach is over 1,350% more efficient than naive replication (where every validator re-runs everything). Gensyn combines cryptographic proofs, game theory, and optimized processes to guarantee correct execution without redundant computation.

The result: Judge can verify AI workloads at scale, enabling decentralized inference networks where thousands of untrusted nodes contribute compute—and dishonest executors are caught and penalized.

High-Stakes AI Decision-Making: Why Transparency Matters

Judge's target market isn't casual chatbots—it's applications where verifiability isn't a nice-to-have, but a regulatory or economic requirement. Here are scenarios where opaque APIs fail catastrophically:

Decentralized finance (DeFi): Autonomous trading agents manage billions in assets. If an agent uses an AI model to decide when to rebalance portfolios, users need proof the model wasn't tampered with. Judge enables on-chain verification: the agent commits to a specific model hash, executes trades based on its outputs, and anyone can challenge the decision logic. This transparency prevents rug pulls where malicious agents claim "the AI told me to liquidate" without evidence.

Regulatory compliance: Financial institutions deploying AI for credit scoring, fraud detection, or anti-money laundering (AML) face audits. Regulators demand explanations: "Why did the model flag this transaction?" Opaque APIs provide no audit trail. Judge creates an immutable record of model version, inputs, and outputs—satisfying compliance requirements.

Algorithmic governance: Decentralized autonomous organizations (DAOs) use AI agents to propose or vote on governance decisions. Community members must verify the agent used the approved model—not a hacked variant. With Judge, the DAO encodes the model hash in its smart contract, and every decision includes a cryptographic proof of correctness.

Medical and legal AI: Healthcare and legal systems require accountability. A doctor diagnosing cancer with AI assistance needs to document the exact model version used. A lawyer drafting contracts with AI must prove the output came from a vetted, unbiased model. Judge's on-chain audit trail provides this evidence.

Prediction markets and oracles: Projects like Polymarket use AI to resolve bet outcomes (e.g., "Will this event happen?"). If resolution depends on an AI model analyzing news articles, participants need proof the model wasn't manipulated. Judge verifies the oracle's AI inference, preventing disputes.

In each case, the common thread is trust without transparency is insufficient. As VeritasChain notes, AI systems need "cryptographic flight recorders"—immutable logs proving what happened when disputes arise.

The Zero-Knowledge Proof Alternative: Comparing Verde and ZKML

Judge isn't the only approach to verifiable AI. Zero-Knowledge Machine Learning (ZKML) achieves similar goals using zk-SNARKs: cryptographic proofs that a computation was performed correctly without revealing inputs or weights.

How does Verde compare to ZKML?

Verification cost: ZKML requires ~1,000× more computation than the original inference to generate proofs (research estimates). A 70B-parameter model needing 10 GPU-hours for inference might require 10,000 GPU-hours to prove. Verde's refereed delegation is logarithmic: checking ~7 layers instead of 80 is a 10× reduction, not 1,000×.

Prover complexity: ZKML demands specialized hardware (like custom ASICs for zk-SNARK circuits) to generate proofs efficiently. Verde works on commodity GPUs—any miner with a gaming PC can participate.

Privacy trade-offs: ZKML's strength is privacy—proofs reveal nothing about inputs or model weights. Verde's deterministic execution is transparent: inputs and outputs are public (though weights can be encrypted). For high-stakes decision-making, transparency is often desirable. A DAO voting on treasury allocation wants public audit trails, not hidden proofs.

Proving scope: ZKML is practically limited to inference—proving training is infeasible at current computational costs. Verde supports both inference and training verification (Gensyn's broader protocol verifies distributed training).

Real-world adoption: ZKML projects like Modulus Labs have achieved breakthroughs (verifying 18M-parameter models on-chain), but remain limited to smaller models. Verde's deterministic runtime handles 70B+ parameter models in production.

ZKML excels where privacy is paramount—like verifying biometric authentication (Worldcoin) without exposing iris scans. Verde excels where transparency is the goal—proving a specific public model executed correctly. Both approaches are complementary, not competing.

The Gensyn Ecosystem: From Judge to Decentralized Training

Judge is one component of Gensyn's broader vision: a decentralized network for machine learning compute. The protocol includes:

Execution layer: Consistent ML execution across heterogeneous hardware (consumer GPUs, enterprise clusters, edge devices). Gensyn standardizes inference and training workloads, ensuring compatibility.

Verification layer (Verde): Trustless verification using refereed delegation. Dishonest executors are detected and penalized.

Peer-to-peer communication: Workload distribution across devices without centralized coordination. Miners receive tasks, execute them, and submit proofs directly to the blockchain.

Decentralized coordination: Smart contracts on an Ethereum rollup identify participants, allocate tasks, and process payments permissionlessly.

Gensyn's Public Testnet launched in March 2025, with mainnet planned for 2026. The $AI token public sale occurred in December 2025, establishing economic incentives for miners and validators.

Judge fits into this ecosystem as the evaluation layer: while Gensyn's core protocol handles training and inference, Judge ensures those outputs are verifiable. This creates a flywheel:

Developers train models on Gensyn's decentralized network (cheaper than AWS due to underutilized consumer GPUs contributing compute).

Models are deployed with Judge guaranteeing evaluation integrity. Applications consume inference via Gensyn's APIs, but unlike OpenAI, every output includes a cryptographic proof.

Validators earn fees by checking proofs and catching fraud, aligning economic incentives with network security.

Trust scales as more applications adopt verifiable AI, reducing reliance on centralized providers.

The endgame: AI training and inference that's provably correct, decentralized, and accessible to anyone—not just Big Tech.

Challenges and Open Questions

Judge's approach is groundbreaking, but several challenges remain:

Performance overhead: RepOps' 30% slowdown is acceptable for verification, but if every inference must run deterministically, latency-sensitive applications (real-time trading, autonomous vehicles) might prefer faster, non-verifiable alternatives. Gensyn's roadmap likely includes optimizing RepOps further—but there's a fundamental trade-off between speed and determinism.

Driver version fragmentation: Verde assumes version-pinned drivers, but GPU manufacturers release updates constantly. If some miners use CUDA 12.4 and others use 12.5, bitwise reproducibility breaks. Gensyn must enforce strict version management—complicating miner onboarding.

Model weight secrecy: Judge's transparency is a feature for public models but a bug for proprietary ones. If a hedge fund trains a valuable trading model, deploying it on Judge exposes weights to competitors (via the on-chain commitment). ZKML-based alternatives might be preferred for secret models—suggesting Judge targets open or semi-open AI applications.

Dispute resolution latency: If a challenger claims fraud, resolving the dispute via binary search requires multiple on-chain transactions (each round narrows the search space). High-frequency applications can't wait hours for finality. Gensyn might introduce optimistic verification (assume correctness unless challenged within a window) to reduce latency.

Sybil resistance in refereed delegation: If multiple executors must agree, what prevents a single entity from controlling all executors via Sybil identities? Gensyn likely uses stake-weighted selection (high-reputation validators are chosen preferentially) plus slashing to deter collusion—but the economic thresholds must be carefully calibrated.

These aren't showstoppers—they're engineering challenges. The core innovation (deterministic AI + cryptographic verification) is sound. Execution details will mature as the testnet transitions to mainnet.

The Road to Verifiable AI: Adoption Pathways and Market Fit

Judge's success depends on adoption. Which applications will deploy verifiable AI first?

DeFi protocols with autonomous agents: Aave, Compound, or Uniswap DAOs could integrate Judge-verified agents for treasury management. The community votes to approve a model hash, and all agent decisions include proofs. This transparency builds trust—critical for DeFi's legitimacy.

Prediction markets and oracles: Platforms like Polymarket or Chainlink could use Judge to resolve bets or deliver price feeds. AI models analyzing sentiment, news, or on-chain activity would produce verifiable outputs—eliminating disputes over oracle manipulation.

Decentralized identity and KYC: Projects requiring AI-based identity verification (age estimation from selfies, document authenticity checks) benefit from Judge's audit trail. Regulators accept cryptographic proofs of compliance without trusting centralized identity providers.

Content moderation for social media: Decentralized social networks (Farcaster, Lens Protocol) could deploy Judge-verified AI moderators. Community members verify the moderation model isn't biased or censored—ensuring platform neutrality.

AI-as-a-Service platforms: Developers building AI applications can offer "verifiable inference" as a premium feature. Users pay extra for proofs, differentiating services from opaque alternatives.

The commonality: applications where trust is expensive (due to regulation, decentralization, or high stakes) and verification cost is acceptable (compared to the value of certainty).

Judge won't replace OpenAI for consumer chatbots—users don't care if GPT-4 is verifiable when asking for recipe ideas. But for financial algorithms, medical tools, and governance systems, verifiable AI is the future.

Verifiability as the New Standard

Gensyn's Judge represents a paradigm shift: AI evaluation is moving from "trust the provider" to "verify the proof." The technical foundation—bitwise-exact reproducibility via Verde, efficient verification through refereed delegation, and on-chain audit trails—makes this transition practical, not just aspirational.

The implications ripple far beyond Gensyn. If verifiable AI becomes standard, centralized providers lose their moats. OpenAI's value proposition isn't just GPT-4's capabilities—it's the convenience of not managing infrastructure. But if Gensyn proves decentralized AI can match centralized performance with added verifiability, developers have no reason to lock into proprietary APIs.

The race is on. ZKML projects (Modulus Labs, Worldcoin's biometric system) are betting on zero-knowledge proofs. Deterministic runtimes (Gensyn's Verde, EigenAI) are betting on reproducibility. Optimistic approaches (blockchain AI oracles) are betting on fraud proofs. Each path has trade-offs—but the destination is the same: AI systems where outputs are provable, not just plausible.

For high-stakes decision-making, this isn't optional. Regulators won't accept "trust us" from AI providers in finance, healthcare, or legal applications. DAOs won't delegate treasury management to black-box agents. And as autonomous AI systems grow more powerful, the public will demand transparency.

Judge is the first production-ready system delivering on this promise. The testnet is live. The cryptographic foundations are solid. The market—$27 billion in AI agent crypto, billions in DeFi assets managed by algorithms, and regulatory pressure mounting—is ready.

The era of opaque AI APIs is ending. The age of verifiable intelligence is beginning. And Gensyn's Judge is lighting the way.

Sources:

Nillion's Blacklight Goes Live: How ERC-8004 is Building the Trust Layer for Autonomous AI Agents

February 11, 2026 · 12 min read

Dora Noda

Software Engineer

On February 2, 2026, the AI agent economy took a critical step forward. Nillion launched Blacklight, a verification layer implementing the ERC-8004 standard to solve one of blockchain's most pressing questions: how do you trust an AI agent you've never met?

The answer isn't a simple reputation score or a centralized registry. It's a five-step verification process backed by cryptographic proofs, programmable audits, and a network of community-operated nodes. As autonomous agents increasingly execute trades, manage treasuries, and coordinate cross-chain activities, Blacklight represents the infrastructure enabling trustless AI coordination at scale.

The Trust Problem AI Agents Can't Solve Alone

The numbers tell the story. AI agents now contribute 30% of Polymarket's trading volume, handle DeFi yield strategies across multiple protocols, and autonomously execute complex workflows. But there's a fundamental bottleneck: how do agents verify each other's trustworthiness without pre-existing relationships?

Traditional systems rely on centralized authorities issuing credentials. Web3's promise is different—trustless verification through cryptography and consensus. Yet until ERC-8004, there was no standardized way for agents to prove their authenticity, track their behavior, or validate their decision-making logic on-chain.

This isn't just a theoretical problem. As Davide Crapis explains, "ERC-8004 enables decentralized AI agent interactions, establishes trustless commerce, and enhances reputation systems on Ethereum." Without it, agent-to-agent commerce remains confined to walled gardens or requires manual oversight—defeating the purpose of autonomy.

ERC-8004: The Three-Registry Trust Infrastructure

The ERC-8004 standard, which went live on Ethereum mainnet on January 29, 2026, establishes a modular trust layer through three on-chain registries:

Identity Registry: Uses ERC-721 to provide portable agent identifiers. Each agent receives a non-fungible token representing its unique on-chain identity, enabling cross-platform recognition and preventing identity spoofing.

Reputation Registry: Collects standardized feedback and ratings. Unlike centralized review systems, feedback is recorded on-chain with cryptographic signatures, creating an immutable audit trail. Anyone can crawl this history and build custom reputation algorithms.

Validation Registry: Supports cryptographic and economic verification of agent work. This is where programmable audits happen—validators can re-execute computations, verify zero-knowledge proofs, or leverage Trusted Execution Environments (TEEs) to confirm an agent acted correctly.

The brilliance of ERC-8004 is its unopinionated design. As the technical specification notes, the standard supports various validation techniques: "stake-secured re-execution of tasks (inspired by systems like EigenLayer), verification of zero-knowledge machine learning (zkML) proofs, and attestations from Trusted Execution Environments."

This flexibility matters. A DeFi arbitrage agent might use zkML proofs to verify its trading logic without revealing alpha. A supply chain agent might use TEE attestations to prove it accessed real-world data correctly. A cross-chain bridge agent might rely on crypto-economic validation with slashing to ensure honest execution.

Blacklight's Five-Step Verification Process

Nillion's implementation of ERC-8004 on Blacklight adds a crucial layer: community-operated verification nodes. Here's how the process works:

1. Agent Registration: An agent registers its identity in the Identity Registry, receiving an ERC-721 NFT. This creates a unique on-chain identifier tied to the agent's public key.

2. Verification Request Initiation: When an agent performs an action requiring validation (e.g., executing a trade, transferring funds, or updating state), it submits a verification request to Blacklight.

3. Committee Assignment: Blacklight's protocol randomly assigns a committee of verification nodes to audit the request. These nodes are operated by community members who stake 70,000 NIL tokens, aligning incentives for network integrity.

4. Node Checks: Committee members re-execute the computation or validate cryptographic proofs. If validators detect incorrect behavior, they can slash the agent's stake (in systems using crypto-economic validation) or flag the identity in the Reputation Registry.

5. On-Chain Reporting: Results are posted on-chain. The Validation Registry records whether the agent's work was verified, creating permanent proof of execution. The Reputation Registry updates accordingly.

This process happens asynchronously and non-blocking, meaning agents don't wait for verification to complete routine tasks—but high-stakes actions (large transfers, cross-chain operations) can require upfront validation.

Programmable Audits: Beyond Binary Trust

Blacklight's most ambitious feature is "programmable verification"—the ability to audit how an agent makes decisions, not just what it does.

Consider a DeFi agent managing a treasury. Traditional audits verify that funds moved correctly. Programmable audits verify:

Decision-making logic consistency: Did the agent follow its stated investment strategy, or did it deviate?
Multi-step workflow execution: If the agent was supposed to rebalance portfolios across three chains, did it complete all steps?
Security constraints: Did the agent respect gas limits, slippage tolerances, and exposure caps?

This is possible because ERC-8004's Validation Registry supports arbitrary proof systems. An agent can commit to a decision-making algorithm on-chain (e.g., a hash of its neural network weights or a zk-SNARK circuit representing its logic), then prove each action conforms to that algorithm without revealing proprietary details.

Nillion's roadmap explicitly targets these use cases: "Nillion plans to expand Blacklight's capabilities to 'programmable verification,' enabling decentralized audits of complex behaviors such as agent decision-making logic consistency, multi-step workflow execution, and security constraints."

This shifts verification from reactive (catching errors after the fact) to proactive (enforcing correct behavior by design).

Nillion's underlying technology—Nil Message Compute (NMC)—adds a privacy dimension to agent verification. Unlike traditional blockchains where all data is public, Nillion's "blind computation" enables operations on encrypted data without decryption.

Here's why this matters for agents: an AI agent might need to verify its trading strategy without revealing alpha to competitors. Or prove it accessed confidential medical records correctly without exposing patient data. Or demonstrate compliance with regulatory constraints without disclosing proprietary business logic.

Nillion's NMC achieves this through multi-party computation (MPC), where nodes collaboratively generate "blinding factors"—correlated randomness used to encrypt data. As DAIC Capital explains, "Nodes generate the key network resource needed to process data—a type of correlated randomness referred to as a blinding factor—with each node storing its share of the blinding factor securely, distributing trust across the network in a quantum-safe way."

This architecture is quantum-resistant by design. Even if a quantum computer breaks today's elliptic curve cryptography, distributed blinding factors remain secure because no single node possesses enough information to decrypt data.

For AI agents, this means verification doesn't require sacrificing confidentiality. An agent can prove it executed a task correctly while keeping its methods, data sources, and decision-making logic private.

The $4.3 Billion Agent Economy Infrastructure Play

Blacklight's launch comes as the blockchain-AI sector enters hypergrowth. The market is projected to grow from $680 million (2025) to $4.3 billion (2034) at a 22.9% CAGR, while the broader confidential computing market reaches $350 billion by 2032.

But Nillion isn't just betting on market expansion—it's positioning itself as critical infrastructure. The agent economy's bottleneck isn't compute or storage; it's trust at scale. As KuCoin's 2026 outlook notes, three key trends are reshaping AI identity and value flow:

Agent-Wrapping-Agent systems: Agents coordinating with other agents to execute complex multi-step tasks. This requires standardized identity and verification—exactly what ERC-8004 provides.

KYA (Know Your Agent): Financial infrastructure demanding agent credentials. Regulators won't approve autonomous agents managing funds without proof of correct behavior. Blacklight's programmable audits directly address this.

Nano-payments: Agents need to settle micropayments efficiently. The x402 payment protocol, which processed over 20 million transactions in January 2026, complements ERC-8004 by handling settlement while Blacklight handles trust.

Together, these standards reached production readiness within weeks of each other—a coordination breakthrough signaling infrastructure maturation.

Ethereum's Agent-First Future

ERC-8004's adoption extends far beyond Nillion. As of early 2026, multiple projects have integrated the standard:

Oasis Network: Implementing ERC-8004 for confidential computing with TEE-based validation
The Graph: Supporting ERC-8004 and x402 to enable verifiable agent interactions in decentralized indexing
MetaMask: Exploring agent wallets with built-in ERC-8004 identity
Coinbase: Integrating ERC-8004 for institutional agent custody solutions

This rapid adoption reflects a broader shift in Ethereum's roadmap. Vitalik Buterin has repeatedly emphasized that blockchain's role is becoming "just the plumbing" for AI agents—not the consumer-facing layer, but the trust infrastructure enabling autonomous coordination.

Nillion's Blacklight accelerates this vision by making verification programmable, privacy-preserving, and decentralized. Instead of relying on centralized oracles or human reviewers, agents can prove their correctness cryptographically.

What Comes Next: Mainnet Integration and Ecosystem Expansion

Nillion's 2026 roadmap prioritizes Ethereum compatibility and sustainable decentralization. The Ethereum bridge went live in February 2026, followed by native smart contracts for staking and private computation.

Community members staking 70,000 NIL tokens can operate Blacklight verification nodes, earning rewards while maintaining network integrity. This design mirrors Ethereum's validator economics but adds a verification-specific role.

The next milestones include:

Expanded zkML support: Integrating with projects like Modulus Labs to verify AI inference on-chain
Cross-chain verification: Enabling Blacklight to verify agents operating across Ethereum, Cosmos, and Solana
Institutional partnerships: Collaborations with Coinbase and Alibaba Cloud for enterprise agent deployment
Regulatory compliance tools: Building KYA frameworks for financial services adoption

Perhaps most importantly, Nillion is developing nilGPT—a fully private AI chatbot demonstrating how blind computation enables confidential agent interactions. This isn't just a demo; it's a blueprint for agents handling sensitive data in healthcare, finance, and government.

The Trustless Coordination Endgame

Blacklight's launch marks a pivot point for the agent economy. Before ERC-8004, agents operated in silos—trusted within their own ecosystems but unable to coordinate across platforms without human intermediaries. After ERC-8004, agents can verify each other's identity, audit each other's behavior, and settle payments autonomously.

This unlocks entirely new categories of applications:

Decentralized hedge funds: Agents managing portfolios across chains, with verifiable investment strategies and transparent performance audits
Autonomous supply chains: Agents coordinating logistics, payments, and compliance without centralized oversight
AI-powered DAOs: Organizations governed by agents that vote, propose, and execute based on cryptographically verified decision-making logic
Cross-protocol liquidity management: Agents rebalancing assets across DeFi protocols with programmable risk constraints

The common thread? All require trustless coordination—the ability for agents to work together without pre-existing relationships or centralized trust anchors.

Nillion's Blacklight provides exactly that. By combining ERC-8004's identity and reputation infrastructure with programmable verification and blind computation, it creates a trust layer scalable enough for the trillion-agent economy on the horizon.

As blockchain becomes the plumbing for AI agents and global finance, the question isn't whether we need verification infrastructure—it's who builds it, and whether it's decentralized or controlled by a few gatekeepers. Blacklight's community-operated nodes and open standard make the case for the former.

The age of autonomous on-chain actors is here. The infrastructure is live. The only question left is what gets built on top.

Sources:

Verifiable On-Chain AI with zkML and Cryptographic Proofs

April 22, 2025 · 36 min read

Dora Noda

Software Engineer

Introduction: The Need for Verifiable AI on Blockchain

As AI systems grow in influence, ensuring their outputs are trustworthy becomes critical. Traditional methods rely on institutional assurances (essentially “just trust us”), which offer no cryptographic guarantees. This is especially problematic in decentralized contexts like blockchains, where a smart contract or user must trust an AI-derived result without being able to re-run a heavy model on-chain. Zero-knowledge Machine Learning (zkML) addresses this by allowing cryptographic verification of ML computations. In essence, zkML enables a prover to generate a succinct proof that “the output $Y$ came from running model $M$ on input $X$” – without revealing $X$ or the internal details of $M$. These zero-knowledge proofs (ZKPs) can be verified by anyone (or any contract) efficiently, shifting AI trust from “policy to proof”.

On-chain verifiability of AI means a blockchain can incorporate advanced computations (like neural network inferences) by verifying a proof of correct execution instead of performing the compute itself. This has broad implications: smart contracts can make decisions based on AI predictions, decentralized autonomous agents can prove they followed their algorithms, and cross-chain or off-chain compute services can provide verifiable outputs rather than unverifiable oracles. Ultimately, zkML offers a path to trustless and privacy-preserving AI – for example, proving an AI model’s decisions are correct and authorized without exposing private data or proprietary model weights. This is key for applications ranging from secure healthcare analytics to blockchain gaming and DeFi oracles.

How zkML Works: Compressing ML Inference into Succinct Proofs

At a high level, zkML combines cryptographic proof systems with ML inference so that a complex model evaluation can be “compressed” into a small proof. Internally, the ML model (e.g. a neural network) is represented as a circuit or program consisting of many arithmetic operations (matrix multiplications, activation functions, etc.). Rather than revealing all intermediate values, a prover performs the full computation off-chain and then uses a zero-knowledge proof protocol to attest that every step was done correctly. The verifier, given only the proof and some public data (like the final output and an identifier for the model), can be cryptographically convinced of the correctness without re-executing the model.

To achieve this, zkML frameworks typically transform the model computation into a format amenable to ZKPs:

Circuit Compilation: In SNARK-based approaches, the computation graph of the model is compiled into an arithmetic circuit or set of polynomial constraints. Each layer of the neural network (convolutions, matrix multiplies, nonlinear activations) becomes a sub-circuit with constraints ensuring the outputs are correct given the inputs. Because neural nets involve non-linear operations (ReLUs, Sigmoids, etc.) not naturally suited to polynomials, techniques like lookup tables are used to handle these efficiently. For example, a ReLU (output = max(0, input)) can be enforced by a custom constraint or lookup that verifies output equals input if input≥0 else zero. The end result is a set of cryptographic constraints that the prover must satisfy, which implicitly proves the model ran correctly.
Execution Trace & Virtual Machines: An alternative is to treat the model inference as a program trace, as done in zkVM approaches. For instance, the JOLT zkVM targets the RISC-V instruction set; one can compile the ML model (or the code that computes it) to RISC-V and then prove each CPU instruction executed properly. JOLT introduces a “lookup singularity” technique, replacing expensive arithmetic constraints with fast table lookups for each valid CPU operation. Every operation (add, multiply, bitwise op, etc.) is checked via a lookup in a giant table of pre-computed valid outcomes, using a specialized argument (Lasso/SHOUT) to keep this efficient. This drastically reduces the prover workload: even complex 64-bit operations become a single table lookup in the proof instead of many arithmetic constraints.
Interactive Protocols (GKR Sum-Check): A third approach uses interactive proofs like GKR (Goldwasser–Kalai–Rotblum) to verify a layered computation. Here the model’s computation is viewed as a layered arithmetic circuit (each neural network layer is one layer of the circuit graph). The prover runs the model normally but then engages in a sum-check protocol to prove that each layer’s outputs are correct given its inputs. In Lagrange’s approach (DeepProve, detailed next), the prover and verifier perform an interactive polynomial protocol (made non-interactive via Fiat-Shamir) that checks consistency of each layer’s computations without re-doing them. This sum-check method avoids generating a monolithic static circuit; instead it verifies the consistency of computations in a step-by-step manner with minimal cryptographic operations (mostly hashing or polynomial evaluations).

Regardless of approach, the outcome is a succinct proof (typically a few kilobytes to a few tens of kilobytes) that attests to the correctness of the entire inference. The proof is zero-knowledge, meaning any secret inputs (private data or model parameters) can be kept hidden – they influence the proof but are not revealed to verifiers. Only the intended public outputs or assertions are revealed. This allows scenarios like “prove that model $M$ when applied to patient data $X$ yields diagnosis $Y$, without revealing $X$ or the model’s weights.”

Enabling on-chain verification: Once a proof is generated, it can be posted to a blockchain. Smart contracts can include verification logic to check the proof, often using precompiled cryptographic primitives. For example, Ethereum has precompiles for BLS12-381 pairing operations used in many zk-SNARK verifiers, making on-chain verification of SNARK proofs efficient. STARKs (hash-based proofs) are larger, but can still be verified on-chain with careful optimization or possibly with some trust assumptions (StarkWare’s L2, for instance, verifies STARK proofs on Ethereum by an on-chain verifier contract, albeit with higher gas cost than SNARKs). The key is that the chain does not need to execute the ML model – it only runs a verification which is much cheaper than the original compute. In summary, zkML compresses expensive AI inference into a small proof that blockchains (or any verifier) can check in milliseconds to seconds.

Lagrange DeepProve: Architecture and Performance of a zkML Breakthrough

DeepProve by Lagrange Labs is a state-of-the-art zkML inference framework focusing on speed and scalability. Launched in 2025, DeepProve introduced a new proving system that is dramatically faster than prior solutions like Ezkl. Its design centers on the GKR interactive proof protocol with sum-check and specialized optimizations for neural network circuits. Here’s how DeepProve works and achieves its performance:

One-Time Preprocessing: Developers start with a trained neural network (currently supported types include multilayer perceptrons and popular CNN architectures). The model is exported to ONNX format, a standard graph representation. DeepProve’s tool then parses the ONNX model and quantizes it (converts weights to fixed-point/integer form) for efficient field arithmetic. In this phase, it also generates the proving and verification keys for the cryptographic protocol. This setup is done once per model and does not need to be repeated per inference. DeepProve emphasizes ease of integration: “Export your model to ONNX → one-time setup → generate proofs → verify anywhere”.
Proving (Inference + Proof Generation): After setup, a prover (which could be run by a user, a service, or Lagrange’s decentralized prover network) takes a new input $X$ and runs the model $M$ on it, obtaining output $Y$. During this execution, DeepProve records an execution trace of each layer’s computations. Instead of translating every multiplication into a static circuit upfront (as SNARK approaches do), DeepProve uses the linear-time GKR protocol to verify each layer on the fly. For each network layer, the prover commits to the layer’s inputs and outputs (e.g., via cryptographic hashes or polynomial commitments) and then engages in a sum-check argument to prove that the outputs indeed result from the inputs as per the layer’s function. The sum-check protocol iteratively convinces the verifier of the correctness of a sum of evaluations of a polynomial that encodes the layer’s computation, without revealing the actual values. Non-linear operations (like ReLU, softmax) are handled efficiently through lookup arguments in DeepProve – if an activation’s output was computed, DeepProve can prove that each output corresponds to a valid input-output pair from a precomputed table for that function. Layer by layer, proofs are generated and then aggregated into one succinct proof covering the whole model’s forward pass. The heavy lifting of cryptography is minimized – DeepProve’s prover mostly performs normal numeric computations (the actual inference) plus some light cryptographic commitments, rather than solving a giant system of constraints.
Verification: The verifier uses the final succinct proof along with a few public values – typically the model’s committed identifier (a cryptographic commitment to $M$’s weights), the input $X$ (if not private), and the claimed output $Y$ – to check correctness. Verification in DeepProve’s system involves verifying the sum-check protocol’s transcript and the final polynomial or hash commitments. This is more involved than verifying a classic SNARK (which might be a few pairings), but it’s vastly cheaper than re-running the model. In Lagrange’s benchmarks, verifying a DeepProve proof for a medium CNN takes on the order of 0.5 seconds in software. That is ~0.5s to confirm, for example, that a convolutional network with hundreds of thousands of parameters ran correctly – over 500× faster than naively re-computing that CNN on a GPU for verification. (In fact, DeepProve measured up to 521× faster verification for CNNs and 671× for MLPs compared to re-execution.) The proof size is small enough to transmit on-chain (tens of KB), and verification could be performed in a smart contract if needed, although 0.5s of computation might require careful gas optimization or layer-2 execution.

Architecture and Tooling: DeepProve is implemented in Rust and provides a toolkit (the zkml library) for developers. It natively supports ONNX model graphs, making it compatible with models from PyTorch or TensorFlow (after exporting). The proving process currently targets models up to a few million parameters (tests include a 4M-parameter dense network). DeepProve leverages a combination of cryptographic components: a multilinear polynomial commitment (to commit to layer outputs), the sum-check protocol for verifying computations, and lookup arguments for non-linear ops. Notably, Lagrange’s open-source repository acknowledges it builds on prior work (the sum-check and GKR implementation from Scroll’s Ceno project), indicating an intersection of zkML with zero-knowledge rollup research.

To achieve real-time scalability, Lagrange pairs DeepProve with its Prover Network – a decentralized network of specialized ZK provers. Heavy proof generation can be offloaded to this network: when an application needs an inference proved, it sends the job to Lagrange’s network, where many operators (staked on EigenLayer for security) compute proofs and return the result. This network economically incentivizes reliable proof generation (malicious or failed jobs get the operator slashed). By distributing work across provers (and potentially leveraging GPUs or ASICs), the Lagrange Prover Network hides the complexity and cost from end-users. The result is a fast, scalable, and decentralized zkML service: “verifiable AI inferences fast and affordable”.

Performance Milestones: DeepProve’s claims are backed by benchmarks against the prior state-of-the-art, Ezkl. For a CNN with ~264k parameters (CIFAR-10 scale model), DeepProve’s proving time was ~1.24 seconds versus ~196 seconds for Ezkl – about 158× faster. For a larger dense network with 4 million parameters, DeepProve proved an inference in ~2.3 seconds vs ~126.8 seconds for Ezkl (~54× faster). Verification times also dropped: DeepProve verified the 264k CNN proof in ~0.6s, whereas verifying the Ezkl proof (Halo2-based) took over 5 minutes on CPU in that test. The speedups come from DeepProve’s near-linear complexity: its prover scales roughly O(n) with the number of operations, whereas circuit-based SNARK provers often have superlinear overhead (FFT and polynomial commitments scaling). In fact, DeepProve’s prover throughput can be within an order of magnitude of plain inference runtime – recent GKR systems can be <10× slower than raw execution for large matrix multiplications, an impressive achievement in ZK. This makes real-time or on-demand proofs more feasible, paving the way for verifiable AI in interactive applications.

Use Cases: Lagrange is already collaborating with Web3 and AI projects to apply zkML. Example use cases include: verifiable NFT traits (proving an AI-generated evolution of a game character or collectible is computed by the authorized model), provenance of AI content (proving an image or text was generated by a specific model, to combat deepfakes), DeFi risk models (proving a model’s output that assesses financial risk without revealing proprietary data), and private AI inference in healthcare or finance (where a hospital can get AI predictions with a proof, ensuring correctness without exposing patient data). By making AI outputs verifiable and privacy-preserving, DeepProve opens the door to “AI you can trust” in decentralized systems – moving from an era of “blind trust in black-box models” to one of “objective guarantees”.

SNARK-Based zkML: Ezkl and the Halo2 Approach

The traditional approach to zkML uses zk-SNARKs (Succinct Non-interactive Arguments of Knowledge) to prove neural network inference. Ezkl (by ZKonduit/Modulus Labs) is a leading example of this approach. It builds on the Halo2 proving system (a PLONK-style SNARK with polynomial commitments over BLS12-381). Ezkl provides a tooling chain where a developer can take a PyTorch or TensorFlow model, export it to ONNX, and have Ezkl compile it into a custom arithmetic circuit automatically.

How it works: Each layer of the neural network is converted into constraints:

Linear layers (dense or convolution) become collections of multiplication-add constraints that enforce the dot-products between inputs, weights, and outputs.
Non-linear layers (like ReLU, sigmoid, etc.) are handled via lookups or piecewise constraints because such functions are not polynomial. For instance, a ReLU can be implemented by a boolean selector $b$ with constraints ensuring $y = x \cdot b$ and $0 \le b \le 1$ and $b=1$ if $x>0$ (one way to do it), or more efficiently by a lookup table mapping $x \mapsto \max(0,x)$ for a range of $x$ values. Halo2’s lookup arguments allow mapping 16-bit (or smaller) chunks of values, so large domains (like all 32-bit values) are usually “chunked” into several smaller lookups. This chunking increases the number of constraints.
Big integer ops or divisions (if any) are similarly broken into small pieces. The result is a large set of R1CS/PLONK constraints tailored to the specific model architecture.

Ezkl then uses Halo2 to generate a proof that these constraints hold given the secret inputs (model weights, private inputs) and public outputs. Tooling and integration: One advantage of the SNARK approach is that it leverages well-known primitives. Halo2 is already used in Ethereum rollups (e.g. Zcash, zkEVMs), so it’s battle-tested and has an on-chain verifier readily available. Ezkl’s proofs use BLS12-381 curve, which Ethereum can verify via precompiles, making it straightforward to verify an Ezkl proof in a smart contract. The team has also provided user-friendly APIs; for example, data scientists can work with their models in Python and use Ezkl’s CLI to produce proofs, without deep knowledge of circuits.

Strengths: Ezkl’s approach benefits from the generality and ecosystem of SNARKs. It supports reasonably complex models and has already seen “practical integrations (from DeFi risk models to gaming AI)”, proving real-world ML tasks. Because it operates at the level of the model’s computation graph, it can apply ML-specific optimizations: e.g. pruning insignificant weights or quantizing parameters to reduce circuit size. It also means model confidentiality is natural – the weights can be treated as private witness data, so the verifier only sees that some valid model produced the output, or at best a commitment to the model. The verification of SNARK proofs is extremely fast (typically a few milliseconds or less on-chain), and proof sizes are small (a few kilobytes), which is ideal for blockchain usage.

Weaknesses: Performance is the Achilles’ heel. Circuit-based proving imposes large overheads, especially as models grow. It’s noted that historically, SNARK circuits could be a million times more work for the prover than just running the model itself. Halo2 and Ezkl optimize this, but still, operations like large matrix multiplications generate tons of constraints. If a model has millions of parameters, the prover must handle correspondingly millions of constraints, performing heavy FFTs and multiexponentiation in the process. This leads to high proving times (often minutes or hours for non-trivial models) and high memory usage. For example, proving even a relatively small CNN (e.g. a few hundred thousand parameters) can take tens of minutes with Ezkl on a single machine. The team behind DeepProve cited that Ezkl took hours for certain model proofs that DeepProve can do in minutes. Large models might not even fit in memory or require splitting into multiple proofs (which then need recursive aggregation). While Halo2 is “moderately optimized”, any need to “chunk” lookups or handle wide-bit operations translates to extra overhead. In summary, scalability is limited – Ezkl works well for small-to-medium models (and indeed outperformed some earlier alternatives like naive Stark-based VMs in benchmarks), but struggles as model size grows beyond a point.

Despite these challenges, Ezkl and similar SNARK-based zkML libraries are important stepping stones. They proved that verified ML inference is possible on-chain and have active usage. Notably, projects like Modulus Labs demonstrated verifying an 18-million-parameter model on-chain using SNARKs (with heavy optimization). The cost was non-trivial, but it shows the trajectory. Moreover, the Mina Protocol has its own zkML toolkit that uses SNARKs to allow smart contracts on Mina (which are Snark-based) to verify ML model execution. This indicates a growing multi-platform support for SNARK-based zkML.

STARK-Based Approaches: Transparent and Programmable ZK for ML

zk-STARKs (Scalable Transparent ARguments of Knowledge) offer another route to zkML. STARKs use hash-based cryptography (like FRI for polynomial commitments) and avoid any trusted setup. They often operate by simulating a CPU or VM and proving the execution trace is correct. In context of ML, one can either build a custom STARK for the neural network or use a general-purpose STARK VM to run the model code.

General STARK VMs (RISC Zero, Cairo): A straightforward approach is to write inference code and run it in a STARK VM. For example, Risc0 provides a RISC-V environment where any code (e.g., C++ or Rust implementation of a neural network) can be executed and proven via a STARK. Similarly, StarkWare’s Cairo language can express arbitrary computations (like an LSTM or CNN inference) which are then proved by the StarkNet STARK prover. The advantage is flexibility – you don’t need to design custom circuits for each model. However, early benchmarks showed that naive STARK VMs were slower compared to optimized SNARK circuits for ML. In one test, a Halo2-based proof (Ezkl) was about 3× faster than a STARK-based approach on Cairo, and even 66× faster than a RISC-V STARK VM on a certain benchmark in 2024. This gap is due to the overhead of simulating every low-level instruction in a STARK and the larger constants in STARK proofs (hashing is fast but you need a lot of it; STARK proof sizes are bigger, etc.). However, STARK VMs are improving and have the benefit of transparent setup (no trusted setup) and post-quantum security. As STARK-friendly hardware and protocols advance, proving speeds will improve.

DeepProve’s approach vs STARK: Interestingly, DeepProve’s use of GKR and sum-check yields a proof more akin to a STARK in spirit – it’s an interactive, hash-based proof with no need for a structured reference string. The trade-off is that its proofs are larger and verification is heavier than a SNARK. Yet, DeepProve shows that careful protocol design (specialized to ML’s layered structure) can vastly outperform both generic STARK VMs and SNARK circuits in proving time. We can consider DeepProve as a bespoke STARK-style zkML prover (though they use the term zkSNARK for succinctness, it doesn’t have a traditional SNARK’s small constant-size verification, since 0.5s verify is bigger than typical SNARK verify). Traditional STARK proofs (like StarkNet’s) often involve tens of thousands of field operations to verify, whereas SNARK verifies in maybe a few dozen. Thus, one trade-off is evident: SNARKs yield smaller proofs and faster verifiers, while STARKs (or GKR) offer easier scaling and no trusted setup at the cost of proof size and verify speed.

Emerging improvements: The JOLT zkVM (discussed earlier under JOLTx) is actually outputting SNARKs (using PLONKish commitments) but it embodies ideas that could be applied in STARK context too (Lasso lookups could theoretically be used with FRI commitments). StarkWare and others are researching ways to speed up proving of common operations (like using custom gates or hints in Cairo for big int ops, etc.). There’s also Circomlib-ML by Privacy&Scaling Explorations (PSE), which provides Circom templates for CNN layers, etc. – that’s SNARK-oriented, but conceptually similar templates could be made for STARK languages.

In practice, non-Ethereum ecosystems leveraging STARKs include StarkNet (which could allow on-chain verification of ML if someone writes a verifier, though cost is high) and Risc0’s Bonsai service (which is an off-chain proving service that emits STARK proofs which can be verified on various chains). As of 2025, most zkML demos on blockchain have favored SNARKs (due to verifier efficiency), but STARK approaches remain attractive for their transparency and potential in high-security or quantum-resistant settings. For example, a decentralized compute network might use STARKs to let anyone verify work without a trusted setup, useful for longevity. Also, some specialized ML tasks might exploit STARK-friendly structures: e.g. computations heavily using XOR/bit operations could be faster in STARKs (since those are cheap in boolean algebra and hashing) than in SNARK field arithmetic.

Summary of SNARK vs STARK for ML:

Performance: SNARKs (like Halo2) have huge proving overhead per gate but benefit from powerful optimizations and small constants for verify; STARKs (generic) have larger constant overhead but scale more linearly and avoid expensive crypto like pairings. DeepProve shows that customizing the approach (sum-check) yields near-linear proving time (fast) but with a STARK-like proof. JOLT shows that even a general VM can be made faster with heavy use of lookups. Empirically, for models up to millions of operations: a well-optimized SNARK (Ezkl) can handle it but might take tens of minutes, whereas DeepProve (GKR) can do it in seconds. STARK VMs in 2024 were likely in between or worse than SNARKs unless specialized (Risc0 was slower in tests, Cairo was slower without custom hints).
Verification: SNARK proofs verify quickest (milliseconds, and minimal data on-chain ~ a few hundred bytes to a few KB). STARK proofs are larger (dozens of KB) and take longer (tens of ms to seconds) to verify due to many hashing steps. In blockchain terms, a SNARK verify might cost e.g. ~200k gas, whereas a STARK verify could cost millions of gas – often too high for L1, acceptable on L2 or with succinct verification schemes.
Setup and Security: SNARKs like Groth16 require a trusted setup per circuit (unfriendly for arbitrary models), but universal SNARKs (PLONK, Halo2) have a one-time setup that can be reused for any circuit up to certain size. STARKs need no setup and use only hash assumptions (plus classical polynomial complexity assumptions), and are post-quantum secure. This makes STARKs appealing for longevity – proofs remain secure even if quantum computers emerge, whereas current SNARKs (BLS12-381 based) would be broken by quantum attacks.

We will consolidate these differences in a comparison table shortly.

FHE for ML (FHE-o-ML): Private Computation vs. Verifiable Computation

Fully Homomorphic Encryption (FHE) is a cryptographic technique that allows computations to be performed directly on encrypted data. In the context of ML, FHE can enable a form of privacy-preserving inference: for example, a client can send encrypted input to a model host, the host runs the neural network on the ciphertext without decrypting it, and sends back an encrypted result which the client can decrypt. This ensures data confidentiality – the model owner learns nothing about the input (and potentially the client learns only the output, not the model’s internals if they only get output). However, FHE by itself does not produce a proof of correctness in the same way ZKPs do. The client must trust that the model owner actually performed the computation honestly (the ciphertext could have been manipulated). Usually, if the client has the model or expects a certain distribution of outputs, blatant cheating can be detected, but subtle errors or use of a wrong model version would not be evident just from the encrypted output.

Trade-offs in performance: FHE is notoriously heavy in computation. Running deep learning inference under FHE incurs orders-of-magnitude slowdown. Early experiments (e.g., CryptoNets in 2016) took tens of seconds to evaluate a tiny CNN on encrypted data. By 2024, improvements like CKKS (for approximate arithmetic) and better libraries (Microsoft SEAL, Zama’s Concrete) have reduced this overhead, but it remains large. For example, a user reported that using Zama’s Concrete-ML to run a CIFAR-10 classifier took 25–30 minutes per inference on their hardware. After optimizations, Zama’s team achieved ~40 seconds for that inference on a 192-core server. Even 40s is extremely slow compared to a plaintext inference (which might be 0.01s), showing a ~$10^3$–$10^4\times$ overhead. Larger models or higher precision increase the cost further. Additionally, FHE operations consume a lot of memory and require occasional bootstrapping (a noise-reduction step) which is computationally expensive. In summary, scalability is a major issue – state-of-the-art FHE might handle a small CNN or simple logistic regression, but scaling to large CNNs or Transformers is beyond current practical limits.

Privacy advantages: FHE’s big appeal is data privacy. The input can remain completely encrypted throughout the process. This means an untrusted server can compute on a client’s private data without learning anything about it. Conversely, if the model is sensitive (proprietary), one could envisage encrypting the model parameters and having the client perform FHE inference on their side – but this is less common because if the client has to do the heavy FHE compute, it negates the idea of offloading to a powerful server. Typically, the model is public or held by server in the clear, and the data is encrypted by the client’s key. Model privacy in that scenario is not provided by default (the server knows the model; the client learns outputs but not weights). There are more exotic setups (like secure two-party computation or multi-key FHE) where both model and data can be kept private from each other, but those incur even more complexity. In contrast, zkML via ZKPs can ensure model privacy and data privacy at once – the prover can have both the model and data as secret witness, only revealing what’s needed to the verifier.

No on-chain verification needed (and none possible): With FHE, the result comes out encrypted to the client. The client then decrypts it to obtain the actual prediction. If we want to use that result on-chain, the client (or whoever holds the decryption key) would have to publish the plaintext result and convince others it’s correct. But at that point, trust is back in the loop – unless combined with a ZKP. In principle, one could combine FHE and ZKP: e.g., use FHE to keep data private during compute, and then generate a ZK-proof that the plaintext result corresponds to a correct computation. However, combining them means you pay the performance penalty of FHE and ZKP – extremely impractical with today’s tech. So, in practice FHE-of-ML and zkML serve different use cases:

FHE-of-ML: Ideal when the goal is confidentiality between two parties (client and server). For instance, a cloud service can host an ML model and users can query it with their sensitive data without revealing the data to the cloud (and if the model is sensitive, perhaps deploy it via FHE-friendly encodings). This is great for privacy-preserving ML services (medical predictions, etc.). The user still has to trust the service to faithfully run the model (since no proof), but at least any data leakage is prevented. Some projects like Zama are even exploring an “FHE-enabled EVM (fhEVM)” where smart contracts could operate on encrypted inputs, but verifying those computations on-chain would require the contract to somehow enforce correct computation – an open challenge likely requiring ZK proofs or specialized secure hardware.
zkML (ZKPs): Ideal when the goal is verifiability and public auditability. If you want anyone (or any contract) to be sure that “Model $M$ was evaluated correctly on $X$ and produced $Y$”, ZKPs are the solution. They also provide privacy as a bonus (you can hide $X$ or $Y$ or $M$ if needed by treating them as private inputs to the proof), but their primary feature is the proof of correct execution.

A complementary relationship: It’s worth noting that ZKPs protect the verifier (they learn nothing about secrets, only that the computation was correctly done), whereas FHE protects the prover’s data from the computing party. In some scenarios, these could be combined – for example, a network of untrusted nodes could use FHE to compute on users’ private data and then provide ZK proofs to the users (or blockchain) that the computations were done according to the protocol. This would cover both privacy and correctness, but the performance cost is enormous with today’s algorithms. More feasible in the near term are hybrids like Trusted Execution Environments (TEE) plus ZKP or Functional Encryption plus ZKP – these are beyond our scope, but they aim to provide something similar (TEEs keep data/model secret during compute, then a ZKP can attest the TEE did the right thing).

In summary, FHE-of-ML prioritizes confidentiality of inputs/outputs, while zkML prioritizes verifiable correctness (with possible privacy). Table 1 below contrasts the key properties:

Approach	Prover Performance (Inference & Proof)	Proof Size & Verification	Privacy Features	Trusted Setup?	Post-Quantum?
zk-SNARK (Halo2, Groth16, PLONK, etc)	Heavy prover overhead (up to 10^6× normal runtime without optimizations; in practice 10^3–10^5×). Optimized for specific model/circuit; proving time in minutes for medium models, hours for large. Recent zkML SNARKs (DeepProve with GKR) vastly improve this (near-linear overhead, e.g. seconds instead of minutes for million-param models).	Very small proofs (often < 100 KB, sometimes ~a few KB). Verification is fast: a few pairings or polynomial evals (typically < 50 ms on-chain). DeepProve’s GKR-based proofs are larger (tens–hundreds KB) and verify in ~0.5 s (still much faster than re-running the model).	Data confidentiality: Yes – inputs can be private in proof (not revealed). Model privacy: Yes – prover can commit to model weights and not reveal them. Output hiding: Optional – proof can be of a statement without revealing output (e.g. “output has property P”). However, if the output itself is needed on-chain, it typically becomes public. Overall, SNARKs offer full zero-knowledge flexibility (hide whichever parts you want).	Depends on scheme. Groth16/EZKL require a trusted setup per circuit; PLONK/Halo2 use a universal setup (one time). DeepProve’s sum-check GKR is transparent (no setup) – a bonus of that design.	Classical SNARKs (BLS12-381 curves) are not PQ-safe (vulnerable to quantum attacks on elliptic curve discrete log). Some newer SNARKs use PQ-safe commitments, but Halo2/PLONK as used in Ezkl are not PQ-safe. GKR (DeepProve) uses hash commitments (e.g. Poseidon/Merkle) which are conjectured PQ-safe (relying on hash preimage resistance).
zk-STARK (FRI, hash-based proof)	Prover overhead is high but more linear scaling. Typically 10^2–10^4× slower than native for large tasks, with room to parallelize. General STARK VMs (Risc0, Cairo) saw slower performance vs SNARK for ML in 2024 (e.g. 3×–66× slower than Halo2 in some cases). Specialized STARKs (or GKR) can approach linear overhead and outperform SNARKs for large circuits.	Proofs are larger: often tens of KB (growing with circuit size/log(n)). Verifier must do multiple hash and FFT checks – verification time ~O(n^ε) for small ε (e.g. ~50 ms to 500 ms depending on proof size). On-chain, this is costlier (StarkWare’s L1 verifier can take millions of gas per proof). Some STARKs support recursive proofs to compress size, at cost of prover time.	Data & Model privacy: A STARK can be made zero-knowledge by randomizing trace data (adding blinding to polynomial evaluations), so it can hide private inputs similarly to SNARK. Many STARK implementations focus on integrity, but zk-STARK variants do allow privacy. So yes, they can hide inputs/models like SNARKs. Output hiding: likewise possible in theory (prover doesn’t declare the output as public), but rarely used since usually the output is what we want to reveal/verify.	No trusted setup. Transparency is a hallmark of STARKs – only require common random string (which Fiat-Shamir can derive). This makes them attractive for open-ended use (any model, any time, no per-model ceremony).	Yes, STARKs rely on hash and information-theoretic security assumptions (like random oracle and difficulty of certain codeword decoding in FRI). These are believed to be secure against quantum adversaries. STARK proofs are thus PQ-resistant, an advantage for future-proofing verifiable AI.
FHE for ML (Fully Homomorphic Encryption applied to inference)	Prover = party doing computation on encrypted data. The computation time is extremely high: 10^3–10^5× slower than plaintext inference is common. High-end hardware (many-core servers, FPGA, etc.) can mitigate this. Some optimizations (low-precision inference, leveled FHE parameters) can reduce overhead but there is a fundamental performance hit. FHE is currently practical for small models or simple linear models; deep networks remain challenging beyond toy sizes.	No proof generated. The result is an encrypted output. Verification in the sense of checking correctness is not provided by FHE alone – one trusts the computing party to not cheat. (If combined with secure hardware, one might get an attestation; otherwise, a malicious server could return an incorrect encrypted result that the client would decrypt to wrong output without knowing the difference).	Data confidentiality: Yes – the input is encrypted, so the computing party learns nothing about it. Model privacy: If the model owner is doing the compute on encrypted input, the model is in plaintext on their side (not protected). If roles are reversed (client holds model encrypted and server computes), model could be kept encrypted, but this scenario is less common. There are techniques like secure two-party ML that combine FHE/MPC to protect both, but these go beyond plain FHE. Output hiding: By default, the output of the computation is encrypted (only decryptable by the party with the secret key, usually the input owner). So the output is hidden from the computing server. If we want the output public, the client can decrypt and reveal it.	No setup needed. Each user generates their own key pair for encryption. Trust relies on keys remaining secret.	The security of FHE schemes (e.g. BFV, CKKS, TFHE) is based on lattice problems (Learning With Errors), which are believed to be resistant to quantum attacks (at least no efficient quantum algorithm is known). So FHE is generally considered post-quantum secure.

Table 1: Comparison of zk-SNARK, zk-STARK, and FHE approaches for machine learning inference (performance and privacy trade-offs).

Use Cases and Implications for Web3 Applications

The convergence of AI and blockchain via zkML unlocks powerful new application patterns in Web3:

Decentralized Autonomous Agents & On-Chain Decision-Making: Smart contracts or DAOs can incorporate AI-driven decisions with guarantees of correctness. For example, imagine a DAO that uses a neural network to analyze market conditions before executing trades. With zkML, the DAO’s smart contract can require a zkSNARK proof that the authorized ML model (with a known hash commitment) was run on the latest data and produced the recommended action, before the action is accepted. This prevents malicious actors from injecting a fake prediction – the chain verifies the AI’s computation. Over time, one could even have fully on-chain autonomous agents (contracts that query off-chain AI or contain simplified models) making decisions in DeFi or games, with all their moves proven correct and policy-compliant via zk proofs. This raises the trust in autonomous agents, since their “thinking” is transparent and verifiable rather than a black-box.
Verifiable Compute Markets: Projects like Lagrange are effectively creating verifiable computation marketplaces – developers can outsource heavy ML inference to a network of provers and get back a proof with the result. This is analogous to decentralized cloud computing, but with built-in trust: you don’t need to trust the server, only the proof. It’s a paradigm shift for oracles and off-chain computation. Protocols like Ethereum’s upcoming DSC (decentralized sequencing layer) or oracle networks could use this to provide data feeds or analytic feeds with cryptographic guarantees. For instance, an oracle could supply “the result of model X on input Y” and anyone can verify the attached proof on-chain, rather than trusting the oracle’s word. This could enable verifiable AI-as-a-service on blockchain: any contract can request a computation (like “score these credit risks with my private model”) and accept the answer only with a valid proof. Projects such as Gensyn are exploring decentralized training and inference marketplaces using these verification techniques.
NFTs and Gaming – Provenance and Evolution: In blockchain games or NFT collectibles, zkML can prove traits or game moves were generated by legitimate AI models. For example, a game might allow an AI to evolve an NFT pet’s attributes. Without ZK, a clever user might modify the AI or the outcome to get a superior pet. With zkML, the game can require a proof that “pet’s new stats were computed by the official evolution model on the pet’s old stats”, preventing cheating. Similarly for generative art NFTs: an artist could release a generative model as a commitment; later, when minting NFTs, prove each image was produced by that model given some seed, guaranteeing authenticity (and even doing so without revealing the exact model to the public, preserving the artist’s IP). This provenance verification ensures authenticity in a manner akin to verifiable randomness – except here it’s verifiable creativity.
Privacy-Preserving AI in Sensitive Domains: zkML allows confirmation of outcomes without exposing inputs. In healthcare, a patient’s data could be run through an AI diagnostic model by a cloud provider; the hospital receives a diagnosis and a proof that the model (which could be privately held by a pharmaceutical company) was run correctly on the patient data. The patient data remains private (only an encrypted or committed form was used in the proof), and the model weights remain proprietary – yet the result is trusted. Regulators or insurance could also verify that only approved models were used. In finance, a company could prove to an auditor or regulator that its risk model was applied to its internal data and produced certain metrics without revealing the underlying sensitive financial data. This enables compliance and oversight with cryptographic assurances rather than manual trust.
Cross-Chain and Off-Chain Interoperability: Because zero-knowledge proofs are fundamentally portable, zkML can facilitate cross-chain AI results. One chain might have an AI-intensive application running off-chain; it can post a proof of the result to a different blockchain, which will trustlessly accept it. For instance, consider a multi-chain DAO using an AI to aggregate sentiment across social media (off-chain data). The AI analysis (complex NLP on large data) is done off-chain by a service that then posts a proof to a small blockchain (or multiple chains) that “analysis was done correctly and output sentiment score = 0.85”. All chains can verify and use that result in their governance logic, without each needing to rerun the analysis. This kind of interoperable verifiable compute is what Lagrange’s network aims to support, by serving multiple rollups or L1s simultaneously. It removes the need for trusted bridges or oracle assumptions when moving results between chains.
AI Alignment and Governance: On a more forward-looking note, zkML has been highlighted as a tool for AI governance and safety. Lagrange’s vision statements, for example, argue that as AI systems become more powerful (even superintelligent), cryptographic verification will be essential to ensure they follow agreed rules. By requiring AI models to produce proofs of their reasoning or constraints, humans retain a degree of control – “you cannot trust what you cannot verify”. While this is speculative and involves social as much as technical aspects, the technology could enforce that an AI agent running autonomously still proves it is using an approved model and hasn’t been tampered with. Decentralized AI networks might use on-chain proofs to verify contributions (e.g., a network of nodes collaboratively training a model can prove each update was computed faithfully). Thus zkML could play a role in ensuring AI systems remain accountable to human-defined protocols even in decentralized or uncontrolled environments.

In conclusion, zkML and verifiable on-chain AI represent a convergence of advanced cryptography and machine learning that stands to enhance trust, transparency, and privacy in AI applications. By comparing the major approaches – zk-SNARKs, zk-STARKs, and FHE – we see a spectrum of trade-offs between performance and privacy, each suitable for different scenarios. SNARK-based frameworks like Ezkl and innovations like Lagrange’s DeepProve have made it feasible to prove substantial neural network inferences with practical effort, opening the door to real-world deployments of verifiable AI. STARK-based and VM-based approaches promise greater flexibility and post-quantum security, which will become important as the field matures. FHE, while not a solution for verifiability, addresses the complementary need of confidential ML computation, and in combination with ZKPs or in specific private contexts it can empower users to leverage AI without sacrificing data privacy.

The implications for Web3 are significant: we can foresee smart contracts reacting to AI predictions, knowing they are correct; markets for compute where results are trustlessly sold; digital identities (like Worldcoin’s proof-of-personhood via iris AI) protected by zkML to confirm someone is human without leaking their biometric image; and generally a new class of “provable intelligence” that enriches blockchain applications. Many challenges remain – performance for very large models, developer ergonomics, and the need for specialized hardware – but the trajectory is clear. As one report noted, “today’s ZKPs can support small models, but moderate to large models break the paradigm”; however, rapid advances (50×–150× speedups with DeepProve over prior art) are pushing that boundary outward. With ongoing research (e.g., on hardware acceleration and distributed proving), we can expect progressively larger and more complex AI models to become provable. zkML might soon evolve from niche demos to an essential component of trusted AI infrastructure, ensuring that as AI becomes ubiquitous, it does so in a way that is auditable, decentralized, and aligned with user privacy and security.

API Marketplace Featured

The Trust Crisis in DeFi Risk Models​

Enter Zero-Knowledge Machine Learning​

Privacy-Preserving Credit Scoring: The Institutional Unlock​

ZK-ML vs. Traditional Oracles: The Performance Gap​

Why Institutions Need Transparent Yet Confidential Models​

The 2026 Breakout: From Theory to Production​

The Road Ahead: Challenges and Opportunities​

Why This Matters Beyond DeFi​

The Bottom Line​

Sources​

The Onchain Cloud Architecture: Three Pillars of Programmable Storage​

Cryptographic Proofs: The Technical Foundation of Verifiable Storage​

AI Infrastructure Market: Where Decentralized Storage Meets Real Demand​

Competing Against Centralized Cloud: Where Filecoin Stands in 2026​

From 2.1 EiB to the Future of Verifiable Infrastructure​

Sources​

The Opaque API Problem: Trust Without Verification​

Judge: The Verifiable AI Evaluation Protocol​

Verde: Eliminating Floating-Point Nondeterminism​

Refereed Delegation: Efficient Verification Without Full Recomputation​

High-Stakes AI Decision-Making: Why Transparency Matters​

The Zero-Knowledge Proof Alternative: Comparing Verde and ZKML​

The Gensyn Ecosystem: From Judge to Decentralized Training​

Challenges and Open Questions​

The Road to Verifiable AI: Adoption Pathways and Market Fit​

Verifiability as the New Standard​

The Trust Problem AI Agents Can't Solve Alone​

ERC-8004: The Three-Registry Trust Infrastructure​

Blacklight's Five-Step Verification Process​

Programmable Audits: Beyond Binary Trust​

Blind Computation: Privacy Meets Verification​

The $4.3 Billion Agent Economy Infrastructure Play​

Ethereum's Agent-First Future​

What Comes Next: Mainnet Integration and Ecosystem Expansion​

The Trustless Coordination Endgame​

Introduction: The Need for Verifiable AI on Blockchain​

How zkML Works: Compressing ML Inference into Succinct Proofs​

Lagrange DeepProve: Architecture and Performance of a zkML Breakthrough​

SNARK-Based zkML: Ezkl and the Halo2 Approach​

STARK-Based Approaches: Transparent and Programmable ZK for ML​

FHE for ML (FHE-o-ML): Private Computation vs. Verifiable Computation​

Use Cases and Implications for Web3 Applications​

The Trust Crisis in DeFi Risk Models

Enter Zero-Knowledge Machine Learning

Privacy-Preserving Credit Scoring: The Institutional Unlock

ZK-ML vs. Traditional Oracles: The Performance Gap

Why Institutions Need Transparent Yet Confidential Models

The 2026 Breakout: From Theory to Production

The Road Ahead: Challenges and Opportunities

Why This Matters Beyond DeFi

The Bottom Line

Sources

The Onchain Cloud Architecture: Three Pillars of Programmable Storage

Cryptographic Proofs: The Technical Foundation of Verifiable Storage

AI Infrastructure Market: Where Decentralized Storage Meets Real Demand

Competing Against Centralized Cloud: Where Filecoin Stands in 2026

From 2.1 EiB to the Future of Verifiable Infrastructure

Sources

The Opaque API Problem: Trust Without Verification

Judge: The Verifiable AI Evaluation Protocol

Verde: Eliminating Floating-Point Nondeterminism

Refereed Delegation: Efficient Verification Without Full Recomputation

High-Stakes AI Decision-Making: Why Transparency Matters

The Zero-Knowledge Proof Alternative: Comparing Verde and ZKML

The Gensyn Ecosystem: From Judge to Decentralized Training

Challenges and Open Questions

The Road to Verifiable AI: Adoption Pathways and Market Fit

Verifiability as the New Standard

The Trust Problem AI Agents Can't Solve Alone

ERC-8004: The Three-Registry Trust Infrastructure

Blacklight's Five-Step Verification Process

Programmable Audits: Beyond Binary Trust

Blind Computation: Privacy Meets Verification

The $4.3 Billion Agent Economy Infrastructure Play

Ethereum's Agent-First Future

What Comes Next: Mainnet Integration and Ecosystem Expansion

The Trustless Coordination Endgame

Introduction: The Need for Verifiable AI on Blockchain

How zkML Works: Compressing ML Inference into Succinct Proofs

Lagrange DeepProve: Architecture and Performance of a zkML Breakthrough

SNARK-Based zkML: Ezkl and the Halo2 Approach

STARK-Based Approaches: Transparent and Programmable ZK for ML

FHE for ML (FHE-o-ML): Private Computation vs. Verifiable Computation

Use Cases and Implications for Web3 Applications