Building Indexing Infrastructure for Institutional-Grade dApps: Lessons from the Trenches

As Web3 moves from retail experimentation to institutional adoption, the indexing infrastructure requirements change dramatically. The hobbyist-friendly subgraph that worked for your hackathon project will not survive an institutional audit, and the performance tolerance that retail users accept will get you fired from an enterprise contract.

I have spent the past year building data infrastructure for a protocol that serves both retail and institutional users. Here is what I have learned about what “institutional-grade” actually means for indexing, and how the current landscape of indexing platforms measures up.

What Institutional Clients Actually Require

When an institution evaluates your dApp’s data infrastructure, they are looking at dimensions that most crypto-native builders do not think about:

1. Service Level Agreements (SLAs)

Institutional clients want contractual guarantees around:

  • Uptime: 99.95% or higher (that is less than 4.4 hours of downtime per year)
  • Query latency: p95 under 100ms, p99 under 500ms
  • Data freshness: Maximum lag of N seconds from on-chain confirmation to queryable data
  • Error rates: Less than 0.1% of queries returning errors

Try getting these guarantees from The Graph’s decentralized network. The network is designed for censorship resistance, not SLA compliance. Individual indexers have no contractual relationship with the developers querying their data.

Ormi and Goldsky can offer SLAs because they are centralized services with dedicated infrastructure. This is one of the clearest arguments for centralized indexing in institutional contexts.

2. Data Correctness and Audit Trails

Institutional applications need to prove that the data they serve is correct. This means:

  • Deterministic indexing: The same blockchain state must always produce the same indexed result
  • Audit trails: Every piece of indexed data should be traceable back to its source block and transaction
  • Reconciliation reports: Regular automated checks comparing indexed data against direct chain state
  • Version control: Changes to indexing logic must be tracked, reviewed, and reversible

The Graph’s subgraph model provides some of this through its deterministic execution environment. But the decentralized network adds uncertainty because different indexers might be at different sync states, producing slightly different results for the same query at the same time.

For our institutional deployment, we ended up running our own dedicated Graph node (not using the decentralized network) alongside an Ormi backup. This gives us deterministic indexing with full control over the execution environment.

3. Disaster Recovery and Business Continuity

What happens when your indexing provider goes down? For retail users, a “please try again later” message is annoying. For an institutional trading desk, it could mean missed trades, compliance violations, or regulatory penalties.

Our disaster recovery strategy involves:

  • Primary: Ormi for sub-30ms production queries
  • Secondary: Self-hosted Graph node with identical subgraphs, synced in real-time
  • Tertiary: Direct RPC fallback for critical-path queries (slower but always available)
  • Data store: Goldsky Mirror feeding a PostgreSQL replica that can serve cached data even if all indexers are down

This multi-provider approach is expensive and complex, but it gives us the five-nines reliability that institutional clients demand.

4. Compliance and Data Governance

Institutions need to answer questions like:

  • Where is the indexed data physically stored? (data residency requirements)
  • Who has access to query logs? (privacy regulations)
  • Can the indexing provider be compelled to stop serving data? (regulatory risk)
  • How is personally identifiable information handled? (GDPR, CCPA)

The Graph’s decentralized network actually scores well on some of these: no single entity controls the data, and query logs are distributed. But it scores poorly on data residency (you cannot guarantee which indexer in which jurisdiction serves your query).

Centralized providers like Ormi can offer data residency guarantees, SOC 2 compliance, and detailed access controls. These matter for institutional adoption.

5. Performance Under Load

Institutional applications often have bursty traffic patterns. A market event can spike query volume 10-100x within seconds. Your indexing infrastructure needs to handle these spikes without degradation.

Our load testing revealed significant differences:

  • Ormi: Handled our 10x spike test (40,000 RPS) with minimal latency increase. Their infrastructure appears to auto-scale.
  • The Graph decentralized: Performance degraded significantly under spike load. Indexer selection became unreliable, and many queries timed out.
  • Self-hosted Graph node: Predictable performance limited by our hardware. We provision for 3x normal load, which means 10x spikes cause degradation.
  • Goldsky Mirror: Since reads are against our own database, spike handling depends on our database infrastructure. PostgreSQL with read replicas handles spikes well.

The Architecture We Settled On

After months of evaluation, here is our production architecture:

On-Chain Events
    |
    v
Goldsky Mirror --> PostgreSQL (analytics, reporting, cached reads)
    |
    v
Ormi Subgraphs --> Primary query endpoint (real-time, SLA-backed)
    |
    v
Self-hosted Graph Node --> Fallback endpoint (disaster recovery)
    |
    v
Direct RPC --> Emergency fallback (always available, slower)

Each layer serves a different purpose:

  • Goldsky Mirror for continuous data replication and analytics
  • Ormi for production query serving with performance SLAs
  • Self-hosted Graph Node for independent verification and disaster recovery
  • Direct RPC as the last resort that can never go down (as long as the chain is alive)

Cost Breakdown

For transparency, here is what this costs us monthly:

  • Ormi: ~$1,200/month (production query volume)
  • Goldsky Mirror: ~$800/month (continuous streaming)
  • Self-hosted Graph Node: ~$400/month (VPS + storage)
  • Direct RPC: ~$300/month (archive node access through BlockEden)
  • Total: ~$2,700/month

This sounds expensive until you compare it to traditional financial data infrastructure. Bloomberg Terminal subscriptions start at $2,000/month per seat. Institutional-grade market data feeds cost tens of thousands monthly. In context, $2,700 for a multi-layered, resilient blockchain data infrastructure is remarkably affordable.

Lessons for Builders Targeting Institutional Users

  1. Design for failure from day one. Every component will fail. Your architecture must degrade gracefully.
  2. Separate your read path from your write path. Reads should never depend on a single provider.
  3. Invest in reconciliation tooling. Automated data validation between your indexed data and chain state is not optional for institutional use.
  4. Document everything. Institutional clients want architecture diagrams, runbooks, and incident response procedures.
  5. Plan for multi-chain expansion. Today it is Ethereum and two L2s. In six months, your institutional client will want Solana, Cosmos, and whatever chain their newest fund is targeting.

The indexing wars are creating better infrastructure for everyone. But institutional-grade applications need to think beyond “which platform is cheapest” and design for the resilience, compliance, and performance standards that serious capital demands.

Sophia, this is incredibly valuable. As a regulatory consultant who works with institutional crypto clients, I want to emphasize and expand on the compliance dimension you touched on.

The regulatory landscape for blockchain data infrastructure is evolving rapidly, and most builders are not prepared.

Here are the compliance requirements I am seeing institutional clients mandate before they will integrate with any dApp:

Data Residency and Sovereignty

Under GDPR (EU), PIPL (China), and emerging US state privacy laws, institutional clients need to know where their query data is processed and stored. When a European bank queries your indexing infrastructure, the query itself may contain information subject to data protection laws (e.g., wallet addresses linked to their clients).

With The Graph’s decentralized network, you fundamentally cannot guarantee data residency. Your query might be routed to an indexer in any jurisdiction. This is a non-starter for many institutional compliance teams.

Ormi and Goldsky, as centralized providers, can specify data center locations and sign Data Processing Agreements (DPAs). This is a significant advantage for institutional adoption.

SOC 2 and Security Certifications

Institutional clients increasingly require SOC 2 Type II certification from their infrastructure providers. This certification demonstrates that a company has implemented and maintained adequate security controls over time.

The Graph’s decentralized network cannot obtain SOC 2 certification because there is no single entity responsible for security controls. Individual indexers could potentially be certified, but the network as a whole cannot.

I am seeing centralized indexing providers (including Ormi and Goldsky) pursuing SOC 2 certification specifically to serve the institutional market. This is another area where centralization has a clear advantage.

Anti-Money Laundering (AML) Considerations

This is an emerging concern: if your indexing infrastructure is used to track wallet activity for AML compliance purposes, the accuracy and completeness of the indexed data becomes a regulatory requirement, not just a technical preference.

Institutions using blockchain analytics for AML need to demonstrate that their data sources are reliable and auditable. A decentralized network of anonymous indexers does not meet this standard. A centralized provider with contractual obligations, audit trails, and liability does.

My Recommendation for Builders:

If you are building for institutional clients, choose your indexing infrastructure based on compliance requirements first and performance second. The fastest queries in the world do not matter if your client’s compliance team vetoes the integration.

The multi-layered architecture you described is the right approach. Use compliant centralized services for the production path that institutions interact with, and maintain decentralized fallbacks for resilience and censorship resistance.

Sophia, I appreciate the thoroughness, but I want to push back on the cost and complexity angle from a startup perspective.

Your architecture is impressive: Ormi plus Goldsky Mirror plus self-hosted Graph node plus direct RPC. Four layers of redundancy. $2,700/month. Detailed SLAs and disaster recovery procedures.

For an institutional-grade product, this makes total sense. For 95% of startups reading this, it is massive overkill.

Here is my concern: posts like this can create analysis paralysis for early-stage builders. I have seen startup founders spend months evaluating indexing platforms, building complex multi-provider architectures, and optimizing for institutional requirements when they have zero institutional clients and maybe 50 daily active users.

The startup reality check:

  • If you are pre-seed or seed stage, use one indexing provider. Period. Ormi if you need speed, The Graph if you are cost-sensitive, Goldsky if you need streaming.
  • Your disaster recovery plan at this stage is “if the indexer goes down, show a maintenance page and fix it.” That is fine. Nobody is going to sue a startup with 200 users for 4 hours of downtime.
  • Spend your limited engineering hours building features that attract users, not building redundant infrastructure for users you do not have yet.

When to start thinking about institutional-grade architecture:

  1. You have signed (or are negotiating) your first institutional client
  2. Your query volume exceeds what a single provider can handle
  3. A single provider outage has cost you measurable revenue
  4. Your compliance team (or your client’s compliance team) requires multi-provider redundancy

Until you hit these milestones, a single Ormi deployment with basic monitoring is sufficient. You can always add layers of redundancy later. You cannot recover the months of engineering time spent building infrastructure for hypothetical institutional clients.

That said, one thing from your post that EVERY startup should implement from day one:

Abstract your indexing provider behind an interface. Whether you use a simple factory pattern or a more sophisticated adapter layer, make sure your application code does not directly call Ormi’s API or Goldsky’s webhooks. When you are ready to add redundancy or switch providers, the migration is trivial if the provider is abstracted away.

This is the startup-friendly version of your multi-layer architecture: design for swappability, build for your current scale.

Great discussion. I want to add the wallet infrastructure perspective, because wallet developers have a unique relationship with indexing that is different from typical dApp backends.

Wallets are the most latency-sensitive consumers of indexed data.

When a user opens their wallet, they expect to see their current balances and recent transactions immediately. Not in 500ms. Not in 200ms. Immediately. The benchmark users measure against is their banking app, which typically renders account data in under 100ms.

For a multi-chain wallet, this means we need:

  • Token balance queries across 10+ chains, each returning in under 50ms
  • Transaction history with proper token metadata and human-readable labels
  • NFT ownership data with thumbnails and collection info
  • DeFi position data (staked amounts, LP positions, lending balances)
  • Real-time price data for portfolio valuation

Each of these data types has historically been served by different indexers, APIs, and data sources. The indexing fragmentation across the wallet stack is painful.

How we handle indexing today:

We use a hybrid approach similar to what Sophia described, but optimized for wallet-specific needs:

  1. Ormi subgraphs for token transfer and balance indexing on EVM chains. The sub-30ms latency is critical for the instant-load experience.

  2. Direct RPC calls for real-time balance verification. We never trust indexed data for displaying current balances – we always verify against the chain. The indexed data provides the transaction history context.

  3. Goldsky webhooks for push notifications. When a user receives a transfer, we want to notify them within seconds, not wait for them to refresh their wallet.

  4. Custom indexers for DeFi position tracking. None of the general-purpose indexers handle the complexity of tracking positions across dozens of DeFi protocols well. We maintain protocol-specific indexers for Aave, Uniswap, Compound, and others.

The multi-chain challenge is severely underappreciated.

For every new chain we add to our wallet, we need to:

  • Deploy subgraphs or configure indexing pipelines
  • Verify data correctness for that chain’s specific token standards
  • Handle chain-specific quirks (different event signatures, non-standard ERC-20s, etc.)
  • Maintain ongoing operational monitoring

With the L2 proliferation (Base, Blast, Scroll, Linea, zkSync, Mantle, Mode, and more every month), the operational burden of maintaining indexing across all these chains is becoming unsustainable. We desperately need indexing platforms that can add new chains rapidly with minimal configuration.

SubQuery’s multi-chain approach is appealing for this reason, but their EVM query performance does not meet our latency requirements. Ormi is fast but slow to add new chains. There is no perfect solution yet.

Account abstraction adds another layer of complexity.

With ERC-4337 smart contract wallets, the traditional model of indexing EOA (externally owned account) transfers breaks down. Smart wallet transactions involve UserOperation events, Bundler interactions, and Paymaster flows that standard ERC-20 transfer indexing does not capture. None of the major indexing platforms have first-class support for account abstraction event indexing, which is a growing gap as smart wallets gain adoption.

Sophia, your multi-layered architecture is the right direction. I would love to see the indexing platforms invest more in wallet-specific data APIs, because wallets are the primary touchpoint for every user in Web3.

Fascinating thread. I want to introduce a perspective that is not being discussed yet: how zero-knowledge proofs could fundamentally change the indexing infrastructure landscape.

The indexing verification problem.

The core tension in this entire “indexing wars” debate is trust. When you query an indexer – whether it is The Graph’s decentralized network, Ormi, or Goldsky – you are trusting that the returned data accurately reflects the on-chain state. Even The Graph’s decentralized network with its dispute resolution mechanism only provides economic guarantees (slashing dishonest indexers), not cryptographic guarantees.

ZK proofs can solve this.

Imagine an indexer that returns not just the query result, but a zero-knowledge proof that the result was correctly derived from the actual blockchain state. The client can verify this proof in milliseconds without trusting the indexer at all. No staking, no curation, no dispute resolution – just math.

This is not purely theoretical. Several research groups and startups are working on verifiable computation for blockchain indexing:

  1. Axiom is building ZK coprocessors that can prove arbitrary computations against historical Ethereum state. You could use Axiom to prove that your indexer correctly computed the total value locked in a DeFi protocol by verifying the computation against the actual storage slots on-chain.

  2. Herodotus focuses on cross-chain state proofs. They can cryptographically prove that data from one chain accurately reflects the state of another chain, which is directly relevant to cross-chain indexing.

  3. Lagrange is building ZK-powered data infrastructure that aims to provide verifiable indexing at scale.

What ZK-verified indexing would change:

  • Trust model: You no longer need to trust the indexer. The proof guarantees correctness. This eliminates the need for The Graph’s complex staking and dispute resolution mechanism.
  • Decentralization without overhead: A single centralized indexer with ZK proofs provides the same trust guarantees as a decentralized network of indexers, but with the performance of a centralized system. The decentralization can happen at the verification layer instead of the execution layer.
  • Institutional compliance: ZK proofs provide cryptographic audit trails. An institution can prove that the data they used for a trading decision was mathematically guaranteed to be correct. This is a higher bar than any SLA from a centralized provider.

The current limitations:

I should be honest about where we are today. ZK-verified indexing at production scale faces significant challenges:

  • Proving time: Generating ZK proofs for complex indexing computations is still slow. Proving that you correctly processed a million Transfer events could take minutes or hours, not milliseconds.
  • Proof size and verification cost: While verification is fast, the proofs themselves add bandwidth overhead. For high-frequency queries, this overhead matters.
  • Development maturity: The tooling for building ZK-verified data pipelines is still in early stages. It is research-grade, not production-grade.

My prediction for the next 2-3 years:

The indexing landscape will converge on a model where centralized providers handle execution (because they are faster and cheaper) while ZK proofs provide verification (because they are trustless and auditable). This gives you the best of both worlds: Ormi’s sub-30ms queries with The Graph’s trustless guarantees.

This would make the current “decentralized vs. centralized” debate obsolete. You do not need a decentralized network of indexers if you can cryptographically verify that a single indexer is being honest.

Sophia, your multi-layered architecture is the pragmatic solution for today. But in a few years, I believe a single indexer with ZK verification will replace the need for multiple redundant providers. The trust comes from math, not from redundancy.