After running my team’s data pipelines on The Graph, Subsquid, and self-hosted solutions for the past year, I’ve finally gathered enough real-world data to share what actually scales in production.
The Indexing Landscape in 2026
If you’re building anything serious in Web3, you need indexed data. The question is: how do you get it? The three main approaches:
- The Graph (Decentralized, subgraph-based)
- Subsquid/SQD (Flexible squids, near zero-cost data access)
- Self-Hosted (Run your own indexer stack)
Each has dramatically different tradeoffs. Let me break down what we learned.
The Graph: The Pioneer That’s Showing Its Age
The Graph pioneered declarative blockchain indexing and deserves credit for making this space accessible. But here’s the reality in 2026:
What Works
- Mature ecosystem: Huge library of existing subgraphs
- Decentralized: GRT-based network is genuinely decentralized
- GraphQL API: Clean query interface for frontend devs
What Doesn’t
- Performance: Our Uniswap V3 indexing benchmarks showed The Graph running 10-50x slower than Subsquid for historical data
- Rigidity: Subgraphs are compiled to WASM and run in a black box. Making changes often requires complete redeployment
- Hosted Service Death: The Graph Hosted Service was fully deprecated in 2026. You’re now on the decentralized network ($$) or migrating elsewhere
// The Graph subgraph architecture (simplified)
// Data flows: Archive Node → WASM Black Box → Postgres → GraphQL
// You can't easily:
// - Batch process multiple blocks efficiently
// - Filter at the data source level
// - Transform data with custom logic outside WASM constraints
Cost Reality
We spent approximately $2,400/month for moderate DeFi dashboard queries. The GRT token economics add unpredictability—your costs fluctuate with token price and indexer competition.
Subsquid (SQD): The Performance Beast
Subsquid has emerged as the serious alternative, and the performance numbers don’t lie.
Architecture Advantage
// Subsquid architecture
// Data flows: Subsquid Network (Archive) → Squid SDK (Node.js) → Your Database → Your API
// Key differences:
// 1. Squids are regular Node.js apps - full flexibility
// 2. Batch processing: up to 150k blocks/second
// 3. Filter at source: only fetch data you need
// 4. Custom transforms: any TypeScript logic you want
import {EvmBatchProcessor} from '@subsquid/evm-processor'
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.sqd.io/ethereum-mainnet')
.setRpcEndpoint(process.env.RPC_URL)
.addLog({
address: [UNISWAP_V3_FACTORY],
topic0: [events.PoolCreated.topic],
})
.setFields({
log: {
transactionHash: true,
},
})
Our Benchmarks
| Task | The Graph | Subsquid | Self-Hosted |
|---|---|---|---|
| Uniswap V3 full index | 12+ hours | 45 minutes | 3-4 hours |
| Daily incremental sync | 2-3 min lag | <30 sec lag | ~1 min lag |
| Complex aggregations | Timeout issues | Handled | Depends on setup |
| Monthly cost (moderate load) | $2,400 | $400-800 | $1,200+ (infra) |
What I Love
- Speed: Near zero-cost access to historical data
- Flexibility: It’s just Node.js. Debug, log, transform however you want
- Multi-chain: Same squid can index 100+ networks
- Subsquid Cloud: Deploy with zero-downtime migrations
What Concerns Me
- Data Freshness: There’s a delay between on-chain events and indexed data. Centralized archives need pre-indexing before your squid processes it. For real-time apps, this matters
- Learning Curve: More flexible = more decisions to make
- Younger Ecosystem: Fewer ready-made templates than The Graph
Self-Hosted: The Control Freak’s Choice
For the brave (or budget-constrained), self-hosted indexing is always an option.
When It Makes Sense
- Regulatory requirements: Some enterprises can’t send data to third parties
- Custom data needs: Highly specialized indexing logic
- Cost optimization at scale: If you’re indexing 10+ chains heavily
Our Stack
# Self-hosted indexing infrastructure
services:
# Archive node (Erigon for Ethereum)
archive-node:
image: thorax/erigon:latest
volumes:
- ./data/erigon:/data
# Needs 2TB+ SSD, serious hardware
# Custom indexer
indexer:
build: ./indexer
depends_on:
- archive-node
- postgres
environment:
- RPC_URL=http://archive-node:8545
- DATABASE_URL=postgresql://postgres:5432/indexer
# TimescaleDB for time-series queries
postgres:
image: timescale/timescaledb:latest
volumes:
- ./data/postgres:/var/lib/postgresql/data
Real Costs
| Component | Monthly Cost |
|---|---|
| Archive Node (Ethereum) | $300-500 (dedicated server) |
| Database (managed) | $200-400 |
| Compute (indexer) | $100-200 |
| DevOps Time | Priceless |
| Total | $600-1,100 + time |
The Hidden Costs
What nobody tells you about self-hosting:
- Chain upgrades break things: Pectra just shipped. Your indexer probably needs updates
- Data consistency: One missed block = corrupted state. Rebuilding is painful
- On-call duties: Archive node goes down at 3 AM? That’s your problem
My Recommendation Matrix
| Scenario | Recommendation |
|---|---|
| Startup, need to ship fast | Subsquid Cloud |
| Decentralization matters deeply | The Graph Network |
| Enterprise with compliance needs | Self-hosted + Subsquid SDK |
| Real-time trading/MEV | Self-hosted with custom stack |
| Multi-chain aggregator | Subsquid (built for this) |
| Querying existing protocol data | The Graph (if subgraph exists) |
The Consolidation Trend
Something worth watching: the RPC provider market is consolidating, and indexing is following. Smaller providers are disappearing. If your favorite indexing solution is a small startup, have a backup plan.
The winners seem to be:
- Subsquid: Technical performance, dev experience
- The Graph: Decentralization narrative, ecosystem effects
- Space and Time: Enterprise play, SQL compatibility
Questions for the Community
- What’s your latency requirement? If you need sub-second data freshness, most indexers won’t cut it
- How much customization do you need? Subgraph constraints vs squid flexibility
- What’s your team’s skill set? GraphQL-first vs TypeScript-first
I’m genuinely curious what others are using. Are you rolling your own, paying for hosted solutions, or doing something hybrid?
Data collected from production workloads indexing Ethereum, Arbitrum, and Base. Your mileage may vary.